Reflection #3 – [1/23] – [Deepika Kishore Mulchandani]

[1]. Cheng, J., Danescu-Niculescu-Mizil, C. and Leskovec, J. 2015. Antisocial behavior in online discussion communities.

Summary :

In this paper, Justin Cheng et al. study antisocial behavior in 3 online discussion communities: CNN, a general news site, Breitbart.com, a political news site, and IGN.com, a computer gaming site. The authors first characterize antisocial behavior by comparing the activity of-of users banned from the community(FBUs) and the users never banned(NBUs). They then perform longitudinal analysis, i.e, the study of the behavior of the users over their active tenure in the community. They also consider the readability of the posts and the proportion of user’s post deletion rate as features to train their model. After developing a model, they predict the users who will be banned in the future. With their model, they need to observe only 5 to 10 of the user’s posts to accurately predict that the user will be banned. They present two hypothesis and try to answer the following research questions: Do users become antisocial later? , Does a community’s reaction affect their behavior? , Can antisocial users be identified early?

Reflection:

Antisocial Behavior is a problem is a worry no matter if it is online or in person. That said, this research is an indication of the advancement that is being made to alleviate the ill effects of such behavior. The authors mention the four features that help in recognizing the antisocial users in a community. Out of these features, the one that is salient in the study conducted by the authors is the ‘Moderator features’. Moderators delete the posts and ban the users in a community. They have a particular set of guidelines based on which they delete the posts that they consider antisocial. This raises a few questions. ‘Do these moderators delete posts only based on the language of the post or factors like ‘ the number of down votes’, ‘whether the post is reported’ affect the decision?’ The point of this question is to figure out which do they weigh more heavily. Also, this opens up a variety of questions like, ‘Do moderator demographics(e.g age) play a role in how offensive they find a post to be?’ The authors mention that there were more ‘swear’ words in the posts written by the FBUs. The moderators who are more tolerable of swear words may not delete posts of potential FBUs.

I admire the efforts of the authors in studying the entire history of a particular user to identify patterns in the user behavior over time. I also like the other features used by the authors. The activity features(time spent in a thread) are not that intuitive and end up playing a significant role. The authors made an important observation that the model trained by one community perform relatively well in the other communities as well. Also, they provided some facts that FBUs survived over 42 days in CNN, 82 days in Breitbart and 103 days on IGN. This could be an interpretation of the category of the online discussion community. One could expect the online community which hosts only political news to be more tolerant of antisocial behavior by virtue of the fact that there is opposition inherent in the news. Most of the posts on such a community could have down votes and replies to comments. These are both significant features of the model as well as factors that influence a moderator’s decision.  Thus, the question, ‘Does the category of the online discussion community affect the ban of an antisocial user?’ I also agree with the authors that it is difficult to track users who might instigate arguments but maintain an NBU behavior. This could be a crucial research question to look into.

Read More

Reflection #3 – [1/25] – Aparna Gupta

Paper: Antisocial Behavior in Online Discussion Communities

Summary:

This paper talks about characterizing anti-social behavior which includes trolling, flaming, and griefing, in online communities. For this study the authors have focussed on CCN.com, Breitbart.com and IGN.com. The authors have presented a retrospective longitudinal analysis to quantify anti-social behaviour throughout an individual user’s tenure in a community. They have divided users in two groups – Future Banned Users(FBUs) and Never Banned Users(NBUs) based on the language and the frequency of their posts.

Reflection:

The paper ‘Antisocial Behavior in Online Discussion Communities’ focuses on detecting the anti-social users at an early stage by evaluating their posts. Their results are based on the features – post content, user activity, community response, and the actions of community moderators. In my opinion “What leads an ordinary man to exhibit trolling behaviour” should also have been considered as a contributing feature.

For example, in communities or forums where political discussions are held, comments exhibiting strong opinions are bound to be seen. I therefore feel that What is considered anti-social depends on the particular community and the topic around which the respective community is formed” [1].

What struck my mind was there can be scenarios where discussion context determines the trolling behaviour of an individual. However, the ‘Readability Index’ parameters which authors have considered looked promising.

In the Data Preparation stage to measure “Undesired Behaviour” the authors have stated that “At the user-level, bans are similarly strong indicators of antisocial behaviour”. How is a user getting banned from an online community determines antisocial behaviour? For example, a user got banned from Stack overflow because all of the questions posted were out of scope.

The paper majorly revolves around the 2 hypothesis which authors have stated to evaluate an increase in the post deletion rate. H1: a decrease in posting quality, H2: an increase in community bias. To test both H1 and H2, the authors have conducted 2 studies – 1. Do writers write worse over time? This study is somewhat agreeable where one can analyse how the user writing is changing over time. 2. Does community tolerance change over time?  According to the results presented by the authors this indeed looks true. However, in my opinion It also depends on how opinions/comments are perceived by other members of the community.

In the closing note, the paper presents some interesting facts about how to identify trolls and ban them at the very early stages.

 

[1] https://www.dailydot.com/debug/algorithm-finds-internet-trolls/

Read More

Reflection #3 -[01/25]- [Vartan Kesiz-Abnousi]

Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” ICWSM. 2015.

Summary

The authors of the article aim to study antisocial behavior in online discussion communities, as the title suggests. They use data from three online discussion-based communities, Breitbart.com. CNN.com and IGN.com.  Specifically, they use the comments on articles that are posted on the websites. The data covers a period of over 18 months, 1.7 million users who have contributed nearly 40 million posts. The authors characterize antisocial behavior by comparing the activity of users who are later banned from a community, namely Future-Banned Users (FBUs), with that of users who were never banned, or Never-Banned Users (NBUs). They find significant differences between the two groups. For instance, FBUs tend to write less similarly to other users, and their posts are harder to understand according to standard readability metrics. In addition, they are more likely to use language that may stir further conflict. FBUs also make posts that tend to be concentrated in individual threads rather than spread out across several. They also receive more replies than average users. In the longitudinal analysis, the authors find that of an FBU worsens over their active tenure in a community. Moreover, they show not only do they enter a community writing worse posts than NBUs, but the quality of their posts also worsens more over time. They also find that the distribution of users’ post deletion rates is bimodal, where some FBUs have high post deletion rates, while others have relatively low deletion rates. Finally, they demonstrate that a user’s posting behavior can be used to make predictions about who will be banned in the future with over 80% AUC.

Reflections

User-generated content has become important for the success of websites. As a result, maintaining a civil environment is important. Anti-social behavior includes trolling, bullying, and harassment. Therefore, platforms implement mechanisms designed to discourage antisocial behavior, such as moderation, up and down voting, reporting posts, mute functionality and blocking users’ ability to post.

The design of the platform might render the results non-generalizable or more platform specific. It would be interesting to see whether the results hold for other different platforms. There are cases of discussion platforms where the moderators have the option of issuing a temporary ban. Perhaps this could work as a mechanism to “rehabilitate” users. For instance, the authors find there are two groups of users where some FBUs have high post deletion rates, while others have relatively low deletion rates. It should be noted that the authors excluded users who were banned multiple times so as not to confound the effects of temporary bans with behavior change.

In addition, it should be stressed that these specific discussion boards have an idiosyncrasy. The primary function of these websites is to be a news network, not a discussion board. This is important, because the level of scrutiny is different in such platforms. For instance, they might choose banning opposing views expressed in an inflammatory language more frequently, to support their editors or authors. The authors write that “In contrast, we find that post deletions are a highly precise indicator of undesirable behavior, as only community moderators can delete posts. Moderators generally act in accordance with a community’s comment policy, which typically covers disrespectfulness, discrimination, insults, profanity, or spam”. The authors do not provide evidence to support this position. This does not necessarily mean they are wrong, since their criticism for other methods is valid.

However, the authors propose a method that explores this problem by measuring text-quality. They do this by sending a sample of posts to Amazon Turk. Then they take this sample and run a classification model to generalize text-quality results for their sample.

They find some interesting results. Deletion rate increase by time for FBUs, but it is constant for NBUs. In addition, they find that text-quality decreases for both groups. This could be attributed either to a decrease in the posting quality, which would explain the deletion, or community bias. Interestingly enough the authors find evidence that supports both hypotheses. For the community bias hypotheses, the authors use propensity score matching and they find that early posts (first 10% of time) compared to later posts (last 10% of time) for the same text-quality are more likely to be deleted for FBUs but not NBUs. They also find that excessive censorship cause users to write worse.

Questions

  1. How would a mechanism of temporary bans affect the discussion community?
  2. The primary purpose of these websites is to deliver news to their target audience. Are the results same for websites whose primary purpose is to provide a discussion platform, such as discussion boards?
  3. Propensity Score matching is biased if there are unobserved variables. This is usually the case in non-experimental, observational, studies. A nearest neighbor matching with Fixed Effects to control for contemporaneous trend, or by matching users by time, in addition to text-quality, might be a better strategy.

Read More

Reflection #3 -[01/25]- [Jamal A. Khan]

This paper studies anti-social behavior of people on three different platforms which i believe is quite relevant looking at the ever increasing consumption of social media. First off,  in my opinion what the authors have studied is not anti-social behavior, but rather negative, unpopular and/or inflammatory behavior (which also might not be the case as I’ll highlight a bit later). Nonetheless, the findings are interesting.

Referring to Table 1 in the paper (also shown above) I’m surprised to see so few posts deleted. I was expecting something in the vicinity of 9-10% but that might be just me though! maybe I have a tendency to run into more trolls online 🙁 . What are other people’s experiences, do these numbers reflect the number of trolls you find online?

Now, a fundamental problem that I have with the paper is the use of moderators actions of “banning or not banning” as the ground truth. This approach fails to address a few things. First, What of the moderators biases? One moderator might consider certain comments on certain topic acceptable while another might not and this varies based on how the person in question feels about the topic at hand.  For example, I very rarely talk or care about politics, hence most comments seem innocuous to me, even ones that i see other people react very strongly to. That being the case, if I was the moderator that saw some politically charged comments i would most probably ignore them.

Second, unpopular opinions expressed by people most certainly don’t count as anti-social behavior or troll remarks or even as attempts to derail the discussion or cause inflammation e.g. one such topic that could pop up on IGN would be the strict gender binary enforced by most video game studios, which will, by my experience, get down voted pretty quickly because people are resistant to such changes. So this raises a few questions as to what is used a metric to deal with unpopular posts? Are down-votes used by the moderators as a metric to remove the posts?

Third, varying use of English based on demographics would thrown off the language similarity among posts for FBUs and NBUs by a fair margin and thee authors don’t seem to have catered for it. The paper relies quite heavily on this metric for making a lot of the observations. So, If we were conducting a follow up study, how would we go about taking cultural difference in use of English into account? Do we even have to i.e. will demographically more diverse platforms automatically have a  normalizing effect?

Finally, the idea of detecting problematic people beforehand seems like a good idea at first but on second thought i think it might not be so! but that depends on how the tool is used. The reason why I say this is because, suppose we had an omnipotent classifier that could with 100% accuracy, what would we do once we have the predictions? Ban the users beforehand? wouldn’t that be a violation of the right to opinion and freedom of speech? wouldn’t the classifier just reflect what the people like to see and hear and end up tailoring their content to their point of views? and in a dystopian scenario wouldn’t it just lead to snowflake culture?

As a closing note, how would the results be if the study was to be repeated on Facebook pages? would the results from this study  generalize?

Read More

Reflection #3 – [1/25] – Ashish Baghudana

Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” ICWSM. 2015.

Summary

In this paper, Cheng et al. attempt to characterize anti-social behavior in three online communities – CNN, IGN, and Breitbart. The study address three major questions:

  • Is anti-social behavior innate or dependent on the community influences?
  • Does the community help in improving behavior or worsen it?
  • Can anti-social users be identified early on?

They find that the banned users were less readable, get more replies and concentrate on fewer threads. The authors also find that communities affect writing styles over time. If a user’s posts are unfairly deleted, their writing quality is likely to decrease over time. Subsequently, the authors characterized the banned users based on their deleted posts and found the distribution to be bimodal. An interesting characteristic of banned users is that the frequency of their posts is much higher than that of non-banned users. Finally, the authors build a classifier to predict if a user might be banned based on their first 10 posts and report an accuracy of 0.8.

Reflection

The paper comes at a time when cyberbullying, harassment and trolling are at their peak on the Internet. I found their research methodology very didactic – to effectively summarize and divvy up 1.7 million users and 40 million posts. It is also interesting to read into their use of Amazon Mechanical Turk to generate text quality scores, especially because this metric does not exist in the NLP sphere.

At several instances in the paper, I found myself asking the question what kinds of anti-social behavior do people exhibit online? While the paper focused on the users involved, and characteristics of their posts that made them undesirable on such online communities, it would have been much more informative had the authors also focused on the posts itself. Topic modeling (LDA) or text clustering would have been a great way of analyzing why they were banned. Many of the elements of anti-social behavior discussed in the paper would hold true for bots and spam.

Another fascinating aspect that the paper only briefly touched upon was the community effect. The authors chose the three discussion communities very effectively – CNN (left of center), Breitbart (right of center) and IGN (video games). Analyzing the posts of the banned users on each of these communities might indicate community bias and allow us to ask questions such as are liberal views generally banned on Breitbart?

The third set of actors on this stage (the first two being banned users and the community) are the moderators. Since the final decision of banning a user rests with the moderators, it would be interesting to ask the question what kind of biases do the moderators display? Is their behavior erratic or does it follow a trend?

One tiny complaint I had with the paper was their visualizations. I often found myself squinting to be able to read the graphs!

Questions

  • How could one study the posts themselves rather than the users?
    • This would help understand anti-social behavior holistically, and not just from the perspective of non-banned users
  • Is inflammatory language a key contributor to banning certain users, or are users banned even for disagreeing with long-standing community beliefs?
  • Do the banned posts on different communities exhibit distinct features or are they generalizable for all communities?

Read More

Reflection #3 – [1/24] – Hamza Manzoor

Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jure Leskovec. “Antisocial Behavior in Online Discussion Communities” 

 

Summary:

In this paper, Cheng et al present a study of antisocial behavior of users from the moment they join a community up to when they get banned. The paper presents a very important topic, which is cyber bulling and identification of cyber bullies is one of the most relevant topics in current digital age. The authors study users behaviors on three online discussion communities (CNN, Breitbart and IGN) and characterize antisocial behavior by analyzing users who were banned from these communities. The analysis of these banned users reveal that over time they start writing worse than other users and secondly, the tolerance of community towards them reduces.

 

Reflection:

Overall, I felt that the paper was well organized and showed all the steps of analysis from data preparation to results of findings along with visualization but the correctness of analysis performed in the paper is questionable because the basis of entire analysis is number of deleted posts but the authors did not consider all the reasons for posts to be deleted. Some posts get deleted because they are in different languages or sometimes on controversial topics like politics if the reported post does not conform to opinions of moderators. Sometimes users engage in an off-topic discussion and those posts are deleted to maintain relevance of comments to article. The biases of moderators should be considered.

The paper does not mention the population size on some analysis, which makes me question if the sample size was significant, or not. For example: When they analyze if excessive censorship cause users to write worse. In this analysis, one population had four or more posts deleted among their first five posts, which unless mentioned I believe would be negligible. Also, entire analysis in more or less dependent on first five or ten posts which is also questionable because these posts can be on same thread on one single day. This approach has two caveats, since the authors did not analyze the text, it is therefore unfair to ban user on their first few posts because it might be possible that user had a conflicting opinion rather than a troll and secondly, the paper itself shows that many of NBUs initially had negative posts and they got better in time. Therefore, banning users on first few deleted posts means that they will not have an opportunity to become better.

The strongest features in the statistical analysis are moderator features and without those features the results significantly drop. These moderator features require moderators whereas the purpose of this analysis was to automate the process of finding FBUs and their high dependency on these features makes this analysis look not so significant.

Finally, my take on the analysis in this paper is that the use of number of deleted posts is trivial and the text of posts should be analyzed before automating any such process which bans users from posting.

 

Questions:

Are the posts deleted because of inflammatory language only or difference of opinion as well?

One question that everyone might raise is that analyzing users on first few posts is unfair but what should be the threshold? Can we come up with a solid analysis without topic modeling and analyzing the text?

What kind of biases moderators display? Does it play a role in post deletions and ultimately user ban?

Read More

Reflection #2 – [1/24] – Hamza Manzoor

Mitra, Tanushree, and Eric Gilbert. “The language that gets people to give: Phrases that predict success on Kickstarter.” 

Summary:

In this paper, Mitra et al present a study to answer a research question that how does language used in pitch gets people to fund the project. The authors provide analysis of text from 45k Kickstarter project pitches. The authors clean the text from these pitches to use the phrases available in all 13 categories and finally use 20K phrases from these pitches along with 59 control variables such as project goal, duration, number of pledge levels etc. to train a penalized logistic regression model to predict if the project will be funded or not. Using phrases in model decreases the error rate from 17.03% to 2.4%, which shows that the text of the project pitches plays a vital role in getting funded. The paper compares the features of funded and non-funded projects and explains that the campaigns that show reciprocity (giving something in return), scarcity (limited availability) and social proof have higher tendency of getting funding.

Reflection:

The authors address a question about what features or language helps in getting more funds. The insights that paper provides are very realistic that people generally tend to give if they see benefit for themselves may be they get something in return. The paper provides a very useful insight to startups looking for funding that they should focus more on their pitch and show reciprocity, scarcity and social proof. But still the results of paper are somewhat astonishing to me because the first 100 predictors belong to language of pitch, which makes me question that is language sufficient to predict whether project will be funded?

There are also few phrases that do not make sense when taken out of context for example ‘trash’ has a very high beta score but does it make sense? Unless we look at entire sentence we cannot say that.

The authors show that the use of phrases in model significantly decreases the error rates but the choice of model is not evident. Why have they used penalized logistic regression? Even though penalized logistic regression (LASSO) makes sense but comparison with other models should have been provided. The ensemble methods like Random Forest Classifier should work well on this type of data and therefore the comparison of different models tested would have provided more insight to choice of model.

Furthermore, treating every campaign equally is another false assumption I see in this paper because how can a product asking for $1M and meeting its goals equivalent to a product with $1000 goal and is every category of campaign equivalent?

Finally, this paper was about the language used in pitches but it also presents new research questions, such as, is there a difference between types of people funding different projects? Do most people belong to wealthy societies? Another interesting question would be, can we process text within video pitches to perform similar analysis? Do infographics help? And, can we measure usefulness of a product and use it to predict?

 

Questions:

Is language sufficient to predict whether project will be funded?

Why the use of penalized logistic regression over other models?

Is every category of campaign equivalent?

Is there a difference between types of people funding different projects?

Can we process text within video pitches to perform similar analysis?

Can we measure usefulness of a product and use it to predict?

Read More

Reflection #1 – [1/24] – Hamza Manzoor

[1]. Danah Boyd & Kate Crawford (2012) CRITICAL QUESTIONS FOR BIG DATA

Summary:

In this paper, the authors describe big data as a cultural, technological, and scholarly phenomenon. They explain that the way we handle the emergence of an era of Big Data is critical because current decisions of how we define the use of big data will shape the future. They also describe different pitfalls and discuss six provocations about the issues of Big Data. In these six points they discuss that big data has created a radical shift in how we think about research and has changed the definition of knowledge. They also break the common myth most researchers have that data solves all problems and also point out that the access of data to privileged few is creating a new divide. Furthermore, they go on to explain that big data especially social media data can be sometimes misleading because it not necessarily represent the entire population. They further discuss the ethics of using big data in research and the lack of regulations on ethical practices of research.

[2]. D. Lazer and J. Radford, Data ex Machina: Introduction to Big Data

Summary:

In this paper, the authors define big data and institutional challenges it presents to sociology. They touch base on three types of big data sources and enumerate the promises and pitfalls common across them. The authors are of the opinion that crosscutting these three types of big data is the possibility for sociologists to study human behavior. The authors also discuss the opportunities available to sociologists with the huge amount of data available through various social systems, natural and field experiments and other digital traces. They also explain how targeted sample from a huge chunk of data can be used to study behavior of minorities. They further discuss the vulnerabilities in big data including generalization that data represents entire population, fake data generated through bots and different sources of data with different accessibility and issues that these vulnerabilities presents.

Reflections:

From both Boyd & Crawford’s and Lazer & Radford’s descriptions, I took away that big data should be carefully used keeping in mind ethical issues. Furthermore, the key take away from these papers for me is that big data is not just about size but also how we manipulate the data to generate insights about human behaviors.

I particularly liked Boyd & Crawford’s provocation #3 that bigger data is not necessarily a better data. We computer scientists have common belief that more data can solve all the problems but in actuality this is not essentially true because the data at hand no matter how big is it might not be representative at all for example: trillion rows of Twitter data will still only represent small portion of Twitter users and therefore, generalizing and making claims about behaviors and trends can be misleading. The predictions made using this data will therefore have inherent biases. Since social media data is the biggest source of big data so now the question that comes to mind after this is how do we know if data is true representative or not? If not, then from where do we get the data that is true representation of entire population?

I have concerns about Lazer & Radford’s solution to generalizability that data from different systems should be merged. Is it even possible for a normal sociologist researcher? Will companies provide access to their entire dataset? Boyd & Crawford’s paper explains that people with different privileges have different level of access to the data. Even if we consider an ideal world where we have access to data from all the sources, how will we link data from different sources? For example: A Twitter user handle to Facebook profile and Snapchat username because currently the chunk of data available of Facebook users might not have same users available in twitter data. Will Facebook provide access to their entire dataset?

Nonetheless, the papers enlightened me to think how big data can be used in context of social science and what are the ethical vulnerabilities associated with it.

 

Questions:

 

How do we know if data is true representative or not? Where do we get the data that is true representation of entire population?

Is it possible to link data from different sources?

How do we know what companies are doing at the backend is ethical or not?

Do people behave in same way on different digital platforms?

Can computational social science correctly explain human behavior with current data we have? Because papers suggested that data we have is not true representation until merged.

Read More

Reflection #3 – [01/25] – [John Wenskovitch]

This paper describes a study regarding antisocial behavior in online discussion communities, though I feel that labeling the behavior as “negative” rather than “antisocial” may be more accurate.  In this study, the authors looked at the comment sections of CNN, Breitbart, and IGN, identifying users who created accounts and were banned during the 18-month study window.  Among other findings, the authors noted that these negative users write worse than other users, they both create new discussions and respond to existing discussions, and they come in a variety of forms.  The authors also found that the response from the rest of the community has an influence on the behavior of these negative users, and also that they are able to predict whether or not a user will be banned in the future with great accuracy just by evaluating a small number (5-10) of the user’s posts.

Overall, I felt that this paper was very well organized.  I saw the mapping pattern discussed during Tuesday’s class linking the data analysis process to the sections of the paper.  The data collection, preprocessing, and results were all presented clearly (though I had a visualization/data presentation gripe with many of the subfigures being rendered far too small with extra horizontal whitespace between them).  Their results in particular were neatly organized by research finding, so it was clear what was being discussed from the bolded introductory text.

One critique that I have which was not well addressed by the authors was the fact that all three of the discussion communities that they evaluated used the Disqus commenting platform.  In a way, this works to the authors’ advantage by having a standard platform to evaluate.  However, near the end of the results, the authors note that “moderator features… constitute the strongest signals of deletion.”  It would be interesting to run a follow-up study with websites that use different commenting platforms, as moderators may have access to different moderation tools.  I would be interested to know if the specific actions taken by moderators have a similar effect to the community response, if these negative users respond differently to more gentle moderation steps like shadowbanning or muting than to harsher moderation steps like post deletion and temporary or permanent bans.  From research like this, commenting platform creators can modify their tools to support actions that mitigate negative behavior.

In a similar vein, the authors have no way of knowing precisely how moderators located comments from these negative users to begin the punishment process.  I would be interested to know if there is a cause and effect relationship between the community response and the moderator response (e.g., the moderators look for heavily downvoted comments to delete and ban users), or if the moderators simply keep track of problem users and evaluate every comment made from those users.  Unfortunately, this information is something that would like require moderator interviews or further knowledge of moderation tools and tactics, rather than something that could be scraped or easily provided by Disqus.

The “factors that help identify antisocial users” and “predicting antisocial behavior” sections were quite interesting in my opinion, because problem users could be identified and moderated early on instead of after they begin causing severe problems within the discussion communities.  The authors’ use of inferential statistics here was well written and easy to follow.  Their discussion at the end of these sections regarding the generalizability of these classifiers was also pleasing to see included in the paper, showing that negative users share enough features that a classifier trained on CNN trolls could be used elsewhere.

Finally, I wanted to make note of the discussions under Data Preparation regarding the various ways that undesired behavior could be defined.  The discussion was helpful both from an explanatory perspective, describing negative tactics like baiting users, provoking arguments, and derailing discussions, as well as from a methodological perspective to understand what behaviors were being measured and included throughout the rest of the study.  However, I’m curious if there are cases that the authors did not measure, or if there were false negative bans that may have been introduced into the data.  For example, several reddit communities are known for banning users who simply comment with different political views.  Though I don’t want to visit Breitbart myself, second-hand information that I’ve heard about the community makes me suspect that a similar approach might exist there.  It was not clear to me if authors would have removed comments and banned users from consideration in this study if, for example, they simply expressed unwanted content (liberal views) in polite ways on a conservative website.  It still counts as “undesired behavior,” but I wouldn’t count it in the same tier as some of the other behaviors noted.

Read More

Reflection #3 – [1/25] – [Pratik Anand]

The paper deals with a very relevant topic for social media – antisocial behavior including trolling and cyber bullying.
The authors make a point of understanding the patterns of trolls via their online posts, effects of the community on them and if they can be predicted. It is understandable that anonymity of the internet can cause regular normal users to act differently online. A popular cartoon caption says “On the Internet, nobody knows you’re a dog” . Anonymity is a two way street. You can act anyway you want, but so can someone else.

Community’s response to trolling behavior is also interesting, as it shows strict censorship behavior results in more drastic bad behavior. Hence, some communities use shadowban where the user doesn’t get know that it has been banned. Its posts will only be visible to itself and not others. Are those kind of bans included in FBUs ? The biasness of community moderators should be brought into question – some moderators are sensitive towards certain topics and ban users on even little offense. Thus, moderators behavior can also result in more post deletions and bans. Use of post deletion as ground truth is questionable here. One funny observation is that IGN has more deleted posts than reported posts. What could be the reason ?
The paper doesn’t cover all the grounds related to trolling and abuse. A large number of banning happens when trolling users abuse others over personal messages. The paper doesn’t seem to take that into account. The paper also does not include temporarily banned users. I believe including them will provide crucial insight into corrective behavior by some users and their self-control. I don’t think deleted/reported posts should be a metric for measuring anti-social behavior. Some people post on controversial topics or go off-topic and their posts are reported. This does not constitute as anti-social behavior but it will be included in such kind of metric based on deleted posts. The biasness of moderators is already mentioned above. Cultural differences play a role too. In my experience, many a times, a legitimate post has been branded as a troll behavior because the user was not very comfortable with English, or American use of a statement structure. For example, phrase “having a doubt” in Indian English communicates different things than that in American English. A better solution is analysis of discussions and debates on a community forum and how users react to it.
Based on the issues discussed above, the prospect of predicting anti-social behavior from only 10 posts is problematic. Users can banned based on such decisions. In communities like Steam (gaming marketplace), getting banned means losing access to one’s account and bought video games. Thus, banning users can have implications. Banning users over 10 posts could be over-punishment. A single bad day can make someone lose their online account.

In conclusion, the paper is a good step towards understanding trolling behavior but such multi-faceted problem cannot be identified on simpler metrics. It requires social context and a more sophisticated approach to identify such behavior. The application of such identifications also require some thought so that it is fair and not heavy-handed.

Read More