Summary
In this paper, Cheng et al. attempt to characterize anti-social behavior in three online communities – CNN, IGN, and Breitbart. The study address three major questions:
- Is anti-social behavior innate or dependent on the community influences?
- Does the community help in improving behavior or worsen it?
- Can anti-social users be identified early on?
They find that the banned users were less readable, get more replies and concentrate on fewer threads. The authors also find that communities affect writing styles over time. If a user’s posts are unfairly deleted, their writing quality is likely to decrease over time. Subsequently, the authors characterized the banned users based on their deleted posts and found the distribution to be bimodal. An interesting characteristic of banned users is that the frequency of their posts is much higher than that of non-banned users. Finally, the authors build a classifier to predict if a user might be banned based on their first 10 posts and report an accuracy of 0.8.
Reflection
The paper comes at a time when cyberbullying, harassment and trolling are at their peak on the Internet. I found their research methodology very didactic – to effectively summarize and divvy up 1.7 million users and 40 million posts. It is also interesting to read into their use of Amazon Mechanical Turk to generate text quality scores, especially because this metric does not exist in the NLP sphere.
At several instances in the paper, I found myself asking the question what kinds of anti-social behavior do people exhibit online? While the paper focused on the users involved, and characteristics of their posts that made them undesirable on such online communities, it would have been much more informative had the authors also focused on the posts itself. Topic modeling (LDA) or text clustering would have been a great way of analyzing why they were banned. Many of the elements of anti-social behavior discussed in the paper would hold true for bots and spam.
Another fascinating aspect that the paper only briefly touched upon was the community effect. The authors chose the three discussion communities very effectively – CNN (left of center), Breitbart (right of center) and IGN (video games). Analyzing the posts of the banned users on each of these communities might indicate community bias and allow us to ask questions such as are liberal views generally banned on Breitbart?
The third set of actors on this stage (the first two being banned users and the community) are the moderators. Since the final decision of banning a user rests with the moderators, it would be interesting to ask the question what kind of biases do the moderators display? Is their behavior erratic or does it follow a trend?
One tiny complaint I had with the paper was their visualizations. I often found myself squinting to be able to read the graphs!
Questions
- How could one study the posts themselves rather than the users?
- This would help understand anti-social behavior holistically, and not just from the perspective of non-banned users
- Is inflammatory language a key contributor to banning certain users, or are users banned even for disagreeing with long-standing community beliefs?
- Do the banned posts on different communities exhibit distinct features or are they generalizable for all communities?