Reflection #2 – [8/30] – [Parth Vora]

[1] Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” Icwsm. 2015.

Summary

In this paper, Cheng et al. study three online discussion driven communities (CNN, IGN, and Breitbart) to understand the dynamics behind antisocial behavior in Online Communities. Their results are greatly derived from a quantitative study, where they inspect data to find trends in antisocial behavior and use the same to support their conclusions. This study also tracks how the activities of the users transform over time. The main deciding factor for an individual to be classified as an antisocial entity is the fact that they have been banned by moderators. The paper then goes on to explain a model to predict antisocial behavior using various features that they had discussed earlier.

 

Reflection

The paper answers many questions while leaving many unanswered. Although many conclusions seem intuitive to understand, it is amazing how simply going through the data answers the same questions. Like the one where the authors discuss “if excessive censorship causes users to write worse”. Intuitively, if one is punished for doing the right thing, the chances of repeating the same nice thing again reduces considerably.

What exactly is antisocial behavior? From one online community to another, this definition will change. There can not be a single defining line. For instance, 4chan users will tolerate more antisocial elements than users on Quora. Also, as we move from one geographical area to other, speaking habits will change. What is offensive and inappropriate in some culture might not be inappropriate in some other culture. So, what content is acceptable and to what extent?

Antisocial posts in this paper are labeled by the moderators. These are human moderators and their views are subjective. How can we validate that these posts are actually antisocial and not a positive criticism or some form of sarcasm? Secondly, on huge social networking websites which produce millions of posts every day how can moderation be translated at such a large scale? The paper provides four features and amongst them, the “moderator” feature has more weight in the classifier than the others. But with such large-scale networks, how can one rely on community and moderator features? The model also has a decent accuracy but when extrapolated to a large user base, it could result in banning of millions of innocent user accounts.

Coming to the technical side, the model shows relatively high accuracy during cross-platform testing using simple random forest classifiers and basic NLP techniques. While “Bag of words” model with random forest classifiers is a strong combination, they are insufficient to build the “post features”, in this case. Users have many different writing styles and much depends on the context in which words appear, so something more advanced than “bag of words” is needed. Word vectors would be a very good choice as they help capture context using the relative distance between two words. They can be easily tailored to the common writing style of the platform.

By taking, posts from the same user, we can build a sentiment index for each user. Sentiment index will help predict what the user, in general, feels about a particular topic and prevent incorrect banning. It is comparable to a browser keeping your search history to understand your usage patterns. One can also look at all posts from a general perspective and create an “antisocial index” for each post and only if the index is above a certain threshold, should the user be banned or be penalized. Penalties could include disabling users posting privileges for certain hours, so as to ensure that even if there is a false positive, an NBU is not banned.

In conclusion, the paper provides an informative and intriguing baseline to track antisocial behavior. Many techniques can be used to enhance the proposed model and create an autonomous content filtering mechanism.

Leave a Reply

Your email address will not be published. Required fields are marked *