Paper-
Antisocial Behavior in Online Discussion Communities Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jure Leskovec.
Summary-
The paper explores the characteristics of “antisocial” users, i.e. trolls, online bullies, etc., by creating a category of users called FBUs (Future Banned Users), and tries to distinguish their habits from NBUs (Never Banned Users). It finds that FBUs do not write in tune with the rest of the discourse, write more incomprehensibly, and express more negative emotion than NBUs. Furthermore, it builds a model to try and predict whether a user will be banned, based on features like the post content, frequency, community interaction and moderator interaction. The results are presented quantitatively.
Reflection-
Firstly, the paper seems limited in its choice of source websites for data gathering. It selects only 3, CNN, Breitbart News, and IGN. Its results could be augmented by similar analyses done on other websites, or on a large diverse set of source websites at once.
CNN is a news organization with a left-wing bias (Source), Breitbart news is an extremely right-wing biased website (Source), while IGN being a gaming website, can be thought of as politically neutral. It may be a coincidence, but IGN has the best average cross-domain generalizability for the user banning prediction system. This might suggest that political leanings may have some effect on either generalizability outside the source website, as the politically neutral source generalizes the best.
The paper questions, quite early on, about whether negative community interaction to antisocial behavior encourages or discourages continuation of that behavior, and finds that it actually exacerbates the problem. There are clear parallels between this finding, and certain studies on the effectiveness of the prison system, where “correctional” facilities do nothing to actually steer criminals away from their previous life of crime once they are released from prison.
The paper tries to compare the behavioral patterns of FBUs against NBUs, but through a process called “matching”, they only select NBUs who have the same posting frequency as FBUs. It is worth noting that this frequency is 10 times the posting frequency of regular users, so NBUs themselves may have anomalous usage patterns, or might be a special subset of users. Despite the paper’s claims that this selection choice gives better results, it might be useful to balance this out by collecting the same statistics about a third additional set of random users.
Moreover, the paper claims that FBUs, despite not contributing anything positive to the discussion, receive many more replies on their comments. The parallel to this is news shows with sensationalized, inflammatory news, or deliberately incendiary news panel guests, where the panel discussion does not enlighten the viewers to the issue, but the ensuing argument attracts a large viewership.
The predictive model that the authors create, could be augmented with other features, like post time data, login/logout times, and data about frequency and duration of personal messages between antisocial users and other community users. I suspect that anti-social users would have a number of short, high volume personal message exchanges with other users (maybe an argument with users who were angry enough to personally message the antisocial individual), but not many sustained long-term exchanges with other users. The predictive model, as the paper mentions, could be something more powerful/expressive than a simple piecewise linear model, like a neural network or an SVM.
Lastly, the predictive model, if implemented in real world systems to ban potentially antisocial users, has some problems. Firstly, as the paper briefly mentions, it raises an interesting question about whether we should give these “antisocial” users the benefit of the doubt, and whether it is okay for an algorithm to pre-emptively ban someone, before society (in this case the moderators or the community) has decided that time has come for them to be banned (as is the case in today’s online and real world systems).