Summary
The authors of the article aim to study antisocial behavior in online discussion communities, as the title suggests. They use data from three online discussion-based communities, Breitbart.com. CNN.com and IGN.com. Specifically, they use the comments on articles that are posted on the websites. The data covers a period of over 18 months, 1.7 million users who have contributed nearly 40 million posts. The authors characterize antisocial behavior by comparing the activity of users who are later banned from a community, namely Future-Banned Users (FBUs), with that of users who were never banned, or Never-Banned Users (NBUs). They find significant differences between the two groups. For instance, FBUs tend to write less similarly to other users, and their posts are harder to understand according to standard readability metrics. In addition, they are more likely to use language that may stir further conflict. FBUs also make posts that tend to be concentrated in individual threads rather than spread out across several. They also receive more replies than average users. In the longitudinal analysis, the authors find that of an FBU worsens over their active tenure in a community. Moreover, they show not only do they enter a community writing worse posts than NBUs, but the quality of their posts also worsens more over time. They also find that the distribution of users’ post deletion rates is bimodal, where some FBUs have high post deletion rates, while others have relatively low deletion rates. Finally, they demonstrate that a user’s posting behavior can be used to make predictions about who will be banned in the future with over 80% AUC.
Reflections
User-generated content has become important for the success of websites. As a result, maintaining a civil environment is important. Anti-social behavior includes trolling, bullying, and harassment. Therefore, platforms implement mechanisms designed to discourage antisocial behavior, such as moderation, up and down voting, reporting posts, mute functionality and blocking users’ ability to post.
The design of the platform might render the results non-generalizable or more platform specific. It would be interesting to see whether the results hold for other different platforms. There are cases of discussion platforms where the moderators have the option of issuing a temporary ban. Perhaps this could work as a mechanism to “rehabilitate” users. For instance, the authors find there are two groups of users where some FBUs have high post deletion rates, while others have relatively low deletion rates. It should be noted that the authors excluded users who were banned multiple times so as not to confound the effects of temporary bans with behavior change.
In addition, it should be stressed that these specific discussion boards have an idiosyncrasy. The primary function of these websites is to be a news network, not a discussion board. This is important, because the level of scrutiny is different in such platforms. For instance, they might choose banning opposing views expressed in an inflammatory language more frequently, to support their editors or authors. The authors write that “In contrast, we find that post deletions are a highly precise indicator of undesirable behavior, as only community moderators can delete posts. Moderators generally act in accordance with a community’s comment policy, which typically covers disrespectfulness, discrimination, insults, profanity, or spam”. The authors do not provide evidence to support this position. This does not necessarily mean they are wrong, since their criticism for other methods is valid.
However, the authors propose a method that explores this problem by measuring text-quality. They do this by sending a sample of posts to Amazon Turk. Then they take this sample and run a classification model to generalize text-quality results for their sample.
They find some interesting results. Deletion rate increase by time for FBUs, but it is constant for NBUs. In addition, they find that text-quality decreases for both groups. This could be attributed either to a decrease in the posting quality, which would explain the deletion, or community bias. Interestingly enough the authors find evidence that supports both hypotheses. For the community bias hypotheses, the authors use propensity score matching and they find that early posts (first 10% of time) compared to later posts (last 10% of time) for the same text-quality are more likely to be deleted for FBUs but not NBUs. They also find that excessive censorship cause users to write worse.
Questions
- How would a mechanism of temporary bans affect the discussion community?
- The primary purpose of these websites is to deliver news to their target audience. Are the results same for websites whose primary purpose is to provide a discussion platform, such as discussion boards?
- Propensity Score matching is biased if there are unobserved variables. This is usually the case in non-experimental, observational, studies. A nearest neighbor matching with Fixed Effects to control for contemporaneous trend, or by matching users by time, in addition to text-quality, might be a better strategy.