Reflection #3 – [1/23] – [Jiameng Pu]

Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” ICWSM. 2015.

Summary:

Users contributions are an important part of kinds of social platforms, e.g., posts, comments, votes, etc. While most users are civil, few of the antisocial users greatly contaminate the environment of the internet. By mainly studying on users who were banned from specific communities and compare two user groups, FBUs(Future-Banned Users) and NBUs(Never-Banned Users), the authors try to characterize antisocial behavior, e.g., how FBUs write, how FBUs generate activity around themselves.  The “Evolution over time” analysis shows that FBUs write worse than other users over time and tend to exacerbate their antisocial behavior when there is more strong criticism in the community. By designing features based on the observations and then categorize them, the work can potentially help alleviate the burden of social community moderators from heavy manual labor. Besides, it proposes a typology of antisocial users based on post deletion rates. Finally, A system is introduced to identify undesired users early on in their community life.

Reflection:

The paper leads some extensive discussion and analysis on the topic of antisocial behavior, I highlight some points impress/inspire me most. First, the analysis about how to measure undesired behavior is a useful one in the data preparation section.  It reminds me that down-voting activities cannot be interpreted as undesirable in the context of “antisocial behavior”, which is a much narrower conception. Personally, I don’t use down-vote functionality that much when I browse Q&A websites like Quora, Zhihu, and StackOverflow. And it turns out many people also keep the same habit, which is a good instance where considering fewer features/data, i.e., report records and post delete rates, makes more sense.  Second, instead of predicting whether a particular post or comment is malicious, they put more focus on individual users and their whole community life, which is harder to analyze but bring more convenience to community moderators, since they can do their job like a real community police but not simply a cleaner.  Third, four categories of feature properly cover all the feature classes, but the author doesn’t mention some of potentially important features in Table 3, e.g., post comments, which could be categorized into post features; user’s followings and followers, which could be categorized into community features. Intuitively, these two features are strong indicators of the user’s properties — people of one mind fall into the same group and harsh criticism would show up in the comment area of malicious posts.

I notice that the author performs the above task on a balanced dataset of FBUs and NBUs (N=18758 for CNN, 1164 for IGN, 1138 for Breitbart), suggesting that these learned models generalize to multiple communities. Though the number of FBUs and NBUs is balanced, would the different number of user samples from three platforms influence the generalization of the resulting classifier? To my point of view, it’s more rigorous for the author to modify lopsided data samples or add more discussion about how data can be properly sampled.

Questions & thoughts:

  1. What’s the proper line between the definition of antisocial and non-antisocial? We should avoid confusing unpleasant users and antisocial users.
  2. Compared to the last paper, there is less description of implementation tools throughout different phases of research. I’m pretty curious about how to do specific procedures practically, e.g., data collecting, feature categorization, investigation of the evolution of user behavior and of community response.
  3. I think the classifiers we choose probably make a difference in the prediction accuracy, so it might be better to compare the performance of those classifiers to find out more feasible classifier for this task.
  4. Although we can roughly see the contribution of each feature category from Table 4, I think more extensive and quantitive analysis would complete the research.

Leave a Reply

Your email address will not be published. Required fields are marked *