Reflection #2 – [08/30] – [Dhruva Sahasrabudhe]

Paper-

Antisocial Behavior in Online Discussion Communities Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jure Leskovec.

Summary-

The paper explores the characteristics of “antisocial” users, i.e. trolls, online bullies, etc., by creating a category of users called FBUs (Future Banned Users), and tries to distinguish their habits from NBUs (Never Banned Users). It finds that FBUs do not write in tune with the rest of the discourse, write more incomprehensibly, and express more negative emotion than NBUs. Furthermore, it builds a model to try and predict whether a user will be banned, based on features like the post content, frequency, community interaction and moderator interaction. The results are presented quantitatively.

Reflection-

Firstly, the paper seems limited in its choice of source websites for data gathering. It selects only 3, CNN, Breitbart News, and IGN. Its results could be augmented by similar analyses done on other websites, or on a large diverse set of source websites at once.

CNN is a news organization with a left-wing bias (Source), Breitbart news is an extremely right-wing biased website (Source), while IGN being a gaming website, can be thought of as politically neutral. It may be a coincidence, but IGN has the best average cross-domain generalizability for the user banning prediction system. This might suggest that political leanings may have some effect on either generalizability outside the source website, as the politically neutral source generalizes the best.

The paper questions, quite early on, about whether negative community interaction to antisocial behavior encourages or discourages continuation of that behavior, and finds that it actually exacerbates the problem. There are clear parallels between this finding, and certain studies on the effectiveness of the prison system, where “correctional” facilities do nothing to actually steer criminals away from their previous life of crime once they are released from prison.

The paper tries to compare the behavioral patterns of FBUs against NBUs, but through a process called “matching”, they only select NBUs who have the same posting frequency as FBUs. It is worth noting that this frequency is 10 times the posting frequency of regular users, so NBUs themselves may have anomalous usage patterns, or might be a special subset of users. Despite the paper’s claims that this selection choice gives better results, it might be useful to balance this out by collecting the same statistics about a third additional set of random users.

Moreover, the paper claims that FBUs, despite not contributing anything positive to the discussion, receive many more replies on their comments. The parallel to this is news shows with sensationalized, inflammatory news, or deliberately incendiary news panel guests, where the panel discussion does not enlighten the viewers to the issue, but the ensuing argument attracts a large viewership.

The predictive model that the authors create, could be augmented with other features, like post time data, login/logout times, and data about frequency and duration of personal messages between antisocial users and other community users. I suspect that anti-social users would have a number of short, high volume personal message exchanges with other users (maybe an argument with users who were angry enough to personally message the antisocial individual), but not many sustained long-term exchanges with other users. The predictive model, as the paper mentions, could be something more powerful/expressive than a simple piecewise linear model, like a neural network or an SVM.

Lastly, the predictive model, if implemented in real world systems to ban potentially antisocial users, has some problems. Firstly, as the paper briefly mentions, it raises an interesting question about whether we should give these “antisocial” users the benefit of the doubt, and whether it is okay for an algorithm to pre-emptively ban someone, before society (in this case the moderators or the community) has decided that time has come for them to be banned (as is the case in today’s online and real world systems).

 

Read More

Reflection #1 – [08/28] – Dhruva Sahasrabudhe

 

Papers:

  1. Identity and deception in the virtual community – Judith S. Donath.
  2. 4chan and /b/: An Analysis of Anonymity and Ephemerality in a Large Online Community – Michael S. Bernstein, Andres Monroy-Hernandez, Drew Harry, Paul Andre, Katrina Panovich and Greg Vargas.

Short Summary:

It goes without saying that both the papers deal with issues related to identity of the users on an online social platform. Both spend some time describing both how the design of these systems affects trade-offs between the credibility, status, reputation building, and accountability that identity affords the user on the one hand, and the freedom from consequence, judgement, and equality of consideration that anonymity provides on the other hand. Both papers also discuss the ways in which users deal with or adapt to the design of these systems, creating their own methods to tip the balance of these trade-offs, as befitting the situation.

[1] focuses on the users of Usenet, an online topic-based chat/advice forum. It broadly discusses identity, attempts at deception of identity, and mechanisms for identity verification and the prevention of deception that users create on Usenet.

[2] discusses not just identity, but particularly how anonymity and the transitory nature of data on the website affects content, and user interaction and behavior. It focuses on the random (/b/) thread of the website, 4chan.

Reflections:

[1] describes the concept of signalling as a way to establish possession of a desirable trait, e.g. experience in a community, or domain knowledge of the topic of the community discussion. In particular, [1] mentions examples of signatures which contain programming jokes, “Geek Code“, or riddles used in programming Usenet communities. [2] also mentions similar “signatures” in 4chan /b/, like the “triforce“. It might be interesting to explore the usage of such explicit “signatures” on modern anonymous or pseudonym based platforms, like Reddit, StackOverflow, etc.

More ubiquitous are implicit signalling mechanisms, like the language, grammar, references, usernames/email-ids, etc. While investigating this in a data-driven way might be harder, it would also be interesting to collect data about the trends in usage of in-group language and references of a single new user over time, as the user goes from being a new member to a seasoned member of the group , and starts using the same language and references as the group. It would also be interesting to track how these rates would vary among groups, and how easily a single user simultaneously part of multiple disconnected online social groups can switch between the different mores of implicit signalling of the groups.

[2] mentions that “sites like Twitter … feel ephemeral” because of continuous streams of content, despite not being ephemeral. Would this have an impact akin to ephemerality on the user? Would there be any 4chan like “bumping” (or subtler forms of such behavior) on Twitter, because the end user subconsciously feels the site to be ephemeral?

Side note: [2] mentions in passing that to combat the effect of ephemerality of data on 4chan, users comment “bump” or sometimes “bamp” (as a linguistic variant of “bump“). This highlighted for me the importance of spending a lot of time on the particular website or online community being explored, before conducting data analysis, as a casual user of 4chan would be aware of “bumping“, but not of the keyword “bamp“, which is something of an intermediate reference. Thus, posts which were “bamped” would be ignored while gathering data for analysis in this case.

[1] also states that Usenet went from being ephemeral to being a permanent website, as far as data storage was concerned. This raises an interesting opportunity to explore the effects of this transition on the usage patterns, and the behavior of users of the website.

[1] talks about how a users web home page is a useful, believable way of declaring identity, since it is time-consuming to make, harder to fake, and can’t be discarded or replicated easily. Thus, it increases the cost of deception. Most online users nowadays do not have personal web-pages, but many users connect certain website accounts (e.g. Goodreads, Instagram, etc.) to their Facebook or Google accounts. If these accounts are sufficiently old or regularly used, this greatly increases the cost of deception on a website where the users Facebook or Google account is linked, as they contain large amounts of important social information, and may also be linked to other websites. It would be interesting to explore user speech or behavior patterns on websites linked with their Facebook account, like the rudeness/politeness of speech, lies told, etc., versus websites where there is no need to link another social media account. 

 

Read More