Reflection #2 – [08/30] – [Subil Abraham]

Summary:

This paper examines the behavior of anti social users – trolls – in comment sections of three different websites. The goal of the authors was to identify and characterize these users and separate them from the normal users i.e. the ones who are not intentionally creating problems. The paper analyzed more than a years worth of comment section activity on these sites and identified that the users who had a long history of post deletions and were banned after a while were the trolls (referred to as “Future Banned Users (FBUs)” in the paper). They analyzed the posting history and activity of the trolls, looking at post content, frequency of posting, distribution of their posting across different articles and also comparing them with non problematic users (“Never Banned Users (NBUs)”) who have similar posting activity. The trolls were found to post more on average compared to regular users, tended to have more number of posts under an article, the text in their comment replies were less similar to earlier posts compared to an NBU, and they also engaged more number of users. Trolls also aren’t a homogeneous bunch, with one section of trolls having a higher proportion of deleted posts compared to the rest and are also banned faster and spend less time on the site as a result. The results of this analysis were used to create a model to identify trolls with reasonable accuracy by examining their first 10 posts and determining whether they will be banned in the future.

 

Reflection:

This paper seems to me like an interesting follow up to the section on Donath’s “Identity and Deception” paper on trolls. Where Donath studied and documented troll behavior, Cheng et al. seems to have gone further to perform quantitative analysis on the trolls and their life in the online community. Their observation that the behavior of a troll gets worse over time due to the rest of community actively standing against them seems to be parallel to the behavior of humans in the physical world. Children that grow up abused tend to not be the most well adjusted adults, with studies showing higher rates of crime among adults who were abused or shunned by family and/or community as children compared to those who were treated well. Of course, the difference here is that trolls start off with the intention of making trouble whereas children do not. So an interesting question that we could possibly look at is: If an NBU is treated like an FBU in an online community without chance for reconciliation, will they take on the characteristics of an FBU over time?

It is interesting that the authors were able to get an AUC of 0.80 for their model I feel that is hardly sufficient (my machine learning background is minimal so I cannot comment on whether 0.80 is a relatively good result or not from an ML perspective). This is also a fact that the authors touched upon and recommended having a human moderator on standby to verify the algorithm’s claims. Considering that 1 in 5 cases are false positives, what other factors could we add to increase the accuracy? Given that these days, memes play a big role in the activities of trolls, could that also be factored in the analysis or is meme usage still too small compared to plain text to make it a useful consideration?

 

Other Questions for consideration:

  1. How will the statistics change if you analyze cases where multiple trolls work together? Are they banned faster? Or can they cover and support each other in the community, leading to them being banned slower?
  2. What happens when you stick a non anti social user into a community of anti social users? Will they adopt the community mindset or will they leave? How many will stay and how many will leave and what factors determine whether they stay or leave?

 

Read More

Reflection #1 – [08/28] – [Subil Abraham]

1. Donath, Judith S. “Identity and deception in the virtual community.” Communities in cyberspace. Routledge, 2002. 37-68.
2. Bernstein, Michael S., et al. “4chan and/b: An Analysis of Anonymity and Ephemerality in a Large Online Community.” ICWSM. 2011.

The two papers examine how identity is used (and abused) in the online world, but at opposite ends of the spectrum.

“Identity and Deception” talks about the dynamics of the users in the Usenet newsgroups, where every post has an associated account name that ties that post to a particular identity. This has benefits in that users can grow reputation and gain trust in their particular groups overtime, but also has the disadvantages that it makes it easier to impersonate someone and also that anonymous posts tend to be looked down upon.

On the flip side, “4chan and /b/” examines how the imageboard 4chan thrives despite the fact that over 90% of its posts are anonymous (which is encouraged by the userbase) and that posts tend to be deleted very quickly as new posts come in (unlike most other places where data is stored permanently). Even if it is anonymous, other ways of identifying yourself as being ‘in with the crowd’ have sprung up, through particular language use and tricks (like the so called ‘triforcing’) to identify yourself as a true member.

One thing that stood out to me in “Identity and Deception” was the parallels between the identity dynamics in Usenet and today’s websites. The act of sticking to a single identity to build reputation mirrors what we see today with Reddit and Stack Overflow, but with the explicit addition of a real point system that other users can vote with. This turns the act of gaining reputation from this invisible social practice to a visible, tangible thing provided by the website itself. Category deception when a point system is involved could include not just pretending to be something more than the user actually is, but also vote manipulation (by hacking or vote bots) to inflate the user’s virtual reputation in order to give them an air of legitimacy. The widespread use of Linkedin today seems to be today’s analogue of having a personal webpage which you would link to from your signature, especially for someone who identifies as a professional, with both serving the function of providing a curated view of said professional. Perhaps all this is evidence that humans behave in the same ways even when technologies change and shift over time. Also I guess this means that trolls will never die off. Oh well!

“4chan and /b/” provides an interesting study of posting behavior in the face of ephemerality and anonymity. But one shouldn’t read this and assume that /b/ alone is representative of 4chan as a whole. /b/ is the most popular, sure, but it is only one board among many. I believe that it is likely that different conclusions could be drawn if the authors had performed similar analyses on the other boards. Maybe posts on other boards last comparatively longer or shorter (after normalizing for relative posting activity compared to /b/, so we are not looking at a skewed comparison).  For example, the /r9k/ board does not allow reposts (while reposts form a not insignificant chunk of /b/’s activity). “How will things differ on other boards?” is always an important question to ask.

Having read both the given papers, what could potentially be done in the future is to do a study of the identity dynamics and interactions in 4chan, similar to how “Identity and Deception” did for Usenet. I think that it could be a fascinating case to see how things change (or don’t change) in 4chan compared to Usenet. “4chan and /b/” touches upon this a little bit in the later part of the paper but their main focus seems to be on the data analysis of ephemerality and identity and didn’t really go deeply examine the dynamics of the interaction of the users.

 

Read More