- S. Kumar, J. Cheng, J. Leskovec, and V. S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities,” 2017.
Discourse has been one of the mode through which we humans have used our collective knowledge to further our individual understanding of the world. These conversations have helped one look at the world through the lens of another, the outlook which we may or may not agree to. This discourse have moved in the past few decades to the online world. This movement although opening up pulpits to the masses giving them opportunities to express their opinions, have created few issues of their own. One of the issues that plague these discussion forums due to how identity function in these online settings is sock puppeteering. Although, variations of the same should have existed in the past, due to the scale and reach of these online discussion forums, the dangers are more profound.
The author in this paper, tries to understand the behaviour and characteristics of these “sockpuppets” and use the findings for performing two tasks. First, to differentiate pairs of sockpuppets from a pair of ordinary users and second, to predict if an individual is a sockpuppet or not. The author uses the data obtained from nine communities from the online commentary platform, Disqus as its dataset. The author classifies one to be a sock puppeteer using couple of factors like IP addresses, length of posts and time between posts.
As indicated by the author, writing style seems to persist due to the content being written by the same author. In order to use this as a feature the author uses LIWC and ARI which I believe, even though shown as effective here, could have been better if replaced by better quality vectors that not only looks at the vocabulary but takes into account the semantic structure elements like construction of sentences etc to identify the “master.” Building features vectors in this fashion, I believe would help in one identifying these actors in a robust manner.
Once, the master is identified it would be interesting to analyze the characteristics of the puppet accounts. Given, that some accounts might elicit a more responses compared to some others, it would be a worthwhile study to see how it achieves to do so. One could see if there is any temporal aspect to it; identify when the best time is, to probe, for one to get a response and how these actors optimize their response strategies to achieve it.
One could also look into behaviors by these sockpuppetiers that warrant ban from moderators of these online communities. Identifying these features could then be recorded and given as guidelines for identifying the same. How long these observations may be valid would be something different altogether.
Given that some communities with human moderators have been addressing this particular issue using “mod bans”, one could try to create supervised models for identification of sock puppeteering accounts using these ban informations as the ground truth or label.
Also, on a different note, it would be a worthwhile pursuit to see how these actors respond such bans. Do they create more accounts immediately or do they wait it out? An interesting thought that can be looked into for sure.
Given that uncovering or profiling the identity of the users is the way forward to counteract sock puppeteering, it is a valid concern for for users whose identity needs to be kept under wraps for legitimate reasons. Given that even allegations of certain news about a particular person has lead to violence being directed at them, how can one ensure these people are protected? This is one issue the method described the author which uses IP addresses to identify sock puppetry need to address. How can one differentiate users with legit reasons for creating multiple identities online from those who don’t?