Srijan Kumar, Justin Cheng, Jure Leskovec, V.S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities”
Summary:
The authors in this work tries to automatically detect sockpuppets, which they define as “a user account that is controlled by an individual (or puppetmaster) who controls at least one other user account”. They study data from 9 different online discussion communities. In this paper they addressed the features of sockpuppets from different perspectives:
– Linguistic traits: what language they tend to use
– Activities and Interactions: how sockpuppets communicate and behave with each other and with their communities, and how their communities react to their activities.
– Reply network structure: study the interaction networks of sockpuppets from a social network analysis perspective
They also identified different types of sockpuppets based on two different criteria:
- Deceptiveness: Pretenders and non-pretenders
- Supportiveness: Supporter and non-supporter
They also built a predictive model to:
- Differentiate pairs of sockpuppets from pairs of ordinary users
- Predict whether an individual user account is a sockpuppet or ordinary one
Reflection:
The authors did pretty comprehensive work to approach the problem of detecting sockpuppets and classifying accounts into ordinary or sockpuppet accounts
But I have a few comments/suggestions on their work:
- I wondered why the discovered sockpuppets almost appeared in groups of two accounts, but I believe that is because the authors set a very restrictive constraints when identifying sockpuppets, such as: 1) – they must made their communication from the same IP address or 2) – set a very small time window of 15 minutes between their interactions in order to identify them as sockpuppets played by the same puppetmaster. I would suggest that the authors:
- Remove or relieve the IP address constraint in order to catch more sockpuppets that belong into the same group, since a more realistic scenario would suggest that a puppetmaster would control more than two accounts (no body forms an online campaign of only two accounts)
- Increase the time window, since the puppetmaster would need more time to synchronize the interactions between those sockpuppets
- This model needs to be modified in order to generalize it to more online discussion communities such as facebook and twitter, it is tailored/over fitted more to the Disqus communities. Other features from those much larger and more interactive platforms would definitely improve this model and enrich it
- As always I have observation taken during and after the Arab Spring, since social media platforms were used often as battle fields between opponent parties and the old regimes:
- They have been used to promote or support figures or parties during the different stages of the Egyptian elections.
- They were used to demoralize the opponents or resistance
- Used to spread rumors and amplify their effects and permanence by just repeating/spreading those using sockpuppets. Psychologically, when we repeat a lie over and over it stabilizes in people memory as a fact and vice versa (Illusory truth effect)
- People started to recognize sockpuppets and their patterns and called them some Arabic name to identify them “لجنه”. There is a very common and known term called on a group of sockpuppets who have the same objective and controlled by the same puppetmaster evolved during the Arabic spring called “لجنه الكترونيه” or electronic battalion/committee in English.
- The authors approached the problem as a classification problem of ordinary or sockpuppet accounts. I would suggest also to address it as a clustering problem not only as a classification one. That could be achieved by encoding several features (linguistic traits, activities and interactions, ego-networks) into one objective function. Which would be used to represent the similarity of the discovered communities of sockpuppets. The more optimal this function the more similar those discovered sockpuppets communities.