Paper-
An army of me: Sockpuppets in online discussion communities. -Kumar,et al.
Summary-
This paper attempts to identify, analyze the behavioral and linguistic patterns of, and find accounts which are “sockpuppets”, i.e. a group of multiple accounts are controlled by a single user, the puppet master. It collects data from 9 online communities, and conducts an extensive statistical analysis to infer information about the characteristics of sockpuppet accounts, like the way they phrase messages, (e.g. they use I, we, you, etc. more, use shorter comments, have smaller vocabularies, etc.). They prove that the sockpuppet account pairs have similar type of usage, disproving the hypothesis that one account may be like an ordinary user’s account, while the other one is a malicious account. They find that sockpuppets can be supporters (in-agreement), dissenters (disagreeing with each other), or may not be malicious at all. They use a random-forest classifier to predict sock puppets, achieving fairly good results.
Reflection-
Firstly, the study combines multiple very distinct communities together, for data gathering purposes. The communities range from Breitbart news to IGN. We know from previous papers that the behavior depends to some extent on the type of community, so it should be worth examining the differences in the results obtained from community to community. It is interesting to note that the communities themselves have different levels of sockpuppeting, (e.g. Breitbart news has a disproportionately high number, almost 2.5 times that of CNN when adjusting for number of users).
This paper reminds me of work previously discussed in class, especially the paper on anti-social behavior in online communities. This is due to the similar nature of data collection and data driven analysis. This paper has some very interesting ideas to collect statistics or test hypotheses, (e.g. using entropy to convey usage information, and finding which users are non-malignant using Levenshtein distance between the sockpuppet usernames). It has some results similar to the study on anti-social behavior (e.g. sockpuppets make a large number of posts compared to normal users, just like anti-social users). It however makes the interesting find that the readability index (ARI) for sockpuppets is about as high as normal users, as opposed to the same result for found anti-social users.
The study also finds that sockpuppet pairs behave similarly to each other. This brings up the question of what kind of users have more of a tendency to create malicious sockpuppets? Maybe the type of activity and behavior seen in the sockpuppets can be traced to a superset of users (of which puppet masters are only one category), and is a characteristic of that superset. Maybe there are even more similarities with other types of users, like trolls. This is worth investigating.
This paper focuses on pairwise sockpuppeting, and its techniques for finding a sockpuppet group for data collection rely crucially on the IP address. This technique is effective to study “casual” sockpuppeting, where a user is just making multiple accounts to browse different topics or upvote their own answers, but this is fairly harmless when done in an uncoordinated manner by many individual users. These techniques would fail when trying to detect or gather data about co-ordinated, deliberate attacks to propagate misinformation or a personal agenda through sockpuppeting, which is the truly dangerous strain of such behavior. For example, if someone were to hire people to create multiple accounts and spread a false consensus/misinformation, the people doing this could access the website through multiple IP addresses, and conceal the source. It also focuses on pairs of accounts, and not on a huge mass of accounts being controlled by the same user.