Reflection #4 – [09/06] – [Bipasha Banerjee]

Kumar, Srijan et al. (2017)- “An Army of me: Sockpuppets in Online Discussion Communities” – Proceedings of the International World Wide Web Conference Committee (IW3C2). ACM 978-1-4503-4913-0/17/04 http://dx.doi.org/10.1145/3038912.3052677

Summary

The authors discuss mainly about how sockpuppets (an user account owned by a person who has at least some other account) engage in online discussions. They found how sockpuppets tend to write differently from the normal users in general and use first person pronouns. To remove false positives while identifying sockpuppets, the IP address used by many users were discarded. The authors also introduced the Kmin concept to identify sockpuppet posts that are close in time and similar in length.

Sockpuppetry across nine discussion communities were studied and it was found that these accounts were created early in an user’s lifespan which in turn suggests they were not a consequence of the social interaction the user had in the community. They are most likely to swear and discuss controversial topic, use lesser parts of speech. They tend to generate a lot of communication and are treated harshly by the community with often receiving downvotes and even at times being blocked by moderators. Sockpuppets owned by the same puppet master tend to contribute similar content and those working in pairs try to increase each other’s popularity.

Reflection

This paper emphasizes to distinguish sockpuppets from the normal users. The paper gave us a way to understand in depth the sockpuppets. They essentially differ from the online antisocial present in discussion communities that was discussed by Justin Cheng et al [1]. Sockpuppets can be both pretenders and non-pretenders. This suggests that a part of the sockpuppet group are not trying to deceive. They created multiple accounts (often with similar usernames) to post differently in varied discussion communities.  However antisocial trolls generally create accounts to with negative intention to disrupt. I believe that the non-pretenders, since some have similar usernames are not trying to hide and are benign when it comes to the intention of creating a different account.

The most valid and important concept that comes to my mind is a form of digital central authority that can moderate online accounts across the internet. (I know sounds a bit too ambitious. But please, hear me out!). India has introduced in recent years the Aadhar Card concept (The Indian take on SSN). Since last year, government is trying to link all mobile accounts, bank accounts, mobile wallets (Amazon pay etc.) to the Aadhar unique number. This would ensure authenticity. However, I would not recommend using the same ID for online purpose as well. Once the online account is hacked, a person’s identity can be easily stolen. Instead, some kind of an online digital signature can be introduced. This seems similar to the twitter and Youtube’s verified (blue tick) concept. However, I want to emphasize on it being central and that the same “digital signature” is used across all kinds of social media, discussions etc. A central authority needs to govern this digital signature generation. This verification can be applied when a person first opens an email account. This way the account is linked virtually to a person and impersonation, sockpuppetry etc. would become significantly difficult.

References

[1] Cheng, Justin et al. (2015) – “Antisocial Behavior in Online Discussion Communities”- Proceedings of the Ninth International AAAI Conference on Web and Social Media (61-70).

Read More

Reflection #4 – [09/06] – [Prerna Juneja]

An Army of Me: Sockpuppets in Online Discussion Communities

Summary:

In this paper the authors study sockpuppetry across nine different communities in Disqus, a commenting social media platform. They study how sockpuppets are different from the ordinary users in terms of the language they use, online posting activity and social network. They discover that sockpuppets write shorter sentences, use more singular pronouns like ‘I’, swear more, participate in controversial topics & existing discussions and are likely to interact with other sockpuppets. Keeping these findings in mind, the authors build predictive models to differentiate pair of sockpuppets from pair of ordinary users.

Reflection:

Twitter Bots And Sockpuppets Used Trump’s Tweets To Mess With Virginia’s Governor’s Race” [1]. Facebook detected and removed 32 fake accounts and pages from both Facebook and Instagram after identifying coordinated inauthentic behaviour before midterm elections [2] Fake accounts have not only affected elections but has also taken lives in some cases [e.g. death of Megan Meier[3]].

Sockpuppetry has inflicted almost every social media platform. The intention behind creating sockpuppets might vary though. While primary purpose of creating fake accounts on online discussion platforms might be to sway public opinion by creating census, on Quora it might be to gain upvotes and On Facebook to create a false second identity[double life hypothesis]. I’ve come accross multiple instances where people create multiple profiles with incorrect age, gender, marital status and sexual orientation. I came across a website that sells Quora downvotes “https://www.upyourviews.com/buy-quora-downvotes/”. I wonder if the operating model of such companies involve creation of several bots or sockpuppets accounts!!! Also, I can’t think of any motivation behind existence of sockpuppet accounts on websites like 4chan where users are anonymous and content is ephemeral. 

I wonder if the online communities are willing to share user data along with IP traces like Disqus did. Like the authors mentioned, availability of ground truth data is also a big issue in this research. It is very difficult to build classifiers that give high accuracy without such data. Also the data used by authors will miss users that use different physical machines to access different accounts. I believe when sockpuppetry happens at a larger and professional scale, the miscreants will have the infrastructure to deploy multiple computers for multiple accounts. How to detect sockpuppets then?

Also detection of sockpuppets is one problem. How do we stop creation of such accounts? Tying accounts with some unique social identity like Aadhar or SSN? E.g Aadhar verification is mandatory on multiple matrimonial sites in india [4]

One of the author’s observation is that sockpuppets have a high page rank than ordinary users. They didnt justify or elaborate on this. Can this be contributed to their account’s high posting activity and larger network?

The authors say “Non-pretenders on the other hand have similar display names and this may implicitly signal to the community that they are controlled by the same individual, and thus may be less likely to be malicious.” First part of the statement will only be true if both sockpuppet accounts share the same network.

Authors divide the sockpuppets into three groups: supporters, non-supporters and dissenters with majority being non-supporters. While role of supporters and dissenters is clear since they are two extremes. I am not sure how non-supporters behave? A few examples from this category could have made it clearer.

The authors restricted their study to sockpuppet size 2 since majority of groups contained that no. of sockpuppets. Studying groups with >=3 sockpuppets might lead to interesting discoveries.

[1] https://www.huffingtonpost.com/entry/twitter-bots-sockpuppets-trump-virginia_us_5a01039de4b0368a4e869817

[2] https://www.cnbc.com/2018/07/31/facebook-has-detected-attempts-to-interfere-in-mid-terms.html

[3] https://en.wikipedia.org/wiki/Sockpuppet_(Internet)

[4] http://www.forbesindia.com/article/special/aadhaar-latest-tool-against-fake-matrimonial-profiles/50215/1

Read More

Reflection #4 – [09/06] – [Dhruva Sahasrabudhe]

Paper-

An army of me: Sockpuppets in online discussion communities. -Kumar,et al.

Summary-

This paper attempts to identify, analyze the behavioral and linguistic patterns of, and find accounts which are  “sockpuppets”, i.e. a group of multiple accounts are controlled by a single user, the puppet master. It collects data from 9 online communities, and conducts an extensive statistical analysis to infer information about the characteristics of sockpuppet accounts, like the way they phrase messages, (e.g. they use I, we, you, etc. more, use shorter comments, have smaller vocabularies, etc.). They prove that the sockpuppet account pairs have similar type of usage, disproving the hypothesis that one account may be like an ordinary user’s account, while the other one is a malicious account. They find that sockpuppets can be supporters (in-agreement), dissenters (disagreeing with each other), or may not be malicious at all. They use a random-forest classifier to predict sock puppets, achieving fairly good results.

Reflection-

Firstly, the study combines multiple very distinct communities together, for data gathering purposes. The communities range from Breitbart news to IGN. We know from previous papers that the behavior depends to some extent on the type of community, so it should be worth examining the differences in the results obtained from community to community. It is interesting to note that the communities themselves have different levels of sockpuppeting, (e.g. Breitbart news has a disproportionately high number, almost 2.5 times that of CNN when adjusting for number of users).

This paper reminds me of work previously discussed in class, especially the paper on anti-social behavior in online communities. This is due to the similar nature of data collection and data driven analysis. This paper has some very interesting ideas to collect statistics or test hypotheses, (e.g. using entropy to convey usage information, and finding which users are non-malignant using Levenshtein distance between the sockpuppet usernames). It has some results similar to the study on anti-social behavior (e.g. sockpuppets make a large number of posts compared to normal users, just like anti-social users). It however makes the interesting find that the readability index (ARI) for sockpuppets is about as high as normal users, as opposed to the same result for found anti-social users.

The study also finds that sockpuppet pairs behave similarly to each other. This brings up the question of what kind of users have more of a tendency to create malicious sockpuppets? Maybe the type of activity and behavior seen in the sockpuppets can be traced to a superset of users (of which puppet masters are only one category), and is a characteristic of that superset. Maybe there are even more similarities with other types of users, like trolls. This is worth investigating.

This paper focuses on pairwise sockpuppeting, and its techniques for finding a sockpuppet group for data collection rely crucially on the IP address. This technique is effective to study “casual” sockpuppeting, where a user is just making multiple accounts to browse different topics or upvote their own answers, but this is fairly harmless when done in an uncoordinated manner by many individual users. These techniques would fail when trying to detect or gather data about co-ordinated, deliberate attacks to propagate misinformation or a personal agenda through sockpuppeting, which is the truly dangerous strain of such behavior. For example, if someone were to hire people to create multiple accounts and spread a false consensus/misinformation, the people doing this could access the website through multiple IP addresses, and conceal the source. It also focuses on pairs of accounts, and not on a huge mass of accounts being controlled by the same user.

 

 

— NORMAL —

Read More

Reflection #4 – [09/06] – [Subhash Holla H S]

Paper: S. Kumar, J. Cheng, J. Leskovec, and V. S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities,” 2017.

Summary: The online activity analysis of “a user account that is controlled by an individual (or puppetmaster) who controls at least one other user account.” is the goal of the paper. In it, the authors identify, characterize and predict the behavior of sockpuppetry. The adopted definition of sockpuppets is different from the one understood at the mention of the word. The focus on whether a pair of accounts is sockpuppets is methodically established by:

  • First, identifying them using IP address, the time signature of the comments and the discussions posted in. This was limited to the discussions with at least 3 recurring posts.
  • Second, characterizing them using the hypothesis testing method to infer that the sockpuppets do not lead double lives. The linguistic traits helped differentiate them from normal users by showing that they use mostly first- and second-person singular personal pronouns. The activity analysis of these sockpuppets resulted in the conclusions that they start fewer discussions, participate in controversial topics, are treated harshly by the community and they have a lot of mutual interaction.

Reflections: The past few readings have been probing similar areas of social computing platforms. It has been trying to answer a security-based question where ideally all platforms will want to know the origins of each user, their behavior pattern and predict future use patterns. Now, this paper essentially is introducing another possible concern in the same area. While a lot of research (which is becoming more and more apparent with the readings) is addressing the problem from a statistical standpoint the question that popped to my head is can this be viewed from another viewpoint. Maybe we need to wear a different hat to get some new information. The solution was that of my home base of Human Factors. I wish to give three possible viewpoints that a combination of Human Factors and Human-Computer Interaction advocate for:

  • Ontology: This is a generalized map of the behavior patterns one would display if one had a set of traits. In the case of sockpuppeteers, this would essentially mean that we generalize them into categories and learn their behavior model to predict the behavior of future sockpuppeteers. This could help in the automated filtering of fake accounts, probing into non-human sockpuppets that help spread misinformation, etc. For this first, we will need to build a Persona of the common sockpuppeteer and then draw conclusions based on that.
  • Work Domain Analysis: Now the social computing platform can be considered as a work domain with the task being to post information. Since there is no Normative, or “One best way”, to analyze it we can take a “Formative” approach similar to Kim J. Vicente in his book on Cognitive Work Analysis. This could help us understand the different strategies the sockpuppeteers could use, the social organization and cooperation they have as well as their competencies.
  • Social Network Theory: The use of social network theory can help identify the string of sockpuppets that a user could potentially be using. This could prove to be a useful tool to find the root of a group of accounts. This could also help understand the interaction patterns of these accounts giving valuable insight to build the behavioral model of such individuals.

Another area where I have a few burning questions after reading this paper, which I am hoping to get some insight into is trolling.

  1. Who is a troll?
  2. How is a troll different from a sockpuppet?
  3. Can one become the other?
  4. Do they ever interact?
  5. What is their relationship?

I am hoping to get a better understanding with more reading on the same topic. I think it will be interesting to study the above mentioned interaction.

Read More

Reflection #4 – [09/06] – [Parth Vora]

[1] Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.

Summary
This paper analyses various aspects of Sockpuppets and their behavior over online discussion communities through the study of a dataset of 62,744,175 posts and studying the users plus discussions within them. The authors identify and highlight various traits of sockpuppets and their pairs and then compare that with ordinary users to gain insight into how the sockpuppets operate. They also propose two models, one to identify sockpuppets from regular users and the other to identify pairs of sockpuppets in the online community.

Reflection
Widespread access to the internet and the ease with which one can create an account online, has encouraged many individuals to create sockpuppets and use them to deceive and drive online consensus. The scale of online communities which result in delayed repercussions for deceptive behavior has only encouraged individuals. If one has ever used social media, the chances are that one has been followed or mentioned by a suspicious looking account. Facebook and Twitter have millions of such fake sockpuppet accounts, and there are entire industries that are built around this. The paper brings to light some fascinating facts about the dynamics of sockpuppets in online communities and how can we identify and handle them. However, it leaves some questions unanswered.

Sockpuppeting coupled with the power of spambots and astroturfing have become powerful tools for organizations as well as states in some cases to manipulate public opinion and spread misinformation. One can come up with a system to flag such users and fake posts but when people are operating at such a high level of expertise, can such a system actually work? Even if we ban such accounts, it barely takes a few minutes to create a new account and come back online, how do we deal with this?

Twitter has an anti-spam measure, where it takes hashtags out of the trending section if the content of the tweets is irrelevant. While it sounds like a good measure, consider a scenario where an actual topic of concern is buried inside because of sockpuppets flooding Twitter with spam content over critical trending hashtags. So, the mechanism which is used to defeat spam is itself burying essential topics. How can we guarantee that such systems will adequately serve the purpose they are designed for? Also, in large-scale social media settings, do sock pockets actually exist in pairs?

Not only sockpuppets create a disturbance, but they also develop a sense of doubt amongst the ordinary user. Although people have grown accustomed to non-sense speaking accounts over social media, there has been a significant shift in trust on content published online. It has increased the credibility of fake news, while at the same time reduced the credibility of genuine news. This is very prevalent in the Indian political sphere. Disguised under the guise of IT-Cell (party sponsored organizations which are responsible for online campaigns), these groups use sockpuppets masquerading as other influential people to draw attention from essential topics. Follow comment threads [Example 1][Example 2].

From the technical point of view, models can be improved by using “Empath“[1], instead of LIWC. Empath is build using the word embedding models like word2vec and Glove and has a larger lexicon than LIWC. One problem with using unigram based features is that the model fails to capture the underlying meaning of the sentence. For example, for the model there is no difference between the two sentences “the suspect was killed by the woman” and “the woman was killed by the suspect.” Studies have also shown that deep learning based models perform significantly better than standard machine learning models especially in text/image classification[2]. Such complex models with advanced feature sets can be considered for effective labeling of posts.

In conclusion, although the paper highlights essential features to detect sockpuppets and proposes a model to identify them, sockpuppets have evolved to be more sophisticated and backed by technology. One must think of an efficient way to stop them at the source than to filter them after the damage is done.

References
[1] Fast, Ethan, Binbin Chen, and Michael S. Bernstein. “Empath: Understanding topic signals in large-scale text.” Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2016.
[2] Bengio, Yoshua, and Yann LeCun. “Scaling learning algorithms towards AI.” Large-scale kernel machines 34.5 (2007): 1-41.

Read More

Reflection #4 – [09/06] – [Deepika Rama Subramanian]

Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.”

SUMMARY & REFLECTION

This paper deals with the identification of sockpuppets and groups of sockpuppets. This particular paper defines sockpuppets simply as multiple accounts controlled by a single user, they don’t assume that this is always with malicious intent. The study uses nine different online discussion communities that have varied interests. The study also helped identify types of sockpuppetry based on deceptiveness (pretenders vs non-pretenders) and supportiveness (supportive vs. dissenter).

In order to identify sockpuppets, the authors identified four factors – the posts should come from the same IP address, in the same discussion, be similar in length and posted closer together in time. However, they had eliminated the top 5% of all users who posted from similar IP addresses that could have come from behind a nationwide proxy. If the puppetmaster was backed by an influential group that was able to set up such a proxy in order to propagate false information, these cases would be eliminated right up front.  Is it possible that the most incriminating evidence is being excluded?

Further, the linguistic traits that were identified and considered in this study were largely those used in the previous discussions about antisocial behaviour in online communities. Even the frequency of posting of sockpuppets versus ordinary users and the fact that they participate in more discussions than they start make these user accounts similar to trolls.

In pairs/groups, the sockpuppets tend to interact with one another more than with any other user in terms of replies or upvotes. The paper states in the beginning that it is harder to find the first sockpuppet account and when one is found, the pair or the group are easily identified. Cheng et al. in their paper ‘Antisocial Behaviour in Online Communities’ have already spoken about a model that would be able to weed out anti-social users early on in their lives. Once they have been identified, we could apply the non-linguistic criterion outlined in this paper to identify the rest of the sockpuppets.

A restrictive way of solving this issue of sockpuppet accounts could be to have users tie their discussion board accounts not only to an email id but also to a phone number. The process of obtaining a phone number is longer and also involves submitting documentation that can tie the account to the puppetmaster firmly. This would discourage multiple accounts being spawned by the same individuals.

The author’s classification of the sockpuppets gives us some insight into the motives of the puppetmasters. While the supporters are in the majority, they don’t seem to have much credibility, most supporters being pretenders. However, how could puppetmasters use dissent effectively to spread consensus on issues they are concerned about? One way could be to have their sockpuppets disagree with one other until the dissenter gets ‘convinced’ by the opinion of the puppetmaster. Ofcourse, this would require longer posts that are uncharacteristic of sockpuppets in general. So why do people jump through such hoops when they are highly likely to be flagged by the community over time? I wonder if the work in sockpuppets is a sort of introductory work on spambots because a human puppetmaster could hardly wreak the same havoc that bots could in online platforms.

Read More

Reflection #4 – [09/06] – [Lindah Kotut]

Reading Reflection:

  • Kumar, S., Cheng, J., LesKovek, J., and Subrahmanian, V.S. “An Army of Me: Sockpuppets in Online Discussion Communities

Brief:
The authors make a case of the fact that anonymity encourages deception via sock-puppets, and so propose a means of identifying, characterizing and predicting sockpuppetry using user IP (if there are at least 3 posts from the same IP) and user session data (if posts occur within 15 minutes). Some of the characteristics found to be attributable to sockpuppets include the use of more first-person pronouns, fewer negations, fewer English parts-of-speech (worse-writing than average user). They also found sockpuppets to be responsible for starting fewer conversations but participate in more replies in the same discussion than can be attributed to random chance. They were also more likely to be down-voted, reported and/or deleted by moderators, and tended to have higher page rank and higher local clustering coefficient

Authors also note some concerns regarding the use of sockpuppets in discussion communities: notably, the potentiality of showing a false equivalence, as an act of vandalism and/or cheating.

Reflection:
What happens when the use deceptive sockpuppets are capable of usurping and undermining the will of the majority? I do not have a good example of a case where this is true on social media (separate from the case of battle of bots during the 2016 U.S. election cycle), but there is ample cases where this case could be examined: The FCC request for comment during the Net Neutrality debate in 2017/2018 and the saga of Boaty McBoatface serve as placeholder cautionary tales, for there was no do-over to correct for sockpuppets especially in the case of FCC. This is concern, because this phenomenon can erode the very fabric by which trust in the democratic process is built upon (beyond the fact that some of this events happened over two years ago with no recourse/remedies applied to-date). A follow-up open question would be: what then would replace the eroded system? Because if there is no satisfactory answer to this, then maybe we should have some urgency in shoring up the systems. How then do we mitigate sockpuppetry apart from using human moderators to moderate and/or flag suspected accounts? A hypothetical solution that uses the characteristics pointed out by the authors in automating the identification and/or suspension of suspected accounts is not sufficient as a measure in itself.

The authors, in giving an example of an exchange between two sock-puppets and that of a user who identifies the sockpuppet as such, reveals the presence/power of User Skepticism. How many users are truly fooled by these sockpuppets over the nuisance questions. A simplified way this can be done is a simple recruitment of users to determine whether a certain discussion(s) can be attributed to regular users or sockpuppets. This consideration can lead down the path of measuring for over-corrections:

  • is the pervasive knowledge of the presence of these sockpuppets lead to users doubting even legitimate discussions (and to what extent is this prevalent)?

This paper’s major contribution is in looking at sockpuppets in discussions/replies (therefore this point is not to detract from this contribution). On the matter of the (mis)use of pseudonyms: From a benign use-case such as the Reddit for example has a term “throw-away account” from when a regular user wants to make a discussion about controversial topic that s/he does not want to associate with their regular account, to the extreme end of a journalist using it to “hide” their activities in alt-right community discussions.

  • Can these discussions be merged, or does the fact that it does not strictly adhere to the authors’ definition disqualify it? (Because I believe that considering why users resort to the use of sockpuppets beyond faking consensus/discussion and sowing discord.)

A final point regards positive(ish) use. A shopkeeper with a new shop that wants customers can loudly hawk their wares in front of their shop to attract attention: which is to say, could we consider positive use-cases of this behavior, or do we categorize it as all bad? A forum can attract shy contributors and spark a debate by using friendly sockpuppetry to get things going. Ethical?

Read More

Reflection #4 – [09/06] – [Neelma Bhatti]

Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.

Summary

Kumar et al. focus this paper on the identification, characterization and prediction of sock puppets in online discussion communities. They define sockpuppets as extra account(s) created and managed by a single user for influencing or manipulating public opinion, igniting debates or vandalizing content (in case of Wikipedia). They characterize sockpuppets as pretenders vs non-pretenders and supporters vs. dissenters  based on their linguistic traits, online activity and reply network structure.

Reflection

Although this study nicely situates itself in the body of work currently done in the domain of deception, I felt that it does not establish a very strong objective of being carried out.

It would also be interesting to see if a sock puppet account, or a pair of such is operated by more than one person interchangeably, which not only makes the concept of one puppet master imprecise, but also weakens the statistics obtained by the data with a single user in mind. The hypothesis of puppet masters leading double life reminded me of Facebook, where spouses access each other’s accounts without any problem, sometimes simply in order to peek into the content of ladies only groups, and even comment or react on different posts just for the sake of fun. Although very different from the topic under discussion, it poses the question of whether a study on online behavior of such individuals produce accurate results because of multiple users associated with a single account.

The authors have also used IP as a means to cluster different sock puppets, I was wondering if users logging in to the social platform using proxy servers would be easy to identify using the same study? What if the puppet master uses a both sock puppets and bots to steer the discussion?  In such case, the detection system can be made more robust by incorporating mechanisms to not only to take linguistic traits and activity, but also consider the amount of customization in creating user profile and geographical metadata [1]. This will not only help detecting sockpuppets, but will also be able to identify bots from sockpuppets.

The authors also rightly point out that study of behavior or personality traits would add another dimension to this research. The reasons of having more than one identity online can go beyond sadism, and also be a product of sheer boredom or for the sake of bragging in front of friends.  The puppetmaster can also create multiple identities to avenge a previous ban.

[1] Bessi, A., & Ferrara, E. (2016). Social bots distort the 2016 US Presidential election online discussion.

Read More

Reflection #4 – [09/06] – Vibhav Nanda

Reading:

[1] An Army of Me: Sockpuppets in Online Discussion Communities

Summary:

The authors of this paper have devoted their energy towards talking about sockpuppets in online discussion communities. So as to comprehensively study sockpuppets, and their associated online behavior the authors obtained data from nine different online discussion communities consisting of 2,129,355 discussions , 2,897,847 users, and 62,744,175 posts. They then decided to identify sockpuppets by using a combination of three elements: their IP address, their activity in the discussion post, and the time at which the comment(s) were made. By using this combination of factors, they were able to formally define sockpuppets — “a user account that post from the same IP address in the same discussion in close temporal proximity at least 3 times.” Utilizing this formal definition and an analytical model, the authors deduced 1,623 sockpuppet groups and 3,656 sockpuppets from nine different online discussion communities. Outcome of the project ensued in plethora of intuitive but interesting results including but not limited to the following list:

  1. Sockpupptets start fewer discussions, and post more in existing discussions.
  2. Sockpuppets tend to participate in discussions with more controversial topics.
  3. Sockpuppets are treated harshly by the community.
  4. Sockpuppets in a pair interact with each other more. etc.

Reflection and Questions:

I had really never thought about this area of research and hence this reading ensnared my attention and interest. Howbeit, as I read through the paper it seemed as if it was more focused towards the pretenders and less on the non-pretenders and that reflected in the way they defined sockpuppets — which is totally fine but according to me the authors should have mentioned their focus somewhere in the introduction. Since I didn’t find quality material for non-pretenders, I started thinking how would I define sockpuppets with respect to non-pretenders? Assuming complete access to a users profile, I would start by correlating basic information of the user; for instance their birthday, secret questions, name (in some cases), small variations in username, family information (if available), and contact information. Since non-pretenders do not masquerade, simply use different accounts for different use cases, I would assume that they would not have any reason to manipulate their basic information — unless the platform prevents the user from doing so. Whilst reading the paper, I started to contemplate what could be the embolding factor behind puppetmasters? The only reason I could think was the motivating factor to push their/their sponsors’ political, and ideological agenda, or dilute the opponents agenda. Howbeit in both cases I would assume that puppetmasters would be more articulate in their writing to effectively sway the audience in/against either direction, but the results of this paper — that sockpuppets write shorter sentences with more swear words and use more personal pronouns — were counterintuitive to me. As I was reading through the fifth section of the paper, it occurred to me to think about how long these accounts have been active ? or how frequently does a supposed puppetmaster create new accounts? I am not sure yet what new things we might discover by seeking answers to these questions, but I think these are interesting questions. Another correlation that I strongly thought about was to check if sockpuppets are recycled among different puppetmasters/groups? If we find this to be true, and do some analysis on the topics that these sockpuppets try to propagate or demolish the support of, then we can group the groups according to their affiliations; and if we add a spatial aspect to the groups of groups then we may be able to associate and identify what kind of ideologies are wide-spread in what part of the world. We might also be able to find out if a group is trying to propagate it’s own ideology or demolish another regions’. For instance a group from country X is spreading hate towards topic Y, but as a matter of fact topic Y is appreciated in this country X, then we know that this group is demolishing ideology in a different region and so is true for the opposite where topic Y is hated in country X.

Read More

Reflection #4 – [09/06] – [Shruti Phadke]

Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.

Summary:

Sockpuppets in reference to this paper, are online discussion accounts belonging to the same user, also referred here as the “puppetmaster”. Kumar et. al’s work in studying sockpuppets in online communities observe the posting behavior, interactions and linguistic characteristics of sock puppets, finally leading to a predictive model. They observe that the sockpuppets tend to comment more, write shorter posts, use more personal pronouns such as “I”, and are more likely to interact with each other. Finally, in the predictive task the authors find that activity, community and post features are most relevant to detecting the sockpuppets with 0.68 AUC. Here are some thoughts on the data collection, method and impact of this work:

 

Why is this research even important? Even though this paper has excellent technical and analytical aspects, I believe that there should have been some more stress on why sockpuppetry is harmful in the first place.

“In 2011, a California company called Ntrepid was awarded a $2.76 million contract from US Central Command for “online persona management” operations[42] to create “fake online personas to influence net conversations and spread US propaganda” in Arabic, Persian, Urdu and Pashto.” (Wikipedia)

I found some more reasons which I think are important to situate this research with community betterment

  1. Bypassing the ban on the account by creating another account (mentioned in the paper)
  2. Sockpuppeting during an online poll to submit multiple votes in favor of the puppeteer.
  3. Endorsing a product by writing multiple good reviews
  4. Enforcing public opinion about a policy or candidate by sheer numbers

 

How to build a better ground truth? One obvious point of contention with this paper is the way the data is collected and labeled as sockpuppet accounts. There is no solid validation regarding whether the selected accounts are actually sockpuppets. The authors mention that they had conservative filters while selecting the sockpuppet accounts but it also means that they might have missed significant true positives. So what can be done to build a better ground truth?

  1. Building a strong “anti-ground truth”. There are performance comparisons between sockpuppets and ordinary users throughout the paper. If the sampled list of ordinary accounts was vetted more strongly (if they had a stronger anti group) the comparisons would have been more telling. One way to do this is to collect accounts which posted from different IPs or location at the exact same time.
  2. Asking the discussion groups for sockpuppets. Even though this seems harder, it can form a very strong ground truth and validation point

Lastly, there are several comparisons between the pairs of sockpuppets and two ordinary users. I am not sure whether the ordinary user’s measure was a normalized aggregate of all pairwise ordinary measures. In any case, instead of comparing the sockpuppet pair activity with generic pairwise activity, it would be better to find out the comparison with two ordinary users with some probability of interaction (eg. same discussion, location, posting time etc.) Also, while comparing between pretenders and non-pretenders, it would be beneficial to have a comparison with ordinary users as a ground truth measure.

In the discussion, the authors claim that not all sockpuppets are malicious. Further research can be focused on finding characteristics of only malicious sockpuppets or online deception “artists”!

Read More