Reflection #4 – [09/06] – [Karim Youssef]

The growth of online discussion communities made them central to exchanging opinions, information, and knowledge, which gave those communities a significant role in forming opinions and knowledge for many of their users. With a little chance to verify both content and identity, a normal user could easily be prone to identity deception and misinformation. One of the challenges to a healthy online discussion community is sockpuppetry, where multiple “virtual” users are controlled by a single actual user. Sockpuppetry could be used for deception, or for creating a fake public consensus to manipulate the public opinion towards an event or a person.

In their work An Army of Me: Sockpuppets in Online Discussion Communities, Kumar et. al conducted a valuable study that aims to analyze the behavior of sockpuppets in online communities. Their study consists of a data-driven approach that analyzes data from nine different online discussion communities. Their study analyzes posting activity, and linguistic characteristics of content posted by sockpuppet accounts. After that, they identify different types of sockpuppets based on deceptive behavior and/or supportive behavior towards other sockpuppets. The study finally uses these analyses to build a predictive model that aims to predict whether an account is a sockpuppet and whether a pair of accounts is a sockpuppet pair.

My reflection on this study could be summarized in the following points:

  1. This study helps to build a clear definition of a sockpuppet and highlights some of the motivations behind creating such online identities. There could be a wide variety of motivations behind sockpuppetry, some could be benign, but it is highly important to understand the malicious motivations behind sockpuppetry.
  2. Activity traces of sockpuppets are highly predictive, while community reactions towards them are less predictive. Compared to other types of antisocial behavior, community reactions features were more predictive as shown by Cheng et. al in Antisocial Behavior in Online Discussion Communities. This fact could convey multiple signals. It could be that sockpuppets are hard to detect by their surrounding community, or that a user in an online community is more alert by antisocial content rather than a specific activity pattern. Which may mean that a community tends to react negatively if a sockpuppet account is posting a significantly undesirable content, more than that if a strange activity pattern occurs unless it is blatantly suspicious.
  3. Sockpuppets seem to tend to work in a group. Although the study shows that most of the identified sockpuppets tend to work in pairs, there could be other activity patterns associated with larger groups of sockpuppet accounts that are worth studying.
  4. Similar to other data mining problems on online communities, it seems to still be a hard problem to develop an automated system that could reliably replace human moderators, however, reaching a step where an automatic flag could be raised on some content makes life easier for moderators and help towards a faster control of the spread of any type of undesirable behavior in online communities.

This study stimulated my interest to proceed in different directions. I would be interested to study the role of sockpuppets in different historical events such as the Arab spring or the US 2016 elections. I’d also be interested to study how sockpuppets are effective in forming opinions of normal users, or in spreading misinformation and making it widely accepted by users of online communities.

There are also multiple directions to improve the current study. Among them is the study of the activity pattern of larger sockpuppet groups, a deeper analysis of the content posted by sockpuppets beside their activity patterns to derive more content-based features, and a further analysis and comparison of the activity patterns and content features of sockpuppets across different types of online communities.

 

Read More

Reflection #4 – [09/06] – [Eslam Hussein]

Srijan Kumar, Justin Cheng, Jure Leskovec, V.S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities

 

Summary:

The authors in this work tries to automatically detect sockpuppets, which they define as “a user account that is controlled by an individual (or puppetmaster) who controls at least one other user account”. They study data from 9 different online discussion communities. In this paper they addressed the features of sockpuppets from different perspectives:

– Linguistic traits: what language they tend to use

– Activities and Interactions: how sockpuppets communicate and behave with each other and with their communities, and how their communities react to their activities.

– Reply network structure: study the interaction networks of sockpuppets from a social network analysis perspective

They also identified different types of sockpuppets based on two different criteria:

  1. Deceptiveness: Pretenders and non-pretenders
  2. Supportiveness: Supporter and non-supporter

They also built a predictive model to:

  1.  Differentiate pairs of sockpuppets from pairs of ordinary users
  2. Predict whether an individual user account is a sockpuppet or ordinary one

 

Reflection:

The authors did pretty comprehensive work to approach the problem of detecting sockpuppets and classifying accounts into ordinary or sockpuppet accounts

But I have a few comments/suggestions on their work:

  • I wondered why the discovered sockpuppets almost appeared in groups of two accounts, but I believe that is because the authors set a very restrictive constraints when identifying sockpuppets, such as: 1) – they must made their communication from the same IP address or 2) – set a very small time window of 15 minutes between their interactions in order to identify them as sockpuppets played by the same puppetmaster. I would suggest that the authors:
    • Remove or relieve the IP address constraint in order to catch more sockpuppets that belong into the same group, since a more realistic scenario would suggest that a puppetmaster would control more than two accounts (no body forms an online campaign of only two accounts)
    • Increase the time window, since the puppetmaster would need more time to synchronize the interactions between those sockpuppets
  • This model needs to be modified in order to generalize it to more online discussion communities such as facebook and twitter, it is tailored/over fitted more to the Disqus communities. Other features from those much larger and more interactive platforms would definitely improve this model and enrich it
  • As always I have observation taken during and after the Arab Spring, since social media platforms were used often as battle fields between opponent parties and the old regimes:
    • They have been used to promote or support figures or parties during the different stages of the Egyptian elections.
    • They were used to demoralize the opponents or resistance
    • Used to spread rumors and amplify their effects and permanence by just repeating/spreading those using sockpuppets. Psychologically, when we repeat a lie over and over it stabilizes in people memory as a fact and vice versa (Illusory truth effect)
    • People started to recognize sockpuppets and their patterns and called them some Arabic name to identify them “لجنه”. There is a very common and known term called on a group of sockpuppets who have the same objective and controlled by the same puppetmaster evolved during the Arabic spring called “لجنه الكترونيه” or electronic battalion/committee in English.
  • The authors approached the problem as a classification problem of ordinary or sockpuppet accounts. I would suggest also to address it as a clustering problem not only as a classification one. That could be achieved by encoding several features (linguistic traits, activities and interactions, ego-networks) into one objective function. Which would be used to represent the similarity of the discovered communities of sockpuppets. The more optimal this function the more similar those discovered sockpuppets communities.

 

 

 

Read More

Reflection #4 – [09/06] – [Viral Pasad]

Paper:

An Army of Me: sock puppets in Online Discussion Communities Srijan Kumar, Justin Cheng, Jure Leskovec, V.S. Subrahmanian.

 

Summary:

The paper deals with the analysis of  “a user account that is controlled by an individual (or puppet master) who controls at least one other user account.” The authors analyze various aspects of Sock puppets and their behavior over nine online discussion communities. The study was conducted using the study of a dataset of 62,744,175 posts and studying the users along with discussions within them. They discuss how sock puppets may often be found in pairs, assuming the role of primary and secondary or supporter and dissenter.

 

Reflection:

  • The authors broadly define a sock puppet as a user account that is controlled by an individual (or puppet master) who controls at least one other user account. However, I prefer the traditional definition of the word: “a false online identity that is used for the purposes of deceiving others.”
  • Furthermore, it would be wise to highlight that the sock puppets are often paid partnerships with companies to push their product, but more often than not, they are also a part of Affiliate Marketing where they sell products and earn commissions. 

Not only that, these “stealth influencers” could also potentially sway public opinion on a political issue/candidate.

  • Another interesting point about pair sock puppets, that I pondered upon, was the dissenting Good Cop-Bad Cop roles that they might play. Wherein one disagrees or puts down a product/feature, which is when the primary sock puppet could swoop in and make the same product shine, by highlighting its pros (which were intentionally questioned by the secondary sock puppet).  This is a dynamic between pair sock puppets that I would want to investigate.
  • Another additional metric, worth investigating is the language/ linguistic cues used by the sock puppets to market products. Average Marketing Campaigns, keep the use of jargons to a bare minimum for the lay consumer (eg: 10x faster, 2.5x lighter) sock puppets though, using impartial terms to seem unbiased and neutral, could also be using more jargons to seem like a domain expert and intimidate a user into thinking that they really know the technicalities of the product.
  • Furthermore, I know how difficult it is to obtain clean and complete datasets, but the Disqus dataset barely consists of data with reference to products and purchases. Certain metrics used in the paper and a few other ones, if used with an Amazon Reviews or Ebay Comments Dataset, would yield a great amount of knowledge about the sock puppets and their behavior
  • Another great point to be considered about sock puppets living a dual life is their activity in their ordinary and fake account. A genuine user would have a legitimate profile history and personal data such as friend lists, other interests apart from the one topic being discussed in the post comments.
  • Another question worth asking is about false positives or false negatives as to, how would one verify the results of such a system?

Read More

Reflection #4 – [09/06] – [Nitin Nair]

  1. S. Kumar, J. Cheng, J. Leskovec, and V. S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities,” 2017.

Discourse has been one of the mode through which we humans have used our collective knowledge to further our individual understanding of the world. These conversations have helped one look at the world through the lens of another, the outlook which we may or may not agree to. This discourse have moved in the past few decades to the online world. This movement although opening up pulpits to the masses giving them opportunities to express their opinions, have created few issues of their own. One of the issues that plague these discussion forums due to how identity function in these online settings is sock puppeteering. Although, variations of the same should have existed in the past, due to the scale and reach of these online discussion forums, the dangers are more profound.

The author in this paper, tries to understand the behaviour and characteristics of these “sockpuppets” and use the findings for performing two tasks. First, to differentiate pairs of sockpuppets from a pair of ordinary users and second, to predict if an individual is a sockpuppet or not. The author uses the data obtained from nine communities from the online commentary platform, Disqus as its dataset. The author classifies one to be a sock puppeteer using couple of factors like IP addresses, length of posts and time between posts.

As indicated by the author, writing style seems to persist due to the content being written by the same author. In order to use this as a feature the author uses LIWC and ARI which I believe, even though shown as effective here, could have been better if replaced by better quality vectors that not only looks at the vocabulary but takes into account the semantic structure elements like construction of sentences etc to identify the “master.” Building features vectors in this fashion, I believe would help in one identifying these actors in a robust manner.

Once, the master is identified it would be interesting to analyze the characteristics of the puppet accounts. Given, that some accounts might elicit a more responses compared to some others, it would be a worthwhile study to see how it achieves to do so. One could see if there is any temporal aspect to it; identify when the best time is, to probe, for one to get a response and how these actors optimize their response strategies to achieve it.

One could also look into behaviors by these sockpuppetiers that warrant ban from moderators of these online communities. Identifying these features could then be recorded and given as guidelines for identifying the same. How long these observations may be valid would be something different altogether.

Given that some communities with human moderators have been addressing this particular issue using “mod bans”, one could try to create supervised models for identification of sock puppeteering accounts using these ban informations as the ground truth or label.

Also, on a different note, it would be a worthwhile pursuit to see how these actors respond such bans. Do they create more accounts immediately or do they wait it out? An interesting thought that can be looked into for sure.

Given that uncovering or profiling the identity of the users is the way forward to counteract sock puppeteering, it is a valid concern for for users whose identity needs to be kept under wraps for legitimate reasons. Given that even allegations of certain news about a particular person has lead to violence being directed at them, how can one ensure these people are protected? This is one issue the method described the author which uses IP addresses to identify sock puppetry need to address. How can one differentiate users with legit reasons for creating multiple identities online from those who don’t?

Read More

Reflection #4 – [09/06] – [Bipasha Banerjee]

Kumar, Srijan et al. (2017)- “An Army of me: Sockpuppets in Online Discussion Communities” – Proceedings of the International World Wide Web Conference Committee (IW3C2). ACM 978-1-4503-4913-0/17/04 http://dx.doi.org/10.1145/3038912.3052677

Summary

The authors discuss mainly about how sockpuppets (an user account owned by a person who has at least some other account) engage in online discussions. They found how sockpuppets tend to write differently from the normal users in general and use first person pronouns. To remove false positives while identifying sockpuppets, the IP address used by many users were discarded. The authors also introduced the Kmin concept to identify sockpuppet posts that are close in time and similar in length.

Sockpuppetry across nine discussion communities were studied and it was found that these accounts were created early in an user’s lifespan which in turn suggests they were not a consequence of the social interaction the user had in the community. They are most likely to swear and discuss controversial topic, use lesser parts of speech. They tend to generate a lot of communication and are treated harshly by the community with often receiving downvotes and even at times being blocked by moderators. Sockpuppets owned by the same puppet master tend to contribute similar content and those working in pairs try to increase each other’s popularity.

Reflection

This paper emphasizes to distinguish sockpuppets from the normal users. The paper gave us a way to understand in depth the sockpuppets. They essentially differ from the online antisocial present in discussion communities that was discussed by Justin Cheng et al [1]. Sockpuppets can be both pretenders and non-pretenders. This suggests that a part of the sockpuppet group are not trying to deceive. They created multiple accounts (often with similar usernames) to post differently in varied discussion communities.  However antisocial trolls generally create accounts to with negative intention to disrupt. I believe that the non-pretenders, since some have similar usernames are not trying to hide and are benign when it comes to the intention of creating a different account.

The most valid and important concept that comes to my mind is a form of digital central authority that can moderate online accounts across the internet. (I know sounds a bit too ambitious. But please, hear me out!). India has introduced in recent years the Aadhar Card concept (The Indian take on SSN). Since last year, government is trying to link all mobile accounts, bank accounts, mobile wallets (Amazon pay etc.) to the Aadhar unique number. This would ensure authenticity. However, I would not recommend using the same ID for online purpose as well. Once the online account is hacked, a person’s identity can be easily stolen. Instead, some kind of an online digital signature can be introduced. This seems similar to the twitter and Youtube’s verified (blue tick) concept. However, I want to emphasize on it being central and that the same “digital signature” is used across all kinds of social media, discussions etc. A central authority needs to govern this digital signature generation. This verification can be applied when a person first opens an email account. This way the account is linked virtually to a person and impersonation, sockpuppetry etc. would become significantly difficult.

References

[1] Cheng, Justin et al. (2015) – “Antisocial Behavior in Online Discussion Communities”- Proceedings of the Ninth International AAAI Conference on Web and Social Media (61-70).

Read More

Reflection #4 – [09/06] – [Prerna Juneja]

An Army of Me: Sockpuppets in Online Discussion Communities

Summary:

In this paper the authors study sockpuppetry across nine different communities in Disqus, a commenting social media platform. They study how sockpuppets are different from the ordinary users in terms of the language they use, online posting activity and social network. They discover that sockpuppets write shorter sentences, use more singular pronouns like ‘I’, swear more, participate in controversial topics & existing discussions and are likely to interact with other sockpuppets. Keeping these findings in mind, the authors build predictive models to differentiate pair of sockpuppets from pair of ordinary users.

Reflection:

Twitter Bots And Sockpuppets Used Trump’s Tweets To Mess With Virginia’s Governor’s Race” [1]. Facebook detected and removed 32 fake accounts and pages from both Facebook and Instagram after identifying coordinated inauthentic behaviour before midterm elections [2] Fake accounts have not only affected elections but has also taken lives in some cases [e.g. death of Megan Meier[3]].

Sockpuppetry has inflicted almost every social media platform. The intention behind creating sockpuppets might vary though. While primary purpose of creating fake accounts on online discussion platforms might be to sway public opinion by creating census, on Quora it might be to gain upvotes and On Facebook to create a false second identity[double life hypothesis]. I’ve come accross multiple instances where people create multiple profiles with incorrect age, gender, marital status and sexual orientation. I came across a website that sells Quora downvotes “https://www.upyourviews.com/buy-quora-downvotes/”. I wonder if the operating model of such companies involve creation of several bots or sockpuppets accounts!!! Also, I can’t think of any motivation behind existence of sockpuppet accounts on websites like 4chan where users are anonymous and content is ephemeral. 

I wonder if the online communities are willing to share user data along with IP traces like Disqus did. Like the authors mentioned, availability of ground truth data is also a big issue in this research. It is very difficult to build classifiers that give high accuracy without such data. Also the data used by authors will miss users that use different physical machines to access different accounts. I believe when sockpuppetry happens at a larger and professional scale, the miscreants will have the infrastructure to deploy multiple computers for multiple accounts. How to detect sockpuppets then?

Also detection of sockpuppets is one problem. How do we stop creation of such accounts? Tying accounts with some unique social identity like Aadhar or SSN? E.g Aadhar verification is mandatory on multiple matrimonial sites in india [4]

One of the author’s observation is that sockpuppets have a high page rank than ordinary users. They didnt justify or elaborate on this. Can this be contributed to their account’s high posting activity and larger network?

The authors say “Non-pretenders on the other hand have similar display names and this may implicitly signal to the community that they are controlled by the same individual, and thus may be less likely to be malicious.” First part of the statement will only be true if both sockpuppet accounts share the same network.

Authors divide the sockpuppets into three groups: supporters, non-supporters and dissenters with majority being non-supporters. While role of supporters and dissenters is clear since they are two extremes. I am not sure how non-supporters behave? A few examples from this category could have made it clearer.

The authors restricted their study to sockpuppet size 2 since majority of groups contained that no. of sockpuppets. Studying groups with >=3 sockpuppets might lead to interesting discoveries.

[1] https://www.huffingtonpost.com/entry/twitter-bots-sockpuppets-trump-virginia_us_5a01039de4b0368a4e869817

[2] https://www.cnbc.com/2018/07/31/facebook-has-detected-attempts-to-interfere-in-mid-terms.html

[3] https://en.wikipedia.org/wiki/Sockpuppet_(Internet)

[4] http://www.forbesindia.com/article/special/aadhaar-latest-tool-against-fake-matrimonial-profiles/50215/1

Read More

Reflection #4 – [09/06] – [Dhruva Sahasrabudhe]

Paper-

An army of me: Sockpuppets in online discussion communities. -Kumar,et al.

Summary-

This paper attempts to identify, analyze the behavioral and linguistic patterns of, and find accounts which are  “sockpuppets”, i.e. a group of multiple accounts are controlled by a single user, the puppet master. It collects data from 9 online communities, and conducts an extensive statistical analysis to infer information about the characteristics of sockpuppet accounts, like the way they phrase messages, (e.g. they use I, we, you, etc. more, use shorter comments, have smaller vocabularies, etc.). They prove that the sockpuppet account pairs have similar type of usage, disproving the hypothesis that one account may be like an ordinary user’s account, while the other one is a malicious account. They find that sockpuppets can be supporters (in-agreement), dissenters (disagreeing with each other), or may not be malicious at all. They use a random-forest classifier to predict sock puppets, achieving fairly good results.

Reflection-

Firstly, the study combines multiple very distinct communities together, for data gathering purposes. The communities range from Breitbart news to IGN. We know from previous papers that the behavior depends to some extent on the type of community, so it should be worth examining the differences in the results obtained from community to community. It is interesting to note that the communities themselves have different levels of sockpuppeting, (e.g. Breitbart news has a disproportionately high number, almost 2.5 times that of CNN when adjusting for number of users).

This paper reminds me of work previously discussed in class, especially the paper on anti-social behavior in online communities. This is due to the similar nature of data collection and data driven analysis. This paper has some very interesting ideas to collect statistics or test hypotheses, (e.g. using entropy to convey usage information, and finding which users are non-malignant using Levenshtein distance between the sockpuppet usernames). It has some results similar to the study on anti-social behavior (e.g. sockpuppets make a large number of posts compared to normal users, just like anti-social users). It however makes the interesting find that the readability index (ARI) for sockpuppets is about as high as normal users, as opposed to the same result for found anti-social users.

The study also finds that sockpuppet pairs behave similarly to each other. This brings up the question of what kind of users have more of a tendency to create malicious sockpuppets? Maybe the type of activity and behavior seen in the sockpuppets can be traced to a superset of users (of which puppet masters are only one category), and is a characteristic of that superset. Maybe there are even more similarities with other types of users, like trolls. This is worth investigating.

This paper focuses on pairwise sockpuppeting, and its techniques for finding a sockpuppet group for data collection rely crucially on the IP address. This technique is effective to study “casual” sockpuppeting, where a user is just making multiple accounts to browse different topics or upvote their own answers, but this is fairly harmless when done in an uncoordinated manner by many individual users. These techniques would fail when trying to detect or gather data about co-ordinated, deliberate attacks to propagate misinformation or a personal agenda through sockpuppeting, which is the truly dangerous strain of such behavior. For example, if someone were to hire people to create multiple accounts and spread a false consensus/misinformation, the people doing this could access the website through multiple IP addresses, and conceal the source. It also focuses on pairs of accounts, and not on a huge mass of accounts being controlled by the same user.

 

 

— NORMAL —

Read More

Reflection #4 – [09/06] – [Subhash Holla H S]

Paper: S. Kumar, J. Cheng, J. Leskovec, and V. S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities,” 2017.

Summary: The online activity analysis of “a user account that is controlled by an individual (or puppetmaster) who controls at least one other user account.” is the goal of the paper. In it, the authors identify, characterize and predict the behavior of sockpuppetry. The adopted definition of sockpuppets is different from the one understood at the mention of the word. The focus on whether a pair of accounts is sockpuppets is methodically established by:

  • First, identifying them using IP address, the time signature of the comments and the discussions posted in. This was limited to the discussions with at least 3 recurring posts.
  • Second, characterizing them using the hypothesis testing method to infer that the sockpuppets do not lead double lives. The linguistic traits helped differentiate them from normal users by showing that they use mostly first- and second-person singular personal pronouns. The activity analysis of these sockpuppets resulted in the conclusions that they start fewer discussions, participate in controversial topics, are treated harshly by the community and they have a lot of mutual interaction.

Reflections: The past few readings have been probing similar areas of social computing platforms. It has been trying to answer a security-based question where ideally all platforms will want to know the origins of each user, their behavior pattern and predict future use patterns. Now, this paper essentially is introducing another possible concern in the same area. While a lot of research (which is becoming more and more apparent with the readings) is addressing the problem from a statistical standpoint the question that popped to my head is can this be viewed from another viewpoint. Maybe we need to wear a different hat to get some new information. The solution was that of my home base of Human Factors. I wish to give three possible viewpoints that a combination of Human Factors and Human-Computer Interaction advocate for:

  • Ontology: This is a generalized map of the behavior patterns one would display if one had a set of traits. In the case of sockpuppeteers, this would essentially mean that we generalize them into categories and learn their behavior model to predict the behavior of future sockpuppeteers. This could help in the automated filtering of fake accounts, probing into non-human sockpuppets that help spread misinformation, etc. For this first, we will need to build a Persona of the common sockpuppeteer and then draw conclusions based on that.
  • Work Domain Analysis: Now the social computing platform can be considered as a work domain with the task being to post information. Since there is no Normative, or “One best way”, to analyze it we can take a “Formative” approach similar to Kim J. Vicente in his book on Cognitive Work Analysis. This could help us understand the different strategies the sockpuppeteers could use, the social organization and cooperation they have as well as their competencies.
  • Social Network Theory: The use of social network theory can help identify the string of sockpuppets that a user could potentially be using. This could prove to be a useful tool to find the root of a group of accounts. This could also help understand the interaction patterns of these accounts giving valuable insight to build the behavioral model of such individuals.

Another area where I have a few burning questions after reading this paper, which I am hoping to get some insight into is trolling.

  1. Who is a troll?
  2. How is a troll different from a sockpuppet?
  3. Can one become the other?
  4. Do they ever interact?
  5. What is their relationship?

I am hoping to get a better understanding with more reading on the same topic. I think it will be interesting to study the above mentioned interaction.

Read More

Reflection #4 – [09/06] – [Parth Vora]

[1] Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.

Summary
This paper analyses various aspects of Sockpuppets and their behavior over online discussion communities through the study of a dataset of 62,744,175 posts and studying the users plus discussions within them. The authors identify and highlight various traits of sockpuppets and their pairs and then compare that with ordinary users to gain insight into how the sockpuppets operate. They also propose two models, one to identify sockpuppets from regular users and the other to identify pairs of sockpuppets in the online community.

Reflection
Widespread access to the internet and the ease with which one can create an account online, has encouraged many individuals to create sockpuppets and use them to deceive and drive online consensus. The scale of online communities which result in delayed repercussions for deceptive behavior has only encouraged individuals. If one has ever used social media, the chances are that one has been followed or mentioned by a suspicious looking account. Facebook and Twitter have millions of such fake sockpuppet accounts, and there are entire industries that are built around this. The paper brings to light some fascinating facts about the dynamics of sockpuppets in online communities and how can we identify and handle them. However, it leaves some questions unanswered.

Sockpuppeting coupled with the power of spambots and astroturfing have become powerful tools for organizations as well as states in some cases to manipulate public opinion and spread misinformation. One can come up with a system to flag such users and fake posts but when people are operating at such a high level of expertise, can such a system actually work? Even if we ban such accounts, it barely takes a few minutes to create a new account and come back online, how do we deal with this?

Twitter has an anti-spam measure, where it takes hashtags out of the trending section if the content of the tweets is irrelevant. While it sounds like a good measure, consider a scenario where an actual topic of concern is buried inside because of sockpuppets flooding Twitter with spam content over critical trending hashtags. So, the mechanism which is used to defeat spam is itself burying essential topics. How can we guarantee that such systems will adequately serve the purpose they are designed for? Also, in large-scale social media settings, do sock pockets actually exist in pairs?

Not only sockpuppets create a disturbance, but they also develop a sense of doubt amongst the ordinary user. Although people have grown accustomed to non-sense speaking accounts over social media, there has been a significant shift in trust on content published online. It has increased the credibility of fake news, while at the same time reduced the credibility of genuine news. This is very prevalent in the Indian political sphere. Disguised under the guise of IT-Cell (party sponsored organizations which are responsible for online campaigns), these groups use sockpuppets masquerading as other influential people to draw attention from essential topics. Follow comment threads [Example 1][Example 2].

From the technical point of view, models can be improved by using “Empath“[1], instead of LIWC. Empath is build using the word embedding models like word2vec and Glove and has a larger lexicon than LIWC. One problem with using unigram based features is that the model fails to capture the underlying meaning of the sentence. For example, for the model there is no difference between the two sentences “the suspect was killed by the woman” and “the woman was killed by the suspect.” Studies have also shown that deep learning based models perform significantly better than standard machine learning models especially in text/image classification[2]. Such complex models with advanced feature sets can be considered for effective labeling of posts.

In conclusion, although the paper highlights essential features to detect sockpuppets and proposes a model to identify them, sockpuppets have evolved to be more sophisticated and backed by technology. One must think of an efficient way to stop them at the source than to filter them after the damage is done.

References
[1] Fast, Ethan, Binbin Chen, and Michael S. Bernstein. “Empath: Understanding topic signals in large-scale text.” Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2016.
[2] Bengio, Yoshua, and Yann LeCun. “Scaling learning algorithms towards AI.” Large-scale kernel machines 34.5 (2007): 1-41.

Read More

Reflection #4 – [09/06] – [Deepika Rama Subramanian]

Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.”

SUMMARY & REFLECTION

This paper deals with the identification of sockpuppets and groups of sockpuppets. This particular paper defines sockpuppets simply as multiple accounts controlled by a single user, they don’t assume that this is always with malicious intent. The study uses nine different online discussion communities that have varied interests. The study also helped identify types of sockpuppetry based on deceptiveness (pretenders vs non-pretenders) and supportiveness (supportive vs. dissenter).

In order to identify sockpuppets, the authors identified four factors – the posts should come from the same IP address, in the same discussion, be similar in length and posted closer together in time. However, they had eliminated the top 5% of all users who posted from similar IP addresses that could have come from behind a nationwide proxy. If the puppetmaster was backed by an influential group that was able to set up such a proxy in order to propagate false information, these cases would be eliminated right up front.  Is it possible that the most incriminating evidence is being excluded?

Further, the linguistic traits that were identified and considered in this study were largely those used in the previous discussions about antisocial behaviour in online communities. Even the frequency of posting of sockpuppets versus ordinary users and the fact that they participate in more discussions than they start make these user accounts similar to trolls.

In pairs/groups, the sockpuppets tend to interact with one another more than with any other user in terms of replies or upvotes. The paper states in the beginning that it is harder to find the first sockpuppet account and when one is found, the pair or the group are easily identified. Cheng et al. in their paper ‘Antisocial Behaviour in Online Communities’ have already spoken about a model that would be able to weed out anti-social users early on in their lives. Once they have been identified, we could apply the non-linguistic criterion outlined in this paper to identify the rest of the sockpuppets.

A restrictive way of solving this issue of sockpuppet accounts could be to have users tie their discussion board accounts not only to an email id but also to a phone number. The process of obtaining a phone number is longer and also involves submitting documentation that can tie the account to the puppetmaster firmly. This would discourage multiple accounts being spawned by the same individuals.

The author’s classification of the sockpuppets gives us some insight into the motives of the puppetmasters. While the supporters are in the majority, they don’t seem to have much credibility, most supporters being pretenders. However, how could puppetmasters use dissent effectively to spread consensus on issues they are concerned about? One way could be to have their sockpuppets disagree with one other until the dissenter gets ‘convinced’ by the opinion of the puppetmaster. Ofcourse, this would require longer posts that are uncharacteristic of sockpuppets in general. So why do people jump through such hoops when they are highly likely to be flagged by the community over time? I wonder if the work in sockpuppets is a sort of introductory work on spambots because a human puppetmaster could hardly wreak the same havoc that bots could in online platforms.

Read More