Reflection #5 – [09/11] – [Prerna Juneja]

Paper 1: Exposure to ideologically diverse news and opinion on Facebook

Summary:

In this paper the authors claim that our tendency to associate with like-minded people traps us into echo chambers. Basically the central premise is that “like attracts like”. The authors conduct a study on data set that includes 10.1 million active U.S. users who have reported their ideological affiliation and 7 million distinct URLs shared by them. They discover that the likelihood of individual clicks on a cross-cutting content relative to a consistent content is 17% for conservatives and 6% for liberals. After ranking the news, there is less amount of cross-cutting news since the ranking algorithm considers the way user interacts with friends as well as previous clicks.

Reflection:

Out of the 7 million URLs, only 7% were found to be hard content (politics, news etc.). This shows that facebook is meant more for sharing personal stuff. Since we don’t know the affiliation of all of the user’s friends it’s difficult to say if facebook friendships are based on shared political ideologies. Similar study should be conducted on platforms where people share more of the hard stuff….probably twitter….or google search history. The combined results will give better insights on whether people associate themselves with people having similar political ideologies on online platforms or not.

We can conduct a study to find out how adaptive and intelligent facebook’s news feed algorithm is by having a group of people who have declared their political ideology to start liking, clicking and sharing {both in support as well as disapproval} articles of opposing ideologies. We should then compare the before and after news feed to see if the ranking of the news articles change. Does the algorithm figure out whether the content was shared to show support or to denounce the news piece and modify the feed accordingly?

I wonder if users are actually interested in getting access to cross cutting content. A longitudinal study can be conducted where users are shown balanced news (half supporting their ideology and half opposing) and see if after a few months their click pattern changes: whether they click more cross cutting stuff or in the extreme case, do they change their political ideology. This kind of a study will show if people really care about getting trapped in an echo chamber or not. If not then we certainly can’t blame facebook’s algorithms.

This study is not generalizable. It was conducted on young population, specifically those who chose to reveal their political ideologies. Similar studies should be performed in different countries with users from different demographics. Also the paper doesn’t talk much about those who are neutral. How are political articles ranked for their news feed?

This kind of study will probably not hold for the soft content. People usually don’t hold extreme views about about soft content like music, sports etc.

Paper 2: “I always assumed that I wasn’t really that close to [her]”: Reasoning about invisible algorithms in the news feed &

Summary:

In this paper the authors want to study whether users should be made aware of the presence of news feed curation algorithm and how will this insight affect their future experience. They conduct a study where they made 40 FB users use FeedVis, a system that conveys the difference between the algorithmically altered and unadulterated news feed. More than half were not aware of the algorithm’s presence and got angry initially. They were upset that they couldn’t see close friends and family members in their feed and attributed this to their friend’s decision to either deactivate the account or to exclude them. Following up with the participants after a few months revealed that after the awareness about the presence of algorithm made them actively engage with facebook.

Reflection:

In paper “Experimental evidence of massive-scale emotional contagion through social networks”, authors did a scientific study on “emotion contagion”. The results of the study showed that displaying fewer positive updates in people’s feeds causes them to post fewer positive and more negative messages of their own. That’s how powerful Facebook’s algorithms can be!

In this paper authors try to answer two important questions: should users be made aware of the presence of algorithms in their daily digital lives and how will this insight affect their future experience with the online platform. We find out how ignorance about these algorithms can be dangerous. It can lead people to develop misconceptions about their personal relationships. How to educate users about the presence of these algorithms is still a challenge. Who will take up this role? Online platforms? Or do we need third party tools like FeedVis.

I found Manipulating the Manipulation’ section extremely interesting. It’s amazing to see the ways adopted by people to manipulate the algorithm. The author’s could have included a section describing how far were these users successful in this manipulation. Which technique worked the best. Were changes in the news feed quite evident?

Two favourite lines from the paper “Because I know now that not everything I post everyone else will see, I feel less snubbed when I make posts that get minimal or no response. It feels less personal”

whenever a software developer in Menlo Park adjusts a parameter, someone somewhere wrongly starts to believe themselves to be unloved “

It’s probably the best qualitative paper I’ve read so far.

Read More

Reflection #5 – [09/10] – [Deepika Rama Subramanian]

Eslami, Motahhare, et al. “I always assumed that I wasn’t really that close to [her]: Reasoning about Invisible Algorithms in News Feeds.”

Bakshy, Eytan, Solomon Messing, and Lada A. Adamic. “Exposure to ideologically diverse news and opinion on Facebook.”

SUMMARY

Both the papers deal with Facebook’s algorithm and how it influences people in its everyday life.

The first paper deals with the ‘hidden’ news feed curation algorithm employed by Facebook. Through a series of interviews, they do a qualitative analysis of:

  • Algorithm Awareness – Whether users are aware that an algorithm is behind what they see on their news feed and how they found out about this
  • Evaluation (user) of the algorithm – The study tested if the users thought that the algorithm was providing them with what they needed/wanted to see
  • Algorithm Awareness to Future Behaviour – The study also asked their users if, after they discovered the algorithm and the possible parameters, if they tried to manipulate it in order to personalise their own view or to boost their posts on the platform

The second paper deals with how Facebook’s newsfeed algorithm’s bias leads to the platform being an echo chamber, i.e., where your ideas are going to reinforced with no challenges because you tend to engage with posts that you believe in.

REFLECTION

Eslami et al.’s work shows how a majority of users are unaware that an algorithm controls what they see on their newsfeed. In turn, they will believe that either Facebook is blocking them out or their friends are blocking them out. It is possible to personalize the Facebook feed extensively under News Feed Preferences – prioritize what we see first and the choice to unfollow people and groups. The issue with the feed algorithm is that the ‘unaware participants’ who form a large chunk of the population don’t know that they can tailor their experience. If it is let known, through more than a small header under settings, that an algorithm is tailoring the newsfeed, it would be more helpful and they are less likely to cause an outrage among their users. Placing the News Feed Preferences on either side of the newsfeed itself is a good option.

There was a recent rumour in January that had users believe that Facebook was limiting their feed to 25 friends. Many users were asked to copy-paste a message against this so that Facebook took notice and made alterations to their algorithm. Twitter has made sure that their newsfeed posts are in reverse-chronological order from followed accounts and occasionally the suggested tweets that is liked by someone else you follow. Reddit has two newsfeeds of sorts – best and hot. Best contains posts that are tailored to your tastes based on how you engaged with the posts, hot on the other hand shows the posts trending worldwide. This gives an eclectic and obvious mix, therefore, making sure it doesn’t become an echo-chamber.

Most recently, Zuckerberg had announced that Facebook’s goal was now not ‘helping you find relevant content’ but to ‘have more meaningful interactions’. Facebook tried the Reddit style two newsfeed model in an experiment. They removed posts from reputed media houses and placed them in an explore feed. This was to ensure that (the social media site) promoted interactions, i.e., increase organic content (not just those that were shared from other sites). They also hoped to do away with their platform acting like an echo chamber. This experiment was run in six small countries – Sri Lanka, Guatemala, Bolivia, Cambodia, Serbia and Slovakia. Following this major news sites in these countries (especially Bolivia and Guatemala) showed a sharp decrease in traffic. Unfortunately, this means that Facebook has become one of the biggest sources of news making it a ripe platform to spread fake news (for which, currently, it has limited or no checks).

However, I wonder how Facebook now is responsible for producing complete news, views from both sides. It began purely to support interactions between individuals and has evolved to its current form. Its role in news providing has not become entirely clear yet. However, as far as echo chambers go, this isn’t new. Print media, TV, talk show hosts – their ideologies influence the content they provide. People only tend to watch and enjoy shows that agree with them in general.

Read More

Reflection #5 – [09/10] – [Vibhav Nanda]

Readings:

[1] “I always assumed that I wasn’t really that close to [her]”: Reasoning about invisible algorithms in the news feed

Summary:

This paper focused on a plethora of items regarding our digital life and the ubiquitous curation algorithms. The authors talk about varying awareness levels in different users, pre study conception of facebook newsfeed, post study conception of new feed, participants reaction to finding out about a hidden curation algorithm and how it changed the perception of the participants of the study. In order to show the difference between raw feeds and curated feeds the authors of the paper decided to create a tool called FeedVis that would display users unfiltered feed from friends and pages. By asking open and close ended questions the authors were able to gauge understanding levels of the curation algorithm, the users had. Authors of the paper tried to answer three different research questions within one paper, and were successful in delivering adequate answer for future work.

Reflection/Questions:

It was interesting for me to read that various users had started actively making an effort towards manipulating the algorithm, especially because I am aware of it and it doesn’t bother me at all. In the initial part of the paper the authors discuss the idea about disclosure of the mechanisms of curation algorithm in order to create a trust bond between the users and the platform, howbeit I would argue that if the working mechanism of curation algorithm is made public then trolls, fake news agencies, and other malicious actors could use such information to further increase the reach of their posts/propaganda.  The authors also describe their participants as “typical facebook user”, which I would disagree with this statement because the meaning of a “typical” facebook user is fluid — it meant something different a few years ago (millennials) and now means something different (baby boomers and generation x). According to me facebook should some days show users unfiltered results, other days show them curated results; then track their activity online —   if there is an increase/decrease in user activity, for instance likes/comments/shares — then from that extensive data they should decide if the user would prefer curated results or unfiltered results. Facebook should also give users the option to let the algorithm know which friends/pages the specific would be more interested in — this might also help the algorithm learn more about the user.

[2 ] Exposure to ideologically diverse news and opinion on Facebook

Summary:

The authors of this paper focused on understanding how various facebook users interact with news on social media, the diversity of news spread on facebook in general and diversity of news spread among friend networks. Authors also studied the kind of information that the curation algorithms decide to display to a user and how does selective consumption of news affect the user. The authors also explained that selective consumption is a combination of two different factors: people tend to have more friends with same ideologies so they see reinforcing news, and the curation algorithms tend to display what it thinks the user would like the most — which is news reinforcing its ideologies ( I would argue that this is the reason fake news will never die).

Reflection/Questions:

According to me people with a certain ideological stand point will never be able to fathom the other side hence [for the most part] will never even put an effort into reading/watching news relating to a different ideological point of view. Historically we can see this in cable television, conservative people tend to watch Fox more often , moderates/ liberals tend to watch CNN. Each of these channels also understood their user base and delivered content bespoke to them. Now instead of companies determining news content, it is a curation algorithm that does it for us. I don’t think this is something that needs to be fixed or a problem that needs to be tackled (unless of course it is fake news). It is basic human psychology to find comfort in familiar and if users are forced to digest news content they are unfamiliar with, it will, on a very basic level make them uncomfortable. I also think it will be crossing the line when developers try to manipulate a users news feed, in a way that is not consistent with their usage of facebook, their friend circle and the pages they follow.

Read More

Reflection #5 – [09/10] – [Shruti Phadke]

Paper 1: Bakshy, Eytan, Solomon Messing, and Lada A. Adamic. “Exposure to ideologically diverse news and opinion on Facebook.” Science 348.6239 (2015): 1130-1132.

Paper 2: Eslami, Motahhare, et al. “I always assumed that I wasn’t really that close to [her]: Reasoning about Invisible Algorithms in News Feeds.” Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, 2015

Algorithmic bias and influence on social networks is a growing research area. Algorithms can play an important role in shifting the tide of online opinions and public policies. Both Bakshy et. al’s and Eslami et. al’s papers discuss the effect of peer and algorithmic influence on the social media users. Seeing ideologically similar feed as well as the feed based on interactions can lead to having extremist views and associations online. The “echo-chambers” of opinions can go viral unchallenged within a network of friends and can range from harmless stereotypes to radical extremism. This exposure bias is not just limited to the posts but also to the comments. In any popular thread, the default setting shows only the comments either made by friends or are popular.

Eslami et. al’s work shows how exposing users to the algorithm can potentially improve the quality of online interaction. Having over 1000 friends on Facebook, I barely see stories or feeds from most of them. While Eslami et. al. do insightful qualitative research on how users perceive the difference between “all stories” and “shown stories” along with their future choices, I believe that the study is limited in the number of users as well as different user behaviors. To observe the universality of this phenomenon, a bigger group of users should be observed with users behaviors varying in the frequency of access, posting behavior, lurking, and users with promotional agenda. Such study can be performed with AMT. Even though it will restrict the open coding options and detailed accounts, this paper can serve as a basis for forming a more constrained and precisely defined questionnaire which can lead to quantitative analysis.

Bakshy et. al.’s work, on the other hand, ties the political polarity in online communities to the choices the user has made. It is interesting to understand the limitations of their data labeling process and the content. For example, they have selected only the users that volunteer their polarization on Facebook. Users who volunteer this information might not represent the average population on Facebook. A better classification of such users could have been done by just text classification on their posts without their proclaimed political affiliation. One more reason to avoid their political status is that many users can have a political label attached to them due to peer pressure or the negative stigma attached to their favored ideology in their “friend” circle.

Finally, even though getting exposed to similar or algorithmically influenced content may be potentially harmful or misleading, it also raises the questions about how much data privacy invasion is allowed to de-bias the feed on your timeline. Consciously building algorithms that show cross-cutting content can end up knowing more about a user than he intends. The question of solving this algorithmic influence should be approached with caution and with better legal policies.  

Read More

Reflection #5 – [09/10] – [Lindah Kotut]

  • Motahhare Elsami et. al. “‘I always assumed that I wasn’t really that close to [her]’: Reasoning about invisible algorithms in the news feed”.
  • Eytan Bakshy, Solomon Messing and Lada A. Adamic. “Exposure to ideologically diverse news and opinion on Facebook”

Reflection: 
1.Ethics
Facebook study is apt, given the recent New Yorker spotlight on Mark Zuckerberg. The piece while focusing on Zuckerberg and not the company, gives a good insight on the company ethos — that also give context to Bakshy’s claims . Considering the question of invisible algorithm: Elsami’s paper addresses it directly in outlining the benefits of not making the consequences of changes in  algorithm public, not the algorithm itselfGiven the anecdotes of users who changed their mind about which users the’d like to hear more of, this is a good decision — allowing for the sense of control and trust in the algorithm curation process. Elsami’s paper proceeds to raise the concern about the effect of what the unknowns have on the decision making: When considering the (in)famous Nature paper on the large-scale experimental social vs informational messaging in affecting election turnouts and the other infamous paper on experimenting on information contagion especiall, both used millions of users’ data raise the issue of Ethics. Under GDPR for instance, Facebook is obligated to let the user know when and how their data is collected and used. How about how when the information is manipulated? This question is explicitly considered by Elsami’s paper where they found users felt angered (I thought it was betrayal more than anger from the anecdotes) after having found out design decisions that had a real-life impact — explicitly: “it may be that whenever software developer in Menlo Park adjusts a parameter, someone elsewhere wrongly starts to believe themselves to be unloved.”  

2. Irony
Bakshy’s considers their work as a neutral party in the debate about whether (over)exposure to politics is key to a healthy democracy, or whether they lead to a decreased level of participation in democratic processes. They then conclude with the power to expose oneself to differing viewpoint lies in the individual. Yet Facebook curates what a user sees in their newsfeed, and their own research showed that contentious issues promote engagement, and that engagement raises the prominence of the same content — raising the chances of a typical user viewing it. They attempt to temper this in defending the nature of the newsfeed to be dependent on the users logging/activity behavior, but this goes to show that they place the onus again on the user again … to behave in a certain manner for the algorithm to succeed and obtain consistent data?

3. Access, Scale and Subjectivity
I found it interesting about how the two papers sourced the data. Elsami et al, though they had access to respondents data, still had to deal with the throttle imposed by Facebook API. Bakshy’s on the other hand had millions of data, anonymized this disparity does not present a threat on the validity of the study, it’s just a glaring point. It would be interesting if Elsami’s work could be scaled to a larger audience — the interview process is not very scalable, but elements such as users’ knowledge on the effects of the algorithm is especially important to know how well it scales.

The issue of subjectivity manifested differently in these two works: Elsami was able to probe users on personal reasons for their actions on Facebook, giving interesting insights about decisions. Bakshy’s work regarded the use of sharing of content as a marker of ideology. What of sharing for criticism, irony, or reference?  (From what I understood, alignment was measured from the source – and click of shared link, rather than also including the commentary on the measurement). The reasons why posts are shared range from support to criticism in two extremes, and the motivation beyond the sharing makes a consequential difference in what we can conclude based on engagement. The authors note this in both the source of data (from self-reported ideological affiliation) and in their vague distinction between exposure and consumption.

Read More

Reflection #5 – [09/10] – [Neelma Bhatti]

  • Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science348(6239), 1130-1132.
  • Eslami, M., Rickman, A., Vaccaro, K., Aleyasen, A., Vuong, A., Karahalios, K., … & Sandvig, C. (2015, April). I always assumed that I wasn’t really that close to [her]: Reasoning about Invisible Algorithms in News Feeds. In Proceedings of the 33rd annual ACM conference on human factors in computing systems (pp. 153-162). ACM.

Reading reflections:

Most of us have been wondering at some point if a friend who doesn’t show up on our Facebook news feed anymore, blocked or restricted us? At times, we forget about them until they react on some post on our timeline, bringing their existence into notice.

People are becoming more aware of some mechanism used to populate their news feed with stories from their friends, the groups they have joined and the pages they have liked. However, not all of them know whether the displayed content is just randomly selected and displayed, or if there is a more sophisticated way of not only arranging and prioritizing what is displayed, but also filtering out what is “deemed” unnecessary or uninteresting for us by Facebook, which is by using a curation algorithm.

  • There needs to be some randomization in what is displayed to us to break the echo chambers and filter bubbles created around us. It applies both to news we want to read as well as stories displayed in the news feed. Just like going to Target to get a water bottle and finding an oddly placed but awesome pair of  headphones in the aisle. One might not end up buying it, but it will certainly catch the attention and might even lead you to the electronics section to explore around.
  • As regards to political news, not all people choose to read only what is aligned with their ideologically. Some people prefer reading the opposite party’s agenda, if only to pick points to use against the opponent in an argument, or simply to be in the know. Personalizing the news displayed to them based on what they “like” may not exactly be what they are looking for, whatever the intention for reading that news may be.
  • Eslami et. al. talk about the difference in acceptance of the new knowledge with some users demanding to know the back story, while more than half (n=21) ultimately appreciating the algorithm. While some users felt betrayed by the invisible curation algorithm, knowing about the existence of an algorithm controlling what is displayed on their news feed overwhelmed some participants. This sounds true for some elderly people who haven’t been social media users from a long time, or users who are not very educated.  Authors also talk about future work in determination of optimal amount of information  displayed to users  “to satisfy the needs of trustworthy interaction” and “protection of propriety interest”.  An editable log maintaining the changes to news feed content made by hiding a story or lack of interaction with a friend’s/page’s/group’s story etc., which is accessible to user if only he chooses to see it seems to be a reasonable solution to this issue.
  • I liked the clear and interesting narrative of participant selection to data analysis in the second paper, especially after reading the follow up paper [1]. I do think there should have been more information about how participants reacted to stories missing from the groups they follow or pages they’ve liked, or about the extent to which they preferred keeping them as displayed to them. It would’ve given some useful insights into their thought process(or “folk theories”) about what they think goes on with the news feed curation algorithm.

 

[1] Eslami, M., Karahalios, K., Sandvig, C., Vaccaro, K., Rickman, A., Hamilton, K., & Kirlik, A. (2016, May). First i like it, then i hide it: Folk theories of social feeds. In Proceedings of the 2016 cHI conference on human factors in computing systems(pp. 2371-2382). ACM.

 

 

Read More

Reflection #4 – [09/06] – [Karim Youssef]

The growth of online discussion communities made them central to exchanging opinions, information, and knowledge, which gave those communities a significant role in forming opinions and knowledge for many of their users. With a little chance to verify both content and identity, a normal user could easily be prone to identity deception and misinformation. One of the challenges to a healthy online discussion community is sockpuppetry, where multiple “virtual” users are controlled by a single actual user. Sockpuppetry could be used for deception, or for creating a fake public consensus to manipulate the public opinion towards an event or a person.

In their work An Army of Me: Sockpuppets in Online Discussion Communities, Kumar et. al conducted a valuable study that aims to analyze the behavior of sockpuppets in online communities. Their study consists of a data-driven approach that analyzes data from nine different online discussion communities. Their study analyzes posting activity, and linguistic characteristics of content posted by sockpuppet accounts. After that, they identify different types of sockpuppets based on deceptive behavior and/or supportive behavior towards other sockpuppets. The study finally uses these analyses to build a predictive model that aims to predict whether an account is a sockpuppet and whether a pair of accounts is a sockpuppet pair.

My reflection on this study could be summarized in the following points:

  1. This study helps to build a clear definition of a sockpuppet and highlights some of the motivations behind creating such online identities. There could be a wide variety of motivations behind sockpuppetry, some could be benign, but it is highly important to understand the malicious motivations behind sockpuppetry.
  2. Activity traces of sockpuppets are highly predictive, while community reactions towards them are less predictive. Compared to other types of antisocial behavior, community reactions features were more predictive as shown by Cheng et. al in Antisocial Behavior in Online Discussion Communities. This fact could convey multiple signals. It could be that sockpuppets are hard to detect by their surrounding community, or that a user in an online community is more alert by antisocial content rather than a specific activity pattern. Which may mean that a community tends to react negatively if a sockpuppet account is posting a significantly undesirable content, more than that if a strange activity pattern occurs unless it is blatantly suspicious.
  3. Sockpuppets seem to tend to work in a group. Although the study shows that most of the identified sockpuppets tend to work in pairs, there could be other activity patterns associated with larger groups of sockpuppet accounts that are worth studying.
  4. Similar to other data mining problems on online communities, it seems to still be a hard problem to develop an automated system that could reliably replace human moderators, however, reaching a step where an automatic flag could be raised on some content makes life easier for moderators and help towards a faster control of the spread of any type of undesirable behavior in online communities.

This study stimulated my interest to proceed in different directions. I would be interested to study the role of sockpuppets in different historical events such as the Arab spring or the US 2016 elections. I’d also be interested to study how sockpuppets are effective in forming opinions of normal users, or in spreading misinformation and making it widely accepted by users of online communities.

There are also multiple directions to improve the current study. Among them is the study of the activity pattern of larger sockpuppet groups, a deeper analysis of the content posted by sockpuppets beside their activity patterns to derive more content-based features, and a further analysis and comparison of the activity patterns and content features of sockpuppets across different types of online communities.

 

Read More

Reflection #4 – [09/06] – [Eslam Hussein]

Srijan Kumar, Justin Cheng, Jure Leskovec, V.S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities

 

Summary:

The authors in this work tries to automatically detect sockpuppets, which they define as “a user account that is controlled by an individual (or puppetmaster) who controls at least one other user account”. They study data from 9 different online discussion communities. In this paper they addressed the features of sockpuppets from different perspectives:

– Linguistic traits: what language they tend to use

– Activities and Interactions: how sockpuppets communicate and behave with each other and with their communities, and how their communities react to their activities.

– Reply network structure: study the interaction networks of sockpuppets from a social network analysis perspective

They also identified different types of sockpuppets based on two different criteria:

  1. Deceptiveness: Pretenders and non-pretenders
  2. Supportiveness: Supporter and non-supporter

They also built a predictive model to:

  1.  Differentiate pairs of sockpuppets from pairs of ordinary users
  2. Predict whether an individual user account is a sockpuppet or ordinary one

 

Reflection:

The authors did pretty comprehensive work to approach the problem of detecting sockpuppets and classifying accounts into ordinary or sockpuppet accounts

But I have a few comments/suggestions on their work:

  • I wondered why the discovered sockpuppets almost appeared in groups of two accounts, but I believe that is because the authors set a very restrictive constraints when identifying sockpuppets, such as: 1) – they must made their communication from the same IP address or 2) – set a very small time window of 15 minutes between their interactions in order to identify them as sockpuppets played by the same puppetmaster. I would suggest that the authors:
    • Remove or relieve the IP address constraint in order to catch more sockpuppets that belong into the same group, since a more realistic scenario would suggest that a puppetmaster would control more than two accounts (no body forms an online campaign of only two accounts)
    • Increase the time window, since the puppetmaster would need more time to synchronize the interactions between those sockpuppets
  • This model needs to be modified in order to generalize it to more online discussion communities such as facebook and twitter, it is tailored/over fitted more to the Disqus communities. Other features from those much larger and more interactive platforms would definitely improve this model and enrich it
  • As always I have observation taken during and after the Arab Spring, since social media platforms were used often as battle fields between opponent parties and the old regimes:
    • They have been used to promote or support figures or parties during the different stages of the Egyptian elections.
    • They were used to demoralize the opponents or resistance
    • Used to spread rumors and amplify their effects and permanence by just repeating/spreading those using sockpuppets. Psychologically, when we repeat a lie over and over it stabilizes in people memory as a fact and vice versa (Illusory truth effect)
    • People started to recognize sockpuppets and their patterns and called them some Arabic name to identify them “لجنه”. There is a very common and known term called on a group of sockpuppets who have the same objective and controlled by the same puppetmaster evolved during the Arabic spring called “لجنه الكترونيه” or electronic battalion/committee in English.
  • The authors approached the problem as a classification problem of ordinary or sockpuppet accounts. I would suggest also to address it as a clustering problem not only as a classification one. That could be achieved by encoding several features (linguistic traits, activities and interactions, ego-networks) into one objective function. Which would be used to represent the similarity of the discovered communities of sockpuppets. The more optimal this function the more similar those discovered sockpuppets communities.

 

 

 

Read More

Reflection #4 – [09/06] – [Viral Pasad]

Paper:

An Army of Me: sock puppets in Online Discussion Communities Srijan Kumar, Justin Cheng, Jure Leskovec, V.S. Subrahmanian.

 

Summary:

The paper deals with the analysis of  “a user account that is controlled by an individual (or puppet master) who controls at least one other user account.” The authors analyze various aspects of Sock puppets and their behavior over nine online discussion communities. The study was conducted using the study of a dataset of 62,744,175 posts and studying the users along with discussions within them. They discuss how sock puppets may often be found in pairs, assuming the role of primary and secondary or supporter and dissenter.

 

Reflection:

  • The authors broadly define a sock puppet as a user account that is controlled by an individual (or puppet master) who controls at least one other user account. However, I prefer the traditional definition of the word: “a false online identity that is used for the purposes of deceiving others.”
  • Furthermore, it would be wise to highlight that the sock puppets are often paid partnerships with companies to push their product, but more often than not, they are also a part of Affiliate Marketing where they sell products and earn commissions. 

Not only that, these “stealth influencers” could also potentially sway public opinion on a political issue/candidate.

  • Another interesting point about pair sock puppets, that I pondered upon, was the dissenting Good Cop-Bad Cop roles that they might play. Wherein one disagrees or puts down a product/feature, which is when the primary sock puppet could swoop in and make the same product shine, by highlighting its pros (which were intentionally questioned by the secondary sock puppet).  This is a dynamic between pair sock puppets that I would want to investigate.
  • Another additional metric, worth investigating is the language/ linguistic cues used by the sock puppets to market products. Average Marketing Campaigns, keep the use of jargons to a bare minimum for the lay consumer (eg: 10x faster, 2.5x lighter) sock puppets though, using impartial terms to seem unbiased and neutral, could also be using more jargons to seem like a domain expert and intimidate a user into thinking that they really know the technicalities of the product.
  • Furthermore, I know how difficult it is to obtain clean and complete datasets, but the Disqus dataset barely consists of data with reference to products and purchases. Certain metrics used in the paper and a few other ones, if used with an Amazon Reviews or Ebay Comments Dataset, would yield a great amount of knowledge about the sock puppets and their behavior
  • Another great point to be considered about sock puppets living a dual life is their activity in their ordinary and fake account. A genuine user would have a legitimate profile history and personal data such as friend lists, other interests apart from the one topic being discussed in the post comments.
  • Another question worth asking is about false positives or false negatives as to, how would one verify the results of such a system?

Read More

Reflection #4 – [09/06] – [Nitin Nair]

  1. S. Kumar, J. Cheng, J. Leskovec, and V. S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities,” 2017.

Discourse has been one of the mode through which we humans have used our collective knowledge to further our individual understanding of the world. These conversations have helped one look at the world through the lens of another, the outlook which we may or may not agree to. This discourse have moved in the past few decades to the online world. This movement although opening up pulpits to the masses giving them opportunities to express their opinions, have created few issues of their own. One of the issues that plague these discussion forums due to how identity function in these online settings is sock puppeteering. Although, variations of the same should have existed in the past, due to the scale and reach of these online discussion forums, the dangers are more profound.

The author in this paper, tries to understand the behaviour and characteristics of these “sockpuppets” and use the findings for performing two tasks. First, to differentiate pairs of sockpuppets from a pair of ordinary users and second, to predict if an individual is a sockpuppet or not. The author uses the data obtained from nine communities from the online commentary platform, Disqus as its dataset. The author classifies one to be a sock puppeteer using couple of factors like IP addresses, length of posts and time between posts.

As indicated by the author, writing style seems to persist due to the content being written by the same author. In order to use this as a feature the author uses LIWC and ARI which I believe, even though shown as effective here, could have been better if replaced by better quality vectors that not only looks at the vocabulary but takes into account the semantic structure elements like construction of sentences etc to identify the “master.” Building features vectors in this fashion, I believe would help in one identifying these actors in a robust manner.

Once, the master is identified it would be interesting to analyze the characteristics of the puppet accounts. Given, that some accounts might elicit a more responses compared to some others, it would be a worthwhile study to see how it achieves to do so. One could see if there is any temporal aspect to it; identify when the best time is, to probe, for one to get a response and how these actors optimize their response strategies to achieve it.

One could also look into behaviors by these sockpuppetiers that warrant ban from moderators of these online communities. Identifying these features could then be recorded and given as guidelines for identifying the same. How long these observations may be valid would be something different altogether.

Given that some communities with human moderators have been addressing this particular issue using “mod bans”, one could try to create supervised models for identification of sock puppeteering accounts using these ban informations as the ground truth or label.

Also, on a different note, it would be a worthwhile pursuit to see how these actors respond such bans. Do they create more accounts immediately or do they wait it out? An interesting thought that can be looked into for sure.

Given that uncovering or profiling the identity of the users is the way forward to counteract sock puppeteering, it is a valid concern for for users whose identity needs to be kept under wraps for legitimate reasons. Given that even allegations of certain news about a particular person has lead to violence being directed at them, how can one ensure these people are protected? This is one issue the method described the author which uses IP addresses to identify sock puppetry need to address. How can one differentiate users with legit reasons for creating multiple identities online from those who don’t?

Read More