Reflection #13 – [11/29] – [Karim Youssef]

November 29, 2018 Karim Youssef Leave a comment

In the last decade, the evolution of digital systems and connectivity has created a gushing stream of data that contains latent treasures. Multiple domains such as biology, astronomy, and social science are facing an unprecedented challenge; how to deal with Big Data? Big data is perhaps one of the most resonant scientific terms in the last decade. A specific field of study in computer science has been established to develop hardware and software systems that scale with big data.

In social science, big data gushing out of online social platforms represents an invaluable gold mine. In their work titled Data ex Machina: Introduction to Big Data, David Lazer et al. explore the opportunities and challenges of using big data to study and analyize different social phenomena. From their perspective, big social data comes from three sources; digital life ( e.g. online social platforms ), digital traces ( e.g. call records ), and digitized life ( e.g. digitizing old books and newspapers ).

Many opportunities exist in analyzing data that are generated from the aformentioned sources. These data could reflect actual patterns of social activities that are hard to extract from research surveys and questionnaires. It also creates the opportunity to analyze social interactions with breaking events as they are happening, rather than performing a retrospective analysis on past events.

From my perspective, archived data from social platforms could also serve as a treasure to reptrospectively analyze some special events. An example that always influences my ideas is the Arab Spring uprisings. For example, the Egyptian people have lived a very precious experience between 2011 and 2013. In some places in the world, online social platforms could be the best places to record the traces of some events.

Online social platforms could also serve as a natrual experiment field, a great opportunity but with great ethical concerns. An example could be Facebook’s experiment on social influence and emotion contagion, a very promising experiment that raised a huge ethical debate.

David Lazer et al. also explore a set of challenges and vulnerabilities. Those challenges include the generalizability of analysis performed on an individual data source ( e.g. Twitter ). From this point, they shed the light on the problem of social activities of individuals are spread across multiple social platforms. They also talk about the the credibility and legitimacy of data given the widespread of bots and fake indentities.

From my view, I believe that no matter how hard are the aformentioned challenges, the continuous research efr fort to understand and solve these problems will likely converge at some point. The harder challenge that is likely to linger is the ethical challenge. David Lazer et al. shed the light on the research ethics. How to guarantee the rights of a human subject research, have an informed concent in place, and still guarantee the quality and benefits of collecting and analyzing his online social data. Some people commented on social platforms being free by saying “if it is offered to you for free, then probably you are the goods being sold“. From my perspective, establishing and applying rules that preserve the rights of every user, and being trasnparent about everything, is a lingering challenge.

Reflection #12 – [10/23] – Karim Youssef

October 23, 2018 Karim Youssef Leave a comment

Social influence is a well-known phenomenon that individuals and groups experience through their social interactions. there are multiple aspects of social influence. Individuals may feel pressured to practice some behaviors or follow some beliefs influenced by their close social circle. People also get affected by the emotions of others surrounding them. Some people use the expressions, happiness is in the air, or depression is in the air to express their perception of a surrounding emotion.

With the prevalence of online social platforms, many questions arise regarding how social influence is shaped, and how it affects and/or is affected by real-life social interactions. In their work A 61-million-person experiment in social influence and political mobilization, Bond et al. studied the effect of social influence on encouraging people to vote in elections. Their experiment consisted of conveying a message to individuals through their Facebook Newsfeed that encourages them to vote and let others know that they did. The message also showed some of an individual’s friends who already voted. Their results indicate a significant influence on an individual when he knows that people in his close circle have taken an action, pressuring him to take a similar action.

In another study titled Experimental evidence of massive-scale emotional contagion through social networks, the core data science team at Facebook showed how a decrease in posts with positive emotions in someone’s Newsfeed rendered him less positive, while a decrease in posts with negative emotions rendered him less negative. Their work presented an experimental evidence of the phenomenon known as emotional contagion.

The two aforementioned studies analyze two different aspects of social influence and how they are shaped in online social platforms. My reflection on these studies could be summarized in the following points:

A common point between the two studies is that both provide a perspective on how to interpret the effect size. An effect size that may seem small in an experiment on a tiny subset of a giant social platform such as Facebook should be taken into consideration given how the aggregate effect could be.
The first study claims that the influence of close friends (strong ties) is much significant than the influence of other friends (weak ties) where the number of interactions is much less. I would be interested to study the influence of social media public figures which some people like to call them social influencers. These social influencers may have a significantly large number of apparently weak ties, however, their aggregate effect is more significant than smaller number of strong ties.
Given some experimental evidence on some aspects of social influence, it is interesting to study how this influence could be used for opinion manipulation. During the Arab Spring uprisings, social media played a pivotal role in shaping the public opinion, which I believed contributed significantly to today’s outcome in many countries, e.g. Egypt.
The emotional contagion study refuted the claim that a trending positivity in the Newsfeed may lead to a negative effect on an individual. However, it is hard to conclude this from only the outcome of this study, as there might be another perspective where an individual is pressured to act positively to comply with the trend.

Reflection 11 – [10/16] – [Karim Youssef]

October 14, 2018 Karim Youssef Leave a comment

In social communication, there are multiple values that people tend to respect in order to gain different types of benefits. Being polite is one of the most important among these values. In modern online communities, politeness plays a great role for the community to ensure healthy interactions and for individuals to maximize their benefits, either being a request for help, conveying an opinion to an audience, or any other types of online social interactions.

In their work “A computational approach to politeness with application to social factors”, Danescu-Niculescu-Mizil et al. presented a valuable approach to computationally analyzing politeness in online communities based on linguistic traits. Their work consisted of labeling a large dataset of requests on Wikipedia and StackOverflow using human annotators, extracting linguistic features, and building a machine learning model that automatically classifies requests as polite or not polite with a close-to-human classification performance. They then use their model to analyze the correlation between politeness and social factors such as power and gender.

My reflection about their work consists of the following points:

it is nontrivial to define a norm for politeness. One way of learning this norm is to use human annotators as Danescu-Niculescu-Mizil et al. did. It could be interesting to conduct a similar annotation for the same dataset using human annotators from different cultures ( e.g. different geographic locations ) to understand how the norm for politeness may differ. It could also be interesting to study people’s perception of politeness across different domains. For example, the norm of politeness may differ if the comments are from a political news website versus technical discussions in computer programming.
The model evaluation shows a noticeable difference between the in-domain vs. cross-domain settings, as well as another noticeable difference between the cross-domain performance of the model trained with Wikipedia and the that trained with StackExchange. A simple reasoning could be that there are community specific vocabularies that make a model trained on data from one community not to generalize very well on other communities. From this point, we may conclude that the vocabulary used in comments on StackExchange is more generic than that used in the requests to edit on Wikipedia, which gives an advantage to the cross-domain model trained with StackExchange. I believe it is highly important to categorize the communities and to analyze the community-specific linguistic traits in order to make an informed decision when training a cross-domain model.
Such a study could be used to help to moderate social platforms that are keen to maintain a certain level of “politeness” in their interactions. It could help moderators automatically detect impolite comments, as well as individuals to tell them how likely are their comments to be perceived as polite or not before sharing them.
Given the negative correlation between social power and politeness as inferred by the study, could it be useful to rethink the design of online social systems to encourage maintaining politeness in individuals with higher social power?
Although the study incurs some limitations such as the performance of cross-domain models, it represents a robust and coherent analysis that could serve as a guideline for many similar data-driven studies.

To conclude, there are multiple benefits in studying traits of politeness and automatically predict it in online social platforms. This study inspires me to start from where they stopped and work on enhancing their models and applying them to multiple useful domains.

Reflection 10 – [10/02] – [Karim Youssef]

October 2, 2018 Karim Youssef Leave a comment

the prevalent use of online social platforms such as Twitter has altered the process of news information sharing from only a journalist created and revised content to a user-generated content with low or no guidelines or regulations to ensure quality and credibility of content. This fact is not necessarily negative, user-generated news content is useful as a mean of quick reporting of a breaking event from multiple eyewitnesses. However, this system is highly prone to the spread of misinformation and rumors without a well-established technique to prevent these types of undesirable content.

One implication of the prevalence of user-generated news media is that these media became a fertile ground for promoting the spread alternative media websites, or became by themselves a source of alternative media. In her work Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter, Kate Starbird found in Twitter a gold mine from where she extracted some highly informative insights about alternative media sources and inferred valuable relations between these sources from the posting activity of Twitter users who include them in their tweets.

Starbird’s analysis represents a valuable step towards understanding and revealing some truths about alternative media websites. Her work inspires me and creates multiple questions. One question is how do alternative media contribute to shaping the knowledge and perceptions of the public? In other words, sometimes alternative media could contribute positively even if driven by a political agenda. One obvious example could be the information spread during the Egyptian revolution. In the early days of the revolution in January 2011, all the mainstream media was spreading information that was proven to be incorrect later, while a majority of news shared through alternative media that could be usually dubious has shown to be correct.

In many parts of the world, it is hard to judge what type of media is conveying credible news, as many political agendas may be driving both mainstream and alternative media sources. There must be a mean of verifying the information that reaches an individual, and sometimes a healthy amount of skepticism is required. An interesting factor to study could be the deviance in alternative media. We may try to study the most shared alternative media sources in terms of how credible they are across a period of time, and compare this to some of the mainstream media sources. One of the main challenges could be how o guarantee a neutrality of judgment.

Another suggestion would be to design social systems that encourage users to be healthily skeptical of the news that reaches them by encouraging them to verify the news through easy tasks such as a Google search. Such a design could assign a credibility score of a user that increases as this user verifies the news before sharing. Imagine that beside the share button, there is a “verify” button that when clicked, retrieves the top n relevant links from a Google search and extracts some keywords that may indicate how credible the news is. If a user presses the verify button, the news will be flagged as “credible and safe to share” or “needs more verification” or maybe “highly dubious”. As the user verifies more, he becomes a more credible user as he shares verified news.

It is hard to judge what type of media carries the truth, but it is always possible to go an extra mile and verify before sharing. This might not be as easy as it seems because a user may be blinded by a piece of news that reinforces his opinion about something.

Video Reflection #9 – [09/27] – [Karim Youssef]

September 27, 2018 Karim Youssef Leave a comment

Partisanship is an inherently social phenomenon where people tend to form groups around different ideologies and their representatives. However, if partisans get isolated inside their groups without a constructive exposure to ideas and opinions of other groups, the society may start to deviate from being a healthy and democratic society. Talia Stroud works towards promoting a constructive engagement between partisans of different groups and mitigating the negative effect of partisanship in online news media.

The first study presented by Stroud leverages the Stereotype content model to promote the idea of distinguishing likeness and respect. Results of this study show a significant effect for changing the names of the reaction buttons on comments. Some questions could arise here such as: what is the long-term effect of such a solution in terms of actually refining any negative partisan behavior?. The results of the study partially answer this question by showing that people actually “respect” opposing ideas. But from all people who pass by an opposing comment, how many are actually willing to positively engage with an opposing comment? How to encourage people to engage and respect an opposing comment that deserves this respect?

From my perspective, I would suggest answering these questions by the following:

extending the study within the context of selective exposure and selective judgment by studying the percentage of people who stop by opposing comments, read them, and give them a deserved respect.
extending the design to include a feedback to the user. For example, including a healthy engagement score that increases when a user reads and respects an opposing opinion.

The second study presented in the video analyzes the effect of incivility in online news media comments by analyzing the triggers of reward and punishment for comments. In this regard, the study compares three behavioral acts; profanity, incivility, and partisanship. It is of no surprise that profanity is the agreed-upon rejected act by both commenters and moderators. However, it is a fact that sometimes conversations with incivility attract views and even engagement. Many among my generation grew up watching these TV shows with political opponents fighting on air. These types of media always have the good cause of promoting fruitful discussions between opposing mindsets, however, as she mentioned, there are business incentives behind promoting some controversial discussions.

In a perfect world, we may wish that fruitful interactions between partisans of different groups become as encouraging for engagement as those situations when partisans engage in a fighter mode to defend their ideologies. The question is how to encourage news organization to define clear thresholds for the amount of acceptable incivility in discussions about hot issues?. From another perspective, is it feasible to do so? Or should researchers focus on promoting desirable engagement among users rather than moving towards a stricter moderation of online comments?

From my perspective, the current model for news organizations is the best we can do in terms of having a set of rules and( humans and/or automated ) moderators enforcing these rules to some extent. The changes that we apply could be changed in the user interface design of online news organizations to promote a healthier engagement ( e.g. the first study with my suggestions added to it ) integrated with some of the ideas surveyed in the work Bursting Your (Filter) Bubble: Strategies for Promoting Diverse Exposure. Another important step could be auditing ( and maybe redesigning ) the recommendation algorithms to ensure that they do not contribute to this called filter bubble effect.

Reflection #8 – [09/25] – [Karim Youssef]

September 25, 2018 Karim Youssef Leave a comment

Nowadays, thanks to the abundance and the accessibility of online information sources, people have access to an overwhelmingly wide range of information. Since the early days of this online information explosion, many researchers were concerned about how the internet will affect and shape the individual’s exposure to information. In other words, how the selective exposure theory will manifest in online news and information sources.

One of the decent studies in this field was presented by R. Kelly Garrett in his work Echo chambers online?: Politically motivated selective exposure among Internet news users. This work analyses factors that affect the selection of an online news source by a user, as well as the time a user will spend reading the selected source. The results of this study tend to reinforce the set of initial hypotheses made by the author. These hypotheses could be summarized in that the motivation for a user to favor a news item that matches his opinion over another that challenges it is to seek opinion reinforcement rather than to avert opinion challenge.

Although the author mentions that these results are somehow reassuring in terms of the worries that the internet contributes to creating “Echo Chambers”, there is an important missing piece. This paper studies the effect of the internet as a resource that gives users abundant choices and control over what they read. The fear here was that users selectivity may directly create the Echo Chamber effect. The missing piece here is the contribution of the technology itself in creating this effect through personalization techniques. Although the study shows that users are less likely to avert an opinion challenging information item by itself, the continuous tendency to favor opinion-reinforcing information under the presence of these personalization techniques could lead to a misperceived dominance of their own opinions and an gradual isolation from opposing ideas.

The effect of selective exposure along with the online recommendation and personalization technologies was of concern to Paul Resnick et al. as presented in their work Bursting Your (Filter) Bubble: Strategies for Promoting Diverse Exposure. In their work, they survey existing solutions that aim to encourage the exposure to diverse and cross-cutting content. The surveyed solutions include user interface designs that encourage a user to read opposing opinions or that shows to a user how balanced is his reads.

Despite the attractiveness and creativity of the solutions proposed to promote exposure to diversity, it is necessary to keep moving forward towards a comprehensive understanding of why these “Filter Bubbles” exist. R. Kelly’s study, as well as Eytan Bakshy et al.’s work Exposure to ideologically diverse news and opinion on Facebook, suggest that the choices of individuals play the most significant role in shaping their online exposure. Suppose that we agree to this fact, an important question is: Do hidden personalization algorithms by themselves cause more limit to diversified exposure? or are they only a reflection of the individual’s behavior?. To answer these questions and to have a complete understanding and enhancement of online exposure, we need to connect some dots from the research on selective exposure as a human nature, auditing of online personalization algorithms, and techniques to promote a more diversified online exposure.

Understanding an individual’s motivations, studying their role, as well as that of other effects in driving the online recommendation algorithms, could lead to the best strategy to develop a more diversity-promoting online world.

Reflection #7 – [09/18] – [Karim Youssef]

September 18, 2018 Karim Youssef Leave a comment

The continuous evolution of computer systems and networks infrastructures connected the world, making it possible for anyone connected to the internet to easily communicate and interact with acquaintances as well as strangers within various contexts. With new possibilities, new challenges arise. A question imposes itself, how to maximally convey the traits of real-life social interactions through a computer application?

Electronic online communications date back to early 1970s when the email service was introduced. No doubt that this was revolutionary, however, with the evolution of computer systems and applications, it became possible to create other contexts of online communication with a more synchronous aspect, where people could have an online conversation as similar as possible to a real-life one. This possibility raises the above question.

In an attempt to address this question, Thomas Ericsson, and Wendy A. Kellogg introduced the concept of Social Translucence to the design of social applications. In their work “Social Translucence: An Approach to Designing Systems that Support Social Processes”, they first define social translucence in terms of three aspects of real-life social interactions; visibility, awareness, and accountability. After that, they present the design of an online social application that serves as a knowledge community. Although their design looks simple, its details attempt to capture many aspects of social interactions. Their application is called Babble, and it consists of multiple textual conversation threads where people chat with each other about some topics. They also design a graphical representation of the conversation called social proxy, which depicts a conversation as a circle, and people are small circles moving within the large circle to reflect their activity within the conversation.

From my view, one of the most successful parts of their design is the way the conversation is organized. The date and time stamps, followed by the name and the text message convey a lot of information from a social perspective. This reflects the flow of the conversation, how fast people respond to each other, and it gives the sense of a lively conversation because everyone is seeing what all other parties of a conversation are saying in a near real-time way. It also makes it possible for people who later join a conversation to catch up with at least the most recent part of it. We can notice that this is the convention for most of today’s chatting tools.

The design of the social proxy adds more awareness to some characteristics of the conversation. From my view, this idea is successful in terms of reflecting the activity of speakers and listeners within a conversation, however, there could be different meanings and interpretations associated with the spatial patterns of users. This point of spatial patterns is from the points I like the most about the Chat Circles Series project.

The Chat Circles Series is an attempt to add more liveness and awareness to online conversations by introducing concepts of hearing range and moving in the space. The original Chat Circles and its evolutions try to move as close as possible to a real-life style of conversations. Although this is good in terms of portraying many aspects of social interactions, it could have some drawbacks.

If I imagine traveling back in time and talking with the designers to improve their design. I would focus on how online conversations will become an inherent and mixed part of every day’s life, and in order to cope with the pace of the users’ lifestyle, their design will need to become much simpler. Although some parts of the Chat Circles’ designs are used in today’s chatting tools, for example, the circles growing and shrinking to indicate who is speaking in a group call in Skype, I would stand for the simplicity and readability of Ericsson et al.’s design of a chat thread. Their design is highly used in current chatting tools. Combined with some newer features such as “last active” or “seen by”, and the possibility of reacting with an emotion icon (e.g. emojis) is able to convey sufficient social information.

Finally, my ultimate belief is that although it is highly useful and important to import as much real-life social traits as we could to the digital life, it will never replace the value and importance of a real-life social conversation.

Reflection #6 – [09/13] – [Karim Youssef]

September 13, 2018 Karim Youssef Leave a comment

Personalization in online platforms could be considered a double-edged sword. At first sight, personalization looks beneficial to individual users to give them a better experience when surfing the web. It also seems to make sense since the amount of accessible information is overwhelming. Taking a deeper look, personalization and other hidden filtering algorithms raise many questions. From the fear of Filter Bubbles to the potential implicit discrimination, it became a public interest issue to scrutinize these black boxes that are deciding on our behalf what we may want to see online.

Revealing the functionality of hidden filtering algorithms is a challenging process. In their work Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms, Christian Sandvig et al. propose multiple research methods for auditing online algorithms. These methods include auditing the algorithm’s code, surveying users of a specific online platform, scraping data, using sockpuppets, or crowdsourcing. Every proposed technique faces a specific set of challenges between ethical challenges, legal challenges, and/or insufficient data or knowledge. This work analyses closely every technique and the challenges it faces, which inspires a lot of research in this area.

A few general takeaways from the work presented by Christian Sandvig et al. are:

As mentioned in the paper, it is hard to generalize the results of an audit without being subjective to a specific platform. These subjective audit studies could give advantages to some competitors of the studied platform unless there is a regulation that ensures fairness in studies across different platforms providing the same service.
There are a lot of legal restrictions to performing such studies. Whether workarounds are considered ethical or not depends on the importance of the results and the right of the wide base of users to know what happens behind the scene.
Combining two or more techniques from those mentioned in the paper could lead to more beneficial results, such as combining crowdsourcing with sockpuppet accounts to design more controlled experiments. Or if possible, combining the code auditing with crowdsourcing could help in reverse-engineering the parts that are not clear.
Finally, algorithm auditing is becoming highly important and it is necessary to open the way and relax some conditions that allow for more efficient auditing to ensure transparency of different online platforms.

One of the valuable algorithm audit studies is the one performed by Aniko Hannak et al. , presented in their work Measuring Personalization of Web Search. This work presents a well-designed study that analyses how Google search engine personalizes search results. The beauty of this work lies in the interpretability of their experimental setup and their results, as well as the generality of their approach. This work studies the factors that contribute to the personalization of Google search results. They analyzed the similarity between search results for queries made at the same time using the same keywords and studied the effect of multiple factors such as geolocation, demographics, search history, and browsing history. They quantified the personalization for these different factors as well as for different search categories.

Some of the takeaways from this work could be:

This work serves as a first step towards building a methodology that measures web search personalization quantitatively. Although there could be more parameters and conditions to look at, the method presented by this work is a guiding step.
The generality of their approach backs the previous point. their method could be applied to different online platforms to reveal initial traits about hidden ranking algorithms such as searching for products on e-commerce websites or displaying news in a newsfeed.
As they mention, their findings reflect the most obvious factors that drive personalization algorithms. Starting from their work, a deeper analysis could be done to reveal other hidden traits that may carry any sort of discrimination or limit the exposure to some information.

As we mentioned at the beginning, there could be various benefits from a personalization algorithm, however, auditing these algorithms is necessary to ensure that there are no undesirable effects that could result from using them.

Reflection #5 – [9/11] – [Karim Youssef]

September 11, 2018 Karim Youssef Leave a comment

The amount and sources of information, news, and opinions that we get exposed to every day have significantly increased thanks to online social media platforms. With these platforms serving as a mediator for sharing information between their users, multiple questions arise regarding how their design and function affect the information that reaches an end-user.

In designing an online platform with an overwhelming amount of information flowing every day, it may make sense to design some personalization techniques to optimize the user’s experience. It might also make sense for example for an advertising platform to optimize the display of adds in a way that affects what an end-user see. There are many design goals that may result in the end-user receiving filtered information, however, the lack of transparency of these techniques to end-users, as well as the effect of filtering information on the quality and diversity of the content that reaches a user are significant concerns that need to be addressed.

In their work Exposure to ideologically diverse news and opinion on Facebook, Eytan Bakshy et al. study the factors affecting the diversity of the content that Facebook users are exposed to. They conducted a data-driven approach to analyze the proportion of content from a different ideology versus content from an aligning ideology that a user sees in his Facebook newsfeed. They inferred that the most contributing factors to limiting the diversity of the newsfeed content of a user are the structure of the user’s friends network and what a user chooses to interact with. The study found that the newsfeed ranking algorithm affects the diversity of the content that reaches a user, however, this algorithm is adaptable to the behavior and interactions of the user. From these perspectives, they concluded that “the power to expose oneself to perspectives from the other side in social media lies first and foremost with individuals” as mentioned in the paper.

I agree to some extent with the findings and conclusions of the study discussed above. However, one major concern is the question of to what extent are Facebook users aware of these newsfeed ranking algorithms? Eslami et al. try to answer this critical question in their work “I always assumed that I wasn’t really that close to [her]”: Reasoning about invisible algorithms in the news feed. They conducted a qualitative study that included information from 40 Facebook users about their awareness of the newsfeed curation algorithms. The study showed that the majority are not aware of these algorithms and that there is a large misinterpretation of the effect of these algorithms among users. Although after becoming aware that Facebook controls what they see, a majority of users appreciated the importance of these algorithms, the initial response to knowing that these algorithms exist was highly negative, it also revealed how people are making wrong assumptions when for example not seeing a post from a friend for a while.

I’ll imagine myself as a mediator between users and designers of a Facebook-like social platform, trying to close the gap. I totally agree that every user has the complete right to know how their newsfeed work. And every user should feel that he/she is in full control over what they see and that any hidden algorithm is only helping them to personalize their newsfeed. On the other hand, it is a hard design problem for the platform designers to reveal all their techniques to end-users, simply because the more complex it becomes to use the platform, the more likely users will abandon it to more simple platforms.

If I imagine being hired to alter the design of a social platform to make users more aware of any hidden techniques. I would start with a very simple message conveyed through an animated video that raises the users’ awareness of how their newsfeed work. This could be by simply saying that “we are working to ensure you the best experience by personalizing your newsfeed, we would appreciate your feedback”. For having a user’s feedback, they could see occasional messages that ask them simple questions like “you’ve been interacting with x recently, to see more posts from x, you can go to settings and set this and that”. After a while, users will become more aware of how to control what they see on their newsfeed. Also taking continuous feedback from users on their satisfaction levels with their experience with the platform will help to improve the design over time.

I understand that it is more complex and challenging to address such a problem and that there may be hundreds of other reasons why there are some hidden algorithms that control what an end-user receives. However, ensuring a higher level of transparency is crucial to the healthiness and user satisfaction with online social platforms.

Reflection #4 – [09/06] – [Karim Youssef]

September 6, 2018 Karim Youssef Leave a comment

The growth of online discussion communities made them central to exchanging opinions, information, and knowledge, which gave those communities a significant role in forming opinions and knowledge for many of their users. With a little chance to verify both content and identity, a normal user could easily be prone to identity deception and misinformation. One of the challenges to a healthy online discussion community is sockpuppetry, where multiple “virtual” users are controlled by a single actual user. Sockpuppetry could be used for deception, or for creating a fake public consensus to manipulate the public opinion towards an event or a person.

In their work An Army of Me: Sockpuppets in Online Discussion Communities, Kumar et. al conducted a valuable study that aims to analyze the behavior of sockpuppets in online communities. Their study consists of a data-driven approach that analyzes data from nine different online discussion communities. Their study analyzes posting activity, and linguistic characteristics of content posted by sockpuppet accounts. After that, they identify different types of sockpuppets based on deceptive behavior and/or supportive behavior towards other sockpuppets. The study finally uses these analyses to build a predictive model that aims to predict whether an account is a sockpuppet and whether a pair of accounts is a sockpuppet pair.

My reflection on this study could be summarized in the following points:

This study helps to build a clear definition of a sockpuppet and highlights some of the motivations behind creating such online identities. There could be a wide variety of motivations behind sockpuppetry, some could be benign, but it is highly important to understand the malicious motivations behind sockpuppetry.
Activity traces of sockpuppets are highly predictive, while community reactions towards them are less predictive. Compared to other types of antisocial behavior, community reactions features were more predictive as shown by Cheng et. al in Antisocial Behavior in Online Discussion Communities. This fact could convey multiple signals. It could be that sockpuppets are hard to detect by their surrounding community, or that a user in an online community is more alert by antisocial content rather than a specific activity pattern. Which may mean that a community tends to react negatively if a sockpuppet account is posting a significantly undesirable content, more than that if a strange activity pattern occurs unless it is blatantly suspicious.
Sockpuppets seem to tend to work in a group. Although the study shows that most of the identified sockpuppets tend to work in pairs, there could be other activity patterns associated with larger groups of sockpuppet accounts that are worth studying.
Similar to other data mining problems on online communities, it seems to still be a hard problem to develop an automated system that could reliably replace human moderators, however, reaching a step where an automatic flag could be raised on some content makes life easier for moderators and help towards a faster control of the spread of any type of undesirable behavior in online communities.

This study stimulated my interest to proceed in different directions. I would be interested to study the role of sockpuppets in different historical events such as the Arab spring or the US 2016 elections. I’d also be interested to study how sockpuppets are effective in forming opinions of normal users, or in spreading misinformation and making it widely accepted by users of online communities.

There are also multiple directions to improve the current study. Among them is the study of the activity pattern of larger sockpuppet groups, a deeper analysis of the content posted by sockpuppets beside their activity patterns to derive more content-based features, and a further analysis and comparison of the activity patterns and content features of sockpuppets across different types of online communities.