Reflection #13 – [11/29] – [Subil Abraham]

Lazer, David, and Jason Radford. “Data ex machina: Introduction to big data.” Annual Review of Sociology 43 (2017): 19-39.

This article provides an introduction to the world of big data to sociologists. It talks about the possibilities of using big data to identify new phenomena in human behavior. It also goes over the potential pitfalls of relying on big data and makes the case that big data works best when used in combination with other methods rather than in isolation. It concludes with what the future could hold for using big data for Sociology.

This article is an appropriate bookend for this semester. It goes over the major themes that we covered in detail in our classes and provides a good summary of the things we’ve learned. On noticing that this article was published in a Sociology journal, I was reminded that despite all the computing related things we’re doing, the ultimate goal is to further the study of humans in this connected world. All the machine learning and data analysis was a means to an end, which is obvious in retrospect but is not really in the forefront of your mind when you are deep in the throes of writing code.

The authors made an interesting point about scientists fixating on single platforms like Twitter, making the comparison to ‘model organisms’ in Biology. Behavior on Twitter is exclusive to Twitter and doesn’t necessarily reflect the wide range of human behavior. But even within so-called model organisms, there is such a rich research potential for studying human behavior. I don’t believe that findings need to be generalizable to be useful. Interesting observations can be made on single platforms and there is no need to constantly consider how generalizable the information is. Consider Finstagrams [1], a phenomenon where users are creating secondary accounts to be viewable by only select people. Regular Instagram accounts are often curated to be perfect and public facing. Finstagrams provide an outlet where users can just ‘be themselves’. I believe this could be a fascinating study, looking at what causes a user to make a Finstagram account, when did they first start appearing, what are the real world analogues, and so on. Looking at single platforms exclusively should not be dismissed for lack of generalizability for sometimes it is that lack of generalizability that makes findings interesting.

I think there is an interesting future ahead for this combination of Sociology and Computer Science. With everything that is happening with Facebook and the problems that arise from social media in general, I think this field holds the future in figuring out how to help solve the problem of humans in the online world, just like Sociology is trying to solve the problem of humans in the real world. It warrants keeping an eye out and seeing where things go from here.

[1] https://medium.com/bits-pixels/finstagram-the-instagram-revolution-737999d40014

Read More

Reflection #12 – [10/23] – [Subil Abraham]

  1. Bond, Robert M., et al. “A 61-million-person experiment in social influence and political mobilization.” Nature 489.7415 (2012): 295.
  2. Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. “Experimental evidence of massive-scale emotional contagion through social networks.” Proceedings of the National Academy of Sciences (2014): 201320040.

 

Summary:

The two papers talk about how people are influenced and how this influence spreads through the online social networks that people build. The experiments were focused on Facebook and the network of friends that people build. The first paper tracks the influence of a banner that promoted voting in the election, with and without showing who else stated that they voted. The second paper tracks how the user’s emotions are influenced based on the sentiments represented in the news feed.

 

Reflection:

The whole contagion effect on Facebook influencing people to vote parallels real life unconscious social pressures. For example, I noticed the social pressure affecting me when I felt compelled to buy clothes emblazoned with the VT logo because that is what I see everyone wearing around me. The compulsion to be conforming is extremely strong in humans. So it is not surprising that one is more likely to vote when they are explicitly made aware that people around them, people they know, have voted.

The two experiments make me wonder how much further people can be manipulated in their thoughts and actions, via social media. If you can also manipulate buying habits, advertisers now have a fantastic new weapon. If you can manipulate human thought and initiative, rogue governments now have a fantastic new weapon. This isn’t something that may happen in the far off future either. Oh no! This is something that is happening now. The most extreme example is the massacre and desolation of the Rohingya people in Myanmar. The hatred for these people is being perpetuated through social media, with support from the government (though they would deny such thing). You can spark good effects like getting people to vote through contagion, but remember that this also exists.

We cannot escape talking about the ethical issues surrounding the emotional contagion experiment, especially considering the outrage that it sparked when people were made aware that it happened. I wondered why the emotional contagion paper struck me as something I’d heard about before. Then I realized that I’d read about it in the news and the huge controversy that it caused. One has to ask, is it really informed consent when people are conditioned over the years to simply check the ‘I agree’ box on the terms of service without reading bothering to read it (because who wants to read several pages of impenetrable legal language)? I have to question the researchers’ reasoning that people consented to the experiment. It is a well known fact that in 99.9% of the cases, people don’t read the terms of service. Even if the researchers were unaware of this fact, the problem remains that informed consent did not exist (because few users read the terms of service), so this experiment could be classified as unethical and wrong. The experiment’s results rested on the users not being aware that the experiment is taking place. But it does seem like the user’s choice was taken away from them, which is most certainly not a good standard to set, especially by someone as big and influential as Facebook. Then again, Facebook isn’t exactly a paragon of ethics so I don’t think they care either way.

 

Read More

Reflection #11 – [10/16] – [Subil Abraham]

The paper is an interesting attempt at quantifying politeness when it comes to requests. They used MTurk to annotate requests on talk pages on Wikipedia and Stack Overflow questions and answers. From the annotated data, they were able to build a classifier that could annotate new request texts with close to human level accuracy.

The analysis on Wikipedia that editors who are more polite are more likely to be promoted to admins. But the question now is, what can be done in order to make sure someone continues to be polite, even after gaining power? More generally, what incentive system can be built to prevent the someone’s power from getting to their head? We already have the obvious checks and balances like banning someone even if they are an admin if they become too disruptive. But what about preventing even the small devolution that was observed? Simply stripping the privileges one gained at the slightest sign of impoliteness would be a bad idea, for sure.  We could think about implementing the trained politeness classifier as a browser extension (like Grammarly [1] but for politeness) to tell you what your politeness levels are for what you are typing. But this might end up being suffocating to the user who has to deal with seeing this application that they are not as polite as they should be. And also, as Lindah [2] pointed out, the classifier is far from perfect, so this might not be a good idea either.

This isn’t a very shrewd insight or anything but I may as well point out that the classifier is obviously skewed to the U.S. understanding of what politeness is. The authors were upfront about how they chose their annotators but this does mean that the final results end up with marking a request as impolite or polite based on politeness in the context of the USA. Though the biggest parts of Stack Overflow and Wikipedia are in English, people from all over the world do end up contributing to it. What someone from the US may consider impolite (like the ‘Direct question’ strategy) would seem perfectly polite to a Scandinavian, as their culture is one of directness in their speech. Any future work that builds on this one must keep this in mind.

All in all, I believe this was a pretty solid paper. They set out to do something and they did it, documenting the process. Potential future work would be to take this idea and redo it in a different English speaking culture, to identify how their ideas of politeness differ from the US perspective.

 

[1] https://www.grammarly.com/

[2] https://wordpress.cs.vt.edu/cs5984/2018/10/07/conversational-behavior/

Read More

Reflection 10 – [10/02] – [Subil Abraham]

Starbird’s paper is an interesting examination of the connections between the many “alternative news” domains, with mass shooter conspiracies being the theme connecting them all.

The paper mentions that the author themselves is a politically left-leaning individual and points out that this may bias their perceptions when writing this paper. The author mentioning their bias made me think about my own possible left-leaning bias when consuming the news. When I see some news from an “alt-left” site that someone on the right would call a conspiracy theory, am I taking that news as the truth because it probably agrees with my perceptions; perceptions which may have been sculpted by ages of consuming left-leaning news. How would I, as an individual on the left, be able to conduct a study on left-leaning alternative narratives without my bias skewing the conclusions? Scientists are humans and you cannot eliminate bias from a human entirely. You could bring on people from the right as equal partners when conducting the study and then keep each other in check to try and cancel out each other’s bias. How well you would be able to work together and produce good, solid research considering that this is essentially an adversarial relationship, I do not know.

It’s interesting that Infowars has such a large share of the tweets but only has one edge connecting to it. GIven how prominent Infowars is, one would think that they would have a lot more edges i.e. users that tweet out other alt-right websites would tweet out Infowars too. But it seems like a bulk of the users just tweet out Infowars and nothing else. This means that the audience of Infowars, for the most part, does not overlap with the audience of other alt-right news sites. Now, why would that be? Is it because Infowars’ audience is satisfied with the news they get from there and don’t go anywhere else? Is it also because the audience of other alt-right sites think Infowars is unreliable or maybe they think Infowars is too nutty? Who knows. A larger examination of the reading habits of Infowars’ audience would be interesting. Since this study focuses only on mass shooter conspiracies, it would be interesting to know if and how widely Infowars’ audience read when it comes to the wider field of topics the alt-right websites talk about.

The conclusions from this paper ties really well into the theme of Selective Exposure we talked about the last two reflections. People see different sources all spouting the same thing in different ways and repeated exposure reinforces their opinions. You only need to plant a seed that something is maybe true and then barrage them with sources that are seemingly confirming that truth. Confirmation Bias will take care of the rest. It is especially problematic when it comes to exposure to alternative narratives because the extreme nature of the opinions that will form will be particularly damaging. This is how the anti-vaxxer movement grew and now we have the problem that diseases like measles are coming back because of the loss of herd immunity thanks to anti-vaxxers not vaccinating their children [1]. Trying to suppress alternative narratives is a lost cause as banning one account or website will just lead to the creation of two others. How can we delegitimize false alternative narratives in people who are deep in the throes of their selective exposure? Simply pointing to the truth clearly doesn’t work, otherwise it would be easy and we wouldn’t be having this problem. People need to be deprogrammed by replacing their information diet and enforcing this change for a long time. This is basically how propaganda works and it is fairly effective. Conducting a study of how long it takes to change someone’s entrenched mindset from one to the opposite with only information consumption (controlling for things like personal life events) would be a good first step to understand how we can change people (of course, we run into ethical questions of should we be changing people en masse at all but that is a can of worms I don’t want to open just yet).

Read More

Video Reflection #9 – [09/27] – [Subil Abraham]

Dr. Stroud’s talk on her research on partisanship and its effects in the comments was an enlightening look at how news organizations need to consider their incentives and the design for their comment system. One thing that Dr. Stroud talked about during the NY Times study was considering the business incentives of the news organizations themselves, something we as a class have not discussed in detail (besides a few mentions here and there). We have been focused mainly on the user side of things and I think it is important to consider how one could incentivize the organization to take part to solve the problems because right now they see the partisanship in their comment section as good for business. More engagement means that you can serve more ads to more people and bring in more money. You could make conspiracies that engagement and the revenue generated is why Reddit doesn’t ban controversial subreddits unless they attract a lot of negative media attention, but that is a rabbit hole we don’t want to dive into.

I would agree with Dr. Stroud that severe partisanship is an obviously bad thing, but I don’t think that enforcing civility in every comment conversation is the right way to go. Humans are passionate, emotional creatures, prone to wild gesticulation to try and get their point across. People will blow their tops when talking about a topic they feel strongly about, especially when arguing with someone who has an opposing view. And like Dr. Stroud said, the idea of civility is subjective. What is stopping an organization from morphing this idea of civility over time into something that means “anything that opposes the organization’s views”? Remember that no great change has ever been brought about by people being civil. Even Gandhi, the icon of peace, wasn’t civil. His movements were peaceful, yes. But they were disruptive (i.e. most certainly not civil) which is why they were so effective and popular. The goal should be to incentivize people to listen to each other and help them find common ground, not to try and enforce civility which will at best create a facade of good vibes while not actually producing any understanding between the two groups.

Let’s try and speculate what a discussion system that provides the ability for users to listen and understand the opposing side (while allowing for passionate discussions. First thing we would like is for users to declare their alliances by setting where they stand politically (on the left or the right) on a sliding scale that would be visible when they comment. This allows other users know where this user stands and keep that in mind while engaging them. For now, let us assume that we don’t have to deal with trolls and that everyone set their position on the scale honestly. Now, when a comment (or reply) is posted, other users could vote on how well articulated and well argued the post is (we are not using ‘like’ and ‘recommend’ here because, as Dr. Stroud said, the choice of wording is important and lead to different results). If someone on the right made a well written reply that refutes a comment written by someone on the left, and this is acknowledged by other people by leaving votes on how well written and articulated the reply is (giving more weight to votes from the people on the other side of the scale); it could serve as a point for people on the left to think about, even if they are ideologically opposed to it.

If the comments just devolve into name calling and general rudeness, then nobody is getting votes on how well written and articulated it is. But this system could allow passionate discussions that do not necessarily fall into the bucket of “civility” but are still found to be valuable to be voted up and brought to notice of the people who oppose it. Seeing votes from people on their side will provide a strong incentive to try and understand a point that they might otherwise be opposed to and not think too deeply about.

Read More

Reflection #8 – [09/25] – [Subil Abraham]

 

[1] Garrett, R. Kelly. “Echo chambers online?: Politically motivated selective exposure among Internet news users.” Journal of Computer-Mediated Communication 14.2 (2009): 265-285.

[2] Resnick, Paul, et al. “Bursting your (filter) bubble: strategies for promoting diverse exposure.” Proceedings of the 2013 conference on Computer supported cooperative work companion. ACM, 2013.

 

This week’s papers study and discuss the Filter Bubble effect – the idea that people are only exposing themselves to viewpoints they agree with, to the detriment of obtaining diverse viewpoints. This effect is especially prominent when it comes to reading political news. The first paper studies the reading habits of users of partisan news sites. From their results, they could conclude that the desire to seek out news that reinforces one’s own opinions does not necessarily mean that one goes out of their way to avoid challenges to their opinions. In fact, they found that people engage more with the opposing views, perhaps to try and find flaws in the arguments and reinforce their own views. The second paper is a discussion by multiple authors on strategies for decreasing the filter bubble effect such as gamifying the news reading, encouraging the users to make lists of pros and cons on issues they read, having the news sources push through opposing views if the content is high quality, etc.

The first paper’s findings that people don’t actively avoid stories that challenge their opinions very interesting to me. It means that in most cases people are willing to engage opposing opinions even if they don’t necessarily agree with them. Big internet companies that rely on providing personalization could safely tweak their recommendation algorithms at least a bit to allow some opposing views to filter through without the fear of losing business, which may have been a concern for them.

Something I would like to know, that the first paper didn’t cover is how much are the people comprehending the stories and how much does it influence them? For opinion reinforcing stories, are they tuning out once they realize that the story agrees with their opinions? For opinion opposing stories, are they spending that excess time trying to understand and does that extra engagement lead to at least a shift in their opinion from compared to their earlier stance? Perhaps this study could be done with a final quiz that asks questions about the story content and also interviews to measure if there has been a shift in their opinion.

The idea of gamifying the task of getting a person to reduce their selective exposure, through the use of a stick figure balancing on a tightrope discussed in the second paper, is a very promising idea. Stack Overflow thrives with an enormous amount of content because they have gamified answering questions. You can earn different levels of badges and that gives you prestige. Perhaps the same thing can be done to encourage people to read more widely by giving them points and badges and have a leaderboard tracking the scores across the country. People actively chase the high of getting better scores which is why I think that this might be fairly effective. Of course, you don’t want people to game the game so maybe this gamification could be combined with the other ideas in the paper of having people discuss and interact and rate the comments to prevent botting the game.

Read More

Reflection #7 – [09/18] – [Subil Abraham]

  1. Donath, Judith, and Fernanda B. Viégas. “The chat circles series: explorations in designing abstract graphical communication interfaces.” Proceedings of the 4th conference on Designing interactive systems: processes, practices, methods, and techniques. ACM, 2002.
  2. Erickson, Thomas, and Wendy A. Kellogg. “Social translucence: an approach to designing systems that support social processes.” ACM transactions on computer-human interaction (TOCHI) 7.1 (2000): 59-83.

 

We have finally arrived at talking about design and its effects on interaction in social systems. The two papers closely deal with the idea of Social Translucence – making visible the social actions that are typically unseen in digital interactions – and the effects it has on the participants. Erickson et al. introduces the idea and proposes a system for knowledge sharing driven by conversations. Its aims are more formal, targeting knowledge gathering and interaction in organizations and also introduces a social proxy system that maps how conversations are going. Donath et al. covers similar ideas of Translucence through the lens of an informal chat system, which uses animated circles in a space and allows moving around and joining conversations, somewhat simulating real-life interactions.

I think the chat circles are interesting in that it seems to simulate a real-life social function while stripping away all customization, which means you can’t be judged by your appearance but only by your words and actions. But there is still a section of people who won’t want to use it: the introverts. Imagine you’re at a party. You arrive somewhat late and so almost everyone is in groups engaged in conversation. You don’t want to be that outcast standing alone at the back but you don’t want to try joining in a conversation because you start thinking: what if they don’t want me in? What should I even say? Will I just end up standing there and not get a word out and look dumb? Can I even hold a conversation? Will they think I’m weird? Are they thinking that now? They probably are, aren’t they? I don’t want to be here anymore. I want to go home…

The chat circle space is not that different from a party. But you want everyone to engage and have fun. How can the system serve the shy ones better? Perhaps allow the circles to change their color to green or something to show that they are open to conversation. Perhaps encourage the circles in the space to go talk to circles who seem to be by themselves. Any number of ways that can make the place be more welcoming.

Another problem we might encounter is excessive gatekeeping. This could happen in both the knowledge system and chat circles. Erickson’s knowledge system already has features for requesting entry into a community. At the same time, you don’t want those protections to be used for preventing, say, newbies who are just trying to gain knowledge or are interested. You don’t want the admins to throw their weight around the powerless. StackOverflow is already suffering from the problem of being very unwelcoming to newcomers and old hands alike that they are trying to fix [1]. The same problem could occur among congregations in chat circles where the circles in a group can tell well-meaning people off. How can one be more welcoming is a very broad question that affects a lot of social systems, not just the ones described in the papers. I don’t think there is an algorithmic solution so the best solution right now is to have community guidelines and enforce them well.

One last thing I’d like to point out is that the idea of Teledirection was very prophetic. It describes to a T what happens in IRL streaming, a genre of live streaming popularized on Twitch where the streamer (the tele-actor) goes about their day in real life outside their home and the chat (the tele-directors) make requests or gives directions on what the streamer should do. The limits on what can be done need to be enforced by the streamer. A very famous example is IcePoseidon, an IRL streamer with a rabid fanbase who cause havoc wherever he goes. His presence at any place triggers the fanbase to start disturbing the business, making prank calls, attacking the business on review sites. I find it fascinating how the paper managed to predict it so well.

[1] https://stackoverflow.blog/2018/04/26/stack-overflow-isnt-very-welcoming-its-time-for-that-to-change/

 

Read More

Reflection #6 – [09/13] – [Subil Abraham]

  1. Sandvig, Christian, et al. “Auditing algorithms: Research methods for detecting discrimination on internet platforms.” Data and discrimination: converting critical concerns into productive inquiry (2014): 1-23.
  2. Hannak, Aniko, et al. “Measuring personalization of web search.” Proceedings of the 22nd international conference on World Wide Web. ACM, 2013.

Both papers deal with the topic of examining personalization and recommendation algorithms that lie at the heart of many online businesses, be it travel booking, real estate, or your plain old web search engine. The first paper “Auditing algorithms” brings up the potential for bias creeping in or even the intentional manipulation of the algorithms to advantage or disadvantage someone. It talks about the need for transparency and proposes that algorithms should be examined via audit study and proposes several methods for doing so. The second paper attempts to identify the triggers of and measure the amount of personalization in a search engine, while controlling for other aspects that could change results but is not relevant to personalization.

I think the first problem one might run into when thinking about attempting an audit study of the algorithms of an enormous entity like Google, Facebook or Youtube is the sheer scale of the task ahead of you. When you are talking about algorithms serving millions, even billions of people across the world, you have algorithms that are working with thousands or tens of thousands of variables and it is working towards finding the optimum values for each individual user. I speculate that slight change in the user’s behavior might set of a chain reaction of variable changes in the algorithm. At this scale, human engineers are no longer in the picture and also the algorithm is evolving on its own (thanks machine learning!) and it is possible that even the people who created the algorithm no longer understand how it works. Why do you think Facebook and Youtube are constantly fighting PR fires? They don’t have as much knowledge or control of their algorithms as they might claim. Even the most direct method of a code audit might see the auditors make some progress before they lose it all because the algorithm changed out from under them. How do you audit an ever shifting algorithm of that much size and complexity? The only thing I can think of is use another algorithm that audits the first algorithm since humans can’t do it at scale. But now you run into the problem of possible bias in the auditor algorithm. It’s turtles all the way down.

Even if we are talking about auditing something of a smaller scale, an audit study is still not a perfect solution because of the possibility of things slipping through the cracks. Linus’s law “Given enough eyeballs, all bugs are shallow” doesn’t really work even when everything is out in the open for scrutiny. OpenSSL was open source and a critical piece of infrastructure but the Heartbleed bug lay there unnoticed for two years regardless of many people looking for bugs. What can we do to improve the audit study methods to catch all instances of bias without allowing the study to become impractically expensive?

Coming to the second paper, I find it fascinating the vast difference in how much the later rank results change compared to rank 1. What I want to know is why are the rank 1 results so relatively stable? Is it simply a matter of having a very high pagerank and being of maximum relevance? Are there cases where a result is hard coded in for search queries (like how you often see a link to wikipedia as the first or second result in many search results)? I think focusing specifically on the rank 1 results would be an interesting experiment. Tracking the search results over a longer period of time and looking at the average time periods between rank 1 results changing and also looking at what kind of search queries see the most volatility in rank 1 results.

Read More

Reflection #4 – [09/06] – Subil Abraham

Summary:

This paper analyzes the phenomenon of sock puppets – multiple accounts controlled by a single user. The authors attempt to identify these accounts specifically on discussion platforms from the various signals that are characteristic of sock puppets and build a model to then identify them automatically. They have also characterized different kinds of sock puppet behavior and show that not all sock puppets are malicious (though keep in mind that they use a wider definition of what a sock puppet account is). They have found that it is easier to identify a pair of sock puppets (of a sock puppet group) from their behaviour with respect to each other than it is to find a single sock puppet in isolation.

 

Reflection:

It seems to me that though this paper specifically mentions that they have a broad definition of what a sock puppet is and distinguishes between pretenders and non pretenders, the paper seems to be geared more towards the study and identification of pretenders. The model that is built seems to be better trained at identifying the deceptive kinds of sock puppets (specifically, pairs of deceptive sock puppets of the same group) given the features it uses to identify them. I think that is fair, since the paper mentions that most sock puppets are used for deception and identifying them is of high benefit to the discussion platform. But I feel like if the authors were going to discuss non pretenders too, they should be explicit about their goals with regards to the detection they are trying to do. Just stating “Our previous analysis found that sockpuppets generally contribute worse content and engage in deceptive behavior.” seems to be going against their earlier and later statements about non pretenders and seems to clump them together with the pretenders. I know I’m rambling a bit here but it kind of stood out to me. I would say separate out the discussions of non pretenders and only briefly mention them, focus exclusively on pretenders.

Following that train of thought, let’s talk about non pretenders. I like the idea of having multiple online identities and using different identities for different purposes. I believe that it was something that was more widely practiced in the earlier era where everyone was warned not to use their real identity on the internet (but in the era of Facebook and Instagram and personal branding, everyone seems to have gathered towards using one identity – their real identity). It’s nice to see that there are still some holdouts and it’s something that I would like to see studied. I want to ask questions like: Why use different identities? How many explicitly try to keep their separate identities as separate (i.e. not allow anyone to connect their different identities? How would you identify non pretender sock puppets since they don’t tend to share the same features of the pretenders that the model seems to be (at least to me) is optimised for? Perhaps one could compare writing styles of suspected sockpuppets using word2vec or look at what times they post at (i.e. looking at the time period in which they are active rather than looking at how quickly they post one after another like you would for a pretender).

The authors have pointed out that sock puppets share some linguistic similarities with trolls. This takes me back to the paper on anti social users [1] we read last week. Obviously, not all sock puppets are trolls. But I think an interesting question is how many of the puppet masters fall under the umbrella of anti social users seeing as they are somewhat similar. The anti social paper focused on single users but what if you threw the idea of sock puppets into the mix? I think with the findings of this paper and that paper, you would be able to identify more easily the anti social users who use sock puppet accounts. But they are probably only a fraction of all anti social users so it may or may not be very helpful in the larger scale problem of identifying all the antisocial users.

One final thing I thought about was studying and identifying teams of different users who post and interact with each other similar to how sock puppets accounts work. How would identifying these be different? I think they might have a similar activity feature values to sock puppets and at least slightly different post features. Will having different users rather than the same user post and interact and reinforce each other muddy the waters enough that ordinary users, moderators and algorithms can’t identify them and kick them out? Can they muddy the waters even further by having each user in the team have their own sock puppet group but where the sock puppets within a group avoid interacting with each other like a regular pretender sock puppet group would, but instead only with sock puppets of the other users on their team? I think the findings of this paper could be effectively be used to identify these cases as well with some modification, since in this case the teams of users are essentially doing the same thing as single user sock puppets. But I wonder what these teams could do to bypass that. Perhaps they could write longer and different posts than a usual sock puppet to bypass the post features test. Perhaps post at different times and interact more widely to fool the activity tests. The model in this paper could provide a basis but would definitely need tweaks to effectively use it.

 

[1] Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” Icwsm. 2015.

Read More

Reflection #3 – [09/04] – [Subil Abraham]

Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.” Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017.

 

Summary:

The authors of this paper examined how people perceived the credibility of events that were reported by users. The goal was to build a model that would be able to identify, with a good degree of accuracy, how credible a person will perceive an event based on the text of the tweets (and replies to the tweets) related to the event. To do this, they used a dataset of collected twitter streams with the tweets classified by events and the time when they were collected. These tweets were also rated for their credibility on a 5 point scale. The authors also put forward 15 language features that could be used to influence the credibility perception. With all this in hand, the authors were able to analyze and identify the words and phrases in the different language feature categories that corresponded to high and low perceived credibility scores for the various events. Though the authors advise against using the model on its own, it can be used to complement other systems such as fact checking systems.

 

Reflection:

What I found most interesting was the phenomenon of how the perception of credibility seems to flip for positive to negative and vice versa between the original tweets and replies. I first thought there might be a parallel here between tweets-replies and news article-comments but of course that doesn’t work because there are cases where the replies are perceived more credible than the original so that parallel doesn’t always work because the original tweets are not always credible. (Then again, there are cases where a news article is not necessarily credible so maybe there is a parallel here after all? I’m sorry, I’m grasping at straws here.)

“Everything you read on the internet is true. – Mahatma Gandhi.” This is a joke that you’ll sometimes see on Reddit but also serves as a warning against believing everything you read because you perceive it to be credible. The authors of this paper mentioned how the model can be used to identify content with low credibility and boot it from the platform before it can spread. But could it also be used by trolls or people with malicious intent to augment their own tweets (or other output) in order to increase the perceived credibility of their tweets? This could certainly cause some damage as we are talking about false information being more widely believed because it was improved thanks to algorithmic help where otherwise it may have had a low perceived credibility.

Another thing to consider is longer form content. This analysis is necessarily limited by their dataset which only consists of tweets. But I have often found that I am more likely to believe something if it is longer and articulate. This is especially apparent to me when browsing Reddit where I am more likely to believe a well written, long paragraph or a multi paragraph comment. I try to be skeptical but I still catch myself believing something because it happens to be longer (and also believe it more when it is gilded, but that’s for a different reflection). So the question that arises is: What effect does length have on perceived credibility? And how do the 15 language factors the authors identified affect perceived credibility in such longer form text?

 

Read More