Reflection 11 – [10/16] – [Karim Youssef]

In social communication, there are multiple values that people tend to respect in order to gain different types of benefits. Being polite is one of the most important among these values. In modern online communities, politeness plays a great role for the community to ensure healthy interactions and for individuals to maximize their benefits, either being a request for help, conveying an opinion to an audience, or any other types of online social interactions.

In their work “A computational approach to politeness with application to social factors”, Danescu-Niculescu-Mizil et al. presented a valuable approach to computationally analyzing politeness in online communities based on linguistic traits. Their work consisted of labeling a large dataset of requests on Wikipedia and StackOverflow using human annotators, extracting linguistic features, and building a machine learning model that automatically classifies requests as polite or not polite with a close-to-human classification performance. They then use their model to analyze the correlation between politeness and social factors such as power and gender.

My reflection about their work consists of the following points:

  1. it is nontrivial to define a norm for politeness. One way of learning this norm is to use human annotators as Danescu-Niculescu-Mizil et al. did. It could be interesting to conduct a similar annotation for the same dataset using human annotators from different cultures ( e.g. different geographic locations ) to understand how the norm for politeness may differ. It could also be interesting to study people’s perception of politeness across different domains. For example, the norm of politeness may differ if the comments are from a political news website versus technical discussions in computer programming.
  2. The model evaluation shows a noticeable difference between the in-domain vs. cross-domain settings, as well as another noticeable difference between the cross-domain performance of the model trained with Wikipedia and the that trained with StackExchange. A simple reasoning could be that there are community specific vocabularies that make a model trained on data from one community not to generalize very well on other communities. From this point, we may conclude that the vocabulary used in comments on StackExchange is more generic than that used in the requests to edit on Wikipedia, which gives an advantage to the cross-domain model trained with StackExchange. I believe it is highly important to categorize the communities and to analyze the community-specific linguistic traits in order to make an informed decision when training a cross-domain model.
  3. Such a study could be used to help to moderate social platforms that are keen to maintain a certain level of “politeness” in their interactions. It could help moderators automatically detect impolite comments, as well as individuals to tell them how likely are their comments to be perceived as polite or not before sharing them.
  4. Given the negative correlation between social power and politeness as inferred by the study, could it be useful to rethink the design of online social systems to encourage maintaining politeness in individuals with higher social power?
  5. Although the study incurs some limitations such as the performance of cross-domain models, it represents a robust and coherent analysis that could serve as a guideline for many similar data-driven studies.

To conclude, there are multiple benefits in studying traits of politeness and automatically predict it in online social platforms. This study inspires me to start from where they stopped and work on enhancing their models and applying them to multiple useful domains.

Read More

Reflection 11 – [10/16] – [Neelma Bhatti]

  • Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors.
  • Zhang, J., Chang, J. P., Danescu-Niculescu-Mizil, C., Dixon, L., Hua, Y., Thain, N., & Taraborelli, D. (2018). Conversations Gone Awry: Detecting Early Signs of Conversational Failure.

Authors in the first paper strive to develop a computational framework which identifies politeness, or the lack thereof, in Wikipedia and Stack exchange. They uncover connections between politeness markers and context as well as syntactic structure to develop a domain-independent classifier for identifying politeness. They also investigate the notion that politeness is inversely proportional to power: The higher one ranks in a social (online) setting, the less polite they tend to become.

Reflection:

  • In introduction section of the paper, authors mention their findings about established results about relationship between politeness and gender. It also claims the prediction-based interactions to be applicable to different  communities and geographical regions. However, I didn’t seem to quite understand how the results relate to gender roles in determining politeness. I am also skeptical about the said computational framework to be applicable to different communities and geographical regions since different languages vary greatly by their politeness markers, and have different pronominal forms and syntactic structure. Also accounting the fact that all human annotators in this experiment were residing in the US.
  • Stemming from the above comment is another research direction that seemed interesting to me, does a particular gender tend to be politer than the other in discussions? What above incivility? Is gender also a politeness marker in such case?
  • Authors talk about politeness and power being inversely proportional to each other, by showing the increase in politeness of unsuccessful Wikipedia editors after the elections. This someone doesn’t seem intuitively correct. What if some unsuccessful candidates feel that the results are unjust or unfair, will they still continue being politer than their counterparts? The results seem to indicate that all such aspiring editors keep striving to achieve the position by being humble and polite, which might not always be the case.
  • Research on incorporating automatic spell checking and correction by finding word equivalents for misspelled words can help reducing false positives in the results produced by Ling. (Linguistically Informed Classifier)

Second paper talks about detecting the derailing point of conversations between editors on Wikipedia. Although authors talk at length about the approaches and limitations, there did not seem to be (at least to me) a strong motivation for the work. Possible applications that I could think of are as follows:

  • Giving early warnings to members (attackers) involved in such conversations before imposing a ban.The notion of ‘justice’ can then be prevailed in the community, by making commenters aware of their conduct beforehand.
  • Another application can be muting/shadow banning such members by early detection of conversations which can possibly go awry, to maintain a healthy environment in discussion communities.

 

 

Read More

Reflection #11 – [10/16] – [Subil Abraham]

The paper is an interesting attempt at quantifying politeness when it comes to requests. They used MTurk to annotate requests on talk pages on Wikipedia and Stack Overflow questions and answers. From the annotated data, they were able to build a classifier that could annotate new request texts with close to human level accuracy.

The analysis on Wikipedia that editors who are more polite are more likely to be promoted to admins. But the question now is, what can be done in order to make sure someone continues to be polite, even after gaining power? More generally, what incentive system can be built to prevent the someone’s power from getting to their head? We already have the obvious checks and balances like banning someone even if they are an admin if they become too disruptive. But what about preventing even the small devolution that was observed? Simply stripping the privileges one gained at the slightest sign of impoliteness would be a bad idea, for sure.  We could think about implementing the trained politeness classifier as a browser extension (like Grammarly [1] but for politeness) to tell you what your politeness levels are for what you are typing. But this might end up being suffocating to the user who has to deal with seeing this application that they are not as polite as they should be. And also, as Lindah [2] pointed out, the classifier is far from perfect, so this might not be a good idea either.

This isn’t a very shrewd insight or anything but I may as well point out that the classifier is obviously skewed to the U.S. understanding of what politeness is. The authors were upfront about how they chose their annotators but this does mean that the final results end up with marking a request as impolite or polite based on politeness in the context of the USA. Though the biggest parts of Stack Overflow and Wikipedia are in English, people from all over the world do end up contributing to it. What someone from the US may consider impolite (like the ‘Direct question’ strategy) would seem perfectly polite to a Scandinavian, as their culture is one of directness in their speech. Any future work that builds on this one must keep this in mind.

All in all, I believe this was a pretty solid paper. They set out to do something and they did it, documenting the process. Potential future work would be to take this idea and redo it in a different English speaking culture, to identify how their ideas of politeness differ from the US perspective.

 

[1] https://www.grammarly.com/

[2] https://wordpress.cs.vt.edu/cs5984/2018/10/07/conversational-behavior/

Read More

Reflection 11 – [10/11] – [Lindah Kotut]

  • Cristian Danescu-Niculescu et al. “A Computational approach to politeness with application to social factors”

The authors consider politeness in discussion and their impact on power dynamics. Using crowd-labelled data from StackOverflow and Wikipedia discussions, they are able to identify politeness markers and train a classifier with those markers towards automating the labeling process. They make the data (and the resulting tool) publicly available as a part of their contribution. The politeness tool, provides further insights about how the auto-labeling works, and how the use and placement of keyword affect the general tone of the sentence.

Interesting insights beyond what makes a post/question (im)polite, they are able to distinguish politeness by region (mid-westerners are more polite), by programming language (Python programmers are the most impolite, Ruby the most polite) and gender (women are unsurprisingly more polite) — these findings serve to ground their research on a platform-independent way.

Balanced vs Power Dynamics
The major consideration of this work was in probing power dynamics, by looking at the imbalance between administrators and normal users. It would be very interesting to extend this to note the general tone of users in a balanced discourse: If there are no explicit rules in a forum on how to conduct a discussion, is there an imbalance on conversation? Does it impact the willingness of users replying if the original question is impolite? Using the same classifier to categorize these new discussions would be a straightforward step.

Bless your heart: Sarcasm and other language features
The figure below showcases aspects that are not considered by the classifier: namely sarcasm and colloquial terms. This is expected, as no classifier is perfect except with constant learning. The politeness tool also provides a means for the user to relabel the sentences according to whether they agreed on the label (and confidence) or not. Presumably there is a mechanism to improve the classifier’s prediction accuracy by having this human-in-the-loop provision. It does not impact the paper’s contribution and impact, but it continues to raise the general question on the efficacy of machine language in understanding human language.

Beyond Wikipedia and Stack Overflow
Is bot-speak polite? And are there languages markers that distinguish them from a typical user? The use of such markers would serve as a useful augment to current means of identifying bots that rely on profile features and swarm behavior. This knowledge can further be used in strengthening spam filters on places such as discussion under articles against posts with ‘bot-like’ languages.

Do users care whether their language is considered impolite? Beyond a would-be Wikipedia editor using the knowledge gained about the impact of politeness on their chances of being granted the post, does a questioner on StackOverflow or any other platform with a clear power imbalance care that their tone is impolite? Does the respondent (or the original questioner)? Or do they only care that the answer has been received? Further behaviors can then be discussed about whether the change of behavior is successful towards getting the user their desired goals, or whether the gains do not depend on the user. In which case factors such as personality — the fact that successful would-be Wikipedia administrators for example had a predilection towards the leadership position that cannot be explained by language alone (or lend them towards utilizing language skills towards getting the rewards).

Sample results from the authors’ politeness tool: http://politeness.cornell.edu/

Read More

Reflection 10 – [10/02] – [Neelma Bhatti]

  • Starbird, K. (2017, May). Examining the Alternative Media Ecosystem Through the Production of Alternative Narratives of Mass Shooting Events on Twitter. In ICWSM (pp. 230-239).
  • Samory, M., & Mitra, T. (2018). Conspiracies Online: User discussions in a Conspiracy Community Following Dramatic Events.

Summary:

Authors in both papers explore the alternative narratives of events such as mass shootings, bombings or other crisis events and the users who engage with them by either creating or propagating such news in the name of challenging the  ‘corporate controlled media’ by presenting what they think are the actual ‘facts’.

Reflection:

Humans have this tendency of seeking surprises, and from my understanding it arises from the boredom experienced by ‘regular’ and ‘mainstream’ news one consumes all the time. Believing in, or at least giving a thought to conspiracy theory also gives a sense of being a responsible citizen who takes all news sources into account before settling on believing in one, unlike a regular news reader who readily believes in mainstream media”.

Some thoughts or future research directions that came to my mind (considerable fact: I’m not an active Reddit user) after reading Samory et al’s paper as as follows:

  • What insights could be gained by taking into consideration the number of users who leave the subreddit? What forces/urges them to do so? Do they grow sick of hearing the ‘alternate’ versions of theories? Do some of them ‘joiners’ only join shortly for gaining alternate version of a particular event and then leave? Maybe studying these figures can produce some more insight into what goes on around these communities, and how they shape over time.
  • What’s the ratio of real users vs bots and sock puppets in this subreddit? Do bots and sock puppets exist so as to promote a sense of diversity into spreading alternate narratives of the events?
  • What other dissimilar subreddits have they joined and how do they relate to each other. Qualitative analysis of such data by making a graph like that in Starbird’s paper may produce valuable insight.
  • Also, since veterans have been in the game for long, and may identify fellow veterans (or even some converts), do they join the subreddits joined by their favourite fellow veterans/converts maybe purely out of interest or friendship?

Also since I have great interest in human psychology and what forms the opinions of people, and how one thought leads to another, the first paper had me thinking about the following couple of things:

  • This is an excellent piece of work, but what about users who create their own conspiracy theories? In case of twitter, do such users post the theories they come up with on their own blogs?
  • How does gender play out in this whole conspiracy theory thing?
  • Do some or any of the conspiracy theory ever turn out to be true?
  • What is such user’s regular source of news consumption and how do they land up at the conspiracy theorist news source? Is there a pattern in linking one such news source to another so as to lead the user into the labyrinth of conspiracy theories?

Read More

Reflection 10 – [10/02] – [Karim Youssef]

the prevalent use of online social platforms such as Twitter has altered the process of news information sharing from only a journalist created and revised content to a user-generated content with low or no guidelines or regulations to ensure quality and credibility of content. This fact is not necessarily negative, user-generated news content is useful as a mean of quick reporting of a breaking event from multiple eyewitnesses. However, this system is highly prone to the spread of misinformation and rumors without a well-established technique to prevent these types of undesirable content.

One implication of the prevalence of user-generated news media is that these media became a fertile ground for promoting the spread alternative media websites, or became by themselves a source of alternative media. In her work Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter, Kate Starbird found in Twitter a gold mine from where she extracted some highly informative insights about alternative media sources and inferred valuable relations between these sources from the posting activity of Twitter users who include them in their tweets.

Starbird’s analysis represents a valuable step towards understanding and revealing some truths about alternative media websites. Her work inspires me and creates multiple questions. One question is how do alternative media contribute to shaping the knowledge and perceptions of the public? In other words, sometimes alternative media could contribute positively even if driven by a political agenda. One obvious example could be the information spread during the Egyptian revolution. In the early days of the revolution in January 2011, all the mainstream media was spreading information that was proven to be incorrect later, while a majority of news shared through alternative media that could be usually dubious has shown to be correct.

In many parts of the world, it is hard to judge what type of media is conveying credible news, as many political agendas may be driving both mainstream and alternative media sources. There must be a mean of verifying the information that reaches an individual, and sometimes a healthy amount of skepticism is required. An interesting factor to study could be the deviance in alternative media. We may try to study the most shared alternative media sources in terms of how credible they are across a period of time, and compare this to some of the mainstream media sources. One of the main challenges could be how o guarantee a neutrality of judgment.

Another suggestion would be to design social systems that encourage users to be healthily skeptical of the news that reaches them by encouraging them to verify the news through easy tasks such as a Google search. Such a design could assign a credibility score of a user that increases as this user verifies the news before sharing. Imagine that beside the share button, there is a “verify” button that when clicked, retrieves the top n relevant links from a Google search and extracts some keywords that may indicate how credible the news is. If a user presses the verify button, the news will be flagged as “credible and safe to share” or “needs more verification” or maybe “highly dubious”. As the user verifies more, he becomes a more credible user as he shares verified news.

It is hard to judge what type of media carries the truth, but it is always possible to go an extra mile and verify before sharing. This might not be as easy as it seems because a user may be blinded by a piece of news that reinforces his opinion about something.

Read More

Reflection #10 – [10/02] [Eslam Hussein]

Paper: Kate Starbird, “Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter”

Summary:

The author(s) in this paper did a very deep analysis of the alternative media ecosystem through data collected from twitter about shooting events during 2016. He performed both quantitative and qualitative analysis and included different dimensions (such as political leaning, narrative stance, account types … etc) to facilitate better understanding the data and findings.

Reflection:

  • I would like to do a similar analysis of voting events such as the Brexit and the US 2016 presidential election. I believe the findings about the political leaning would be similar (anti-globalist alternative media promoting the Brexit and Trump election)
  • I would prefer if the author did also a similar analysis but for each event in separate and compare the similarity between them. Since the data represent events scattered temporally and geologically
  • The findings in the paper regarding how the U.S. Alt right media always accused mainstream media of making fake news and they introduce themselves as anti-globalists. Those findings remind me about how often Trump used to call mainstream media as “fake news” in his tweets (more than 500 times) and he declared his statement in the UN as anti-globalist. Those anti-globalist movements in the media inspire me to do more analysis about how social media played a significant role promoting the rise of the right anti-globalists in the western world (specially during the political elections in Europe in the past few years)
  • I think it would have helped the author in his analysis if he employed some community structure analysis on the graph in order to abstract and summarize it
  • The author mentioned that he is left-leaning and the findings in figure 3 showed that the right-leaning alt medias are the most dominating source of conspiracy news., specially that the analysis relied on lots of qualitative work. I do not know if the results are biased or not but that raises questions about those finding and might require more analysis to verify that. Also it made me wonder how an author could control his preferences (political, religious, sexual … etc) while conducting a qualitative study that requires the author’s own interventions and selections?
  • How could we teach people about the leaning of such media? Mainstream media which are easier to classify their bias due to their source of funding and connections which might be clear to the ordinary user. While on the other hand non-mainstream media are harder track and verify their agenda. I think if have some website/database of such media that is updated regularly would help to track them and classify their leaning and understand the message they are spreading.

Read More

Reflection 10 – [10/02] – [Deepika Rama Subramanian]

[1] Starbird,Kate – Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter

SUMMARY

This paper examines the alternative media systems and their role in propagating fake news. This is done by creating network graphs based on tweets related to alternative narratives of mainstream events based on media type, narrative stance and political leaning.

REFLECTION

This paper chooses a dataset from twitter that is both may be slightly small and limited to a couple of incidents that people/media have alternative explanations for. Examining the r/conspiracy subreddit for more events may give more definitive results. Here we may also be able to answer some of the other questions that come to mind –

  1. What kind of news’ alternative narrative becomes most popular – based off of science, politics, disasters both man-made and natural and terrorist attacks.
  2. While there may not be sufficient information to determine these links – what other things are popular conspiracy theorists also interested in? Are they in any way a result of their interest in conspiracy theories or fueling their fantasies in any way?

What surprised me in the paper initially was the inclusion of mainstream media in their network graphs at such a large scale – The Washington Post, The New York Times, etc. However, this made me realize that anything could seem like a clue to a person who is willing to believe in something. Though this would be tough and highly subjective, we could study if people predisposed to believing conspiracy theories – is it something that they already believe before they join communities, i.e., start using social media to read about them? Or is it the result of influence from joining and reading many alternative narratives? How about the timing of these conspiracy theories? Does it take a while for people to come up with alternative narratives or are these theories generated as the events unfold? Another interesting thing to find out is whether these theories have begun coming out at shorter and shorter intervals with respect to the event that they have narratives about. We would then know if theorists are ‘learning’ or are influenced by past theories and are coming up with new ones in shorter periods of time.

I next wanted to see how these alternative narratives tied in with other concepts we’ve previously examined in class – anonymity, sock-puppets and the filter bubble. It seems evident that all of the aforementioned issues could cause or inflame cases where people are leaning towards believing in conspiracy theories. Sockpuppets are particularly useful in pushing your propaganda while the filter bubble is going to help you by making sure users are bombarded with “false” narratives. The anonymity of the users who are propagating alternative narratives doesn’t seem to be an issue for those willing to subscribe since the ‘news sells itself’.

Read More

Reflection 10 – [10/02] – [Nitin Nair]

  • Kate Starbird, Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter

In this paper, the author analyzes twitter data from a 10-month period to create what they term as “domain network” which is studied by qualitative analysis to explore a subset of “fake news.” The paper through its qualitative analysis finds different groups that propagate this fake news. These groups as shown by the author does not fit the general left-right political spectrum due to overarching commonalities between the same.

Due to the shift from traditional methods of propagation of news which although had its demerits has put the burden of wading through lot of information and its validity on to the people who aren’t trained to do so. This has exacerbated the issue of “fake news” and has created a market for alternative news which is not a new phenomenon. I believe the way to tackle this issue is through stemming the funding of these sources. Censoring by the means of deletion, given that we live in places where right to speech exists, impinges on the rights of these rumor mongers but stemming the funding could ride the thin line and achieve the result we want. Creating necessary barriers like temporary auto demonetization of content if “clickbait” titles are found along with explicit content could be a way forward. But, systems should be in place to appeal for reversing this stance by genuine members which could be considered by a case of case basis by moderators. The moderator logs could then be put out in public along with the content for increasing transparency. Putting such burden might hurt genuine content makers but a well thought out design to do the same as discussed, I believe, is a right way forward.

Believing in one conspiracy theory makes a one more likely to believe another as shown in [2], is an interesting one. This issue I believe is exacerbated by the selective exposure due to the “filter bubbles” created by these news sources. Creating mechanisms through which users can select or populate feed with views different from its own could ameliorate the spread of alternative narratives. It’s necessary to understand that one can only control the spread and not eliminate such purveyors of such narratives. Such narratives if in the right amount could be part of the entertainment part of our information diet.

Another design element which could be considered is showing a report of “information diversity” of people at regular intervals. This could be effective nudging to promote diversifying information diet.

But how does one create barriers for such content especially in places where moderation by an informed and unbiased party is not possible. IM services are a good example of such grounds. I do not have a solution for spread of misinformation through such channels but given its impact is one where focus has to be given.

Creating an alternative business model although a hard task to perform I believe an essential step to move forward. Any of the above strategy discussed above which might prove effective in the short term are stop gap mechanisms. To put things in perspective, the CPM model we follow at the moment was introduced by the team at Hotwired in October 1994 when the number of people on the internet was less than 0.4% of earth’s population compared to more than 50% right now.

 

 

[2] Van Prooijen, J. W., & Acker, M. 2015. The influence of control on belief in conspiracy theories: Conceptual and applied extensions. Applied Cognitive Psychology, 29(5), 753-761

Read More

Reflection #10 – [10/02] [Viral Pasad]

Paper-

[1] Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter – Kate Starbird

Summary-

This paper constructs a network graph of the various mainstream, alternative, left and right favouring, government controlled media sources using Twitter data. Certain insights and observations regarding conspiracy theories are extracted from the network graph.

Reflection –

 

  • Thanks to users being able to self report stories, via various different platforms and mediums, it is very difficult to keep track of credible sources and genuine news content. This paper got me wondering if there was a way to actually curb misinformation and fake news online. During a discussion in class, when lateral fact checking was mentioned, I was inspired to think of it as a very ambitious project to design an automated online fact checker to tackle misinformation in online sources. But after reading this paper, I am sceptical regarding the feasibility and accuracy of such a system.

However, one approach that I believe can be employed to curb misinformation and fake news conspiracy theories is using manipulations of weights and bias adjustments to overcome the Selective Exposure and Selective Judgement employed by human readers.

  • Further, one way to perhaps (not eliminate altogether, but) reduce the spread of alternative news is that actual stories with solid backing foundations and research proofs should also use click bait titles to grab the attention of readers via invoking Selective Exposure as well as Selective Judgement in them.
  • As mentioned, users being able to self report stories, via various different platforms and mediums, makes it very difficult to keep track of credible sources and genuine news content. However, going back to a previous reflection, reputation and karma of users can probably be taken into account before/while displaying the stories posted by them. 

This way, the users can be made aware of the context that the writer is posting in and adjust their self bias (take it with a pinch of salt)

  • Further, as discussed in class, geolocation, timestamp and verbosity of a certain tweet can also be made use of to more accurately help distinguish genuine reporting of events and fake alternative news spreading misinformation.

Read More