Reflection #3 – [9/4] – [Deepika Rama Subramanian]

  1. Mitra, G. P. Wright, and E. Gilbert, “A Parsimonious Language Model of Social Media Credibility Across Disparate Events”

SUMMARY:

This paper proposes a model that aims in classifying the credibility level of a post/tweet as one of Low, Medium, High or Perfect. This is based on 15 linguistic measures including modality, subjectivity which are lexicon-based measures and questions, hashtags that are not lexicon based. The study uses a CREDBANK corpus which contains events, tweets and crowdsourced credibility annotations. It not only takes into consideration the original tweet but also retweets and replies to the original tweet and other parameters such as tweet length. The penalized ordinal regression model shows that several linguistic factors have an effect on perceived credibility most of all subjectivity followed by positive and negative emotions.

REFLECTION:

  1.  The first thing that I was concerned about was tweet length, this was set as a control. We have, however, in the past, discussed as to how shorter tweet lengths tend to be perceived as truthful because the tweeter wouldn’t have much time to type in a tweet while in the middle of a major event. The original tweet length itself negatively correlated with perceived credibility.
  2. The language itself is constantly evolving, wouldn’t we have to continuously train with newer lexicon as time goes by? 10 years ago, the word ‘dope’ and ‘swag’ (nowadays used interchangeably with amazing or wonderful) would have meant some very different things.
  3. A well-known source is one of the most credible ways of getting news offline. Perhaps combining the model with one that tests perceived credibility based on source could give us even better results. Twitter has some select verified accounts that have higher credibility than others. The platform could look to assign something akin to karma points for accounts that have in the past given out only credible information.
  4. This paper has clearly outlined that some words evoke the sense of a certain tweet being credible more than some others. Could these words be intentionally used by miscreants to seem credible and spread false information? Since this model is lexicon based, it is possible that the model cannot automatically adjust for it.
  5. One observation that initially irked me in this study was that the negative emotion was tied to low credibility. This seems about correct when we think about how the Kubler-Ross model’s first step is denial. If this is the case, I first wondered how anyone was going to be able to deliver bad news to the world ever. However, taking a closer look at the words that have a negative correlation specifically are ones that seem negatively accusatory (cheat, distrust, egotist) as against sad (missed, heartbroken, sobbed, devastate). While we may be able to get the word out about say a tsunami and be believed, outing someone to be a cheat may be a little more difficult.

Leave a Reply

Your email address will not be published. Required fields are marked *