[1] Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.” Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017.
Summary
This paper proposes a model to quantify the perceived credibility of posts shared over social media. Using Twitter, the authors collect data on 1377 topics ranging over three months and constituting of a total of 66 million tweets. They study how the language used can define an event’s perceived credibility. Mechanical Turkers were asked to label the posts on a 5-point Likert scale which ranged from [-2 to 2]. The authors then went on to define four classes to classify credibility and defined 15 linguistic measures as indicators of perceived credibility level. The rest of the paper goes on to discuss the results and what can be learned from them.
Reflections
There is a large scale penetration of social media in our daily lives. It has become a platform for people to share their views, opinions, feelings, and thoughts. Individuals use language which is frequently accompanied by ambiguity and figure of speech, which makes it difficult even for humans to comprehend it. With any event occurring around the world, tweets are one of the first things that start floating on the internet. This calls for a credibility check.
Use of Twitter is very apt because the language of Twitter is brief due to its character limitations, which makes it ideal to study language features. Although the paper performs a thorough analysis to create a feature set that can help quantify credibility, there are many features that can improve the model.
Social media is filled with informal language, which is hard to process from a natural language processing point of view. It is unclear how the model deals with it. For example, the word “happy” has a positive sentiment while the word “happppyyyyyyyyy” shows a more positive sentiment. The paper considers punctuation marks like question marks and quotations but fails to acknowledge a very important sentence modifier – the exclamation mark. It serves as an emotion booster. For example, the observe the difference between the sentences “the royals shutdown the giants” and “the royals shutdown the giants !!!!”.
Twitter has evolved over the years and with it the way people use it. Real-time event reporting tweets now span over multiple tweets, where each reply is a continuation of the previous tweet. Example. Tweets reporting news also include images or some sort of visual media to give a better idea of the ground reality. The credibility of the author and the people retweeting it also change the perceived credibility. For example, if someone with a high follower to following ratio makes a tweet or retweets some other tweet, its credibility will naturally increase. Can we include these changes to better understand the perceived credibility?
Subjective words like the ones associated with trauma, anxiety, fear, surprise, disappointment are observed to contribute to credibility. This raises the question, can emotion detection in these tweets contribute to perceived credibility? Having worked with emotional intelligence over twitter data, I believe we could come up with complex feature sets that consider the emotion of the tweet as well as the topic in hand and study how emotions play a role in estimating credibility.
One contradiction, I observed is that when hedge words like “to my knowledge” are used in the tweet, they contribute to higher perceived credibility. However, the use of evidential words like “reckon” result in lower perceived credibility, In regular language, both can be used interchangeably but evidently one increases credibility while the other decreases it. Why would this be the case?
There is one more general trend in the observations which is intriguing. In most of the cases, the credibility of a post is high if it tends to agree with the situation at hand. Does that mean, a post will have high credibility if it agrees with a fake event and have low credibility if it disagrees with it?
In conclusion, the paper does an exhaustive study of different linguistic cues that change the perceived credibility of the posts and discussed in detail how the credibility changes from one language feature to another. However, considering how social media has evolved over time, many new amendments can be made to this existing model to create an even more robust and general model.