Reflection #3 – [9/04] – Eslam Hussein

Tanushree Mitra, Graham P. Wright and Eric Gilbert “A Parsimonious Language Model of Social Media Credibility Across Disparate Events”

Summary:

The authors in this paper did a great work trying to measure the credibility of a tweet/message based on its linguistic features and they also incorporated some non-linguistic ones. They built a credibility classifier statistical model that depends on 15 linguistic features which could be classified into two main categories:

1- Lexicon based: that depends lexicons built for special tasks (negation, subjectivity …)

2- Non-lexicon based: questions, quotations … etc.

They also included some control features used to measure the popularity of the content such as number of retweets, tweet length … etc.

They used a credibility annotated dataset CREDBANK to build and test their model. And they also used several lexicons to measure the features of each tweet. Their model achieved a 67.8% accuracy which conclude that the language usage has considerable effect on assessing the credibility of a message.

 

Reflection:

1- I like how the authors addressed the credibility of information on social media from a linguistic perspective. They neglected the source credibility factor when assessing the credibility of the information due to studies that show that information receivers pay more attention to its contents than its source. In my opinion the credibility of the source is a very important feature that should have been integrated to their model. Most of people tend to believe information delivered by credible sources and question ones that come from unknown sources.

2- I would like to see the results after training a deep learning model with this data and those features.

3- Although this study is very important step in countering misinformation and rumors in social media. I wonder how people/groups who used to spread misinformation would misuse those findings and linguistically engineer their false messages in order to deceive such models. What other features could be added in order to prevent them from using the language features to deceive their audience?

4- This work inspires me to study the linguistic features of rumors that have been spread during the Arab spring.

5- I find the following findings very interesting and deserves further study. “The authors found that the number of retweets was one of the top predictors of low perceived credibility. Which means the higher number of retweets the less credible the tweet, and also retweets and replies with longer message lengths were associated with higher credibility scores”. That finding reminds me of online misinformation and rumor attacks during the political conflict between Qatar and its neighboring countries, where online paid campaigns organized to spread misinformation through twitter characterized by the huge number of retweets without any further replies or comments, just retweets. How numbers could be misleading.

Leave a Reply

Your email address will not be published. Required fields are marked *