Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.” Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017.
This paper represents an in-depth and meticulous analysis of different linguistic and social network features that affect the credibility of a post. Mitra et. al. take intuition from linguistic cues such as subjectivity, positive emotions, hedging along with social network specifics such as retweets and replies to build a model that maps such features to the level of credibility. This study with thorough experimentation and validation not only provides strong evidence of the effects of such features but also gives qualitative insights and implications of the research.
The model fit comparison section specifically reflects several social network characteristics. For example, getting better explanatory power after including both the original texts and replies highlights the role of context and conversational nature of the social media interaction. Seeing the low predictive power of non lexicon based features, such as hashtags, caps and question marks, I am curious about whether all such features could be grouped into the “readability index” of the corpus corresponding to each event. It is possible that lower readability can be a good predictor of lower credibility. (Although, it is not clear just by intuition whether higher readability will be a good predictor of higher credibility)
Credibility in non-anonymous networks can have strong ties to how the source is viewed by the reader. Authors discuss that they did not include source credibility in the features but I think that “poster status” can also impact the perceived credibility. For example, I am more likely to believe in the fake news posted by my colleagues rather than a stranger with the same source. Similarly, I am more likely to believe in the information provided by a user with higher karma points than one with the lower karma points. Because the credibility annotations were done by turkers, it is not possible to assess the effect of poster status in the current setup. But, in a retrospective study, it is possible to have additional non-lexicon based features such as user statistics and tie strengths between the poster and the reader.
Such analysis that comprises of strong linguistic and non-linguistic features can be also applied to detecting fake news. Websites such as “Snopes”, “PolitiFact” have pieces of news and the fact-check review on them tagged by “original content”, “fact rating” and “sources” which can be used either for stand-alone analysis or grouping the twitter event streams as fake or credible.
Finally, I believe that consequences of credibility range from disbelieving in scientific and logical information such as the importance of vaccinations and climate change to believing in conspiracy theories and propaganda. Fast paced online interactions do not allow the users to analyze every piece of information they get. This makes the linguistic and social influence perspective on credibility more relevant and important in de-biasing the online interaction.