- “A Parsimonious Language Model of Social Media Credibility Across Disparate Events.” Tanushree Mitra, Graham P. Wright, Eric Gilbert
Mitra et. al. put forth a study assessing the credibility of the events and the related content posted on social media websites like Twitter. They have presented a parsimonious model that maps linguistic cues to the perceived credibility level and results show that certain linguistic categories and their associated phrases are strong predictors surrounding disparate social media events.A model that captures text used in tweets covering 1377 events (66M tweets) they used Mechanical Turks to obtain labeled credibility annotations, Mechanical Lurk by Amazon was used. The authors used trained penalized logistic regression employing 15 linguistic and other control features to predict the credibility (Low, Medium or High) of event streams.
The authors mention that the model is not deployable. However, the study is a great base for future work in this topic. It is simple model deals with only linguistic cues, and the Penalized Ordinal Regression seems like a prudent choice, but coupled with other parameters such as location and timestamp among other things, it could be designed as a complete system in itself.
- The study mentions that the content of a tweet is more reliable, than source, when it comes to assessment of credibility. This would hold true almost always except for when the account posting a certain news/article is an account notorious for fake news or conspiracy theories. A simple additional classiffer could weed out such outliers from general consideration.
- A term used in the paper, ‘stealth advertisers’ stuck to my head and it got me thinking about ‘stealth influencers’ masquerading as unbiased and reliable members of the community. They often use click-baits and the general linguistic cues possessed by them which are generally in extremes such as, “Best Gadget of the Year!!” or “Worst Decision of my Life”
- And their tweets may often fool a naive user/mode looking for linguistic cues to assess credibility. This revolves around the study by Flanagin and Metzger, as there are characteristics worthy of being believed and then there are characteristics likely to be believed.[2] This begs the question, is the use of linguistic cues to affect credibility on social media hackable?
- Further, Location/ Location based context is a great asset to assess credibility. Let me refer to the flash-flood thunderstorm warning issued on recently in Blacksburg. A similar type of downpour or notification would not be heeded seriously in a place which experiences a more intense pour. Thus location based context can be a great marker in the estimation of credibility.
- The authors included the number of retweets as a predictive measure, however, if the reputation/ verified status/karma of the retweet-ers is factored into account, the prediction might become a lot more easier. This is because, multiple trolls retweeting a sassy/ fiery comeback, is different from reputed users retweeting genuine news.
- Another factor is that linguistic cues picked up from a certain region/community/discipline may not be generalizable as every community has a different way of speaking online with jargons and argot. The community here may refer to a different academic discipline or ethnicity, The point being, that the linguistic cue knowledge has to be learned and cannot be transferred.
[2] – Digital Media and Youth: Unparalleled Opportunity and Unprecedented Responsibility Andrew J. Flanagin and Miriam J. Metzger