Today’s topic of discussion is Credibility and Misinformation online.
Mitra, Tanushree et al. (2017) – “A Parsimonious Language Model of Social Media Credibility Across Disparate Events”- CSCW 2017 (126-145).
Summary
The paper mainly focuses on establishing the credibility of news across social media. The authors identified 15 theoretically grounded linguistic assumptions and took help of the CREDBANK corpus to construct a model that would map language to the perceived levels of credibility. Credibility has been broadly described as believability, trust and reliability along with other related topics. However, the term credibility has been termed as both subjective or objective depending on the area of expertise of the researcher. A CREDBANK [1] was constructed which is essentially a corpus of tweets, topics, events and associated human credibility judgements. The corpus has credibility annotations on a 5-point scale (-2 to +2). The paper dealt with the perceived credibility (annotations based as “Certainly Accurate”) of the reported twitter news of a particular event. Proportions of annotations (Pca = “Certainly Accurate” ratings of event / Total rating for that event) was calculated. An event was rated as “Certainly Accurate” if its Pca belonged to the “Perfect Credibility class” (0.9≤ Pca ≤1). All events were given a credibility class of Low to Perfect (rank as Low ≤ Medium ≤ High ≤ Perfect). The linguistic assumptions were considered as the potential predictors of perceived credibility. The potential credibility markers were namely, Modality, Subjectivity, Hedges, Evidentiality, Negations, Exclusions and Conjugations, Anxiety, Positive and negative emotions, Boosters and Capitalization, Quotation, Questions and Hashtags. Nine variables were used as controls namely, Number, average length and number of words in original tweets, retweets and replies. The regression technique used an alpha (=1) parameter to determine the distribution of weight amongst the variables. It was found out that retweets and replies with longer message lengths were associated with higher credibility scores whereas, higher number of retweets were correlated with lower credibility scores.
Reflection
It has become increasingly common for people to experience news through social media and with this comes the problem of the authenticity of that news. The paper dealt with few credibility markers which assessed the credibility of the particular post. It spoke about the variety of words used in the post and how they are perceived to be.
Firstly, I would like to point out that certain people have their own jargon. The millennials speak in a specific language, a medical professional may use a certain language. This may be perceived as negative or dubious language which may in turn reduce the credibility. Does the corpus have variety of informal terms and languages as well as group specific languages in the database to avoid erroneous result?
Additionally, a statement in the paper says, “Moments of uncertainty are often marked with statements containing negative valence expressions.” However, negative expressions are also used to depict some unfortunate event. Let’s take the example of the missing plane MH 370. People are likely to use negative emotion while tweeting about that incident. This certainly doesn’t make it uncertain or less credible.
Although this paper dealt with the credibility of news in the social media realm, namely twitter, credibility of news is still a valid concern when it comes to all forms of news sources. Can we apply this to Television and Print media as well? They are often accused of reporting unauthenticated news or even being bias in some cases. If a credibility score of such media is also measured other than the infamous “TRP or Rating”, it would make these news outlets credible as well. It would force the news agencies to validate their source and this index or score would also help the readers or followers of the network to judge the authenticity of the news being delivered.
[1] Mitra, Tanushree et.al. (2015)- “CREDBANK: A Large-scale Social Media Corpus With Associated Credibility Annotations“