Reflection #3 – [09/04] – [Lindah Kotut]

  • Mitra, T., Wright, G.P., Gilbert, E. “A Parsimonious Language Model of Social Media Credibility Across Disparate Events“.

Brief:
Mitra et. al. approach the problem of credibility, and how to determine this from text and map language cues to perceived levels of credibility (using crowdsourcing). Based on language expressions, linguistic models (markers of modality, subjectivity, hedges, anxiety, etc) and Twitter behaviors during major (rapidly unfolding) social media events using 1% of data during an event (Unclear if both during an active event or including when the peak was considered over? “king mlk martin” collection time in Table 2 was instantaneous. Unless I misunderstood the process?). Unlike work that considers the source of in ascertaining credibility, this work looks only at the information quality in tweet (and retweets) in considering credible news. The features of the tweet: length, number of replies, retweet etc, was also included in this model as controls for the effects of content popularity.

The authors found that linguistic measures made for higher perceived credibility. Original tweet’s subjectivity (e.g. words denoting perfection, agreement and newness) serving as  a major predictive power of credibility, followed by positive emotions. On considering replies to tweets, both positive and negative emotions provided significant predictive power.

Reflection:
The authors do not claim the model be effective if deployed as-is, but would serve as a useful augment to existing/considered models. On looking at the different theorem/approaches that make up the omnibus model:

  • Emerging (Trending) events have the advantage of having a large participants contributing to it, whether in giving context etc. This work is a great follow-up of previous readings considering the problem of finding signal in the noise. Assuming an event where the majority of contributions are credible, and in English-ish. What would be the effect of colloquialism on language models? Considering “sectors” of Twitter use such as BlackTwitter where some words connote a different meaning from the traditional sense, is this effect considered in language models in general, or is this considered too fringe (for lack of a better term) to affect the significance of the whole corpus? Is this a non-trivial problem?
  • Tweet vs Thread Length: Twitter recently doubled the length of tweets to 480 characters, from 240 characters. According to the omnibus model presented by this paper, tweet length did not have a significant effect on establishing credibility. Threading — a Twitter phenomenon that allows complete thought to be written in connected tweets, allows for context giving when one tweet, or a series of disconnected tweets would not. Does threading, and the nuances it introduces, such as different replies and retweets, each tweet focusing on the different context of the whole story – have an effect on the controls effect on credibility?
  • Retrospective scoring: One of the paper’s major contributions is the non-reliance on retrospection as a scoring mechanism, given the importance of establishing credibility of news at the outset. It would be interesting to apply retrospective view on how sentiments changed given time, deleted tweets etc.
  • Breaking the model: Part of theoretical implications presented by the authors include the use of this approach towards sense making during these significant events, I wonder if the same approach can also be used to learn how to “mimic” credibility and sow discord?

P.S. Ethics aside – and in continuation of the second reflection above, is it… kosher to consider how models can be used unethically (regardless of whether this considerations are within the scope of the work or not).

Read More