Tweeting is Believing? Understanding Microblog Credibility Perceptions | CS 6724: Investigative Technologies in Society

Paper: Morris, M. R., Counts, S., Roseway, A., Hoff, A., & Schwarz, J. (2012). Tweeting is Believing?: Understanding Microblog Credibility Perceptions. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (pp. 441–450). New York, NY, USA: ACM.

Discussion Leader: Sukrit V

Summary:

Twitter has increasingly become a source of breaking news and thus assessing the credibility of these news tweets is important. Generally, users access tweets in one of three ways: through their news feed (from people they follow), from Twitter’s recommended trending topics and through search. When searching for tweets, users generally have less information to base their credibility judgements on as compared to content they get from followers (who they presumably know or know of).

This paper investigates features that impact users’ assessments of tweet credibility. The authors ran an initial pilot study where they observed users thinking aloud while conducting a search on Twitter. It was noted that the participants often commented on the user’s avatar, user name and the contents of the tweet. They obtained 26 such features from the pilot study, which they used to design a survey.

In the survey, respondents were asked to assess how each feature impacts credibility on a five-point Likert scale and other demographic data. They found out that tweets encountered through search elicited more concern for credibility than those from people they followed. In addition, topics such as break news, natural disasters, and politics was more concerning than celebrity gossip, restaurant reviews, and movie reviews. The features that most enhanced a tweet’s credibility perception included the author’s influence, topical expertise, and reputation. Content-related features included a URL leading to a high-quality site and the existence of other tweets conveying similar information. Conclusively, the participants were aware of features that conveyed credibility but most of them were obscured due to Twitter’s interface and at-a-glance nature.

They conducted two experiments following their findings from the survey and pilot study. The aim of both was to find out the impact of these tweet features on credibility perceptions. In the first experiment, they varied the message topic, user name, and user image and found that participant’s tweet credibility and author credibility assessments were highly correlated, and that they were generally unaware of the true truth values of the tweets i.e. which ones were false and which were true. In addition, message topic and user names affected credibility. The user names that were topical received higher ratings than traditional names and internet names. However, due to the design of the experiment, users only saw profiles with the same avatar type (default, female, male, generic, and topical) . Thus, they designed a second experiment where each person was subject two different types of images and found that default Twitter icon avatars received significantly lower ratings, while other image types did not show any significant difference.

The authors concluded with the implications of their findings. That is, for individual users those who plan to tweet only on one topic should pick topical user names, otherwise, they should select a traditional user name. Non-standard grammar and default user photos also damaged credibility. They also suggested design changes to Twitter’s interface: author credentials should be made visible at a glance and metrics on the number of retweets or number of times a link has been shared, and who is sharing those tweets, would provide users with more context for assessing credibility. Conclusively, the authors found that users’ credibility judgements relied on heuristics and were often systematically biased. This highlights the significance of presenting more information on Tweeters when they are unknown to the user.

Reflection:

Overall, this was a very well-designed study. I particularly liked how the authors approached the survey design: by first performing a pilot study to determine what to ask, asking these questions, determining what features were important from the answers to these questions, finally narrowing down to three features, and studying the effects of these features.

Their follow-up study was necessary and could have been possibly overlooked had they not hypothesized why there was no significant impact of user image type on tweet credibility and author credibility ratings.

The basis for the entire study was reliant on Twitter users using its search feature and encountering random tweets. Clearly, search is not always implemented that way: sometimes it is organized by popularity, how recent the tweet was made and possibly by how ‘distant’ (by the number of degrees of connection) the tweet is. Obviously, mimicking this in a study is difficult to do. It would be interesting if Twitter’s search algorithm ordered tweets based on how ‘distant’ each tweet’s author was from the user (based on the number of degrees of connection).

I appreciate that the authors accounted for race and age-related bias in their study when determining the different user photos to use. Further, they mention that there is no difference in credibility ratings for male and female profiles. Of special mention is Thebault-Spieker’s recent work [1] on the absence of race and gender bias in Amazon Mechanical Turk workers. The participants for this study are from an email list, however, and the question arises, why is there no gender related bias? Or if there is, how statistically significant is it?

The tweet content that they use for the study does not seem particularly ecologically valid, however, having recently joined Twitter, I do not believe I am the right person to comment on the same. For example, they used only bit.ly links in the Tweets and the text itself is not how most people on Twitter write. Their use of ecologically valid Twitter avatars and gender-neutral user names, however, is commendable.

Their finding that participants were generally unable to distinguish between true and false (both plausible) information was of particular note.

They also noted that topical user names did better than user names of the other two types. Perhaps this was because of a lack of any biographic information (as they hypothesize). An interesting follow-up study would be to test whether topical user names with biographical information or traditional user names with similar biographical information would be perceived as being more credible.

Questions:

Do you think journalist or technology-savvy people would perform search and assess search results on Twitter differently from typical users? If so, how?
The participants for this study are from an email list, however, and the question arises, why is there no gender related bias? Or if there is, how statistically significant is it? Perhaps it is due to the design of the experiment and not necessarily that all 200-odd participants were free from gender-related bias. [Re: Thebault-Spieker’s CSCW paper (please ask me for a copy)]
Do you believe the content of their tweets were ecologically valid?
Why do you think topical user names did better than traditional and internet user names?
How could search results be better displayed on social networks? Would you prefer a most-recent, random, or ‘distance’-based ordering of tweets?
Aside: Do you think the authors asked how often the participants used Bing Social Search (and not only Google Real Time Search) because they work at Microsoft Research?

References:

[1] Jacob Thebault-Spieker, Daniel Kluver, Maximilian Klein, Aaron Haflaker, Brent Hecht, Loren Terveen, and Joseph Konstan 2018. Simulation experiments on (the absence of) ratings bias in reputation systems. In Proceedings of the 20th acm conference on computer supported cooperative work & social computing(CSCW ’18).

Sukrit Venkatagiri