Benjamin D. Horne and Sibel Adali. “This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News.”
Summary:
The main goal of the
research is to build a model to differentiate fake news from real news. The
authors also analyze satire news, which is considered a type of fake news in
this paper. The data is collected from 3 data sets: Buzzfeed 2016 election data
set; Burfoot and Baldwin data set; and a data set containing fake, real, and
satire news created by the authors. The articles are analyzed and compared with
each other based on three main feature categories: stylistic, complexity, and
psychological. A Support Vehicle Machine classification model is built with the
four most significant features selected from ANOVA and Wilcoxon rank sum
test.
The result of the
research is in sync with previous studies of fake news, which can be summarized
in three main points:
1. Fake news articles have more lexical
redundancy, more adverbs, more personal pronouns, fewer nouns, fewer analytic
words, and fewer quotes. This means that fake news articles is much less
informative and require a lower educational level to read than real news
articles.
2. Real news articles convince readers through
solid reasoning while fake news supports their claim through heuristics.
3. The content and title of fake news and
satire articles are extremely similar.
Reflection:
First of all, the authors definitely practiced what they preached. The title of the paper is packed with verb phrases yet contains zero stop word. Moreover, all three main conclusions are included in the title, which hopefully will increase the chance of this article, or at least its main points, being read. Despite it might sound like a half-hearted solution, the authors’ suggestion of transforming real news articles’ titles to resemble that of fake news articles is actually a good idea. What will happen if we create the title of real news articles using the formula for fake news articles? Will people be more likely to read them? Will people classify them as fake news based on their titles?
In spite of successfully building a classification model with a relatively high accuracy distinguishing fake and satire from real news articles, the research fails to deliver any new findings. Indeed, the result is nothing but a reconfirmation of previous studies. Furthermore, the difference between fake and real news articles seems obvious to most people. Similar to clickbait, detecting fake news articles is relatively easy. But the bigger question is how can we improve people willing to read real news articles instead of scrolling through a list of fake news titles?
One interesting finding is the different
title features of fake news articles between BuzzFeed 2016 election news
dataset and the political news dataset collected by the authors. The former one
uses significantly more analytical words. Nevertheless, the later one has more
verb phrases and past tense words. That suggests that there is more than one
type of fake news. Their difference in word choices also suggests they might be
targeting different groups or trying to provoke different reactions. It can be
interesting to study the cause of the distinction in feature between fake news
articles.
In addition, it is surprising that the SVM model produces much more accurate classification with satire articles than with fake news articles. It “achieve a 91% cross-validation accuracy over a 50% baseline on separating satire from real articles.” On the other hand, the model only “achieve a 71% cross-validation accuracy over a 50% baseline when separating the body texts of real and fake news articles.” Is that because of the mocking tone that distinguishes satire from real news articles? Furthermore, the model has a low accuracy when separating satire from fake news articles. This might post an issue as we might want to treat satire articles differently than fake news articles.
Lastly, the distinction between clickbait’s and fake news articles’ title is quite intriguing. Because of their similarity in lack of validity, ethics, and valuable information; many people put clickbait and fake news in the same category. Yet, these two types of articles serve completely different purposes. Clickbait encourages readers to visit the web page thus the titles “have many more function words, more stop words, more hyperbolic words (extremely positive), more internet slangs, and more possessive nouns rather than proper nouns.” Fake news, meanwhile, wants to deliver their messages even if the majority of the links are never clicked. Hence, their titles, loaded with claims about people and entities, are an extremely concise summary of the whole articles. One way or the other, both fake news and clickbait have found their strategy to attract readers’ attention and engagement. So why real news articles are failing so far behind? Is it because they are not aware of the tricks fake news and clickbait are using? Or that they are too proud to give up their formal and boring titles?
Read More