Reading Reflection #2 – 1/31/19 – Dat Bui | CS4984 Spring19: Data Science & Analytics Capstone

This Just In: Fake News Packs a Lot in the Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

Summary: The goal of the writers was to build a classifier that could distinguish fake news from real news. They used well known hypothesis testing methods known as ANOVA and the Wilcoxon rank sum test to approach the problem. It is found what mainly distinguishes real articles from fake articles is the title. Fake news titles use significantly fewer stop-words and nouns, while also using significantly more proper nouns and verb phrases compared to real news article titles. It has also been found that fake news is generally more similar to satire than it is to real news. Compared to real news articles, fake news articles use fewer technical words, smaller words as a whole, less punctuation, fewer quotes, and more redundancy. Fake news articles have also been found to have more negative emotion than real or satire news.

Reflection: This paper was interesting to read, and especially considering today’s political climate and biased news sources. It is important to distinguish real news from fake news because in today’s age of instant news, it is easy to read any article and automatically believe it

Things to consider: It was stated by the authors that they are not aware of whether ornot there was any selection bias in collecting the data, and that they can not say anything about the traffic generatedby any of the stories. The main thing to be aware of is that the selected articles may have selection bias, which could have heavily influenced results. There is a chance that those collecting data could have intentionally looked for simple, repetitive articles with crazy headlines for the fake news samples, and more eloquent articles for their real news samples.

Accuracy: The findings seem correct in that fake news articles are indeed different from real and satire articles, but because of low accuracies, the classifier is almost useless. The authors’ features hit ”’between 71% and 91% accuracy” when distinguishing fake articles from real news. While 91% accuracy may seem high, it is not a great indicator of anything. That means 91% of the time it classifies the information correctly, but the other 9% of the time, it either classifies fake news articles as real, or real news articles as fake; in other words, out of 10000 articles, 9100 would be classified correctly, while 900 would not, either fake news is classified as real news, or vice versa.

Stylistic differences: I find it interesting that, while fake news articles seem a lot like clickbait, structurally, they are not as similar as we may think. It is concerning to think that fake news articles are not as easy to spot as we initially thought. It has been found, however, that there are definitely stylistic differences between fake news titles andreal news titles; namely in the length and word complexity. Fake news titles tend to be longer with more simple words.

Questions: Because fake news titles are more similar to satire than real news articles, is there a possibility that people used to click on them as a ”joke”, only to find that eventually they started to believe in what they were reading?

Are titles intentionally written in a way as to imitate satire, to evoke more clicks from people who think it may be a funny read?

If a fake news article had a short title with complex words, and a writing style similar to a real news article, could we distinguish it from a real article?

Leave a Reply Cancel reply