Summary
The goal of this paper was to identify specific statistical differences between fakes news and real news. The primary concern of this paper focuses more on categorizing the different features of fakes news rather than similar papers that may try to classify only the difference between satirical pieces. Using a wide variety of stylistic, complexity, and features, they were able identify fake news 71% and 78% of the time when compared to real news. Along these lines, they also could correctly classify 91% of satire news to real news. Lastly, they had a harder time identifying satire piece to fake with a 55% and 67% success rate.
Reflection
Much like the last paper, I didn’t find much of the results from this paper to be surprising either. But even so, I believe they did do what they set out to study in the first place, which was to be able to find features the can categorize fake news from real news. Even though I wasn’t surprised, I felt as though the paper was well written. The authors followed the data science workflow and succinctly described everything they were doing and why. Specifically, their explanations of the limitations on their datasets and how they tried to combat them was quite thorough. One thing I did notice was that they explained their reasoning with using ANOVA, but completely left off why the Wilcoxon rank sum would work when ANOVA wouldn’t. As we read more articles, I think this paper could be a great reference for structure and linguistics going into the semester project.
One thing I felt was missing from the paper was reflection. They did very well in telling us the who, what, how, and why but don’t really offer ideas on what all of these conclusions might mean and what this research could lead to. I have a few idea on different research branches that could come from this:
- Maybe try to do the same type of classification work, but use news articles from different geographical areas or languages. This could be used to test whether or not fake news can be generalized and studied in a broader light.
- Although a bit tangent, I think it would be interesting to specifically look at the positive sentiment differences between the three types of news pieces they defined. They looked at negative sentiment but I wonder if they were to add positive sentiment into the mix, would they be better able to distinguish satire from fake news?
In regards to the upcoming semester project, I liked the general methodologies and statistical analyses that the authors used in their research. I might look to do something that may be similar to this type of research with natural language processing and lexical analysis. Using many different types of features and combining them into a coherent argument looks impressive to me and I wonder how they were able to combine all of these factors together and even form a classifier from the results.