Reading Reflection #2 – [1/31/2019] – [Sourav Panth]

This Just In: Fake News Packs a Lot in the Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

Summary:

The premise of this article was “is there any systematic stylistic and other content differences between fake and real news?” To conduct their studies, Benjamin D. Horne and Sibel Adalı used three different data sets. The first one had been featured by Buzzfeed through their analysis of real and fake news items from 2016 US Elections. The second data set was collected by them and contains news articles on US politics from real, fake, and satire news sources. The third data set contains real and satire articles from a past study. This allowed them to draw some conclusions, primarily that fake news is shorter and uses a lot of repetitive content.

Reflection:

First, I really enjoyed this paper a lot more than the first article we read, it was a lot more structured and organized with the questions they asked. They didn’t go too broad. I also thought it was really interesting how they used feature engineering to be able to detect whether an article is fake or real.

The authors talk about how real news articles are significantly longer than fake news articles and that fake news articles use fewer technical words and quotes. This is not surprising at all to me, it’s hard to be technical and use quotes when the article is fake. I believe a lot of it also has to do with the fact that fake news primarily is used as click bait and commonly as a source of revenue just from getting consumers to view ads on their site. While real news also gets revenue from consumers using ads on their website one of their main goals is to inform the public. Because of this “real news” it’s probably backed up with more data and facts, they may even have updates on the same page.

Another thing that they talked about that was not surprising to me was that fake news articles often had longer titles than real news articles. This kind of goes back to what I was talking about the in the previous paragraph where fake news publishers are just trying to catch the eye of the consumer and get them to click on their link. An example that they gave is that fake news will often use more proper nouns in the title, this goes with the click bait theory because people will click on a link if it’s related to a celebrity or public figure that they have an interest in. I’m not sure if this is what they were going for, but it kind of seems like their title was longer than it needed to be as well.

Finally he talks about fake news being more closely related to satire then to real news. Again this doesn’t surprise me at all because satire is essentially fake news however they don’t advertise their articles as being real, all their consumers know that it’s fake and just for entertainment. One thing that really surprised me was the fact that they were able to distinguish if an article was satire or fake news over 50% of the time.

Future Work:

I think the first thing that I would work on is figuring out if the top four features, number of nouns, lexical redundancy (TTR), word count, and number of quotes, that they use are the best features and if there could be added features to increase accuracy.

Another thing that could be very interesting is seeing when the fake news articles are at the peak of publishing and if that correlates with any important events like the US election. This could help to show if fake news is a recent trend or it’s always been around but just not publicly known.

Leave a Reply

Your email address will not be published. Required fields are marked *