Reading Reflection #2

Fake news is an increasingly frequent problem and has a negative effect on readers. Horne and Adali used three data sets to try to find distinguishing characteristics of fake news, real news and satire. They looked to answer the question, “Is there any systematic stylistic and other content differences between fake and real news.” The work they did is important since misleading or incorrect information has been found to have a higher potential to become viral. It is important to be able to identify fake news and understand what characteristics set it apart. The authors had the following findings.

-The content of fake and real news is substantially different.

-Titles are a strong differentiating factor.

-Fake content is more closely related to satire than to real.

-Real news persuades through arguments, while fake news persuades through heuristics.

I thought that the paper was well done. The most surprising findings to me were that real news persuades with arguments and fake news persuades through heuristics. The Elaboration Likelihood Model was interesting and future work could be done to further study the effect of fake news on readers and how fake news is perceived. I found it interesting that so much can be determined from the titles of articles. If the intent of fake news publishers is to spread incorrect information, could they change their titles and structure to more closely match real news in order to spread misinformation more effectively? In addition, I thought that the data sets used had limitations and that future work would benefit from more comprehensive data. The authors also acknowledged this. A big challenge seems to be deciding how to classify fake news and real news consistently. Future work could be done to define this further.

Read More

Reading Reflection #2 – 1/31/19 – Dat Bui

This Just In: Fake News Packs a Lot in the Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

Summary: The goal of the writers was to build a classifier that could distinguish fake news from real news. They used well known hypothesis testing methods known as ANOVA and the Wilcoxon rank sum test to approach the problem. It is found what mainly distinguishes real articles from fake articles is the title. Fake news titles use significantly fewer stop-words and nouns, while also using significantly more proper nouns and verb phrases compared to real news article titles. It has also been found that fake news is generally more similar to satire than it is to real news. Compared to real news articles, fake news articles use fewer technical words, smaller words as a whole, less punctuation, fewer quotes, and more redundancy. Fake news articles have also been found to have more negative emotion than real or satire news.

Reflection: This paper was interesting to read, and especially considering today’s political climate and biased news sources. It is important to distinguish real news from fake news because in today’s age of instant news, it is easy to read any article and automatically believe it

Things to consider: It was stated by the authors that they are not aware of whether ornot there was any selection bias in collecting the data, and that they can not say anything about the traffic generatedby any of the stories. The main thing to be aware of is that the selected articles may have selection bias, which could have heavily influenced results. There is a chance that those collecting data could have intentionally looked for simple, repetitive articles with crazy headlines for the fake news samples, and more eloquent articles for their real news samples.

Accuracy: The findings seem correct in that fake news articles are indeed different from real and satire articles, but because of low accuracies, the classifier is almost useless. The authors’ features hit ”’between 71% and 91% accuracy” when distinguishing fake articles from real news. While 91% accuracy may seem high, it is not a great indicator of anything. That means 91% of the time it classifies the information correctly, but the other 9% of the time, it either classifies fake news articles as real, or real news articles as fake; in other words, out of 10000 articles, 9100 would be classified correctly, while 900 would not, either fake news is classified as real news, or vice versa.

Stylistic differences: I find it interesting that, while fake news articles seem a lot like clickbait, structurally, they are not as similar as we may think. It is concerning to think that fake news articles are not as easy to spot as we initially thought. It has been found, however, that there are definitely stylistic differences between fake news titles andreal news titles; namely in the length and word complexity. Fake news titles tend to be longer with more simple words.

Questions: Because fake news titles are more similar to satire than real news articles, is there a possibility that people used to click on them as a ”joke”, only to find that eventually they started to believe in what they were reading?

Are titles intentionally written in a way as to imitate satire, to evoke more clicks from people who think it may be a funny read?

If a fake news article had a short title with complex words, and a writing style similar to a real news article, could we distinguish it from a real article?

Read More

[Reflection #2] – [1/31] – [Matthew, Fishman]

Quick Summary

Why is this study important? With the sheer quantity of news stories shared across social media platforms, news consumers must use quick heuristics to decide if a news source is trustworthy. Horne et al. set out with the task of finding distinguishing characteristics between real news and fake news, to help news consumers be better able to distinguish the two.

How did they do it? They studied stylistic, complexity, and psychological features of real news, fake news, and satire articles to better help them classify each from one another. Horne et al. then used ANOVA tests on normally-distributed data sets and Wilcoxon rank sum tests on non-normally distributed features to find which features differ between the different categories of news. They then selected the top 4 distinguishing features for both the body text and title text of the articles to create an SVM model with a linear kernel and 5-fold cross-validation.

What were the results? Their classifier achieved 71% to 91% cross-validation accuracy over a 50% baseline when distinguishing between real news and either satire or fake news. Some of the major differences they found between fake news and real news is that real news has a much more substantial body, while fake news has much longer titles with simpler words. This suggests that fake news writers are attempting to squeeze as much substance into the titles as possible.

Reflection

Again, I was not as surprised by the outcomes of this study as I had hoped. It seems obvious that, for example, fake news articles would have a less substantial body and use simpler words in their titles than real news. However, the lack of stop words and length of fake news titles did surprise me; I had always associated fake news with click-bait, which usually has very short titles.

A few problems I had with the study included:

  • The data sets used. If the real enemy here is fake news, then why would the researchers use only a total of 110 fake news sources (in comparison to 308 satire news sources and 4111 real news sources). No wonder the classifier had an easier time distinguishing real news from the other two.
  • The features extracted. The researchers could have used credibility features or different user interaction metrics like shares or clicks to better distinguish fake news from real. If the study utilized more than just linguistic features, their classifier could have been much more accurate.

Going Forward:

Some improvements on the study (in addition to the dataset size and features extracted) could be to do some user research:

  • How much time do users spend reading a fake news article in comparison to real news?
  • What characteristics of a news consumer are correlated with sharing, liking, or believing fake news?
  • What is the ratio of fake news articles clicked or shared to that of real news?

Questions Raised:

  • Can we predict the type of user to be more susceptible to fake news?
  • How have fake news’ linguistics changed over the years? What can we learn from this is predicting how they might change in the future?
  • Should real news sources change the format of their titles to give lazy consumers as much information as possible without needing to read the article? Or, would this hurt their baseline as their articles might not get as many clicks if all the information is in the title?
  • Should news aggregates like Facebook and Reddit be using similar classifiers to mark how potentially “fake” a news article is?

Read More

[Reading Reflection 2] – [01/31] – [Numan, Khan]

This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

Summary:

This paper’s overall goal is to prove whether fake news differs systematically from real news in style and language use. The authors’ motivation for this goal is to disprove the assumption that fake news is written to look similar to real news. In other words, fooling the reader who doesn’t check for credibility of source and arguments mentioned in the article. The paper’s method of proving this assumption is false is by studying three data sets and their features using one-way ANOVA test and Wilcoxon rank sum test. The first data set is from Buzzfeed’s analysis of real and fake news from 2016 US Elections. The second data set has news articles on US politics from real, fake, and satire news sources. The third data set contains real and satire articles from a previous study. The paper chose to include satire articles in order to differentiate it from other papers on fake news. The paper concluded that fake news articles had less content, more repetitive language, and fewer punctuation than real news. Furthermore, fake news articles have longer titles, use few stop words, and fewer nouns compared to real news. When comparing fake news to satire news, Horne and Adali were able conclude that fake news is more similar to satire news than real news–disproving the assumption from the beginning of the paper.

Reflection:

The assumption that this paper is trying to prove wrong is a belief that I have. When fake news became more prevalent during the 2016 Presidential Election, I viewed those articles as trying to appear as real news but have a lack of credibility in the sources and arguments used. Initially, I found it interesting that they were trying to disprove this assumption because my point of view was that fake and real news were similar. However, I became fascinated with their inclusion of satire news in their data sets. I had never thought of comparing fake news to satire news. Because I agree with the way the paper defines satire news as “…explicitly produced for entertainment”. Why would fake news–that’s purpose is to deceive–be similar to news that is read for entertainment and mockery? But now thinking about it more and looking at the bigger picture, I have a much different view of fake news now after reading this paper. These fakek news articles are not only trying to deceive people but are created for parody purposes since satirical news can easily grabs peoples’ attention too. Therefore, it would make a lot of sense that fake news is similar to satire news.

While I don’t have much experience with Natural Language Processing (NLP) or Understanding (NLU), the features defined in this paper for the datasets obtained made sense to me. In other words, there seem to be no unnecessary or overlooked features for this paper. Being able to gauge word syntax, sentence & word level complexity, and sentiment, all make sense based on the goal of this paper which is to determine if fake news differs systematically from real news in style and language use. These features provide information from a high and low level for language analysis. Personally, due to my inexperience in this field, I would be eager to learn how to analyze natural language in Python in the future.

Something else I appreciated that Horne and Adali did was acknowledging that a statistical test won’t say anything about predicting classes in the data. Therefore, they used the statistical tests as a way of feature selection for their Support Vector Machine (SVM) model that would help them classify news articles based on small feature subsets. It was amazing that from these subsets that they used in their classifier significantly improved the prediction of fake and satire news where they were able to get between 71% and 91% accuracy in separating from real news stories. One question that I was curious about is Horne and Adali selection of features for their SVM. Why did they chose specifically the top 4 features from their hypothesis testing? Is it because they are trying to avoid over-fitting? Would we see a difference if they used Principal Component Analysis as way of telling which features would be the best in terms of classification of fake versus real news?

The reflection from the statistical tests based on their defined features were clear and made sense. Sometimes this paper effectively reflects on the results found, however, some of the reflections in this paper are not surprising. For example, I already knew that fake news titles tend to be shorter in content and have longer titles than real news. My belief–from before reading this paper–is that fake news intends to grabs readers’ attention through those long titles. Therefore, it makes sense that the writers of fake news articles packs as much info into the titles. This leads me to think that readers tend to be more interested in the titles of fake news compared to real news. This leads to me another question that could be a project idea. What can we do to help the general public easily detect real news versus fake news–when readers are obviously attracted to the titles of fake news articles? Should real news articles adjust their titles? While I critique that some of the reflection was obvious, a lot of the results found from the syntax features was very interesting such as fake news uses fewer punctuation and their titles uses fewer stop words but more proper nouns. Overall, this paper was very well written, could have had some more analysis based on their results, but I really appreciate them creating a working SVM model for classifying fake and real news.

Read More

[Reading Reflection 2] – [01/30] – [Raghu, Srinivasan]

Summary:

This paper was primarily about how fake news appears to resemble satirical news as opposed to the commonly-held belief that fake news resembles real news. Through analyzing data sets containing fake, real, and satirical news, comparisons were made between these groups based on a set of features. Some of the key conclusions drawn are listed below.

  • The content of fake news and real news articles is vastly different from one another. Real news articles tend to be slightly longer, whereas fake articles use fewer technical words and typically require a lower educational level to read. Fake news also contains more redundancy and personal pronouns.
  • Titles are a significant factor in differentiating between fake and real news articles. Fake news titles tend to be longer and usually contain more capitalized words and proper nouns. It’s also common for fake news titles to have several verb clauses, as the writers try to squeeze as much substance as they can.
  • Fake news articles are more similar to satire than real news articles. Both types of articles usually contain smaller words and tend to use fewer technical words. In addition, both fake news and satirical articles tend to contain more redundancy.

Reflection:

I have listed below a few of the lines that interested me in the paper.

“SentiStrength is a sentiment analysis tool that reports a negative sentiment integer score between -1 and -5 and a positive sentiment integer score between 1 and 5, where -5 is the most negative and 5 is the most positive.”

SentiStrength is one of the more interesting features these data sets are being analyzed upon. The results of the study showed that there was a correlation between fake news and generally negative sentiment. Although the sentiment of real news articles tends to be more neutral, the majority of satirical articles have a negative sentiment as well. Which begs the question, could there be a sentiment analysis tool that could analyze sarcasm and humor? Such a tool could certainly aid in distinguishing between fake news articles and satire.

“Fake news packs the main claim of the article into its title, which often is about a specific person and entity, allowing the reader to skip reading the article, which tends to be short, repetitive, and less informative.”

Fake news has been one of the prevalent issues that our country has faced in recent years, primarily due to having a significant impact on the 2016 US Presidential Election. If Facebook or Google users only read a headline without delving in and determining the legitimacy of an article, then fake news suddenly changes from being an annoyance to a danger. Machine learning has an obvious application here in tracking down fake news, but this raises several other questions.

  • Will users be alright with Facebook restricting what they can and cannot share with their friends?
  • What happens if there are articles with a mix of real news (to bypass fake news detectors) but also contain fake news?
  • Is this task too much for AI and does it require the presence of a human?

Other Thoughts:

Overall, the paper’s results were not surprising at all, as satire is one of the primary forms of fake news. Although with a different agenda, satire does not contain legitimate news, and has the potential to mislead readers. One of the principal questions asked in determining the credibility of an article is “Does this article seem like satire?”

Read More

Reading Reflection #2 – [01/31] – [Alon Bendelac]

Summary:

The issue of fake news has received a lot of attention lately due to the 2016 US Presidential Elections. This research paper compares real, fake, and satire sources using three datasets. The study finds that fake news is more similar to satire news than to real news; and uses heuristics rather than arguments. It is also found that news article titles are significantly different between real and fake news. Three categories of features were studied: stylistic, complexity, and psychological. The study uses the one-way ANOVA test and the Wilcoxo rank sum test to determine if the news categories (real, fake, and satire) show statistically significant differences in any of the features studied. Support Vector Machine (SVM) classification was used to demonstrate that the strong differences between real, fake and satire news can be used to predict news of unknown classification.

Reflection:

Punctuation: In Table 3(c), one of the stylistic features is “number of punctuation.” The study looks at all types of punctuation as a whole, and only considers the total number of punctuations. I think it would be interesting to look at specific punctuation types separately. For example, maybe fake news articles are more likely than real news articles to have an ellipsis in the title. An ellipsis might be a common technique used by fake news organizations to attract readers. Similarly, question marks in the title might also be commonly used in fake news articles.

Neural networks: The study used Support Vector Machine (SVM) to predict if an article is real or fake. I wonder how a neural network, which is more abstract and flexible than an SVM, would perform. In Table 6, only two of the three categories (real, fake, and satire) are tested at a time, because an SVM is designed for two classes. A neural network could be designed to classify articles into one of the three categories. This would make more sense than an SVM, since we usually can’t eliminate one of the three categories and then test for just the other two.

Processing embedded links: This study only looks at the bodies of articles as plain text, without considering possible links within the text. I think looking into where embedded links direct you could help detect fake news. For example, if an article contains a link to another article known to be fake news, then the first article is most likely also fake news. The research question could be: Can embedded links be used to predict if a news article is fake or real?

Number of citations and references: I believe the real news stories are more likely to contain citations, references, and quotes than fake news stories. Number of quotes was one of the stylistic features, but number of references was not studied. A reference could be to a study or another news article related to the one in question. A reference could also be to a past event.

Read More

Reflection #2 – [1/31/2019] – [Phillip Ngo]

Summary

The goal of this paper was to identify specific statistical differences between fakes news and real news. The primary concern of this paper focuses more on categorizing the different features of fakes news rather than similar papers that may try to classify only the difference between satirical pieces. Using a wide variety of stylistic, complexity, and features, they were able identify fake news 71% and 78% of the time when compared to real news. Along these lines, they also could correctly classify 91% of satire news to real news. Lastly, they had a harder time identifying satire piece to fake with a 55% and 67% success rate.

Reflection

Much like the last paper, I didn’t find much of the results from this paper to be surprising either. But even so, I believe they did do what they set out to study in the first place, which was to be able to find features the can categorize fake news from real news. Even though I wasn’t surprised, I felt as though the paper was well written. The authors followed the data science workflow and succinctly described everything they were doing and why. Specifically, their explanations of the limitations on their datasets and how they tried to combat them was quite thorough. One thing I did notice was that they explained their reasoning with using ANOVA, but completely left off why the Wilcoxon rank sum would work when ANOVA wouldn’t. As we read more articles, I think this paper could be a great reference for structure and linguistics going into the semester project.

One thing I felt was missing from the paper was reflection. They did very well in telling us the who, what, how, and why but don’t really offer ideas on what all of these conclusions might mean and what this research could lead to. I have a few idea on different research branches that could come from this:

  • Maybe try to do the same type of classification work, but use news articles from different geographical areas or languages. This could be used to test whether or not fake news can be generalized and studied in a broader light.
  • Although a bit tangent, I think it would be interesting to specifically look at the positive sentiment differences between the three types of news pieces they defined. They looked at negative sentiment but I wonder if they were to add positive sentiment into the mix, would they be better able to distinguish satire from fake news?

In regards to the upcoming semester project, I liked the general methodologies and statistical analyses that the authors used in their research. I might look to do something that may be similar to this type of research with natural language processing and lexical analysis. Using many different types of features and combining them into a coherent argument looks impressive to me and I wonder how they were able to combine all of these factors together and even form a classifier from the results.



Read More

Reading Reflection #2 – [1/31/2019] – [Sourav Panth]

This Just In: Fake News Packs a Lot in the Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

Summary:

The premise of this article was “is there any systematic stylistic and other content differences between fake and real news?” To conduct their studies, Benjamin D. Horne and Sibel Adalı used three different data sets. The first one had been featured by Buzzfeed through their analysis of real and fake news items from 2016 US Elections. The second data set was collected by them and contains news articles on US politics from real, fake, and satire news sources. The third data set contains real and satire articles from a past study. This allowed them to draw some conclusions, primarily that fake news is shorter and uses a lot of repetitive content.

Reflection:

First, I really enjoyed this paper a lot more than the first article we read, it was a lot more structured and organized with the questions they asked. They didn’t go too broad. I also thought it was really interesting how they used feature engineering to be able to detect whether an article is fake or real.

The authors talk about how real news articles are significantly longer than fake news articles and that fake news articles use fewer technical words and quotes. This is not surprising at all to me, it’s hard to be technical and use quotes when the article is fake. I believe a lot of it also has to do with the fact that fake news primarily is used as click bait and commonly as a source of revenue just from getting consumers to view ads on their site. While real news also gets revenue from consumers using ads on their website one of their main goals is to inform the public. Because of this “real news” it’s probably backed up with more data and facts, they may even have updates on the same page.

Another thing that they talked about that was not surprising to me was that fake news articles often had longer titles than real news articles. This kind of goes back to what I was talking about the in the previous paragraph where fake news publishers are just trying to catch the eye of the consumer and get them to click on their link. An example that they gave is that fake news will often use more proper nouns in the title, this goes with the click bait theory because people will click on a link if it’s related to a celebrity or public figure that they have an interest in. I’m not sure if this is what they were going for, but it kind of seems like their title was longer than it needed to be as well.

Finally he talks about fake news being more closely related to satire then to real news. Again this doesn’t surprise me at all because satire is essentially fake news however they don’t advertise their articles as being real, all their consumers know that it’s fake and just for entertainment. One thing that really surprised me was the fact that they were able to distinguish if an article was satire or fake news over 50% of the time.

Future Work:

I think the first thing that I would work on is figuring out if the top four features, number of nouns, lexical redundancy (TTR), word count, and number of quotes, that they use are the best features and if there could be added features to increase accuracy.

Another thing that could be very interesting is seeing when the fake news articles are at the peak of publishing and if that correlates with any important events like the US election. This could help to show if fake news is a recent trend or it’s always been around but just not publicly known.

Read More

[Reflection #2] – [01/31] -[Liz Dao]

Benjamin D. Horne and Sibel Adali. “This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News.”

Summary:

The main goal of the research is to build a model to differentiate fake news from real news. The authors also analyze satire news, which is considered a type of fake news in this paper. The data is collected from 3 data sets: Buzzfeed 2016 election data set; Burfoot and Baldwin data set; and a data set containing fake, real, and satire news created by the authors. The articles are analyzed and compared with each other based on three main feature categories: stylistic, complexity, and psychological. A Support Vehicle Machine classification model is built with the four most significant features selected from ANOVA and Wilcoxon rank sum test. 

The result of the research is in sync with previous studies of fake news, which can be summarized in three main points:

1.    Fake news articles have more lexical redundancy, more adverbs, more personal pronouns, fewer nouns, fewer analytic words, and fewer quotes. This means that fake news articles is much less informative and require a lower educational level to read than real news articles.

2.    Real news articles convince readers through solid reasoning while fake news supports their claim through heuristics.

3.    The content and title of fake news and satire articles are extremely similar.

Reflection:

First of all, the authors definitely practiced what they preached. The title of the paper is packed with verb phrases yet contains zero stop word. Moreover, all three main conclusions are included in the title, which hopefully will increase the chance of this article, or at least its main points, being read. Despite it might sound like a half-hearted solution, the authors’ suggestion of transforming real news articles’ titles to resemble that of fake news articles is actually a good idea. What will happen if we create the title of real news articles using the formula for fake news articles? Will people be more likely to read them? Will people classify them as fake news based on their titles?

    In spite of successfully building a classification model with a relatively high accuracy distinguishing fake and satire from real news articles, the research fails to deliver any new findings. Indeed, the result is nothing but a reconfirmation of previous studies. Furthermore, the difference between fake and real news articles seems obvious to most people. Similar to clickbait, detecting fake news articles is relatively easy. But the bigger question is how can we improve people willing to read real news articles instead of scrolling through a list of fake news titles?

    One interesting finding is the different title features of fake news articles between BuzzFeed 2016 election news dataset and the political news dataset collected by the authors. The former one uses significantly more analytical words. Nevertheless, the later one has more verb phrases and past tense words. That suggests that there is more than one type of fake news. Their difference in word choices also suggests they might be targeting different groups or trying to provoke different reactions. It can be interesting to study the cause of the distinction in feature between fake news articles.

    In addition, it is surprising that the SVM model produces much more accurate classification with satire articles than with fake news articles. It “achieve a 91% cross-validation accuracy over a 50% baseline on separating satire from real articles.” On the other hand, the model only “achieve a 71% cross-validation accuracy over a 50% baseline when separating the body texts of real and fake news articles.” Is that because of the mocking tone that distinguishes satire from real news articles? Furthermore, the model has a low accuracy when separating satire from fake news articles. This might post an issue as we might want to treat satire articles differently than fake news articles.

    Lastly, the distinction between clickbait’s and fake news articles’ title is quite intriguing. Because of their similarity in lack of validity, ethics, and valuable information; many people put clickbait and fake news in the same category. Yet, these two types of articles serve completely different purposes. Clickbait encourages readers to visit the web page thus the titles “have many more function words, more stop words, more hyperbolic words (extremely positive), more internet slangs, and more possessive nouns rather than proper nouns.” Fake news, meanwhile, wants to deliver their messages even if the majority of the links are never clicked. Hence, their titles, loaded with claims about people and entities, are an extremely concise summary of the whole articles. One way or the other, both fake news and clickbait have found their strategy to attract readers’ attention and engagement. So why real news articles are failing so far behind? Is it because they are not aware of the tricks fake news and clickbait are using? Or that they are too proud to give up their formal and boring titles?

Read More

[Reading Reflection 2] – [1/30] – [Henry Wang]

Summary

Horne and Adali focus on the problem of fake news and how its contents differ from real news with an analysis of three different data sets. Previous work has been focused on the issue of how fake news spreads in networks and the two researchers take a new approach and investigate the content-specific differences: style and complexity among others. The researchers conducted an ANOVA test and a Wilcoxon rank sum test to compare statistics for different metrics such as all caps, swear words, pronouns, and also used a support vector machine for classification of fake vs real, satire vs real, and satire vs fake of news articles.

Reflection

This paper was an interesting read, and incredibly relevant given today’s technology that facilitates instantaneous news alerts through mobile and other handheld devices. What was surprising to me was that the researchers offered additional insights beyond the data itself, unlike the previous article. The paper notes that its main findings, that real news and fake news target people differently (real news connects with readers based on arguments while fake news connects with readers based on heuristics), can be explained by the Elaboration Likelihood Model theory. This is an interesting idea, and something that is mentioned for a brief moment in the paper but is something that may contribute to the spread of fake news is echo chambers. If all news comes from biased and un-objective sources, people most likely have a harder time of discerning actual real news from fake news.People can be incredibly trusting towards news sources that their friends and family listen to, so this creates an issue where objectively fake news becomes real news and vice versa and any contradictions are then deemed as fake news. 

Another interesting point that is raised in the paper is that classification for satire vs fake news has a much lower classification accuracy with the support vector machine used than for satire vs real news. On Reddit, there is a sub-reddit called “AteTheOnion” which contains screenshots of people on social media who respond to articles without realizing the articles are satire and it would be interesting to analyze the contents of articles referenced in that sub-reddit to see where audiences incorrectly classified news to better determine why exactly the misclassification between satire and real news occurred.To a careful reader, satire should be clear just by examining the headline (does this make sense logically, could this be a joke; these are relevant questions for media literacy and engaging with online media) but to me there are so many factors as to why people may be susceptible to interpreting satire and fake news as real news that it would be hard to classify whether a person will interpret a new news article as fake news/satire as real news. 

Additional Questions

  • Given what we know about the typical headline and overall structure of fake news, can we automate generation of fake news articles that are believable?
  • Based on the previous paper, how does audience perception differ for real news vs fake news in terms of Twitter tweets, favorites, or other metrics?
  • Given a person’s reading history can we reliably predict how susceptible they are to incorrectly classifying an article as real, fake, or satire?


Read More