[Reflection #2] – [01/31] -[Christopher Blair]

Perhaps the most interesting yet somewhat obvious conclusion of this paper happens on the sixth page where they state simply that “Titles are a strong differentiating factor between real and fake news”. This begs the question, for real or fake news who do we put the engagement decider on, as in, given that a real news article wants to communicate without giving away the best parts of the story, fake news articles seem to by and large explain exactly what they say in the article in the title. This is interesting proposition because it could be the basis for further explaining the phenomena of fake news. Particularity, the self-referential notions that people consider that is inherent to our bias is really in the fake news articles that we click, if we suppose that the title explains most of the story then could we delve deeper in the psyche of someone who frequently reads fake news? For example, if we were to run a study in which people who were identified were frequent consumers of fake news sites, the research question that I would ask is how long do they spend reading the article? If the title alone describes each story point as was presented in the paper then why do they even need to read the article in the first place, well I believe that in the first it is a form of social validation and constrained exploration in the form that, people will not venture out of their comfort zones into places where they disagree, this is what I believe can be explained through this study. Secondly, I believe it is like instant gratification, we know what is going to happen because we read the title of the article, so we feel gratified when we click on the story and confirm that notion. Finally, to answer the research question I believe that compared to a real news story, much less time would be spent on a fake news article because of the over-descriptive title however, the fake news website would likely try to drive inter-site engagement by presenting the user with further fake news articles to click on after. I am fixated on the titles of fake news more so than the content, because the titles themselves drive users to click their articles to the detriment of public discourse.

When it comes to the heuristical claim, I am very interested because I wanted to understand where they could possibly derive these statistics from in a way that would allow them to actively make claims without just simply synthesizing the numbers. This notion is complicated, but I remember my relatively basic statistics class about Simpsons Paradox, this is a particularity interesting jump where you could get data to “work for you”. If you can take statistics and decouple them and present them by stripping previous context from them then you can essentially make radicicolous claims, like this segment from a webpage (https://blog.revolutionanalytics.com/2013/07/a-great-example-of-simpsons-paradox.html)

Since 2000, the median US wage has risen about 1%, adjusted for inflation.

But over the same period, the median wage for:

  • high school dropouts,
  • high school graduates with no college education,
  • people with some college education, and
  • people with Bachelor’s or higher degrees

have all decreased. In other words, within every educational subgroup, the median wage is lower now than it was in 2000.

Considering most fake news is motivated by a specific special interest, working to shift perspectives of people by claiming fallacious statistical claims that would never be valid. I believe that through an extension of this paper we can look at the incidence of statistical claims within fake news articles in order to build a classifier that can take regular real news statistical claims, a separate one that can classify fake news claims and figure out whether there is a predominate style in either. My conjecture is that fake news will follow a format to the degree of citing singular statistics in isolation so that many of their readers may be unable to consider the context and externalities that need to be consider with that statistic in order to give it context. This would be an interesting project because it would give us some meta-knowledge within the field of data analytics to be on the lookout for publications that may commit this fallacy. As well as keeping people informed that the statistics that they are reading could be skewed to implant a specific idea into their heads.

Read More

Reflection#2 [1/30] [Kibur Girum]

This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News 

Summary:

The purpose of the study was to debunk the assumption that fake news is written to look more like a real news and to show that it is more related to a satire news. Their work can help to develop technologies that can detect malicious fake news. The study was conducted through a unique study of three data sets and features that capture the style and the language of articles. Based on multiple findings, the study provided the following conclusions:

  • Fake news is more similar to satire news than real news 
  • Persuasion in fake news is achieved through heuristics rather than the strength of arguments 
  • Fake news is targeted for audiences who are not likely to read beyond titles 

Reflection:

Social Media specially twitter has changed the way we acquire information in our society. This also brings a lot of challenges and one of them is the spread of fake news. We don’t have to look no more than the 2016 election to see the threat it posed in our society. I believe that this study provided a step forward in tackling this problem. Even though, I am really impressed by their findings and conclusion, the study lacked concreate arguments and a broader data set to back up their findings. Moreover, a lot of assumptions were taken which affects the creditability of their study. Nevertheless, their research gives a great insight for future studies. Considering their findings and summarization, we can reflect on different aspects and their implications on how we perceive fake news. 

Part 1: From their findings that amazes me the most is that fake news is more like a satire news than real news. This makes me to question the intention and the motive of Fake news article writers. Are some fake news articles purpose is to demonize a specific person than spreading a fake news? Conducting a more intensive research on their social media account and organization affiliation might provide great awareness.

Part 2: According to their studies, “Fake news titles use significantly fewer stop-words and nouns, while using significantly more proper nouns and verb phrases.” I think this is an eye-opening discovery that can open doors for further studies.  

Questions and further research 

  1. One question we can ask is are fake news consumers interested only on the title than the content of the article? I believe that conducting a research on fake news consumers might give insights about why fake news articles poses the above format and why it is effective. 
  2. Can we determine the standard and political affiliation of a news source based on the title of its published articles? This will help us to better distinguish a fake news from a real news.  

Part 3: The other question that I discovered after reading the paper is that do fake news articles differ in terms of their approach? Do they change their approach time to time or stay consistent? I believe that doing more research on multiple fake news sources will help us to combat fake news. 

Read More

Reading Reflection 2 – 1/30 – Yoseph Minasie

Summary

Fake news has gained a lot of attention from the 2016 presidential election and emphasis on combating it has increased since then. People have become more critical on the authenticity of publicly shared news articles that grab a large amount of attention. However, it seems as if fake news is still everywhere. There are currently many websites that check whether news articles are real or fake, but the accuracy could be better. This study was intended to help fact-checkers detect fake news early on. The main problem this study addressed was if there was any systematic stylistic and other content differences between fake and real news. The main conclusions that were: 

  • The content of fake and real news articles are substantially different. 
  • Titles are a strong differentiating factor between fake and real news
  • Fake content is more closely related to satire than to real. 
  • Real news persuades though arguments, while fake news persuades through heuristics. 

Reflection 

Their logic of real news coming from only real news sources and fake news coming from only fake news sources is flawed. There could have been some articles in which the opposite was true. This could have flawed their research if there was a good amount cases such as this.

The sharing of information conforms to ones beliefs:

This concept makes sense and I’ve noticed it happening around me. I remember one person I know, who I follow on social media, shared an article that aligned with her views but not with mine. It was about an attack on some politician. When I opened up the article, the arguments did not appear to be valid and there wasn’t much hard evidence. I didn’t think she read it carefully. In psychology there’s a term called confirmation bias that states that people are more likely to seek information parallel with their beliefs and they are also more likely to remember that information. This ties to the notion of an echo-chamber and how fake news persuades through heuristics. Another interesting study would be how much of a news article, real or fake, do people read before they share it. This could be done by tracking how long people are on the link. This could only be an estimation and doesn’t account for different speeds of reading, idle users, multitasking while reading, and other factors. 

This concept makes sense and I’ve noticed it happening around me. I remember one person I know, who I follow on social media, shared an article that aligned with her views but not with mine. It was about an attack on some politician. When I opened up the article, the arguments did not appear to be valid and there wasn’t much hard evidence. I didn’t think she read it carefully. In psychology there’s a term called confirmation bias that states that people are more likely to seek information parallel with their beliefs and they are also more likely to remember that information. This ties to the notion of an echo-chamber and how fake news persuades through heuristics. Another interesting study would be how much of a news article, real or fake, do people read before they share it. This could be done by tracking how long people are on the link. This could only be an estimation and doesn’t account for different speeds of reading, idle users, multitasking while reading, and other factors. 

Titles are a strong differentiating factor between fake and real news:

The example given in the paper is an obvious example of fake vs real news titles. Since not every fake news article will have the same clear signs, another example should have been given with less indicators in order to highlight the subtle differences between the two.

Fake content is more closely related to satire than to real:

The point of satire is to bring across a point using humor, irony, and exaggeration. So satire should be more related to fake content. Satirical news sources are very popular and do highlight key points. I follow some of them on social media and sometimes wonder if it’s apparent to everyone that these articles are satire. I say this because I’ve seen several comments from people who take the news literally and do not differentiate fact from satire. They then come forward with negative criticism. Some people would think that’s funny and reply with a sarcastic remark to get a reaction from that person.

Since this paper and future work on this topic will be public,one obstacle could be the use of this research by people with malicious intent. If they do create fake news and change their style a little based on this research, it could be harder for fact-checkers to detect fake articles, lowering their accuracy. More in-depth methods would need to be applied in order combat this issue. 

Read More

Reading Reflection 2 – 01/30 – Alec Helyar

Summary

This article analyzes the language style of fake news articles and compares them to those of real and satirical pieces.  To do this, the researchers assembled three datasets: a Buzzfeed collection of high-impact articles, 75 hand-picked articles in each category, and a Burfoot and Baldwin collection of “non-newsy” satirical articles.  From there, the researchers extract natural language features from the articles which measure the style, complexity, and psychology of the language choices used.  Using these features, the researchers use ANOVA and Wilcoxon tests to detect differences between the three groups of articles.  These tests deduced that fake news articles were noticeably distinguishable from real news, and were in many ways closer to satire.  Finally, the researchers built a linear SVM model to classify articles, and found that their features could predict fake and satire news at an accuracy of 71% and 91%, respectively.

Reflection

First, I found the data collection methodology much more prudent than that of the last article.  All three datasets are valid, but flawed.  So, it was commendable that the researchers chose to measure all of them separately, to account for their shortcomings.  This level of depth is impressive, since it would have been sufficient to simply pick their second, hand-picked dataset.  It captures a strong list of well-known sources for each category, and the results would remain statistically significant so long as they pick the articles at random.  The fact that they felt that they needed to use all three datasets shows the complexity involved in distinguishing and sampling fake news.

Second, I found the list of top 4 features they extracted to be very interesting.  They were: number of nouns, lexical redundancy (TTR), word count, and number of quotes.  Out of hundreds of features which measure the style, complexity, and psychological impact of the article, the most important factors are essentially how many quotes, nouns, and unique words they used.  This is an interesting finding, and I’m left wondering: why didn’t the researchers include a Principle Component Analysis or Feature Importance analysis to tell us how much of the variance or accuracy could be explained by these four features?  It would be an insight by itself to learn that, say, 40% of the variance could be explained by word count alone.

Finally, the concluding remakes in the article have opened up an interesting question.  The researchers describe how difficult it is to obtain non-noisy ground truth for fake and real news, especially when real news becomes increasingly opinion-based.  Could the researchers repeat this experiment with real vs. opinion-based/biased/editorials?  And if the amount of bias present is a scale, could they instead model a regression to determine the level of bias present in modern news sources?

Read More

[Reflection #2] – [01/30] -[Jonathan Alexander]

Overview

This paper is about a study conducted to characterize antisocial behavior in large online discussion communities. The study aims to answer three questions: are there users that only become antisocial later in their community life, does a community’s reaction to antisocial behavior improve the behavior, and can antisocial users be identified before they cause any issues in the community? The study used CNN.com, Breitbart.com, and IGN.com to conduct their analysis of online discussion communities. They found that users who were eventually banned from these communities concentrated their efforts on a smaller number of threads, wrote worse than other users over time, and were exacerbated by other users over time. Using these conclusions the author created a classifier to try to identify negative users before the antisocial behavior got them banned by the websites moderators.

Reflection

  • The websites chosen by the author for use in the study are known to house biased and controversial discussions. The banning of users or antisocial behavior on these websites may be inherent to the nature of the topics being discussed. This is specifically true on CNN.com and Breitbart.com. Given the polarizing nature of politics, these websites are not so much discussion communities and may be more susceptible to users engaging in undesired behaviors more so than other discussion communities that are actually communities. Websites such as Breitbart.com have content that is inflammatory to some and may lead to outbursts that this study would call antisocial.
  • The paper also states that users that violate the community norms are eventually permanently banned from the discussion community. The author uses this as their “ground truth” for analyzing antisocial behavior online. However, communities such as Breitbart.com and CNN.com likely contain many users with similar views and biases. A user that violates these norms and is banned may not necessarily be displaying antisocial behavior. It is also possible that the banned user disagreed with the views of the community or did not hold their same beliefs. These differences between users and the specific community they engaged with could have led to them violating that communities norms and being banned. That does not mean their behavior was antisocial, rather it could have just been following a different set of norms than that found in the subset of society on Breitbart.com or CNN.com. I think an interesting question would be what portion of users banned from websites like Breitbart.com and CNN.com help viewpoints opposing the majority of users on those platforms?
  • The paper employs the use of human workers to label the appropriateness of a post. Doing this introduces bias into the ratings inherent with the introduction of humans. The three websites chosen represent three widely different communities, that are each different in some way from the mainstream culture. Asking workers to rate these posts who have no experience with the community they are rating or what is appropriate for that setting has the potential to produce inaccurate or biased ratings as each of the workers has to rely on their own opinions and beliefs to decide what is appropriate.
  • Later in the paper they use the fact that those who are banned from websites tend to write posts that are less similar than the others on the forum. I do not think this is a clear indication the posts included antisocial behavior. If the users are very different from the community they are communicating with that may be the reason they are banned. It could be a difference of opinion or personality, not necessarily antisocial behavior.
  • The author also uses the rate of post deletion to predict banning and characterize the user as antisocial. However, a high rate of post deletion could predict banning as it shows that the moderator, who eventually bans the user, does not like having that user’s content on their forum. Using this as a predictor could be gauging the moderator’s opinion of the user more than any antisocial behavior the user is displaying. Furthermore, user banning, as a metric of antisocial behavior seems like it could miss many cases. Being banned from an online community, especially those that act as echo chambers for a similar set of opinions, does not necessarily mean that the behavior was antisocial. How can we measure antisocial behavior without including differences between users and the community?

I think this paper addresses an important issue as online discussion and discourse becomes more prevalent. I found some issue with how the study was conducted as it seems they are measuring how different some users are from the overall community and how the moderator feels about them more so than their antisocial behavior. By using the deletion of posts and difference from other posts as the main measures of antisocial behavior, they are really measuring the moderator’s view of the user and how much they stick out from the overall community. The article states that if becomes harder for their classifier to predict if a user will be banned the further in time it is from when they are actually banned. This helps illustrate that they are measuring the moderator opinion of the user rather than the underlying antisocial behavior as it makes sense that a moderator will delete more of the users posts (their main feature in classification) soon before they ban the user. For future research, analysis of the content of posts by users who display antisocial behavior over the life of their account would be interesting to see how this behavior begins and how community features interact with this behavior.

Read More

Reading Reflection #2 – 01/31 – Jacob Benjamin

This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

Horne and Sibel (2017) sought to explore the differences between real and fake news.  Unlike prior studies, satirical news was considered an additional category. The primary question approached in the study was: “Is there any systematic stylistic and other content differences between fake and real news?”  To answer this question, Horne and Sibel (2017) gathered three different data sets:

  1. Buzzfeed’s analysis of real and fake news items from the 2016 US Elections.
  2. Self-collected news articles on US politics from real, fake, and satire news sources.
  3. A previous study’s data set containing real and satire articles.

Contrary to the assumption that fake news is written to look like real news and fool the reader, Horne and Sibel (2017) found that the assumption is not true.  Rather, the articles employ heuristics such as title structure and noun choice.  It was also concluded that fake news specifically targets those who do not read beyond the title. 

While I found many of the finding to be fascinating, I was once again unsurprised by many of the findings.  The conclusion that fake news is most easily discernible, and most effective, via the title is something I have observed through the shared posts and associated comment sections on Facebook and Twitter for years.  However, beyond this initial concern, the study raised a number of concerns and questions:

  • Foremost of these concerns is their strategy for dividing sources into real, fake, and satirical sources.  While these categories will work in most cases, increasingly often reputable (real) sources will have vast differences in the news they report.  Depending on the event, both sources cannot be correct, and perhaps neither source is correct.  Thus, bias also plays a large part in the real versus fake news cycle.  It would be erroneous to determine that all real sources post only real news, and fake news sources only post fake news. 
  • Additionally, many, if not all, of the news articles concerned US politics.  This raises the question as to whether or not these findings can be generalized to other issues. 
  • While Horne and Sibel (2017) raised some of the issues with reversing or combating fake news, they later failed to offer suggestions as to how to utilize their data.  I feel as researchers and information scientists, it is also our duty to take the next step beyond the study, even if that next step is just providing possible uses for the data or suggesting finding derived approaches to the issue at hand.  We are responsible for the information we find. 

Horne, & Sibel. (2017, March 28). This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News. Retrieved January 30, 2019, from https://arxiv.org/abs/1703.09398

Read More

Reflection #2 – [01/30] – [Heather Robinson]

This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News


Summary

As noted in the title, this paper claims that fake news is more similar to explicit satire than to real news. Although the study utilized many data features that seemed to be useful and topical, no raw data was ever shown and the authors provided little reflection on the results.


Reflection

First, I would like to discuss the insufficiency of reflection shown for each finding. The features seemed very in-depth, but were later somewhat disregarded in the overall picture of analysis. While at a glance it seems like there’s plenty of content and discussion after each major finding is presented, very little of the discussion provokes thought or brings up new ideas.

For example, at one point the author writes,

The significant features support this finding: fake news places a high amount of substance and claims into their titles and places much less logic, technicality, and sound arguments in the body text of the article.

This is hardly a “finding”; I’m sure this statement is an early realization for anyone who has seen a few fake news articles. Furthermore, despite the fact that there is a lack in reflection, this paper does a better job than the one we read last week.

Secondly, I would like to note the lack of expansive data sets. I am unsure why the authors used a data set by Buzzfeed — a source that is not exactly professional — as a major basis for their academic findings. I believe that if they truly needed more scholarly information to analyze, they should have expanded their own set of articles (which only had 75 pieces from each type of news source).

Lastly, I would like to say that though most of the findings are not surprising, it is good to see that our notions of fake news are actually supported by concrete, statistical evidence.


Further Questions

  1. What would the distribution across source types look like if we had a feature that denoted the frequency of misspellings throughout the article? I predict that it would be “Fake > Satire = Real.”
  2. How would the distribution of these same features look:
    1. across articles that are not political in nature?
    2. across the Real News sources that were mentioned?
    3. between more polarized news sources such as CNN vs. Fox News?

Read More

Reading Reflection 2 – [1/30/2019] – [Tucker, Crull]

This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in

Text Body, More Similar to Satire than Real News

Summary:

            In this paper, Benjamin D. Horne and Sibel Adalı set out to answer the question “Is there any systematic stylistic and other content differences between fake and real news?”. To answer the question the authors, use three separate data sets. The first is a dataset that has been featured by Buzzfeed through their analysis of real ad fake news items from the 2016 US election. Next is a data set that the authors collected themselves that focused on US politics in the form of real, fake, and satire new sources. The last data set is one that the authors got from a passed study containing real and satire articles.  The authors showed that “real news articles are significantly longer than fake news articles”, That fake news use fewer technical words, and that fake news titles are longer than real news ones.

Reflection:

  • “The content of fake and real news articles is substantially different.”
  • In this paper, the authors claim that real news articles are longer, use more technical words, more nouns, more analytics and more quotes. All of this is not surprising to me because when you are coming with a lie it’s easier to be less precise than when you are telling the truth. Hence real news should find it easier to find quotes and facts that back up their argument.
  • “Titles are a strong differentiating factor between fake and real news.”
  • I find this completely unsurprising because as the fake news will use a more clickbait titles than real news sites. Also, I find that the finding that fake news use more proper nouns not surprising because I feel like fake news titles will include any name to try and hook people.
  • “Fake content is more closely related to satire than to real”
  • This is also really not surprising because satire is meant to entertain and make fun of things. So it’s really not hard to see that satire site would “fewer technical, and fewer analytic words, as well as, fewer quotes, fewer punctuation, more adverbs, and fewer nouns than real articles”.  I did think it was interesting that they were still able to predict that the article was satire and not fake news 67% of the time.

Additional Questions:

  • I think it would be instructing to see if it’s easier to tell whether an article is fake or real depending on which political party the article is targeting.

Read More

Reflection #2 – [1/30] – Kyle Czech

Article:

This Just In: Fake New Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

By: Benjamin D. Horne and Sibel Adah

Brief Summary:

The main problem addressed in this paper is if there are any systematic stylistic and other content differences between fake and real news.

Reflection:

There are a few quotes from this article that spawned some questions and thought:

  • Quote 1: “… we find that real news articles are significantly longer than fake news articles and that fake news articles use fewer technical words, smaller words, fewer punctuation, fewer quotes, and …”
  • Reflection 1: I found these findings from the paper fairly obvious, without the need of going through a mountain of data using statistical analysis. Fake articles are always gonna have fewer quotes, due to the fact that the news article aren’t real to begin with, and therefore lack most valid documentation. Same goes with the finding that real news articles are significantly longs that fake news articles, as real news articles should typically have more data from an actual event, more opinions from multiple parties, and more importantly, a network of other news stations reporting the same thing, therefore creating a much larger pool of data.
  • Quote 2: “Precisely, we find that fake news titles are longer than real news titles.. “
  • Reflection 2: I find these findings to be surprising, given that I would assume that most fake news articles would try to get viewers to click, almost like ads, by having a shorter and catchy title, to draw the user in. However, it appears that they do the opposite. They try to provide the user as much information up front, almost trying to prevent the user from looking further into the story and noticing the lack of accurate sources, eye witness account, quotes, ect…
  • Quote 3: “Fake content is more closely related to satire than to real”
  • Reflection 3: I found this finding to be very straight forward. Given the fact that satire is fake news, except with different intentions, I would suspect that fake news and satire articles would share largely similar structures.
  • Quote 4: “We collected this data by first gathering known real, fake, and satire news sources, which can be found in Table 1.” The fake news sources were collected from Zimdars’ list of fake and misleading news websites (Zimdaras 2016) and have had at least 1 story show up as false on a fact-checking website like snooper.com in the past. The real sources come form Business Insider’s “Most-Trusted” list (Engel 2014), and are well established news media companies.
  • Reflection 4: There were a few things from this quote that I noticed. First, it was interesting to see that FOX and CNN were left off of the table that showed “Real”, “Fake”, and “Satire” sources. I wonder if this is because of the latest presidential election and the fire that these two news outlets were under, and if they were removed to prevent any controversy. Second, there is no mention about how Zimdar determined if a news website was either fake or real, or how Business Insider determined there “Most-Trusted” list. I find this information skeptical, because “Trusted” websites can differ from source source and you’re putting all your eggs in one basket, research wise, if you only draw from one source.

I find that the problem with fake news today, isn’t necessary the smaller fake websites, which this paper seems to focus on, but the major and well established news media companies, which this paper seems to immediately mark as innocent because of their reputation that they have to maintain. The issue from this past presidential campaign was that the major news outlets like “FOX” and “CNN” had conflicting reports of what actually happened during the presidential election. Well established news media companies are just as capable of spreading fake news. In today’s society, there are some media companies that are politically motivated, and if they are twisting stories to conform to a political agenda and influence peoples decisions on how they should view certain topics, isn’t that fake news?

Future Work:

Some future work that can build off this article is examining the rates of fake news posts before the 2016 presidential election, when fake news was first highly publicized and if there is any trend between the US Election cycle and the rise of fake new posting rates on social media. Also, more future work that can be done is analyzing to see where the vast majority of fake news is read from. Whether it is a specific social media site, or mobile platform.

Read More

Reading Reflection 2 – [1/30/2019] – [Taber, Fisher]

This Just In: Fake News Packs a Lot in the Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

Summary:

The main question that this paper wanted to address was: Is there any systematic stylistic and other content differences between fake and real news? The authors used three different datasets during this study. The first set was a used an analysis of real and fake news from the 2016 elections. The second contained articles about US politics from real, fake, and satire sources. Finally, the third set contained real and satire news articles. The authors show that “fake news articles tend to be shorter in terms of content, but use repetitive language and fewer punctuation.” The authors show that they can use the features they found in the paper to predict if an article is real, fake or satire news with an SVM.

Reflection:

I thought this paper was better structured and focused than the last paper that we read. They had a much more clear direction with what question that they wanted to answer. It was also stated early in the paper how and why this paper was different from the previous studies that had come before it: “The inclusion of satire as a third category of news is a unique contribution of our work.” In the results section, the authors also expanded discussions on what their statistical findings could mean and why the findings turned out in a certain way.

While again I do not find the results to be too shocking it was cool to see that using the features the authors found a model can be built to predict if an article is real or fake. (I also really enjoyed how the authors structured the title)

Questions Brought Up:

Why just SVM?

One of the main questions I was wondering throughout the paper was why didn’t the authors try multiple types of machine learning algorithms to see which one would best predict fake news articles? Maybe the were leaving this for further papers, but it would have been neat to see how different types of classifiers would work on this type of prediction.

Would having evenly distributed data set cause problems with the prediction models that the authors use?

I noticed that two out of three of their data sets were evenly distributed between real and fake/satire news sources. I wonder if having the same percent of sources would mess up the prediction algorithm. Assuming that the amount of real news stories is not one to one with fake news stories, would the algorithm think that fake news stories are more/less prevalent than they actually are? I know that they could randomize the way they fed in the fake and real stories, but I wonder if it is better to have data sets that model the real world more closely.

Personally, for the training on walking prediction that I have done in the past, we have had to vary the way the data is pre-processed because the algorithm will learn the certain walking course before it will learn to predict a person’s step. We neededo have a more random walking course and just take the delta of the steps to get a model that would better suit the real world.

Future Questions/topics:

Could a browser extension house the prediction model that they have developed and possibly alert the users when an article could be potentially fake?

Read More