[Reading Reflection 3] – [2/4] – [Henry Wang]

Article 1: “Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos”

Summary

In this paper, the researchers analyze the differences between a dataset of Zika-virus videos. The videos that were analyzed were relatively popular at 40,000 or more views and broadly classified into two distinct groups: informational and conspiracy theory. The main research questions are focused on the differences between the two groups of videos as well as the differences between the reactions to the two groups of videos. The investigators of this paper used quite a few different analysis methods, in particular topic modeling and semantic network analysis was used for comment/reply analysis.

Reflection

This article was an interesting change of pace focusing now on conspiracy theories as it relates to the real interpretation of events. This particular topic has always been interesting to me and I definitely feel like it could be a potential research topic. Conspiracy theories involve far-fetched ideas, for example the world is flat or Australia does not exist. Anyone can say those words, but how can we analyze the behaviors and sentiments of those who buy into those theories?This is clearly a very tough question to answer, and based on the results of this paper it is disappointing to see that the investigators of this research paper did not find significant differences. 

One issue I found with the paper is the researchers never explain whether or not trolls may impact the comment analysis. The researchers cleaned up the comment section by doing typical things such as removing punctuation, making words lowercase, etc, but do not account for troll-interactions with videos. YouTube’s comment section is un-moderated, for the most part. How can the researchers be sure comments that they analyzed were authentic?

Additional Questions

  • What differences in user reactions would we see if we analyzed posts that referenced these Zika videos from another platform (for example Facebook/Reddit post linking to the video)?
  • YouTube’s recommender system is personalized so people who engage with specific content see related videos recommended, such as conspiracy-based videos. How can we stop the spread of misinformation in a platform like YouTube?


Article 2: “Automated Hate Speech Detection and the Problem of Offensive Language”

Summary

This article addresses automating hate-speech detection using a classifier to classify Twitter tweets as hate-speech, offensive but not hate-speech, or neither. Previous studies have combined the first two categories into one broad category, and though there is no official definition for hate speech, the researchers in this paper attempt to build such a classifier that is able to accurately classify between the three categories. The investigators tried different models and finally proceeded to use a logistic regression model for the dataset.

Reflection

The discussion of the model used was relatively brief, and knowing that this is a research paper I would have liked to know more about why the investigators chose to first test “logistic regression, naïve Bayes, decision trees, random forests, and linear SVMs” because to me it’s not immediately clear what all of these tests have in common and how they are all suitable choices for the data. 

What was most interesting to me was the fact that with the researcher’s model they discovered that rare types of hate speech were incorrectly classified. This is most interesting to see because it seems like the model would only be useful for classifying if a tweet is hateful based on whether or not it’s one of the more prevalent types of hate speech on Twitter.Future work should focus on identifying causes of this misclassification and give more discussion on this problem instead of referencing another researcher’s paper that had the same issue. 

Additional Questions/Research

  • Future work could be to combine aspects of a Twitter user such as age of account, location, etc. with this analysis and see if this contributes to better or worse classification.
  • Twitter is a social platform network so naturally people might monitor their language. How can we apply these same techniques for gaming platforms that have online chatrooms and similar environments to automate hate-speech detection and are the systems in place (e.g. auto banning) sufficient?
  • Can we use a similar approach of hate-speech detection for verbalized content (e.g. voice chat in videogames)? 



Read More

[Reading Reflection 2] – [1/30] – [Henry Wang]

Summary

Horne and Adali focus on the problem of fake news and how its contents differ from real news with an analysis of three different data sets. Previous work has been focused on the issue of how fake news spreads in networks and the two researchers take a new approach and investigate the content-specific differences: style and complexity among others. The researchers conducted an ANOVA test and a Wilcoxon rank sum test to compare statistics for different metrics such as all caps, swear words, pronouns, and also used a support vector machine for classification of fake vs real, satire vs real, and satire vs fake of news articles.

Reflection

This paper was an interesting read, and incredibly relevant given today’s technology that facilitates instantaneous news alerts through mobile and other handheld devices. What was surprising to me was that the researchers offered additional insights beyond the data itself, unlike the previous article. The paper notes that its main findings, that real news and fake news target people differently (real news connects with readers based on arguments while fake news connects with readers based on heuristics), can be explained by the Elaboration Likelihood Model theory. This is an interesting idea, and something that is mentioned for a brief moment in the paper but is something that may contribute to the spread of fake news is echo chambers. If all news comes from biased and un-objective sources, people most likely have a harder time of discerning actual real news from fake news.People can be incredibly trusting towards news sources that their friends and family listen to, so this creates an issue where objectively fake news becomes real news and vice versa and any contradictions are then deemed as fake news. 

Another interesting point that is raised in the paper is that classification for satire vs fake news has a much lower classification accuracy with the support vector machine used than for satire vs real news. On Reddit, there is a sub-reddit called “AteTheOnion” which contains screenshots of people on social media who respond to articles without realizing the articles are satire and it would be interesting to analyze the contents of articles referenced in that sub-reddit to see where audiences incorrectly classified news to better determine why exactly the misclassification between satire and real news occurred.To a careful reader, satire should be clear just by examining the headline (does this make sense logically, could this be a joke; these are relevant questions for media literacy and engaging with online media) but to me there are so many factors as to why people may be susceptible to interpreting satire and fake news as real news that it would be hard to classify whether a person will interpret a new news article as fake news/satire as real news. 

Additional Questions

  • Given what we know about the typical headline and overall structure of fake news, can we automate generation of fake news articles that are believable?
  • Based on the previous paper, how does audience perception differ for real news vs fake news in terms of Twitter tweets, favorites, or other metrics?
  • Given a person’s reading history can we reliably predict how susceptible they are to incorrectly classifying an article as real, fake, or satire?


Read More

[Reading Reflection 1] – [1/28] – [Henry Wang]

Summary

Bagdouri investigates the intersection of journalism and Twitter with a multi-faceted analysis that spans a set of “5000 news producers and 1 million news consumers”. The main research questions Bagdouri seeks to answer are:

  • Is Twitter usage the same or different for news producers compared to news consumers?
  • Can previous findings be applied to a wholly different group of journalists?

The reason for doing such an investigation is that previous studies focused on a small and specific set of data that it was difficult to apply those findings to a larger and more spread-out set of data. With a larger set of data, Bagdouri analyzes different features of over 13 million tweets and over 5,000 Twitter accounts of journalists and news organizations using different statistical analyses, in particular Welch and Kolmogorov-Smirnov statistical tests.

Reflection

Having rarely used Twitter as a social media platform before reading this article, this Bagdouri’s research gave me insight into some of the more practical usages of the platform: mainly broadcasting and disseminating information.

The beginning of the article starts off with selecting criteria to compare the groups of Twitter accounts, and Bagdouri comes up with eighteen different criteria. How were those chosen, and are they good metrics? In my opinion, some of these metrics are not as easily quantifiable as Bagdouri suggests. For example, one feature that was analyzed was audience reaction. This is a binary classification: did another user retweet, or did the user favorite a tweet? To me, in a study that aims to describe how journalists and consumers interact through social media we would need better classification that involves not only positive reactions but negative ones as well. For instance, measuring number of times people had seen a particular tweet but did not favorite nor retweet could be a third classification in this sub-category.

Other categories are easier to define and to collect data for, though their purposes are not entirely clear. For instance, the author mentions publication medium as a means of differentiating between tweets made through mobile or desktop applications, but to me this does not seem like a classification that would yield any insight regarding the way journalists and consumers of news interact.

One thing that I still have questions about though are how does Bagdouri account for cultural differences? The author is aware that differences may exist between Arabic and English journalists, and analyzes each group independently, but then compares the two without providing an explanation or hypothesis for the difference (e.g. “We note first the unsurprising observation that journalists are more likely to have a verified account [is] 8.46% vs 0.35% for Arabic, and 14.84% vs 2.96% for English).

In the end, it seems the paper has moved from the broader category of journalism source vs consumer to a more specific type of journalism source vs consumer. The paper suggests two strategies, Twitter for independent journalists vs Twitter for news organizations, reaches the same audience based on the fact that the two groups receive roughly the same number of favorites. Is this idea flawed because Bagdouri does not compare tweet-by-tweet over the same particular event? To me, this is flawed because viewers might relate to certain events more closely, e.g. news about a hurricane in Florida may see more retweets and favorites versus news about the oldest cat in Ireland.

Additional Questions

  • How can we explore, track, and analyze the dissemination of information not just from news source to viewer, but from viewer A to viewer A’s friend?
  • How does indirect news affect perceptions of whether or not a source is accurate in the age of “fake news”?

Read More