Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos
Summary:
As YouTube emerged as a popular social media platform, it also became the favorite playground of fake news and conspiracy theory videos. Surprisingly, most of the videos on this website are health-related. This creates a concern that fake news and conspiracy theory content can mislead and install fear among the audience thus affecting the spread of epidemics. The researchers collected the content of the 35 most widespread YouTube videos during the first phase of the Zika-virus outbreak – from December 30, 2015, to March 30, 2016 – and user reaction to those videos. This paper aims to find the dissimilarities in terms of user activity (number of comments, shares, likes, and dislikes), and the attitude and content of user responses between informational and conspiracy videos.
Unexpectedly, there is no significant difference between these two types of videos in most cases. Both informational and conspiracy videos share not only the same amount of responses and unique users per view but also a low rate of additional responding per unique user. Furthermore, the user comments on these two types of videos are slightly negative, which challenging Vousoughi, Roy and Aral’s conclusion that false news provokes more negative sentiments than true news. However, there is a dissimilarity in the content of user response between informational and conspiracy videos. Informational videos user responses focus on the consequences of the virus. Whereas, the conspiracy theory videos user responses are more interested in finding out who is responsible for the outbreak.
Reflection:
Even though the studying of misleading health-related content on YouTube might be worth researching, the authors’ choice to limit their research to the topic Zika-virus is, sadly, not a wise decision. First of all, despite the fact that YouTube has over one billion highly active users watching around six billion hours of videos per month, it is also true that distribution of user age and interest are extremely skewed. Also, a large percentage of YouTube users are teenagers and young adults who have little or no interest in Zika-virus. In fact, there is a huge gap in user activity and engagement between popular topics (gaming and beauty) and other videos. Much of the popular health-related content (anorexia, tanning, vaccines, etc.) is also linking to beauty trends. In other words, YouTube is not an ideal social media forum for studying user response to different type of videos of Zika-virus or most health issues. The perfect proof for this is the extremely small dataset the authors were able to acquire on the topic. So, if the same question was asked about videos related to vaccines, the result might have been more interesting.
Another concern about the dataset is that there is a lack of explanation and inspection of the data collected. What are the user statistics: age range, gender, prior behavior, etc.? Do the dataset includes videos that were reported as many conspiracy theory videos can be flagged as inappropriate? Why did the authors choose 40,000 as the cut off for their dataset? How popular are Zika-virus related videos compared to other videos posted on the same day?
As mentioned above, the most active group of YouTube users are not attracted to the topic of Zika-virus. Older users are less likely to show reactions on social media forums, especially on controversial topics. Therefore, the user inactivity and passiveness can make the difference in user activity between informational and conspiracy theory videos seem insignificant. Also, the low rate of additional responding can also be the result of the user behavior on social media forums rather. There needs to be more analysis of the user behavior before concluding that the user activity is similar across the two types of videos.
The most interesting analysis of this research is the semantic maps of user comments between informational and conspiracy theory videos. The clusters of the informational videos are bigger and more concentrated. Surprisingly, offensive words (shit, ass, etc.) are used more frequently in informational videos. Moreover, the comments in conspiracy theory videos are more concerned with foreign forces such as Brazil, Africa, etc. Meanwhile, the audience of informational videos focuses more on the nation’s eternal conflicts between parties and religions. What might be the cause of the difference in interest and usage of offensive language? The fact that Bill Gates, the only businessman, is not only mentioned but also has his name misspelled frequently is interesting. Why did he appear as much as presidents of the United States in the comment? Does the common misspelling of his name indicate the education level of the audience?
Automated Hate Speech Detection and the Problem of Offensive Language
Summary:
The paper address one of the biggest challenges of automatic hate-speech detection: distinguish hate speech from offensive language. The authors define hate speech “as a language that is used to expresses hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group.” Even with a stricter definition of hate speech than previous studies, the authors still find the distinction between these two types of expression too ambiguous and case specific. The raw data contains tweets with terms from the hate-speech lexicon, which is compiled by the website Hatebase.org. Then, the tweets were reviewed and grouped into three categories: hate-speech, offensive language, and none of the above. After trials and errors, the authors decided to build a logistic regression model. Even though the algorithm achieves a relatively high overall precision of 91%, it does not accurately differentiate hate-speech from offensive languages with a misclassification rate of 40%.
Despite the improvement in precision and accuracy compared to previous models, the research acknowledges that there are plenty of issues with the current model. The algorithm often flags tweets as less hateful or offensive than human coders. The heavy reliance on the presence of particular terms and neglect of the tone and context of the tweets result in a high misclassification. Moreover, it is noticeable that sexist tweets are much less likely to be considered as hateful than racist or homophobic tweets. The authors suggest future research should consider social context and conversations in which hate-speech arises. Also, studying the behavior and motivation of hate speakers can also provide more insight into the characteristics of these expressions. On the other hand, another pitfall of the current model is that it performs poorly on less common types of hate-speech such as those targeting Chinese immigrants.
Reflection:
It is not an exaggeration to say that even humans are struggling to differentiate hateful speeches from those that are merely offensive, let alone an algorithm. This research shows that the presence and frequency of keywords are by far sufficient in distinguishing hate speech. Indeed, in most cases, the decisive factors determining whether a tweet is hateful or offensive are the context of the conversation, the sentiment of the tweet, and the user speaking pattern. However, analyzing these factors is much easier said than done.
The authors suggest user attributes such as gender and ethnicity can be helpful in the classification of hate-speech. But acquiring these data might be impossible. Even in the few cases in which users agree to provide their demographic information, the trustworthiness is unguaranteed. However, these attributes can still be derived from analyzing the user behavior in the forum. For example, a user who often likes and retweets cooking recipes and cat pictures have a higher chance to be a woman. Therefore, future studies can consider looking into a user prior behavior to predict their tendency of expressing hate-speech and classify their tweets’ nature.
One of the main causes of the high misclassification of hate-speech is that the word choices and writing styles vary greatly among users. Even though most hateful expressions are aggressive and contain many curse or offensive terms, there are plentiful exceptions. An expression can still “intended to be derogatory, to humiliate, or to insult” disadvantaged groups without using a single offensive term. In fact, this is the most dangerous form of hate-speech as it often disguises itself as a statement of truth or solid argument. On the other side, there are statements using flagged terms yet convey positive messages. Therefore, rather than focusing on the vocabulary of the tweets, it might be better to analyze the messages the users want to convey.