Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos
This paper focuses its research on the initial user reaction to informational and conspiracy videos on YouTube about the Zika-Virus. Their data set was the 35 most popular videos on the issue. Through their analysis, they were able to find that most (23) of the videos were purely information while the remaining (12) offered different conspiracy theories. They found that user activity between the videos seemed to be the same, whether or not the content was informational or conspiracy, with low user engagement. The difference between the two was within the contents and sentiments of the reactions. Typically, informational responses focused on the disease itself and how it affects babies and women in Brazil while conspiracy responses seemed to to target certain people or demographics for starting the virus.
One thing that surprised me was how oddly specific the subject matter was. The authors chose to research YouTube videos from a virus that had appeared two years earlier. I thought that such a targeted subject would be very limited in its analysis but it does make some sense as most, if not all, of the content for the Zika-Virus has been uploaded and the data set is non-continuous. I would have liked to see them tackle multiple different health issues on YouTube in order to avoid different biases that may appear for specific subjects. That being said, the scope for the project was smaller as a result and might be a great example for choosing a feasible semester project.
One thing that didn’t make sense to me was why the authors chose only the “first phase” of the Zika-Virus in 2016. It seems they want to test the initial response of the population but don’t seem to explain why. How do the initial phases differ from the entire ordeal altogether? I think that the results and types of user interactions would have the same if not similar to the found results, but the data-set would be much larger and reliable than the 35 videos they were able to find.
Some additional research questions that could come out of this:
- Just given the comments of individual YouTube videos, can we use them to predict whether or a video is a conspiracy or informational video?
- Along the point about the “initial phase” could we compare the different phases of the Zika-Virus and the user interaction / dissemination?
Automated Hate Speech Detection and the Problem of Offensive Language
This paper researches the differences between hate speech and offensive language. Their data set consisted of 25,000 tweets that contained speech lexicon found on Hatebase.org. Through their analysis, they found that it was quite easy to classify offensive language, with a 91% prediction rate but had a little more trouble identifying hate speech with a 61% prediction rate. They found that those that were correctly identified as hate speech often were extreme and contained multiple racial and homophobic slurs.
What I would have liked to see are more explanations on the features that they used. What are the TF-IDF, POS, and reading score methods they decided to use and why are they helpful to the research? This is something I found useful in some of the other papers we have read so far.
Some additional research questions that could come out of this:
- Using the terms generally associated with hate speech. Can we find a percentage of tweets that are hateful / offensive that contain them?
- As noted in the conclusions, another topic would be to look at the users who have been known to use hate speech by looking at their social structures and motivations. But also I think it would be interesting to see data on their demographics and how they might change over time as well considering the responses to their hate speech.