Automated Hate Speech Detection and the Problem of Offensive Language
Summary
In this paper, the authors used crowdsourcing to classify tweets into a few categories: those which are hate speech, those which are offensive but not hate speech, and those which are neither. From this data, they trained a pretty accurate model for detecting hate speech on Twitter.
Reflection
I found it very helpful and interesting how the authors went over the common pitfalls of previous models similar to their research. This probably gave them the insight to make their model much more accurate in the first place. I could tell that they made many considerations during the data selection process based on these findings from previous projects, which made the paper’s results much more trustworthy.
Additionally, I would like to point out that while the authors noted some of the features that they looked at initially, they never wrote about which features were used in the final model. I think it would be useful to know which statistics were the deciding factors for hate speech.
Lastly, I would like to applaud the authors’ extensive reflection. Their analysis on where and why their model was strong or weak will definitely be very useful for others in the same field. I particularly found it thought-provoking how they noticed that “people identify racist and homophobic slurs as hateful but tend to see sexist language as merely offensive.” I will definitely use their reflection as a guideline for my future projects.
Further Questions
- Why are people generally likely to believe that sexist speech is less hateful than racist or homophobic speech?
- How can we improve on some of the weak points listed for their model?
- How could this model be used to improve research or social media in the future?
- How can we add more definition to the line between hate speech and merely offensive speech? Why is it so blurred, even for us humans?
- What kind of content on the Internet do these hateful users consume that may make them more likely to lash out at minority groups?
Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos
Summary
In this paper, the authors look at 35 different YouTube videos’ analytics to find patterns between “informational” and “conspiracy theory” videos. They find that user engagement between these two types is actually very similar, which leads them to discuss public relations strategies for health organizations.
Reflection
I would primarily like to address the sample size that the authors used for this paper. Somehow, despite the infinitely vast array of content on YouTube, the authors only seemed to scrape up 35 videos total. I believe that if they want their study to be significant enough to give advice to world health organizations, they should’ve had a larger sample size.
Either way, I found the distribution of comment topics between video types to be pretty interesting. Informational viewers mostly spoke about the effects of the disease, whereas conspiracy viewers focused on the causes. This seems somewhat predictable considering the directions each video category takes, but it was intriguing nonetheless.
Further Questions
- How could this information be used by YouTube to possibly promote more factual content rather than conspiracies? Should YouTube be allowed to do that?
- How is this information useful to the general public?
- Should we take these results seriously considering the size of the data set?