Automated Hate Speech Detection and the Problem of Offensive Language
Davidson, T. et al (2017) studied the detection of hate speech and offensive language on social media platforms. Previous lexical detection methods failed to separate offensive speech from hate speech. The study defined hate speech as: language that is used to expresses hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group. This definition allows for the classification of offensive language that uses words other models would normally classify as hate speech. The model was formed by a collection of CrowdFlower workers classifying a sample of tweets as hate speech, offensive speech, and normal speech. It was found that by creating additional classifications for different types of speech (for example offensive speech) leads to more reliable models. While this study found methods to increase reliability in some cases, it also addresses way that the model could be improved with further research.
While I found the paper to be fascinating, and most definitely an area we need further research in, I found a few issues with the methods used in gathering and interpreting the data fed into the model, as well as the model itself.
- Judging from the results, especially with classifying sexism as offensive language versus hate speech, there is likely to be a human created bias in the classification of the tweets. This may be due to a non-random sample provided by CrowdFlower. Another possible, albeit unfortunate, explanation is that sexism is commonplace enough that it is not regarded as hate speech. Either way, the bias of the people analyzing and feeding data into models is something that should be explored further (see Weapons of Math Destruction for further insight into this issue).
- Another issue is that the coders appear to be prone to errors as well, which can affect the reliability of the model. Davidson, T. et al (2017) found that there was a small number of cases where the coders misclassified speech. This concern is somewhat related to the previous point I made. Using convenience sampling (CrowdFlower, M-Turk, etc. are arguably forms of convenience sampling), introduces threats to internal validity. Thus, with convenience sampling, there is the possibility of introducing bias into the study, as well as reducing the generalizability of the results.
- The lexical method appears to look more at word choice, than at context of word choice, which lead to a large majority of the misclassifications in the model. While the method can be used as part of a more wholistic approach, it seems like a whole new method should be explored. Perhaps a potential approach is classifying the likelihood of an individual user posting hate speech, thus creating more of a predictive model. Although the ethical implications of such a model would need to be explored first.
Overall, I found Davidson, T. et al (2017) to be thought provoking and providing of additional research directions.
Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos
Nerghes, A. et al (2018) explored the user activity differences between informational and conspiracy videos on YouTube, specifically related to the 2016 Zika-virus outbreak. The study sought to answer the following questions:
- What type of Zika-related videos (informational vs. conspiracy) were most often viewed on YouTube?
- How did the number of comments, replies, likes and shares differ across the two video types?
- How did the sentiment of the user responses differ between the two video types?
- How did the content of the user responses differ between the video types?
The study inspected 35 of the most popular Zika-virus related videos to answer these questions. It was found that there are no statistical differences found between informational and conspiracy videos, as well as no statistical differences found in the number of comments, replies, like, and shares between the two video classifications. It was also found that the users respond differently to sub-topics.
One of the largest questions the Nerghes, A. et al (2018) study raised for me was:
- Are these results generalizable to other topics? While the Zika-virus outbreak was significant, I have to ask how many people were truly invested in pursuing new information. I find it likely that given other issues, the results could be different.
Assuming that the results are generalizable to additional topics, the study continues to raise additional questions:
- Disregarding the fact that this study, and the associated results, were directed towards the health field, what can be done to increase user engagement?
- The next question we must ask is whether or not we should attempt to direct traffic away from conspiracy videos. This question further depends on whether or not discussions on conspiracy videos yield positive results. It would be interesting to explore how non-toxic engagement on items we would label as fake news or conspiracy theories results, and what some of the best approaches to starting and maintaining those discussions would be.
Overall, I find that there are many new directions that could be pursued related to this subject.
Works Cited:
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language. Retrieved February 3, 2019, from https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15665
Nerghes, A., Kerkhof, P., & Hellsten, I. (2018). Early Public Responses to the Zika-Virus on YouTube. Proceedings of the 10th ACM Conference on Web Science – WebSci 18. doi:10.1145/3201064.3201086