Automated Hate Speech Detection and the Problem
of Offensive Language
Davidson, T. et al (2017) studied the detection of hate speech
and offensive language on social media platforms. Previous lexical detection methods failed to separate
offensive speech from hate speech. The
study defined hate speech as: language
that is used to expresses hatred towards a targeted group or is intended to be
derogatory, to humiliate, or to insult the members of the group. This definition allows for the classification
of offensive language that uses words other models would normally classify as
hate speech. The model was formed by a
collection of CrowdFlower workers classifying a sample of tweets as hate
speech, offensive speech, and normal speech.
It was found that by creating additional classifications for different
types of speech (for example offensive speech) leads to more reliable models. While this study found methods to increase
reliability in some cases, it also addresses way that the model could be improved
with further research.
While I
found the paper to be fascinating, and most definitely an area we need further
research in, I found a few issues with the methods used in gathering and
interpreting the data fed into the model, as well as the model itself.
- Judging from the results, especially
with classifying sexism as offensive language versus hate speech, there is likely
to be a human created bias in the classification of the tweets. This may be due to a non-random sample
provided by CrowdFlower. Another
possible, albeit unfortunate, explanation is that sexism is commonplace enough
that it is not regarded as hate speech.
Either way, the bias of the people analyzing and feeding data into
models is something that should be explored further (see Weapons of Math
Destruction for further insight into this issue).
- Another issue is that the coders
appear to be prone to errors as well, which can affect the reliability of the
model. Davidson, T. et al (2017) found
that there was a small number of cases where the coders misclassified speech. This concern is somewhat related to the
previous point I made. Using convenience
sampling (CrowdFlower, M-Turk, etc. are arguably forms of convenience sampling),
introduces threats to internal validity.
Thus, with convenience sampling, there is the possibility of introducing
bias into the study, as well as reducing the generalizability of the results.
- The lexical method appears to look
more at word choice, than at context of word choice, which lead to a large
majority of the misclassifications in the model. While the method can be used as part of a
more wholistic approach, it seems like a whole new method should be
explored. Perhaps a potential approach
is classifying the likelihood of an individual user posting hate speech, thus
creating more of a predictive model. Although
the ethical implications of such a model would need to be explored first.
Overall, I found Davidson, T. et al (2017) to be thought provoking
and providing of additional research directions.
Early Public Responses to the Zika-Virus on YouTube:
Prevalence of and Differences Between Conspiracy Theory and Informational
Videos
Nerghes, A.
et al (2018) explored the user activity differences between informational and conspiracy
videos on YouTube, specifically related to the 2016 Zika-virus outbreak. The study sought to answer the following
questions:
- What type of Zika-related videos
(informational vs. conspiracy) were most often viewed on YouTube?
- How did the number of comments,
replies, likes and shares differ across the two video types?
- How did the sentiment of the user
responses differ between the two video types?
- How did the content of the user
responses differ between the video types?
The study inspected 35 of the most popular Zika-virus
related videos to answer these questions.
It was found that there are no statistical differences found between
informational and conspiracy videos, as well as no statistical differences
found in the number of comments, replies, like, and shares between the two video
classifications. It was also found that
the users respond differently to sub-topics.
One of the
largest questions the Nerghes, A. et al (2018) study raised for me was:
- Are these results generalizable to other
topics? While the Zika-virus outbreak
was significant, I have to ask how many people were truly invested in pursuing
new information. I find it likely that
given other issues, the results could be different.
Assuming that the results are
generalizable to additional topics, the study continues to raise additional questions:
- Disregarding the fact that this study,
and the associated results, were directed towards the health field, what can be
done to increase user engagement?
- The next question we must ask is
whether or not we should attempt to direct traffic away from conspiracy videos. This question further depends on whether or
not discussions on conspiracy videos yield positive results. It would be interesting to explore how
non-toxic engagement on items we would label as fake news or conspiracy
theories results, and what some of the best approaches to starting and
maintaining those discussions would be.
Overall, I find that there are many new directions that
could be pursued related to this subject.
Works Cited:
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017).
Automated Hate Speech Detection and the Problem of Offensive Language.
Retrieved February 3, 2019, from https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15665
Nerghes, A., Kerkhof, P., & Hellsten, I. (2018). Early
Public Responses to the Zika-Virus on YouTube. Proceedings of the 10th
ACM Conference on Web Science – WebSci 18. doi:10.1145/3201064.3201086
Read More