Summary:
This paper studies the presence of hateful, violent, and discriminatory content in YouTube channels of two categories: right-wing and baseline. The study analyzes the lexicon, topics, and implicit biases in the text of the videos. More specifically, the text in video titles, captions, and comments were studied. The dataset used consists of over 7,000 videos and 17 million comments. It was found the right-wing channels contain more “negative” words, contain more topics about war and terrorism, and are more discriminating against minorities such as Muslims and the LGBT community.
Reflection:
Compare against left-wing channels: Instead of comparing right-wing YouTube channels to the rest of YouTube, I think the study should have narrowed their baseline to a set of left-wing channels. This would make the comparison more symmetric and would emphasize the results.
Video Transcript: A significant aspect of a YouTube video is the transcript (i.e. what is being said). I think if the study included that they would have a lot more data to look at.
I think the introduction could be improved. Although the idea of the study is clear, the introduction did not specify a motivation for studying this topic. It did not state in what kind of real-world application this study can be used. Also, related work should have been at the beginning of the paper, not at the end.
In addition to looking at specific words, I think the study should look at the presence of n-grams.
Data over time: Is there a pattern to the findings when looked over an extended time period? I think it would be interesting to do a similar study that looks at how the contents of right-wing channels change over time, and to see how the findings relate to real-world events.
In order to study the content of the video itself, and not just the text surrounding it, I think crowdsourcing could be used to label videos with tags that describe their contents. This dataset could be used to study any other classifications of YouTube videos.
Blocked words: Channels can create a list of words that they want to review before allowing in the comments section. This might limit what users are able to say. Do the right-wing channels block more, or less comments than the baseline channels? How might this impact the results of the study?
Verified channels: Is the percentage of channels that are verified different in the two categories (right-wing and baseline)?