Reading Reflection#3 -[2/5/19]-[Numan Khan] | CS4984 Spring19: Data Science & Analytics Capstone

Automated Hate Speech Detection and the Problem of Offensive Language

Summary:

This paper used crowd-sourcing to label a group of tweets into three different categories: containing hate speech, only offensive language, and those with neither. The researchers of this paper accomplished this by training a multi-class classifier to differentiate between these three categories. The main obstacle addressed was being able to identify hate speech versus offensive speech because there are similar overlapping offensive remarks used in both types of speech.

Reflection:

Something that I’m interested to see in the future is how major social media platforms will respond to the criticism they receive about regulating hate speech. Because of the increasing legal implications of individuals who post hate speech, Twitter and Facebook must be careful when identifying hate speech. If social media platforms autonomously removing posts, how accurate would their algorithms be? As we can see by numerous other studies done on identifying hate speech versus offensive speech, algorithms are still being improved. However, if they manually remove posts, would their removal rate be too slow compared to the rate of post being created that have hate speech? Whatever happens in the future, social media platforms must address the growing problem of hate speech in a careful manner.

Another thing that caught my attention in this paper was that they properly defined hate speech and the process the researchers used for labeling the data. By giving three or more coders their specific definition of hate speech for labeling each tweet, I believe their process makes a lot of sense and does a good job in making sure that they accurately label tweets for their classifier.

Lastly, I appreciate the fact that they used a variety of models to find which model(s) perform(s) the best, instead of simply choosing one or two models. However, one thing I am curious about what features that were used in the final model that was a logistic regression with L2 regularization.

Further Work:

I believe that some future work to improve the model from this paper are to check if the quotes from a song are being used for appreciation or hate speech and checking for cultural context for some offensive language. Furthermore, I am curious if the current definition for hate speech can be even more specific in order to improve the labeling of tweets, therefore, improve the classifier. Lastly, the best way of truly addressing hate speech is by understanding what the root cause is. Maybe by researching different media sources that incite hate we could try to better identify users that use hate speech instead of posts of hate speech.

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

Summary:

This paper sought to research how much informational and conspiracy theory videos differ in terms of user activity such as number of comments, shares, and likes and dislikes. Furthermore, the also analyzed the sentiment and content of the user responses. They collected data for this study by finding YouTube videos with at least 40,000 views on July 11, 2016. Their search for YouTube videos resulted in a data set containing 35 videos. Their results were that 12 out of the 35 videos were focused on conspiracy theories. However, no statistical differences were found in the number of user activity and sentiment between informational and conspiracy theory videos.

Reflection:

In the present day, YouTube is one of the largest platforms where countless number of people are accessing and posting new video. It can be said that communication have been substantially influenced by platforms like YouTube since it is very easy for people around the world to post videos. With growth of YouTube comes a lot of challenges such as the Ebola outbreak in 2016. I appreciate the effort this study made in trying to differentiate information and conspiracy theory videos. The researchers of this paper provided detailed definitions on the two types of videos and clearly explained how their data collection process. Personally, I am surprised that the sentiment in both types of videos were similar–I had thought there would be a significant difference. However, this study had a small dataset and didn’t have strong arguments.

Future Work:

A sample of size of 35 seems too small when doing any sort of significance test. In the present day, YouTube is one the largest platforms for videos where numerous videos are being posted every hour and the researchers of this study found only 35 videos. My suggestion to these researchers are to increase their sample size by finding more videos. In addition, research any other features that can help when differentiating informational and conspiracy theory videos.

Numan

Leave a Reply Cancel reply