[Reflection #3] -[2/4/2019] – [Jonathan Alexander]

Early Public Responses to the Zika-Virus on YouTube:

Overview

This paper analyzes videos on YouTube concerning the Zika virus. In this study the authors group videos about the virus into two categories, informational videos and conspiracy videos. The researchers then compare different aspects of user activity surrounding these videos such as number of comments, shares, likes, dislikes, and replies. The researchers use this data to attempt and ascertain:

  1. Are informational or conspiracy videos viewed more
  2. Did the quantity of user activity (comments, replies, likes, shares) differ between informational and conspiracy videos
  3. Did the content of user activity (comments and replies) differ between informational and conspiracy videos.

The researchers concluded that there was no significant difference in user activity across the two types of videos, that the sentiment of comments to both types of videos was the same, and that comments seem to frame the virus differently across the two types of videos.

Reflection

  • The study only uses a sample size of 35 videos and only those about the Zika virus. I think a larger sample size would be beneficial to the study. Furthermore, if the aim of the study is to compare informational and conspiracy videos generally there are many more instances that should have been considered such as 9/11 and the moon landing. I would be interested in seeing a similar study comparing informational and conspiracy videos that uses a larger sample size and more trials ( instances of events that had a large presence of both informational and conspiracy videos).
  • Early on in the paper the authors cite another study by Vousoughi, Roy, and Aral that stated that the sentiment in comments on conspiracy theory news is often more negative than that of scientific news. The authors then get the opposite results stating that there was no difference in sentiment among the informational and conspiracy videos. Given the disagreement I would be interested in seeing further research into the differences in user sentiment across informational and conspiracy news.
  • The article categorizes the top 35 videos (based on number of views) found using the input string “Zika-virus” into either informational or conspiracy. Firstly, I think there could be other categories of videos resulting from that search besides informational and conspiracy, for instance comedy, reaction, and satire videos concerning the Zika virus all could have returned from that search. Furthermore, the videos were categorized “based on close watching the sample of videos”. This introduces personal bias and the potential for human error into the study, especially as the authors themselves point out that conspiracy theories can be true or false.
  • The study concludes that comments from the two different types of videos discuss the virus using different “framings”. The authors give examples but do not dive too deep into the difference in “framing”. I would be interested in seeing further research into how discussions are framed differently between informational and conspiracy news and how these differences could be generalized to other forms of social media news such as fake news.

Automated Hate Speech Detection and the Problem of Offensive Language

Overview

This paper tells of a group of researchers that attempt to use crowd sourced data to train a  multi-class classifier to distinguish between hate speech, offensive language, and normal speech. The authors are very careful to define hate speech as “language that is used to express hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult a member of the group” to highlight what they see as the difference between hate speech and offensive language. The authors crowd source a lexicon of hate speech, a sentiment lexicon, and hand classified tweets to train their classifier. The authors stated that their best performing model has a precision of .91, but that around 40% of hate speech was a misclassification. The paper concludes that although their classifier was moderately successful there is often a conflation of offensive language with hate speech and that further work should be done to utilize algorithms to differentiate between the two.

Reflection

  • The paper begins by defining hate speech targets disadvantaged social groups in a way that is potential harmful to them. I think this definition introduces long term challenges as the classifier will have to keep up as disadvantaged social groups are always changing and evolving. When the researchers do go and give a definition “language that is used to express hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group”, I think this introduces new issues as this definition is hard to differentiate from the vast majority of insults. To illustrate these issues look no further than the ambiguity of hate speech throughout the paper, several times the authors mention the inherent bias or that their sources of data were conflicted about what was hate speech. I would be interested in research into classifying hateful sentiment rather than hate speech as the latter is too susceptible to only learning the current slurs and target groups of the today without being general enough to use in the future.
  • To decide what tweets to collect the researchers used a “lexicon of words and phrases identified by the internet as hate speech compiled by Hatebase.org”. The use of such a lexicon introduces huge amounts of human bias into the study as well as makes the focus of the classification on what words or slurs we call hate speech today instead of the overall sentiment of hate speech that can be used in ways we have yet to see. I think research into the quality of the words and phrases identified on Hatebase.org could shed light on the quality of studies that rely on such lexicons and potentially advance our efforts to classify hate speech.
  • After collecting the tweets the researchers had each tweet manually classified as hate speech, offensive language, or neither by a service called CrowdFollowers. As with the use of Hatebase.org, having the tweets manually classified adds a huge amount of bias to the study and also introduces the possibility of human error if some tweets are misclassified, an issue that comes up in the authors discussion. Research into the classification of hate speech that does not rely on human classification may be impossible, but the use of human classification will always introduce the possibility of error and bias.
  • The authors note in their results that 40% of hate speech was misclassified and that the tweets with the highest predicted probability of being hate speech were the ones with multiple slurs or racist terms. This let horrible instances of hate speech slip by simply because they were written without slurs or specific key words. I would be interested in seeing how much their classifier relies on key words or phrases and if there is a way to classify speech as hate speech without relying on such things.
  • The authors note that the classifier is better at detecting hate speech target at specific groups such as “anti-black racism and homophobia”, but had a harder time detecting hate speech at other groups such as the Chinese.  This further illustrates in my mind that the classifier relies too much on key words or terms. Since the social groups that are the target or hate speech are constantly changing and evolving relying on group or time specific slurs to detect hate speech is not a long term solution. Further research into what defines hate speech beyond specific groups, slurs, or key words could generalize our classification to be useful in a wider context and for a longer period of time.

Read More

Reading Reflection #3 – [2/04/2019] – [Taber, Fisher]

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

Summary:

This paper analyzed the most popular videos on YouTube about the Zika outbreak in 2016. The authors wanted to analyze how the user responses varied on informational versus conspiracy videos. The research questions that they wanted to answer were:

1. What type of Zika-related videos (informational vs. conspiracy) were most often viewed on YouTube?
2. How did the number of comments, replies, likes and shares differ across the two video types?
3. How did the sentiment of the user responses differ between the two video types?
4. How did the content of the user responses differ between the video types?

The team found that “results on user activity showed no statistically significant difference across the video types“.

Reflection:

My first impression of this paper was that it did not really accomplish anything. I think there were about seven times that the authors said that there were no findings that were statistically significant across the video types. I understand that finding nothing significant should still be published so that others will know what to look for in future work, but I think this paper was based off too small of a sample size. I don’t think 35 videos was enough data points to gather. I think I would have been more ok with the authors finding no statistical significance if there was a larger amount of videos analyzed.

Future Work:

Expand this work to include multiple conspiracy videos on a variety of different sites.

This would address my main concern with the paper. If multiple videos of different conspiracy and real information were covered, instead of a very niche topic. The study of many videos will most likely have more insightful results.

Can you determine the validity of a video based off of the interactions between users?

If you study a large data set of videos and comments will it be possible to tell what videos are conspiracy vs informative based off of the community comments?

Automated Hate Speech Detection and the Problem of Offensive Language

Summary:

In this paper the authors wanted to find out if there was a way to separate offensive language from hate speech. Since offensive language often can get classified incorreclty by algorithms as hate speech because it contains many lexical similarities.

Reflection:

I thought this paper did a nice job explaining the methods behind the machine learning that they were doing. But, the models did not seem very accurate. I think computers might still be a little be away from being able to classify different types of language correctly. I also think that it would have been interesting to see how other python libraries algorithms stacked up. Such as TensorFlow or Keras to see how ‘deep learning’ would do on this data set.

Future Work:

  • Finding algorithms that will be able to cut down offensive versus hate speech posts and feed posts in a gray area to human moderators.

Read More

[Reading Reflection 3] – [2/4] – [Henry Wang]

Article 1: “Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos”

Summary

In this paper, the researchers analyze the differences between a dataset of Zika-virus videos. The videos that were analyzed were relatively popular at 40,000 or more views and broadly classified into two distinct groups: informational and conspiracy theory. The main research questions are focused on the differences between the two groups of videos as well as the differences between the reactions to the two groups of videos. The investigators of this paper used quite a few different analysis methods, in particular topic modeling and semantic network analysis was used for comment/reply analysis.

Reflection

This article was an interesting change of pace focusing now on conspiracy theories as it relates to the real interpretation of events. This particular topic has always been interesting to me and I definitely feel like it could be a potential research topic. Conspiracy theories involve far-fetched ideas, for example the world is flat or Australia does not exist. Anyone can say those words, but how can we analyze the behaviors and sentiments of those who buy into those theories?This is clearly a very tough question to answer, and based on the results of this paper it is disappointing to see that the investigators of this research paper did not find significant differences. 

One issue I found with the paper is the researchers never explain whether or not trolls may impact the comment analysis. The researchers cleaned up the comment section by doing typical things such as removing punctuation, making words lowercase, etc, but do not account for troll-interactions with videos. YouTube’s comment section is un-moderated, for the most part. How can the researchers be sure comments that they analyzed were authentic?

Additional Questions

  • What differences in user reactions would we see if we analyzed posts that referenced these Zika videos from another platform (for example Facebook/Reddit post linking to the video)?
  • YouTube’s recommender system is personalized so people who engage with specific content see related videos recommended, such as conspiracy-based videos. How can we stop the spread of misinformation in a platform like YouTube?


Article 2: “Automated Hate Speech Detection and the Problem of Offensive Language”

Summary

This article addresses automating hate-speech detection using a classifier to classify Twitter tweets as hate-speech, offensive but not hate-speech, or neither. Previous studies have combined the first two categories into one broad category, and though there is no official definition for hate speech, the researchers in this paper attempt to build such a classifier that is able to accurately classify between the three categories. The investigators tried different models and finally proceeded to use a logistic regression model for the dataset.

Reflection

The discussion of the model used was relatively brief, and knowing that this is a research paper I would have liked to know more about why the investigators chose to first test “logistic regression, naïve Bayes, decision trees, random forests, and linear SVMs” because to me it’s not immediately clear what all of these tests have in common and how they are all suitable choices for the data. 

What was most interesting to me was the fact that with the researcher’s model they discovered that rare types of hate speech were incorrectly classified. This is most interesting to see because it seems like the model would only be useful for classifying if a tweet is hateful based on whether or not it’s one of the more prevalent types of hate speech on Twitter.Future work should focus on identifying causes of this misclassification and give more discussion on this problem instead of referencing another researcher’s paper that had the same issue. 

Additional Questions/Research

  • Future work could be to combine aspects of a Twitter user such as age of account, location, etc. with this analysis and see if this contributes to better or worse classification.
  • Twitter is a social platform network so naturally people might monitor their language. How can we apply these same techniques for gaming platforms that have online chatrooms and similar environments to automate hate-speech detection and are the systems in place (e.g. auto banning) sufficient?
  • Can we use a similar approach of hate-speech detection for verbalized content (e.g. voice chat in videogames)? 



Read More

Reading Reflection #3 – 02/05/2019 – Jacob Benjamin

Automated Hate Speech Detection and the Problem of Offensive Language

            Davidson, T. et al (2017) studied the detection of hate speech and offensive language on social media platforms.  Previous lexical detection methods failed to separate offensive speech from hate speech.  The study defined hate speech as: language that is used to expresses hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group.  This definition allows for the classification of offensive language that uses words other models would normally classify as hate speech.  The model was formed by a collection of CrowdFlower workers classifying a sample of tweets as hate speech, offensive speech, and normal speech.  It was found that by creating additional classifications for different types of speech (for example offensive speech) leads to more reliable models.  While this study found methods to increase reliability in some cases, it also addresses way that the model could be improved with further research. 

            While I found the paper to be fascinating, and most definitely an area we need further research in, I found a few issues with the methods used in gathering and interpreting the data fed into the model, as well as the model itself. 

  • Judging from the results, especially with classifying sexism as offensive language versus hate speech, there is likely to be a human created bias in the classification of the tweets.  This may be due to a non-random sample provided by CrowdFlower.  Another possible, albeit unfortunate, explanation is that sexism is commonplace enough that it is not regarded as hate speech.  Either way, the bias of the people analyzing and feeding data into models is something that should be explored further (see Weapons of Math Destruction for further insight into this issue). 
  • Another issue is that the coders appear to be prone to errors as well, which can affect the reliability of the model.  Davidson, T. et al (2017) found that there was a small number of cases where the coders misclassified speech.  This concern is somewhat related to the previous point I made.  Using convenience sampling (CrowdFlower, M-Turk, etc. are arguably forms of convenience sampling), introduces threats to internal validity.  Thus, with convenience sampling, there is the possibility of introducing bias into the study, as well as reducing the generalizability of the results.
  • The lexical method appears to look more at word choice, than at context of word choice, which lead to a large majority of the misclassifications in the model.  While the method can be used as part of a more wholistic approach, it seems like a whole new method should be explored.  Perhaps a potential approach is classifying the likelihood of an individual user posting hate speech, thus creating more of a predictive model.  Although the ethical implications of such a model would need to be explored first.

Overall, I found Davidson, T. et al (2017) to be thought provoking and providing of additional research directions.

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

            Nerghes, A. et al (2018) explored the user activity differences between informational and conspiracy videos on YouTube, specifically related to the 2016 Zika-virus outbreak.  The study sought to answer the following questions:

  • What type of Zika-related videos (informational vs. conspiracy) were most often viewed on YouTube?
  • How did the number of comments, replies, likes and shares differ across the two video types?
  • How did the sentiment of the user responses differ between the two video types?
  • How did the content of the user responses differ between the video types?

The study inspected 35 of the most popular Zika-virus related videos to answer these questions.  It was found that there are no statistical differences found between informational and conspiracy videos, as well as no statistical differences found in the number of comments, replies, like, and shares between the two video classifications.  It was also found that the users respond differently to sub-topics.

            One of the largest questions the Nerghes, A. et al (2018) study raised for me was:

  • Are these results generalizable to other topics?  While the Zika-virus outbreak was significant, I have to ask how many people were truly invested in pursuing new information.  I find it likely that given other issues, the results could be different. 

Assuming that the results are generalizable to additional topics, the study continues to raise additional questions:

  • Disregarding the fact that this study, and the associated results, were directed towards the health field, what can be done to increase user engagement?
  • The next question we must ask is whether or not we should attempt to direct traffic away from conspiracy videos.  This question further depends on whether or not discussions on conspiracy videos yield positive results.  It would be interesting to explore how non-toxic engagement on items we would label as fake news or conspiracy theories results, and what some of the best approaches to starting and maintaining those discussions would be. 

Overall, I find that there are many new directions that could be pursued related to this subject.

Works Cited:

Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language. Retrieved February 3, 2019, from https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15665

Nerghes, A., Kerkhof, P., & Hellsten, I. (2018). Early Public Responses to the Zika-Virus on YouTube. Proceedings of the 10th ACM Conference on Web Science – WebSci 18. doi:10.1145/3201064.3201086

Read More

Reading Reflection #3 – 02/04 – Alec Helyar

Summary

This paper focused on analyzing the social media response to the initial outbreak of the Zika-virus. More specifically, the paper compares the user activity and sentiment on informational content versus conspiratorial content on YouTube. The researchers hoped to determine the difference between the virality and sentiment of the video types and their comments. Among 35 videos related to the Zika-virus, the researchers gathered 20,745 user comments. For each video, the researchers extracted the meta-information and category of the video. From the comments, the researchers performed a sentiment analysis and topic model generation. The researchers discovered that the virality and sentiment of the users were very similar across the two types of videos. They also discovered that the content of the users’ responses were different across the two types, though.

Reflection

I found it interesting that the researchers only used a sample size of 35 videos. At first, I wondered whether this was simply because there were only 35 videos to be found. Further down the article, I picked up that the researchers used every video they found under a string search on July 11th. I suppose the researchers wanted to only observe the videos produced early on in the outbreak, but why didn’t they simply wait until after the outbreak and use the API to only find videos within a certain time frame? There could have been many more videos to use. Why was it necessary that the researchers observe the early period of the outbreak? It didn’t seem to contribute to their research questions.

I also wondered about the data processing choices that they made. It was odd to me that the researchers looked at the linear values of views, comments, replies, likes, dislikes, and shares when analyzing the virality of the videos. Afterall, it’s inherent within the definition of virality that there is an “explosion” of popularity. To me, the researchers should have taken the log of these exponential values. Luckily, this choice did not impact the results of their findings, since the researchers chose to use Spearman correlations instead of Pearson.

Overall, I found the paper to be well handled. The topic was fairly specific in scope, and so I’m not sure if the findings ended up being impactful. As far as future uses for the research go, it would have been also interesting if the researchers could have spoken to the ability to classify conspiratorial vs. informational videos or comments using ML, rather than manually categorizing them. Perhaps something along these lines could result in more impactful findings.

Read More

[Reading Reflection #3] -[02/04] – [Kyle Czech]

Title:

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

Brief Summary:

This video analyzed the content of the most popular videos posted on YouTube in the first phase of the Zika-virus outbreak in 2016, and the user responses to those videos. The results show that 12 out of the 35 videos in the data set focused on conspiracy theories, but no statistical differences were found in the number of user activity and sentiment between the two types of videos. The content of the user responses shows that users respond differently to sub-topics related to Zika-virus.

Reflection:

There were a few memorable quotes in this paper that stood out to me:

Quote 1: “YouTube videos have been discussed as a source of misinformation on various health related issues such as….”

Reflection 1: I found this interesting as YouTube is notorious for taking down videos due to copyright issues, however, is it YouTube’s responsibility to maintain the integrity of their videos as well? This debate can build off of Facebook’s current issue with fake news and whether they should be held accountable for the fake accounts that are allowed to post the things that they do to mislead people.

Quote 2: “… in an analysis of videos related to rheumatoid arthritis, 30% was qualified as misleading, …”

Reflection 2: I found this line interesting, as the article doesn’t clarify what would label a video as “misleading”. For example, if there was only one line in the entire video that wasn’t accurate, does that make the entire video “misleading”, or is there a certain amount of content needed in the video that is false in order for it to be labeled as “misleading”?

Future Research:

The following ideas were thought of to expand the research of this paper:

  • Future research in the use of detecting sarcasm – Given that the article examined conspiracy theories in relation to real news, there is a probability that some of the comments on the conspiracy videos were marked as “positive”, although a viewer might have been sarcastic (a more negative intention). Some conspiracy videos are very “out there” and some viewers might be simply commenting on that aspect of the conspiracy video in a more teasing way.
  • Exploring relationship between news outlets reporting about a topic and number of conspiracy videos – Some future research that might be interesting is analyzing the trend in conspiracy videos and how frequently (either by amount of time, or number of videos) that news outlets cover an outbreak. It would be interesting to see if there is a connection to number of more “creative” thoughts about an outbreak if the media over reports something.

Title:

Automated Hate Speech Detection and the Problem of Offensive Language

Brief Summary:

By using crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither, we can train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can reliably separate hate speech from other offensive language and when this differentiation is more difficult.

Reflection:

There were a number of quotes that stood out to me:

Quote 1: “Both Facebook and Twitter have responded to criticism for not doing enough to prevent hate speech on their sites by instituting policies to prohibit the use of their platforms for attacks on people based on characteristics like race, ethnicity, gender, and sexual orientation, or threats of violence towards others”

Reflection 1: After reading this quote, I wondered if their new policy would extend to the spread of known fake news sites that have their own accounts on these social media platforms. If the fake news that they spread on their platforms leads to attacks on people based on the above criteria, would that be a violation of their policy? Although its not directly liked to the policy above, the argument could be made that it certainly has the ability to cause issues depending on the content of fake news released.

Quote 2: “… people use terms like h*e and b*tch when quoting rap lyrics…”

Reflection 2: I wonder to what extend will the program begin to identify lyrics when it comes to hate speech. For example, if I was just quoting a song and how it relates to my life on twitter, it’s not hate speech, although I might use those words. However, if I directed it at someone who I got into an argument with recently with the “@” symbol, using the above words, would it be able to identify that as possible hate speech based upon the context of the lyrics and who my directed audience is?

Future Research: 

Future work that can build off of this research is the possibility of looking into the specific targets of the slang that certain culture groups use, such as “n*gga”, and if it might be classified as hate speech, if the person using it isn’t from that background associated with it. Although that kind of language is used and deemed “socially acceptable” when used by some cultures, it becomes more subjective when other cultures use that kind of language, depending on the intended target of the message.

Read More

Reflection #3 – 02/04 – Tucker Crull

Early Public Responses to the Zika-Virus on YouTube: 

Summary: 

This is a study about the authors analyzed the content of the most popular videos posted on YouTube in the first phase of the Zika-virus outbreak in 2016 and how the user responded to the videos. They examined how much informational and conspiracy theory videos differ in number of comments, shares, likes and dislikes, and the sentiment and content of the user responses. Their study shows that there are no statistical differences in the number of user activity and sentiment between the two types if videos. Also they found that the user response content was different between the two videos but the user of the videos do not in engage with the conversation.

Reflection: 

The low engagement of YouTube users viewing Zika-virus related content is an important finding, showing that these users express their opinion in their responses without further participating in conversations: Getting user to engage and interact with social media content is probably one of the hardest and most valuable things that a content creator can do. So, it’s not suspiring that the users only post their opinion in the comment section. It would be interesting to see if user engage more with this topic on other social media platforms. Also it would fascinating to see how the engage would change if the creator of the video asked a question at the end of the video.

To counter the spread of misinformation, the monitoring of the content posted on YouTube deserves more attention by health organizations: I feel like monitoring content on YouTube would be a very costly endeavor for Health organizations. This is because there is no good way of finding the misinformation in a video. I think a better solution to fight against misinformation is to have “online health interventions can be targeted on the most active social media users. However, this solution could have a problem that the interventions could give the misinformation a big platform.

Automated Hate Speech Detection and the Problem of Offensive Language:

Summary: 

This study set out to solve a key challenge for automatic hate speech detection on social media. This challenge is how to separate of hate speech from other forms of offensive languages. The challenge comes from the fact that lexical detection methods tend to have low precision because they overclassify messages as high speech. The results show that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive and tweets without hate keywords are also harder to classify.

Reflection:

I found this paper very helpful that the paper show examples how hard it is to determine hate speech from offensive language. I thought this paper did a great job reflecting and finding reasons why some tweets were misclassified. Like how they found a few recurring phrases such as these h*es ain’t loyal that were actually lyrics from rap songs that users were quoting. This is really interesting because I believe that our society is trying to eliminate hate speech but also listens to popular rap music that promotes hate speech.

We also found a small number of cases where the coders appear to have missed hate speech that was correctly identified by our model:

I thought that is was surprising that the coders missed hate speech because their classification are what the model is train on. This could have led to the study results being off or could have improve the results.

Read More

Reflection #3 – 02/04 – Heather Robinson

Automated Hate Speech Detection and the Problem of Offensive Language

Summary

In this paper, the authors used crowdsourcing to classify tweets into a few categories: those which are hate speech, those which are offensive but not hate speech, and those which are neither. From this data, they trained a pretty accurate model for detecting hate speech on Twitter.

Reflection

I found it very helpful and interesting how the authors went over the common pitfalls of previous models similar to their research. This probably gave them the insight to make their model much more accurate in the first place. I could tell that they made many considerations during the data selection process based on these findings from previous projects, which made the paper’s results much more trustworthy.

Additionally, I would like to point out that while the authors noted some of the features that they looked at initially, they never wrote about which features were used in the final model. I think it would be useful to know which statistics were the deciding factors for hate speech.

Lastly, I would like to applaud the authors’ extensive reflection. Their analysis on where and why their model was strong or weak will definitely be very useful for others in the same field. I particularly found it thought-provoking how they noticed that “people identify racist and homophobic slurs as hateful but tend to see sexist language as merely offensive.” I will definitely use their reflection as a guideline for my future projects.

Further Questions

  1. Why are people generally likely to believe that sexist speech is less hateful than racist or homophobic speech?
  2. How can we improve on some of the weak points listed for their model?
  3. How could this model be used to improve research or social media in the future?
  4. How can we add more definition to the line between hate speech and merely offensive speech? Why is it so blurred, even for us humans?
  5. What kind of content on the Internet do these hateful users consume that may make them more likely to lash out at minority groups?

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

Summary

In this paper, the authors look at 35 different YouTube videos’ analytics to find patterns between “informational” and “conspiracy theory” videos. They find that user engagement between these two types is actually very similar, which leads them to discuss public relations strategies for health organizations.

Reflection

I would primarily like to address the sample size that the authors used for this paper. Somehow, despite the infinitely vast array of content on YouTube, the authors only seemed to scrape up 35 videos total. I believe that if they want their study to be significant enough to give advice to world health organizations, they should’ve had a larger sample size.

Either way, I found the distribution of comment topics between video types to be pretty interesting. Informational viewers mostly spoke about the effects of the disease, whereas conspiracy viewers focused on the causes. This seems somewhat predictable considering the directions each video category takes, but it was intriguing nonetheless.

Further Questions

  1. How could this information be used by YouTube to possibly promote more factual content rather than conspiracies? Should YouTube be allowed to do that?
  2. How is this information useful to the general public?
  3. Should we take these results seriously considering the size of the data set?

Read More

Reading Reflection 3 – 2/04 – Yoseph Minasie

Early Public Responses to the Zika-Virus on YouTube: 

Summary: 

YouTube is a popular social media platform for sharing videos. Many of these videos cover a wide range of health topics. Out of these videos, some of them are very informative and helpful, while others, as with any other social media site, spread false information or conspiracy theories. This particular study had to do with YouTube videos regarding the Zika-Virus in 2016 when it was in its first phase of its outbreak. Videos were grouped based on whether it was informational or conspiracy. The empirical research questions had to do with comparing the views, replies, likes, and shares for each group, and sentimental and user response content difference between the two. The results concluded that there was almost no significant difference in user response or views. However, user response content was different between the two and users in both videos don’t really engage in conversation. 

Reflection: 

Users express their opinion in their responses without further participating in conversations:

One way to overcome this could be by asking questions towards the end of the video.This will promote engagement in conversations and more commenting as well. On Twitter, there’s a poll feature in which you could ask a question for up to 7 days and have up to 4 different answers. People could respond and after they do, the percentages of the amount of people who chose each answer will be given. This could be another great way to engage people about health issues.

Online health interventions can be targeted on the most active social media users

This could be a great way of combating misinformation that’s being spread by popular accounts. However, there might be some unforeseen obstacles. If the users are given too much attention, they might obtain a larger spotlight, which will increase followers.There are many people in today’s society that have spread misinformation and sparked outrage by many people. This popularity and reaction is then continued and their message is shared among more people. Even if the majority won’t agree with their message, it’s likely to reach a few people that feel well-aligned with it and boost their platform. 

Some conspiracy theories can be entertaining to think about and some of them have appeared to be true in the past. So it will be wrong to dismiss every theory. Facebook has recently been more forcefully trying to stop fake news on its site. It has partnered with factcheckers and if a post has false information tied to it, they don’t take it down, they just decrease its visual reach. This decreases the impact it will have on their users. This can be applied to many other social media sites, including YouTube. However, fact checking through a video might take longer and might introduce new obstacles.  

Automated Hate Speech Detection and the Problem of Offensive Language:

Summary: 

Hate speech is an large issue in social media. Many forms of detection, such as lexical detection, overclassify many posts as hate speech, leading to low accuracy. The goal of this study was to better understand how to automate the distinction between hate speech, offensive language, or neither. After the best model was created and tested, the results concluded that future work must better account for context and heterogeneity in hate speech. 

Reflection: 

We also found a small number of cases where the coders appear to have missed hate speech that was correctly identified by our model

I was surprised the workers classified the tweet mentioned wrong, considering it takes three people to check one tweet. Since these human classifications are used to teach the final model, the model might group tweets in the wrong categories, leading skewed results. Better human classifiers must be used in order to have a more accurate model. 

We see that almost 40% of hate speech is misclassified 

31% of hate speech in the data was categorized just as offensive language. Context is important in social situations and has been concluded by the report. Another study could be created just based on AI reading context. Natural language processing has been tried to solve this problem and there also has been numerous other approaches. In 2017, MIT scientists developed an AI that could detect sarcasm supposedly better than humans [1]. They too were trying to find an algorithm to detect hate speech on Twitter, but they concluded that the meaning of many tweets could not be properly understood without taking sarcasm into consideration. Future work can use the sarcasm detector to help with recognition of hate speech. 

Lexical methods are effective ways to identify potentially offensive terms but are inaccurate at identifying hate speech 

 Again, context is very important and could provide extensive support for this study. Not only context in the sentence, but context in the conversation is also beneficial. A reply to a tweet doesn’t always tell the whole story. If there are a large list of replies to a comment, context might need to be read for each reply in order to understand the last reply. Also, some tweets reference current events without doing so explicitly, so that also has to be taken into account.

[1] https://www.breitbart.com/tech/2017/08/07/new-mit-algorithm-used-emoji-to-learn-about-sarcasm/

Read More

Reading Reflection #2 – [1/31/2019] – [Kayla Moore]

Summary: 

In this paper, researchers explored the stylistic and linguistic differences between real news, fake news, and satirical news. They used three different data sets with the first one being a BuzzFeed election data set, the second being the authors’ political news data set, and the last being the Burfoot and Baldwin data set of satire and real news stories. They concluded that there are significant differences between fake news stories and real news stories and that fake news stories resembled satirical news stories. 

Reflection: 

In June of 2017, Facebook announced an update that would help limit the spread of fake news on their site. They found that some users were sharing a lot of links to “low quality content such as clickbait, sensationalism, and misinformation” [1]. Along with other solutions, such as banning fake news sites from their ad-selling services, Facebook also ‘deprioritized’ these links so that it would show up less frequently on users’ timelines [1]. In their research, Horne and Adah discovered that fake news articles tend to be shorter and less complex than real news articles, amongst other findings. Upon reflecting on this research, some questions that are prompted are: 

  • How could the analytic techniques in this article be used by social media companies in limiting the spread of fake news?  
  • How can we better provide resources to the public to distinguish between real and fake news? 

In the methodology section of the paper, Horne and Adah explained the limitations of their data sets and what they did with their own data set to counter these limitations. Their primary target was real news, fake news, and satirical news related to the United States election and other political news. As far as future work goes,  

  • How does the style, language, and content of fake news differ in different regions, particularly, regions where English is not the dominant language?  
  • What other ‘genres’ show stylistic and linguistic similarities with fake news? 

An interesting finding in this study is the significant differences in the titles of fake news articles versus that of real news articles. They found that fake news articles typically have longer titles and what seems to be attention-grabbing techniques such as more capitalization, more proper nouns and less stop words [2].  

  • How do the titles of fake news articles and real news articles differ for the same story?  
  • How does political bias play a role in fake news? 

According to the study, fake news requires a lower education level to read, have less substantial information, and persuades through heuristics rather than arguments [2]. They also acknowledge that the purpose of fake news is to spread misinformation. It seems as if fake news specifically targets people with lower levels of education and who are less-inclined to check the credibility of these stories. 

[1] https://www.usatoday.com/story/tech/news/2017/06/30/facebook-aims-filter-more-fake-news-news-feeds/440621001/ 

[2] https://arxiv.org/abs/1703.09398 

Read More