Reading Reflection #4 – [02/07/2019] – [Chris Blair]

In this article perhaps the first and biggest flaw that jumped out at me was the seemingly disconnect with the authors picking the baseline channels, I agree with the right-wing channels and I realize that they do not consider them neutral in thought and politically. However; to denote some of them particularly “Vice NEWS” and “The Young Turks” can be considered extremely left winged with their own fragrance of conspiracy theories leading to the possible of confusing attribution for any sort of classifier or classification to be made on the levels of hate speech relative to the baseline. I would almost go as far to say the authors may have picked these baseline channels in order to make their results seem more plausible, would it not be far enough to take main stream media outlets YouTube channels and show the difference there? I mean “Anonymous Official” creates vast webs of conspiracies and falsehood that are sometimes privy to lies and focus on hate and discrimination, “Drama Alert” is merely a YouTube tabloid, I could drone on and on, but the writers should have put more time into constructing their basis in order to compare things to because it is damaging to their results which are actually pretty telling. The topic words are more proof of this false equivalency, whilst most of the of the topic words focus on right wing news interpretation, many of the baseline stay within the realm of YouTube. They only seek to create, transform, or communicate YouTube drama within the platform in order to make themselves more money through controversy. Concluding with the combination of picking a weak baseline coupled with the troubling topic words found within their own corpus leads us to the conclusion that their baseline should really mirror actual news sources as the current ones lessen the accuracy of the prediction by have tangentially different objectives that the right-wing media vs. the baseline are attempting to achieve.

However; I appreciate what this study has done because it gives us a good foundation for future projects, I really want to advance the cause of mental health through data analytics so I think extending off this study would be worthwhile. Starting later in the paper once again at the topics words the using a similar approach we can take posts and comments for youtubers videos about various different things and examine the sentiment, conduct a three-fold analysis described in this paper on captions, headlines, and comments on different popular YouTube channels within certain age brackets particularly 13-15,16-18, 19-21, this is likely the people who will be the most representative populations on YouTube currently. First, we would start by taking the most popular channels, doing sentiment analysis on the captions, comments, and headlines and determining what is the median, where are the outliers and how we should classify the baseline as a mix of good and bad sentiment, then we can build our tests groups out of the overwhelmingly good and bad mix of channels. Secondly, we apply the same approach as the paper outlines with their three-fold analysis and topic modeling we can draw the topics matrix and look for troubling words within these YouTube video captions, comments, or headlines. People in the age bracket of 13-21 typically leave these distressful comments as a form of escapism which often leads them to more isolation as they keep trying to escape their own isolation, which will eventually spiral into depression and social ineptitude. Finding these comments would allow us to provide these people with the help and support they need before their conditions worsens, it allows us to then further increase our net as we tighten the algorithm as well!

Read More

Reading Reflection #4 2/7/18 Tucker Crull

 Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination

Brief Summary:

This paper sets out to study issues related to hate, violence and discriminatory bias in right-wing YouTube videos. The authors of the paper used as a baseline the “ten most popular channels (in terms of numbers of subscribers on November 7, 2017) of the category ‘news and politics’ according to the analytics tracking site Social Blade.” They then used this baseline to analyze twelve right-wing “news” channels that they selected.  The research questions they wanted to answer were the following:

Is the presence of hateful vocabulary, violent content and discriminatory biases more, less or equally accentuated in right-wing channels?

 Are, in general, commentators more, less or equally exacerbated than video hosts in an effort to express hate and discrimination? 

The authors found that right wing channels tend to have a higher number of words that are from “negative” semantic fields, more directed content towards war and terrorism, and more discriminatory bias towards LGBTQ individuals and Muslims.  

Reflection:

  • The “Alex Jones’ YouTube channel had more than 2 million subscribers as of October 2017. As stated in an article in The Guardian (24), “The Alex Jones Channel, the broadcasting arm of the far-right conspiracy website InfoWars, was one of the most recommended channels in the database of videos” used in a study which showed that YouTube’s recommendation algorithm was not neutral during the presidential election of 2016 in the United States of America (23, 25). At the moment of our data collection, Alex Jones expressed support to 12 other channels in his public YouTube profile. We visited these channels and confirmed that, “according to our understanding, all of them published mainly right-wing content.”
    • I thought it was interesting how the authors chose the channels that they were going to study. Like, The Alex Jones Channel makes sense and is defended by being one of the most recommended channels in the database of videos. However, for the other 12, I think it was a little biased to defend their selections by saying “according to our understanding, all of them published mainly right-wing content.”
    • Another choice that I thought it was weird to include “YouTube Spotlight”, and “YouTube Spotlight UK” because the authors specifically mention that there is a possibility that YouTube is politically biased.

Future Work:

I would love to see a study like this being done on left wing channels, comparing the results to those found on the right-wing channels.

Read More

Reading Reflection 4 – 2/6/2019 – Jonathan Alexander

Overview

This paper analyses a selection of right-wing YouTube channels aiming to ascertain if the presence of hateful vocabulary, violent content and discriminatory biases are accentuated more in right-wing channels and if commentators are more or less exacerbated than videos hosts. The authors first selected a sample of YouTube channels they identified as right-wing and a baseline selection of the top ten YouTube channels in the news and politics category. The analyzed the comments of the videos and the videos themselves for hate, violence, and bias using a threefold method of lexical, topic, and implicit bias analysis. From their analysis and data, the authors made several conclusions: that right-wing channel are more specific in their content, have a negative bias towards Muslims, and that commenters on right-wing YouTube channels are more exacerbated than the hosts of the videos.

Reflection

  • The paper focuses on “right-wing” content on YouTube. Their definition of right-wing seems to be based off of their personal opinions and connection InfoWars, “a right-wing news site founded by…” Their definition seems lacking to me and the view of their paper seems one sided. They also state, “It is valuable to investigate whether behaviors connected to hate, violence, and discriminatory bias come into sight in right-wing videos”. The authors do not offer what value they are talking about with this “value” or what conclusions they look to draw. I would instead choose to inquire as to the presence of hate, violence, and discriminatory bias in partisan YouTube channels.
  • The paper creates its sample of right-wing YouTube channels by examining channels connected to “Alex Jones” which the authors said they “visited and confirmed that, according to our understanding, all of them published mainly right –wing content”. In total, the researchers collected 13 YouTube channels. I think this method of sample selection introduces a huge amount of bias to the study as it relies on the researchers opinions and verification of what is right-wing.
  • The researchers lexical analysis made use of semantic fields to categorize words in the caption or comments as positive or negative. They selected a collection of categories and grouped them as positive or negative, if a word fit a semantic category in the positive or negative grouping it added towards the corresponding word count for that source. The researchers seemed to choose far more categories for the negative category than for the positive category and it seems possible that this may have skewed their results. From 194 total categories to choose from the researchers chose “15 categories related to hate violence, discrimination and negative feelings, and (b) 5 categories related to positive matters in general” It seems to me that they cherry picked these categories to support what they are trying to showcase in this article instead of using a representative sampling of semantic fields. I think a lexical analysis of a wide variety of political news YouTube channels using all of the Empath semantic field categories could provide insight into implicit bias or sentiments embedded in our political process.”
  • The authors point out that some of the top ranked topics for right-wing videos had to do with war and terrorism which the authors categorizes as negative. However, these issues are hallmarks of the right-wing platform and their very presence does not indicate hate, violence, or bias. Instead I think such topic analysis across all news sources could give some insight into the use of fear in media to persuade, influence, or simply sell more media.

Read More

Reading Reflection 4 – 02/07 – Yoseph Minasie

Summary:

This study focused on right-wing YouTube videos that promote hate, violence, and discriminatory acts.  It compared comments to video content and right-wing channels to baseline channels while analyzing lexicon, topics, and implicit biases. The research questions were 

  • Is the presence of hateful vocabulary, violent content and discriminatory biases more, less or equally accentuated in right-wing channels? 
  • Are, in general, commentators more, less or equally exacerbated than video hosts in an effort to express hate and discrimination? 

The results concluded that right-wing channels usually contained a higher degree of words from negative semantic fields, included more topics relating to war and terrorism, had more discriminatory biases against Muslims in the videos and LGBT people in the comments. One of the main points of this study is better understand of the right wing speech and how the content is related to its reaction. 

Reflection:

Even though this is an analysis, how would this research be used to better understand hate speech. Right-wing users post videos and people who subscribe to them, or people with similar views and ideals, watch these videos and respond to them with their own similar viewpoints. Same with non-right-wing users. This research doesn’t add too much into the discussion of understanding the rise of alt-right extremism in relation to the internet. However, the methodology used in the study was interesting and very beneficial, particularly the WEAT. Other studies can also benefit from using multi-layered investigation to better understand context. 

YouTube’s recommendations often lead users to channels that feature highly partisan viewpoints 

I’ve read about how YouTube’s algorithm does this in order to keep users online. An example of this occurring was after the Las Vegas shooting. YouTube recommended viewers of videos saying that the event was a government conspiracy[1]. YouTube has since changed its algorithm, but it still suggests some highly partisan and conspiracy content. A future study could be done on the amount of videosit takes for YouTube to recommend a highly partisan viewpoint or conspiracy theory. This could help YouTube engineers who work on the suggestion algorithm to better understand how or why this is occurring and help mitigate it.

It is important to notice that this selection of baseline channels does not intend to represent, by any means, a “neutral” users’ behavior (if it even exists at all) 

I thought this idea was interesting. There’s bias all around us, but some people unintentionally or intentionally draw a blind eye towards it. A future studycould be on the implicit biases popular sites, such as social media and news sites, have and the different types of viewers’ biases. This could help open the eyes of several people and also help them take their own implicit biases into account before performing an action. This may not create “neutral” behaviors in users, but might help users become more neutral. 

[1] https://www.techspot.com/news/73178-youtube-recommended-videos-algorithm-keeps-surfacing-controversial-content.html

Read More

Reflection #4 – 02/06 – Heather Robinson

Analyzing Right-wing YouTube Channels: Hate, Violence, and Discrimination


Summary

The authors used a multi-layered analysis to compare (1) video captions vs. comments and (2) right-wing captions and comments vs. baseline captions and comments.


Reflection

Overall, I think this paper was awesome. There were little to no flaws in the data collection process, the authors explained how they achieved their final statistics in great detail, they had a pretty extensive reflection that interpreted what each part of the data meant, and the conclusion addressed specific downfalls and future research methods.

I was particularly blown away by the structure of the graphs in the lexicon analysis. The authors were able to display such a large amount of data in a small space while still maintaining its readability and meaning. I found the actual results of this study to be more understandable than usual, perhaps because the authors often used simplified graphs rather than extensive amounts of numbers. I will definitely be using the structure of this paper’s statistics as a reference for future projects.

Although I thoroughly enjoyed this study, I would like to address a small issue that I noticed: there is no big-picture analysis. The authors do an extraordinary job at explaining what the data means, but they don’t interpret or guess why it is that way. For example, why might right-wing channels have a higher focus on terrorism? What are the over-arching trends?

In the conclusion, the authors wrote,

“These findings contribute to a better understanding of the behavior of general and right-wing YouTube users.”

To what end? Why does it matter? What changes do the authors want to see by the publishing of this study? Do they want YouTube to change the way its algorithm promotes content in some way? None of this is discussed.


Further Questions

  1. Why were most of the baseline comments focused around celebrities? Is it perhaps because approximately 40% of the comments used are from the channel DramaAlert?
  2. Why do right-wing comments generally exhibit more anger? Could we somehow find out who this anger is directed towards?
  3. What would very left-wing channels look like with these same features applied?
  4. What other methods could we use to select “baseline” channels? Could we hand-select them?
  5. What would these features look like applied to other types of content on YouTube? For example, the “Entertainment” category instead of “News.”

Read More

Reading Reflection #4 – 02/07/2019 – Jacob Benjamin

Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination

            Ottoni, R. et al (2018) investigated the differences between user interactions on far-right wing YouTube channels, compared to a baseline sample of channels.  The study specifically searched for differences in lexical and topical between the two types of channels, as well as the instances and degree of implicant bias in the texts.  The two main research questions were as follows:

  • Is hateful vocabulary, violent content and discriminatory biases more, less, or equally prevalent in right-wing channels?
  • Are video commenters more, less or equally aggravated than video hosts, and do they express more, less, or and equal amount of hate and discrimination?

Ottoni, R. et al (2018) hoped that the methods detailed in the study could be extended to any type of text.  The following results were found:

  • Right-wing channels tended to contain more “negative” semantics fields, such as “hate”, “kill”, “anger”, “sadness”, and “swearing”.  This is opposed to “positive” semantics such as “joy”, “love”, “optimism”, and “politeness”. 
  • Typically approached topics related to war and terrorism.
  • Demonstrated more discriminatory bias against Muslims and towards LGBT people.

I personally found this study to be fascinating, as well as some of the results.  I disagree that results of a study have to be surprising to be worthwhile.  We are very good at misleading ourselves (i.e. superstitious behaviors, etc.), thus I find merit in finding unsurprising results if they answer a research question that has not been pursued before.  Having said that:

  • I did not find the results of Ottoni, R. et al (2018) to be particularly surprising.  While I am sure in group members with right-wing leanings likely have different opinions on the matter, I believe there has already been a lot of research on the emotions and language choices displayed my left- and right-wing leaning people.  I personally conducted a badly designed study on the differences in emotion displayed by left- and right- wing leaning people when viewing internet memes (a study I would like to revisit eventually).  Many of the results I found corresponded to the results found in this study.
  • The study implemented a variety of analysis methods beyond lexical analysis.  In the previous reflection, I made the point that perhaps lexical analysis on its own does not provide as much precision within the model as we would hope for.  The inclusion of topical analysis and implicate bias analysis appears to cover a few additional facets to perceiving intent behind the text. 

Work Cited:

Ottoni, R., Cunha, E., Magno, G., Bernardina, P., Jr., W. M., & Almeida, V. (2018). Analyzing Right-wing YouTube Channels. Proceedings of the 10th ACM Conference on Web Science – WebSci 18. doi:10.1145/3201064.3201081

Read More

Reading Reflection #3 – [02/05/2019] – [Chris Blair]

What I found most interesting about “Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos” is taking this concept and tangentially applying it to a different concept. If we take disease outbreaks, even on the regional or town level and trace it into a machine learning algorithm we could find out trends that would allow us to be even more proactive when it comes to detecting and preventing outbreaks. For example, recent measles, mumps, and rubella outbreaks (https://www.who.int/news-room/detail/29-11-2018-measles-cases-spike-globally-due-to-gaps-in-vaccination-coverage) have been traced backed to the increased incidence of anti-vaccination conspiracy theories and their effect on populations to act and chose not to vaccinate their children. Using these principles, we can study social media activity on many different and popular anti-vaccination pages to target hotbeds where this disease could spread and push specific media campaigns to convince and dissuade people from joining this damaging movement. Continuing on this, we can also use this for mental health awareness by using a specific machine learning we a look at troubling tweets or Facebook posts made by individuals and analyze them in order to determine problem areas or even typical mental health triggers, be it environment, drugs, poverty, abuse, etc. we will be able to get a holistic view on the issues that are plaguing our society the most and bring them into the forefront. These are just two possible ideas but based on sentiment mapping we can also see how diseases spread commonly like the flu, is represented on social media, historically, and then be able to develop certain rules that allow us, using historical data to predict or see from a bird’s eye perspective what the current condition of the outbreak is at and what we can do to influence it.

What I found most interesting is that this project used a no context approach, not in the fact that the grammar had no context rather that they didn’t consider the user who the words were being said to. This inspired me with an idea which could be an extension and addition to this current paper, first we can use image recognition to determine the race/ethnicity of someone then using that in order to figure out through the context given by the person receiving the hateful threats is actually hateful of like the paper was mentioning simply just part of a song. Continuing this we can use certain classifiers like Facebook user profile to get people’s religion as well and determine whether or not the user could be the recipient of hateful speech from that standpoint. However; after researching it seems like this is difficult more as many facial recognition software has been trained substantially on Caucasian models, so it seems like they are overtuned in that regard. This paper however does give us a great baseline to work with in preventing hate speech, as we now have the grammar in order to figure out whether or not a certain tweet or post is in fact hate speech, so now all we have to do is make the model more accurate by adding more context, maybe looking at the users account who is posting the hate speech to see if they have a history of posting hate speech or perhaps looking at the person account to see if they have a history of receiving hate speech. All this details we can use to make the model more accurate by eliminating confounding factors like song lyrics, sarcasm, quoting, etc.

Read More

Reading Reflection #4 – [02/05/2019] – [Taber Fisher]

Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination

Summary:

This paper wanted to “observe issues related to hate, violence and discriminatory bias” of different types of videos. The researchers partitioned their data set into right-winged and baseline videos. In order to compare the two different partitions that they assigned, the researchers used a three-layered approach. The tree layers include lexicon, topics, and implicit bias. The research questions that they wanted to answer were:

1. Is the presence of hateful vocabulary, violent content and discriminatory biases more, less or equally accentuated in right-wing channels?
2. Are, in general, commentators more, less or equally exacerbated than video hosts in an effort to express hate and discrimination?

The study found that “right-wing channels are more specific in their content, discussing topics such as terrorism and war, and also present a higher percentage of negative word categories” and that they were able to capture a bias against Muslim communities.

Reflection:

I liked this paper, it brought up a lot of interesting practices on how to analyze different types of words such as Word Embedding Association Test (WEAT). I did not think it was possible for the Implicit Association Test, or at least some version of it, to be encoded in an algorithm. I am a little skeptical about how accurate WEAT is since the authors make it sound like a simple cosine similarity between words that share a common context. I also would like to see how the WEAT was applied to the Wikipedia pages that were on topics like Baye’s Rule. Also, would the results change if a different site was used as a starting point? For example, if Twitter or the Encyclopedia Britannica was used I would have to imagine the results would be somewhat different.

There are a lot of sites that would meet the criteria for why they choose Wikipedia. I wish there was more discussion about why they choose the twenty categories they selected from the 194 Empath categories. Also, why did they not choose the same amount of positive and negative categories? It kind of feels like they were only trying to find things in the negative category and the positives category was tacked in as an afterthought.

I was surprised to find that a lot of hate and toxicity was found on general videos as well as the right-winged videos. I wonder if the same would be true on other platforms such as Reddit, Twitter, or 4chan.

Future Work and Questions:

Can commentator’s activity on other videos be tracked to see what they usually watch and respond too? There might be a difference between how people in different groups interact, or don’t interact, with other groups.

With a lot of sites now banning right-winged content, I would be interested to see how the audience and content creators act on different sites. Do they progressivly get more hateful the more the are banned from websites?

Read More

Reading Reflection #4 – [02/05/2019] – [Kyle Czech]

Title: Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination

Brief Summary:

In this paper, by using YouTube, they observed issues related to hate, violence, and discriminatory bias in a dataset containing more than 7,000 videos 17 million comments. They investigate similarities and difference between users comments and video content in a selection of right-wing channels and compare it to a baseline set using a three-layered approach, in which its analyzed with (a) lexicon, (b) topics and (c) implicit biases present in the texts. Among the results, the analyses show that right-wing channels tend to (a) contain a higher degree of words from “negative” semantic fields, (b) raise more topics related to war and terrorism, and (c) demonstrate more discriminatory bias against Muslims (in videos) and towards LGBT people (in comments).

Reflection:

There were some quotes that really stood out to me:

Quote 1: “As stated in a The Guardians article [24], “The Alex Jones Channel, the broadcasting arm of the far-right conspiracy website InfoWars, was one of the most recommended channels in the database of videos” used in a study which showed that YouTube’s recommendation algorithm was not neutral during the presidential election of 2016 in the United State of America [23, 25]”

Reflection 1: I found this very interesting because, despite the mention of the possibility that YouTube showed bias during the 2016 presidential campaign, two YouTube channels were still included in the list of “Baseline channels”, including “YouTube Spotlight”, and “YouTube Spotlight UK”. I found this highly irregular, as I don’t believe these YouTube channels should have been included in the baseline in a political analysis paper, as they mentioned that there was a possibility that YouTube has a past of showing political bias, as it weakens the accreditation of these selected “Baseline channels”.

Quote 2: “From the 194 total Empath categories, we selected the following (a) 15 categories related to hate, violence, discrimination and negative feelings, and (b) 5 categories related to positive matters in general:

negative: aggression, anger, disgust, dominant personality, hate, kill, negative emotion, nervousness, pain, rage, sadness, suffering, swearing terms, terrorism, violence.

positive: joy, love, optimist, politeness, positive emotion”

Reflection 2: I found this information to be misleading and not entirely complete. First, some of the words “selected” don’t necessarily fit the category that they are in, for example, I don’t see how “nervousness” is negative, as it can also be seen as excitement or anxious. Also, I believe that they should have selected the same number of negative words as positive words. With having two data sets used for the same kind of measure, they should try and include the same number of words in these data sets, as it might appear that you’re trying to fit the data to match your hypothesis or in an attempt to show bias towards the results. It is also unclear of how they came up with the number of words to use from the data set, or why they chose the words that they did.

Future Work:

I think that it would be interesting to do an analysis on left-wing YouTube channels. I think it would be interesting comparing the “hateful” vocabulary used on captions and comments on these channels before and after the 2016 presidential election due to the election of President Trump. I would believe that the left-wing YouTube channels would currently show the same results as having a high usage of “hateful” vocabulary.

Also, I agree with the future work that needs to be done in this paper that is mentioned in the conclusion section, with analyzing more of the negations on these so-called “negative” words. For example, a caption like “End of Terrorism?” would probably show up as negative due to the mention of “terrorism”, however, without more research into the content itself, that classification of “negative” might be misleading.

Read More

Reading Reflection #3

Early Public Responses to the Zika-Virus on Youtube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

This paper aimed to analyze the public response to different YouTube videos concerning the Zika virus. Conspiracy theory and informational videos were analyzed. The paper defined a conspiracy theory as “explanations for important events that involve secret plots by powerful and malevolent groups”. Since the spread of conspiracy theory videos is harmful because they contain misleading or untrue information and distract from important health messages, the researchers looked to analyze the sentiment and content of user responses so that implications for online health campaigns and knowledge dissemination are known. False news, of which conspiracy videos fall under, has been shown to spread faster and father than true news. The researchers used 35 of the most popular Zika-virus YouTube videos for their study. They used metrics such as views, comments, replies, likes, dislikes, and shares and compared them for conspiracy and informational videos. They found that there was no statistically significant difference between the two groups of videos. One conclusion was that users respond in similar ways in terms of these metrics to both types of videos. In addition, topic modeling showed that informational videos center around the causes and consequences of the virus while conspiracy videos focus on unfounded theories.

I did not find this paper to provide any surprising results. I think the most significant conclusion was that users responded in similar ways in terms of views, shares, and likes to both types of content. This means that both types of content spread in similar ways. The researchers found that Zika virus video comments were all slightly negative on average, which contradicts prior research that found that false news triggers more negative sentiments than true news. However, they did not suggest why this happened. A takeaway from this study is that health organizations looking to spread helpful health information should give careful thought to how to target audiences and engage them. Future research could include exploring other subjects besides the Zika virus. Since some of the findings contradicted prior work, I think it would be interesting to see if this holds true for other topics. Also, the most effective techniques for spreading true news could be studied. Is it effective to debunk conspiracy videos? What is the best way to engage views with true news?

Automated Hate Speech Detection and the Problem of Offensive Language

Hate speech classifier algorithms have been created before with limited accuracy. Hate speech is particularly challenging to classify because it overlaps with offensive language and context must be take into account when analyzing it. This paper looks to create a hate speech detection model that distinguishes between hate speech and offensive language, with the goal of reaching higher accuracy. They used logistic regression on a sample of about 25,000 tweets to train the model.

I thought the paper had significant implications since many parties are interested in flagging hate speech, such as Twitter, Instagram, or countries with hate speech laws. The researched shared a significant amount of challenges related classifying hate speech. I found it interesting that the definition of hate speech is not well defined and that the researchers had to establish a definition for the premise of this paper. In addition, they found and mentioned several tweets that were misclassified by human coders. I seems difficult to train an algorithm to classify something that is not well defined and is often done with error by even humans. Future work could involve similar research taking into account human biases and using more data. I think that a more established definition of hate speech is necessary before these algorithms can be more accurate.

Read More