Reflection #4 – 02/06 – Heather Robinson

Analyzing Right-wing YouTube Channels: Hate, Violence, and Discrimination


Summary

The authors used a multi-layered analysis to compare (1) video captions vs. comments and (2) right-wing captions and comments vs. baseline captions and comments.


Reflection

Overall, I think this paper was awesome. There were little to no flaws in the data collection process, the authors explained how they achieved their final statistics in great detail, they had a pretty extensive reflection that interpreted what each part of the data meant, and the conclusion addressed specific downfalls and future research methods.

I was particularly blown away by the structure of the graphs in the lexicon analysis. The authors were able to display such a large amount of data in a small space while still maintaining its readability and meaning. I found the actual results of this study to be more understandable than usual, perhaps because the authors often used simplified graphs rather than extensive amounts of numbers. I will definitely be using the structure of this paper’s statistics as a reference for future projects.

Although I thoroughly enjoyed this study, I would like to address a small issue that I noticed: there is no big-picture analysis. The authors do an extraordinary job at explaining what the data means, but they don’t interpret or guess why it is that way. For example, why might right-wing channels have a higher focus on terrorism? What are the over-arching trends?

In the conclusion, the authors wrote,

“These findings contribute to a better understanding of the behavior of general and right-wing YouTube users.”

To what end? Why does it matter? What changes do the authors want to see by the publishing of this study? Do they want YouTube to change the way its algorithm promotes content in some way? None of this is discussed.


Further Questions

  1. Why were most of the baseline comments focused around celebrities? Is it perhaps because approximately 40% of the comments used are from the channel DramaAlert?
  2. Why do right-wing comments generally exhibit more anger? Could we somehow find out who this anger is directed towards?
  3. What would very left-wing channels look like with these same features applied?
  4. What other methods could we use to select “baseline” channels? Could we hand-select them?
  5. What would these features look like applied to other types of content on YouTube? For example, the “Entertainment” category instead of “News.”

Read More

Reflection #3 – 02/04 – Heather Robinson

Automated Hate Speech Detection and the Problem of Offensive Language

Summary

In this paper, the authors used crowdsourcing to classify tweets into a few categories: those which are hate speech, those which are offensive but not hate speech, and those which are neither. From this data, they trained a pretty accurate model for detecting hate speech on Twitter.

Reflection

I found it very helpful and interesting how the authors went over the common pitfalls of previous models similar to their research. This probably gave them the insight to make their model much more accurate in the first place. I could tell that they made many considerations during the data selection process based on these findings from previous projects, which made the paper’s results much more trustworthy.

Additionally, I would like to point out that while the authors noted some of the features that they looked at initially, they never wrote about which features were used in the final model. I think it would be useful to know which statistics were the deciding factors for hate speech.

Lastly, I would like to applaud the authors’ extensive reflection. Their analysis on where and why their model was strong or weak will definitely be very useful for others in the same field. I particularly found it thought-provoking how they noticed that “people identify racist and homophobic slurs as hateful but tend to see sexist language as merely offensive.” I will definitely use their reflection as a guideline for my future projects.

Further Questions

  1. Why are people generally likely to believe that sexist speech is less hateful than racist or homophobic speech?
  2. How can we improve on some of the weak points listed for their model?
  3. How could this model be used to improve research or social media in the future?
  4. How can we add more definition to the line between hate speech and merely offensive speech? Why is it so blurred, even for us humans?
  5. What kind of content on the Internet do these hateful users consume that may make them more likely to lash out at minority groups?

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

Summary

In this paper, the authors look at 35 different YouTube videos’ analytics to find patterns between “informational” and “conspiracy theory” videos. They find that user engagement between these two types is actually very similar, which leads them to discuss public relations strategies for health organizations.

Reflection

I would primarily like to address the sample size that the authors used for this paper. Somehow, despite the infinitely vast array of content on YouTube, the authors only seemed to scrape up 35 videos total. I believe that if they want their study to be significant enough to give advice to world health organizations, they should’ve had a larger sample size.

Either way, I found the distribution of comment topics between video types to be pretty interesting. Informational viewers mostly spoke about the effects of the disease, whereas conspiracy viewers focused on the causes. This seems somewhat predictable considering the directions each video category takes, but it was intriguing nonetheless.

Further Questions

  1. How could this information be used by YouTube to possibly promote more factual content rather than conspiracies? Should YouTube be allowed to do that?
  2. How is this information useful to the general public?
  3. Should we take these results seriously considering the size of the data set?

Read More

Reflection #2 – [01/30] – [Heather Robinson]

This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News


Summary

As noted in the title, this paper claims that fake news is more similar to explicit satire than to real news. Although the study utilized many data features that seemed to be useful and topical, no raw data was ever shown and the authors provided little reflection on the results.


Reflection

First, I would like to discuss the insufficiency of reflection shown for each finding. The features seemed very in-depth, but were later somewhat disregarded in the overall picture of analysis. While at a glance it seems like there’s plenty of content and discussion after each major finding is presented, very little of the discussion provokes thought or brings up new ideas.

For example, at one point the author writes,

The significant features support this finding: fake news places a high amount of substance and claims into their titles and places much less logic, technicality, and sound arguments in the body text of the article.

This is hardly a “finding”; I’m sure this statement is an early realization for anyone who has seen a few fake news articles. Furthermore, despite the fact that there is a lack in reflection, this paper does a better job than the one we read last week.

Secondly, I would like to note the lack of expansive data sets. I am unsure why the authors used a data set by Buzzfeed — a source that is not exactly professional — as a major basis for their academic findings. I believe that if they truly needed more scholarly information to analyze, they should have expanded their own set of articles (which only had 75 pieces from each type of news source).

Lastly, I would like to say that though most of the findings are not surprising, it is good to see that our notions of fake news are actually supported by concrete, statistical evidence.


Further Questions

  1. What would the distribution across source types look like if we had a feature that denoted the frequency of misspellings throughout the article? I predict that it would be “Fake > Satire = Real.”
  2. How would the distribution of these same features look:
    1. across articles that are not political in nature?
    2. across the Real News sources that were mentioned?
    3. between more polarized news sources such as CNN vs. Fox News?

Read More

Reflection #1 – [01/28] – [Heather Robinson]

Journalists and Twitter: A Multidimensional Quantitative Description of Usage Patterns


Summary

Using millions of tweets from thousands of users, Mossaab Bagdouri was able to conduct what was perhaps the most ambitious and thorough analysis of journalists on Twitter to date. Overall, the University of Maryland professor found that the official news organizations often share links to their own stories, whereas individual journalists seek out engagement with other users more directly. Lastly, Bagdouri found that language barriers may also shape the online behaviors of these journalists — whether dividing or unifying.


Reflection

As an avid Twitter user, I didn’t find many of Bagdouri’s results to be extremely surprising. For example, it was mentioned extensively that professional accounts were less likely to retweet other posts or to engage with other users. From a practical standpoint, this makes sense. The more an organization posts or responds, the more chances there are to mess up.

A retweet from an official, certified account is almost always considered an endorsement of (1) the opinion of the post and (2) the creator of the post. In the realm of Twitter, retweeting from an official account is very dangerous due to the community’s critical culture. In this realm, many individuals must walk on eggshells in order to avoid getting “canceled,” or rejected, by the active community of Twitter.

Furthermore, engaging in comments on Twitter will often descend into heated arguments. Someone could easily reply with benevolent intentions and find themselves in an argument about racial inequalities within an hour of the post. From a public relations standpoint, replying to comments on Twitter would almost never result in a positive interaction for the brand. Thus, companies would obviously be less likely to interact directly with the community than an individual journalist.


Further Questions

  • What would happen if we considered that some of the “news consumers” were also journalists rather than disregarding that fact?
  • How would the data look:
    • with a survey of a larger amount of countries, spanning the same language?
    • with a survey of journalists within the same country, but that speak different languages (perhaps within the United States)?
    • if we also included other forms of journalism, such as YouTube?
    • if we separated the accounts based on whether they were verified?
  • What would the overall sentiment analysis of journalists vs. organizations look like?
  • Could we map a political graph (liberal vs. moderate vs. conservative) by looking at an organization’s most commonly mentioned terms, the other accounts its consumers follow, etc.? Using this map, could we predict a user’s political affiliations only by looking at who they follow on Twitter?

Read More