[Reading Reflection 4] – [02/07] – [Raghu, Srinivasan]

Summary:

This paper was primarily about comparing the similarities and differences between comments and video content in a selection of right-wing and baseline Youtube accounts. Through analyzing the lexicon, topics, and implicit biases present in the texts, comparisons were made between these groups based on a set of features. Some of the key conclusions drawn are listed below.

  • Right-wing accounts are typically more specific in their content, and also contain a higher percentage of negative words, such as “aggression”, “kill”, and “violence”. On the other hand, baseline channels typically contain a higher percentage of positive words, such as “joy” and “optimism”.
  • Comments generally use more words such as “disgust” and “hate”, whereas captions typically use more words such as “aggression”, “rage”, and “violence”. Youtube commentators are generally more exacerbated than video hosts on hate and discrimination topics.
  • There was not a significant difference in immigrant or LGBT bias between the both types of accounts. However, right-wing accounts tend to have a negative bias towards Muslim communities.
  • Right-wing accounts tend to have a higher bias against immigrants and Muslim communities in captions compared to a higher bias against LGBT groups in comments. They also typically raise more topics related to war and terrorism.

Reflection:

I have listed below a line that interested me in the paper.

“Among the top ranked topics for the right-wing captions, we observe a relevant frequency of words related to war and terrorism, including nato, torture and bombing, and a relevant frequency of words related to espionage and information war, like assange, wikileaks”

This statistic did not come as any surprise to me while reading this paper since most right-wing people heavily support increased defense spending to help end the war on terrorism. Wikileaks would also be a hot topic amonst right-wing people since Hillary Clinton’s emails from her private email server were leaked on Wikileaks. I’m also left wondering what the similarities and differences amongst comments and video content on left-wing videos are and how they compare to that of these videos. Would there be just as much negative sentiment on those videos as well? Or would they employ more positive sentiment? A potential glimpse into the answer to this question could be inferred upon from the fact that the study used The Young Turks as a baseline source, which is interesting since their views are fairly liberal. This source ended up having a more positive sentiment when referring to immigrants, LGBT, and Muslim groups. However, that’s not to say that there could exist other groups that left-wing videos may be more hostile towards.

Other Thoughts:

Overall, I wasn’t too surprised with the results of the paper, given the nature of right-wing politics. I’m very interested in seeing how these results compare with left-wing videos, as I believe that hate is likely to be found on the far right and far left ends of the political spectrum.

Read More

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos

Summary:

This paper was primarily about the differences between informational and conspiracy videos referring to the Zika virus during the most recent outbreak. Through analyzing the 35 most popular informational and conspiracy videos concerning the virus, comparisons were made between these types of videos based on a set of features. Some of the key conclusions drawn are listed below.

  • A majority of the videos were classified as being informational videos (23 out of 35).
  • There were no differences found in user activity or the sentiment of user responses between the two types of videos.

Reflection:

I have listed below a line that interested me in the paper.

“YouTube considers the number of views as the fundamental parameter of video popularity. Hence, collecting videos with the highest number of views, for our given search string, allows us to capture those videos that would be listed first by search engines”

I’m very hesitant to believe that views are the fundamental parameter of video popularity. Although views are the most obvious factor in portraying video popularity, they can easily be skewed. It’s impossible to determine if views are paid or organic, and if a user wants to get a point across with a video that likely won’t garnish much attention (i.e. a conspiracy theory), they may resort to this method. I think another potential factor to consider in choosing videos would be audience retention, as there are users who see a title and click on a video, but immediately stop watching after a few seconds.

Other Thoughts:

Overall, I was surprised by the similar sentiment found in both types of videos, as I had imagined there to be a drastic difference. One statistic that did not surprise me was the lower vaccination rates in states where misinformation was prevalent.

Automated Hate Speech Detection and the Problem of Offensive Language

Summary:

This paper was primarily about hate-speech detection, and differentiating between hate speech, offensive language, and neither. Through analyzing tweets from each of these categories, a crowd-sourced classifier was trained to classify between these groups. Some of the key conclusions drawn are listed below.

  • Tweets with the highest probability of being hate speech usually contain racial or homophobic slurs.
  • Lexical methods are effective ways to identify offensive language, but are inaccurate in determining hate speech.

Reflection:

I have listed below a line that interested me in the paper.

“While these tweets contain terms that can be considered racist and sexist it is apparent than many Twitter users use this type of language in their everyday communications.”

Classifying hate speech appears to be an arduous task, arguably more difficult than detecting fake news articles. This is especially due to the fact that many Twitter users use hate speech trigger words when they are not tweeting out hate speech (a common misclassification is with song lyrics). In order to better detect and classify hate speech, I believe that there needs to be a more consistent definition of what the term encompasses. More research also needs to be done on being able to factor in false words such as “love” that consistently fool hate speech detectors.

Other Thoughts:

Overall, I found myself pretty engaged in this topic. I think that the topic of targeting hateful content is interesting, and I’m heavily considering working on it for the semester project.

Read More

[Reading Reflection 2] – [01/30] – [Raghu, Srinivasan]

Summary:

This paper was primarily about how fake news appears to resemble satirical news as opposed to the commonly-held belief that fake news resembles real news. Through analyzing data sets containing fake, real, and satirical news, comparisons were made between these groups based on a set of features. Some of the key conclusions drawn are listed below.

  • The content of fake news and real news articles is vastly different from one another. Real news articles tend to be slightly longer, whereas fake articles use fewer technical words and typically require a lower educational level to read. Fake news also contains more redundancy and personal pronouns.
  • Titles are a significant factor in differentiating between fake and real news articles. Fake news titles tend to be longer and usually contain more capitalized words and proper nouns. It’s also common for fake news titles to have several verb clauses, as the writers try to squeeze as much substance as they can.
  • Fake news articles are more similar to satire than real news articles. Both types of articles usually contain smaller words and tend to use fewer technical words. In addition, both fake news and satirical articles tend to contain more redundancy.

Reflection:

I have listed below a few of the lines that interested me in the paper.

“SentiStrength is a sentiment analysis tool that reports a negative sentiment integer score between -1 and -5 and a positive sentiment integer score between 1 and 5, where -5 is the most negative and 5 is the most positive.”

SentiStrength is one of the more interesting features these data sets are being analyzed upon. The results of the study showed that there was a correlation between fake news and generally negative sentiment. Although the sentiment of real news articles tends to be more neutral, the majority of satirical articles have a negative sentiment as well. Which begs the question, could there be a sentiment analysis tool that could analyze sarcasm and humor? Such a tool could certainly aid in distinguishing between fake news articles and satire.

“Fake news packs the main claim of the article into its title, which often is about a specific person and entity, allowing the reader to skip reading the article, which tends to be short, repetitive, and less informative.”

Fake news has been one of the prevalent issues that our country has faced in recent years, primarily due to having a significant impact on the 2016 US Presidential Election. If Facebook or Google users only read a headline without delving in and determining the legitimacy of an article, then fake news suddenly changes from being an annoyance to a danger. Machine learning has an obvious application here in tracking down fake news, but this raises several other questions.

  • Will users be alright with Facebook restricting what they can and cannot share with their friends?
  • What happens if there are articles with a mix of real news (to bypass fake news detectors) but also contain fake news?
  • Is this task too much for AI and does it require the presence of a human?

Other Thoughts:

Overall, the paper’s results were not surprising at all, as satire is one of the primary forms of fake news. Although with a different agenda, satire does not contain legitimate news, and has the potential to mislead readers. One of the principal questions asked in determining the credibility of an article is “Does this article seem like satire?”

Read More

[Reading Reflection 1] – [01/28] – [Raghu, Srinivasan]

Summary:

This paper was primarily about how Twitter was being used by news producers and consumers, particularly in analyzing the differences between how journalists and organizations in Arab and European English-speaking countries use the platform. Through crawling thousands of tweets from various accounts, comparisons were made between these groups based on a set of features. Some of the key conclusions drawn are listed below.

  • News outlets tend to have a more official style and share more links than journalists on their accounts.
  • Journalists tend to target their communication and maintain a personal engagement with their audience. It’s also found that journalists may be using Twitter to gather information. Organizations, however, tend to broadcast their tweets and avoid the personal engagement that journalists pursue in their tweets.
  • Arab journalists prefer to broadcast their tweets more than the average English journalist will. They also tend to be more distinguishable compared to news consumers in the Arab world.
  • Print and radio journalists have a large number of differences between each other, whereas TV journalists tend to share characteristics with both groups.
  • Journalists who speak the same language but reside in different countries tend to share many similarities with each other.

Reflection:

I have listed below a few of the lines that interested me in the paper.

“These two features perhaps suggest that people who want to get the news from Twitter expect to find them in the timelines of the organizations more than from the journalists.”

This suggestion based on the collected data did not surprise me at all, as organizations tend to be more well-known compared to journalists reporting similar information. Therefore, it would make sense that a larger majority of people expect to find news in the timelines of organizations. Organizations also have a greater chance of being verified compared to individual journalists. However, this does make me ponder if journalists had broadcasted information or became verified just as often, would users flock to their timelines for gathering information? Although journalists tend to have more personal engagements on Twitter, I’m curious as to whether or not users may decide to visit timelines of journalists to gather information if those accounts had appeared on their feed more often.

“The broadcast communication behavior is evident for Arab journalists. They tweet more than twice as much as the English ones, share 75% more links, and use 39% more hashtags.”

This statistic was interesting to me, as I’m curious as to why Arab journalists broadcast their information more so than their English counterparts. Could this have any doing with the fact that Arab journalists are more distinguishable than news consumers? It’s also interesting to me that Arab journalists are on Twitter more often, as I’m interested in which regions of the world is Twitter a more dominant news source.

“British journalists have more followers than the Irish ones (45K vs. 10K) and are included in more lists (262 vs. 98). But these facts are not surprising when we take in consideration the number of inhabitants in these two countries.”

This statistic surprised me because I believed that different regions of the world would have different levels of Twitter activity. However, this statistic shows that journalists who speak the same language from different countries have a lot in common. This makes me wonder whether or not there is a relationship between language and level of Twitter activity.

Other Considerations:

Is the President’s use of Twitter to deliver key information influencing others to join Twitter? Is it making Twitter users more likely to use Twitter as their primary medium of obtaining news?

Read More