Reading Reflection #3 – 2/5/2019 – Bright Zheng

Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Information Video

Summary

This paper focuses on comparing viewer responses on informative and conspiracy videos about the early-phased epidemics (Zika-virus). Authors collected 35 top-YouTube-ranked videos and studied the user activities and the comments and replies. The results show that there is no statistically significant differences across informative and conspiracy video types, in terms of user activities, and there is no differences in the sentiment of the user responses to the two video types. The only significant result is that neither of the two types of video content promotes additional responding per unique user.

Reflection

This paper is a very well-written paper. It has very detailed definition on conspiracy theories and the problem’s background. The authors carefully explained how they obtained the data set, and suitable scenarios of different statistical tests used during the research. 

Different from previous papers that we read for this class, this paper does not draw any significant insights from the all the tests. No significant differences on the selected features between informational and conspiracy theory videos was the final conclusion.

This conclusion is definitely a surprise to me. The 12 collected conspiracy videos are very much like fake news. They are mostly negative and mention a lot of proper nouns in their contents, so these conspiracy videos getting the “same” reactions as the informative ones didn’t reflect the comparison between real and fake news.

I thought the size of the data set (only 35 videos) was a limitation of this work. However, this research focuses solely on the first phase of the Zika-virus, so it might be difficult to collect a larger set of sampling videos. I also realized that people are very unlikely to go further than the top 35 videos on YouTube with any search query, not to mention interacting with the video (like, dislike, comment, etc.) 

One solution to this limitation could be surveying videos from different phases of Zika-virus. This future work enables the following questions

  • Is there a shift on topic weights on comments of both types of videos?
  • Does user activity change over different phases on informative and conspiracy videos?

In the first phase of any new epidemic outbreak, only little facts are known to scientists. However, as more studies are done, maybe scientists will have a better understanding of the epidemic, and informative videos will shift topic from consequences to causes. This topic shift in informative videos might also be reflected in the comments. We can already see that the weight of informative videos’ comments is heavier on “affected stakeholders” and “consequences of Zika” instead of “causes of Zika”, while conspiracy videos’ comments are more focused on the causes.

There also might be changes in user activities on the two types of videos. Conspiracy videos’ contents might be consistent throughout different phases, since the conspiracies are all “explanations for important events that involve secret plots by powerful and malevolent groups.” Informative videos’ contents will for sure change with the increase of known facts and evidences, so people might be more willing to share and view informative videos.

We could also further this study by surveying the first phase of other epidemics; however, YouTube might not be the best social platform if we want to survey a wide range of epidemics.

Automated Hate Speech Detection and the Problem of Offensive Language

Summary

This paper addresses automating hate speech detection using a classifier to identify hate speech and offensive language tweets. Authors collected 33,458 Twitter users and selected 25k random tweets from total 85.4 million tweets. These tweets then get manually labeled into “hate speech”, “offensive language”, and “neither” for supervising learning algorithms and models. Then, the paper went on to talk about the output from the trained classifier.

Reflection

The topic of hate speech vs. offensive language is something that I have never thought about, and I’m surprised that there have been researches that systematically study the difference between the two categories of language. I like how this paper demonstrates the difficulty of this topic by listing out errors in previous studies. I don’t like how the paper is very brief on the Model section. It wasn’t detailed on why exactly they decided to use these certain algorithms and the features used in these models.

Because of the need of large amounts of crowd sourcing and outside resources (hatebase.org), there are a lot of inaccuracies in both the dataset and the results. One obvious solution for the imprecision of the Hatebase lexicon is to find a better outside resource that better identify hate speech lexicon. However, it might be difficult to find a “perfect” outside resource since there is no formal definition for “hate speech”.

Figure 1 in the paper shows that the classifier is more accurate on the “Offensive” and “Neither” categories than on the “Hate” category. I’m wondering whether this is because of the strict definition of “hate speech”.

Future work of this topic may include running considering other methods on the dataset. For example, 

  • Can natural language processing algorithms, such as Named Entity Disambiguation/Recognition, help the classifier to determine what the event trigger is and then make a decision?
  • Can non-textual features, such as location and registered race, help identifying the context?

Read More

Reflection #1 – [1/29] – Bright Zheng

Journalists and Twitter: A Muti-dimensional Quantitative Description of Usage Patterns – Mossaab Bagdouri

Summary

Twitter is a great platform for journalists and news organizations to reach out and interact with their audiences. In this paper, Bagdouri studies the interaction of journalists / news organizations and their audience by surveying 18 features of three different account categories (journalists, news organizations, and news consumers),  across two languages and cultural backgrounds (English-speaking and Arab-speaking countries), and three types of news media (print, radio, and television). By performing Welch’s t-test and Kolmogrov-Smirnov test with these features and different categories of Twitter accounts, Bagdouri found that journalists tend to target and have personal engagements with their audience, whereas news organizations prefer broadcasting their posts and are more official. The same pattern is found when comparing Arab journalists and English journalists, where Arab journalists appear to broadcast more tweets and more distinguishable from their audience than English journalists. The paper also finds that journalists across different media types are very different. 

Reflection

This is an overall very interesting paper. I really like how Bagdouri compares journalists from different cultural backgrounds and how he compares journalists/news organizations with their audiences. These comparisons are probably not done in previous work, since this research surveys the largest set of twitter accounts and tweets and is more focused on journalists. However, the paper’s definition for “audience” intrigues me. “Audience” in this paper are accounts that “have a bidirectional follower / friend relationship” with selected journalists. The limitation that the paper addresses is that some of these audience accounts might include other journalists, but since the number of journalists is statistically insignificant compare to the total number, this isn’t really a limitation. This definition of “audience” omits twitter accounts that follow these journalists and don’t have a bidirectional follower relationship, and these twitter accounts are true audience of these journalists in my opinion. People that journalists follow may include other journalists and their friends in real life, and they are not representative of what the actual audience perception and reaction are like. I think it would be better to survey all the followers, instead of just the ones with bidirectional relationship with the journalists.

Another thing that intrigues me is that the journalists are not separated into news categories. Sports journalists and news organizations might tweet differently and have different interactions with their audience than the ones that cover politics. Audience might also react to different categories of news differently. For example, I often see sports fans tweeting out their excitement or disappointment with original tweets, which means less interaction with the journalists, and people react to political figures’ tweets or news by retweeting the original tweet extensively. Analyzing different news categories of journalists and news organizations can definitely be developed as part of the future work of this research.

This research can also consider a third type of users, Opinion-ist (writers who write Opinions). Opinion-ist are not journalists, but they still talk about news and are often quite influential. It would be interesting to compare journalists and opinion-ist to see how similar they are and how personal/targeting they are, since the research already showed that journalists are more targeting and relatable than news organization accounts. 

Since this research can be seen as an extension of De Choudhury, Diakopoulos, and Naaman (2012)’s classifier research, an account classifier that’s based on Bagdouri’s data would be interesting. This classifier will make the verification process easier and suggest more targeted accounts to new users, and it should be able to at least identify whether an account belongs to a journalist, a news organization, or a news consumer, and what language they primarily speak. If more analysis is done with different news categories, the classifier should also be able to categorize different accounts.

Read More