Reading Reflection #3 – 02/04 – Alec Helyar | CS4984 Spring19: Data Science & Analytics Capstone

Summary

This paper focused on analyzing the social media response to the initial outbreak of the Zika-virus. More specifically, the paper compares the user activity and sentiment on informational content versus conspiratorial content on YouTube. The researchers hoped to determine the difference between the virality and sentiment of the video types and their comments. Among 35 videos related to the Zika-virus, the researchers gathered 20,745 user comments. For each video, the researchers extracted the meta-information and category of the video. From the comments, the researchers performed a sentiment analysis and topic model generation. The researchers discovered that the virality and sentiment of the users were very similar across the two types of videos. They also discovered that the content of the users’ responses were different across the two types, though.

Reflection

I found it interesting that the researchers only used a sample size of 35 videos. At first, I wondered whether this was simply because there were only 35 videos to be found. Further down the article, I picked up that the researchers used every video they found under a string search on July 11th. I suppose the researchers wanted to only observe the videos produced early on in the outbreak, but why didn’t they simply wait until after the outbreak and use the API to only find videos within a certain time frame? There could have been many more videos to use. Why was it necessary that the researchers observe the early period of the outbreak? It didn’t seem to contribute to their research questions.

I also wondered about the data processing choices that they made. It was odd to me that the researchers looked at the linear values of views, comments, replies, likes, dislikes, and shares when analyzing the virality of the videos. Afterall, it’s inherent within the definition of virality that there is an “explosion” of popularity. To me, the researchers should have taken the log of these exponential values. Luckily, this choice did not impact the results of their findings, since the researchers chose to use Spearman correlations instead of Pearson.

Overall, I found the paper to be well handled. The topic was fairly specific in scope, and so I’m not sure if the findings ended up being impactful. As far as future uses for the research go, it would have been also interesting if the researchers could have spoken to the ability to classify conspiratorial vs. informational videos or comments using ML, rather than manually categorizing them. Perhaps something along these lines could result in more impactful findings.

Leave a Reply Cancel reply