Reading Reflection #4 – [02/07/2019] – [Chris Blair] | CS4984 Spring19: Data Science & Analytics Capstone

In this article perhaps the first and biggest flaw that jumped out at me was the seemingly disconnect with the authors picking the baseline channels, I agree with the right-wing channels and I realize that they do not consider them neutral in thought and politically. However; to denote some of them particularly “Vice NEWS” and “The Young Turks” can be considered extremely left winged with their own fragrance of conspiracy theories leading to the possible of confusing attribution for any sort of classifier or classification to be made on the levels of hate speech relative to the baseline. I would almost go as far to say the authors may have picked these baseline channels in order to make their results seem more plausible, would it not be far enough to take main stream media outlets YouTube channels and show the difference there? I mean “Anonymous Official” creates vast webs of conspiracies and falsehood that are sometimes privy to lies and focus on hate and discrimination, “Drama Alert” is merely a YouTube tabloid, I could drone on and on, but the writers should have put more time into constructing their basis in order to compare things to because it is damaging to their results which are actually pretty telling. The topic words are more proof of this false equivalency, whilst most of the of the topic words focus on right wing news interpretation, many of the baseline stay within the realm of YouTube. They only seek to create, transform, or communicate YouTube drama within the platform in order to make themselves more money through controversy. Concluding with the combination of picking a weak baseline coupled with the troubling topic words found within their own corpus leads us to the conclusion that their baseline should really mirror actual news sources as the current ones lessen the accuracy of the prediction by have tangentially different objectives that the right-wing media vs. the baseline are attempting to achieve.

However; I appreciate what this study has done because it gives us a good foundation for future projects, I really want to advance the cause of mental health through data analytics so I think extending off this study would be worthwhile. Starting later in the paper once again at the topics words the using a similar approach we can take posts and comments for youtubers videos about various different things and examine the sentiment, conduct a three-fold analysis described in this paper on captions, headlines, and comments on different popular YouTube channels within certain age brackets particularly 13-15,16-18, 19-21, this is likely the people who will be the most representative populations on YouTube currently. First, we would start by taking the most popular channels, doing sentiment analysis on the captions, comments, and headlines and determining what is the median, where are the outliers and how we should classify the baseline as a mix of good and bad sentiment, then we can build our tests groups out of the overwhelmingly good and bad mix of channels. Secondly, we apply the same approach as the paper outlines with their three-fold analysis and topic modeling we can draw the topics matrix and look for troubling words within these YouTube video captions, comments, or headlines. People in the age bracket of 13-21 typically leave these distressful comments as a form of escapism which often leads them to more isolation as they keep trying to escape their own isolation, which will eventually spiral into depression and social ineptitude. Finding these comments would allow us to provide these people with the help and support they need before their conditions worsens, it allows us to then further increase our net as we tighten the algorithm as well!

Reading Reflection #4 – [02/07/2019] – [Chris Blair]

chrisb56

Leave a Reply Cancel reply