Reading Reflection #4 – [02/07/2019] – [Chris Blair]

In this article perhaps the first and biggest flaw that jumped out at me was the seemingly disconnect with the authors picking the baseline channels, I agree with the right-wing channels and I realize that they do not consider them neutral in thought and politically. However; to denote some of them particularly “Vice NEWS” and “The Young Turks” can be considered extremely left winged with their own fragrance of conspiracy theories leading to the possible of confusing attribution for any sort of classifier or classification to be made on the levels of hate speech relative to the baseline. I would almost go as far to say the authors may have picked these baseline channels in order to make their results seem more plausible, would it not be far enough to take main stream media outlets YouTube channels and show the difference there? I mean “Anonymous Official” creates vast webs of conspiracies and falsehood that are sometimes privy to lies and focus on hate and discrimination, “Drama Alert” is merely a YouTube tabloid, I could drone on and on, but the writers should have put more time into constructing their basis in order to compare things to because it is damaging to their results which are actually pretty telling. The topic words are more proof of this false equivalency, whilst most of the of the topic words focus on right wing news interpretation, many of the baseline stay within the realm of YouTube. They only seek to create, transform, or communicate YouTube drama within the platform in order to make themselves more money through controversy. Concluding with the combination of picking a weak baseline coupled with the troubling topic words found within their own corpus leads us to the conclusion that their baseline should really mirror actual news sources as the current ones lessen the accuracy of the prediction by have tangentially different objectives that the right-wing media vs. the baseline are attempting to achieve.

However; I appreciate what this study has done because it gives us a good foundation for future projects, I really want to advance the cause of mental health through data analytics so I think extending off this study would be worthwhile. Starting later in the paper once again at the topics words the using a similar approach we can take posts and comments for youtubers videos about various different things and examine the sentiment, conduct a three-fold analysis described in this paper on captions, headlines, and comments on different popular YouTube channels within certain age brackets particularly 13-15,16-18, 19-21, this is likely the people who will be the most representative populations on YouTube currently. First, we would start by taking the most popular channels, doing sentiment analysis on the captions, comments, and headlines and determining what is the median, where are the outliers and how we should classify the baseline as a mix of good and bad sentiment, then we can build our tests groups out of the overwhelmingly good and bad mix of channels. Secondly, we apply the same approach as the paper outlines with their three-fold analysis and topic modeling we can draw the topics matrix and look for troubling words within these YouTube video captions, comments, or headlines. People in the age bracket of 13-21 typically leave these distressful comments as a form of escapism which often leads them to more isolation as they keep trying to escape their own isolation, which will eventually spiral into depression and social ineptitude. Finding these comments would allow us to provide these people with the help and support they need before their conditions worsens, it allows us to then further increase our net as we tighten the algorithm as well!

Read More

Reading Reflection #3 – [02/05/2019] – [Chris Blair]

What I found most interesting about “Early Public Responses to the Zika-Virus on YouTube: Prevalence of and Differences Between Conspiracy Theory and Informational Videos” is taking this concept and tangentially applying it to a different concept. If we take disease outbreaks, even on the regional or town level and trace it into a machine learning algorithm we could find out trends that would allow us to be even more proactive when it comes to detecting and preventing outbreaks. For example, recent measles, mumps, and rubella outbreaks (https://www.who.int/news-room/detail/29-11-2018-measles-cases-spike-globally-due-to-gaps-in-vaccination-coverage) have been traced backed to the increased incidence of anti-vaccination conspiracy theories and their effect on populations to act and chose not to vaccinate their children. Using these principles, we can study social media activity on many different and popular anti-vaccination pages to target hotbeds where this disease could spread and push specific media campaigns to convince and dissuade people from joining this damaging movement. Continuing on this, we can also use this for mental health awareness by using a specific machine learning we a look at troubling tweets or Facebook posts made by individuals and analyze them in order to determine problem areas or even typical mental health triggers, be it environment, drugs, poverty, abuse, etc. we will be able to get a holistic view on the issues that are plaguing our society the most and bring them into the forefront. These are just two possible ideas but based on sentiment mapping we can also see how diseases spread commonly like the flu, is represented on social media, historically, and then be able to develop certain rules that allow us, using historical data to predict or see from a bird’s eye perspective what the current condition of the outbreak is at and what we can do to influence it.

What I found most interesting is that this project used a no context approach, not in the fact that the grammar had no context rather that they didn’t consider the user who the words were being said to. This inspired me with an idea which could be an extension and addition to this current paper, first we can use image recognition to determine the race/ethnicity of someone then using that in order to figure out through the context given by the person receiving the hateful threats is actually hateful of like the paper was mentioning simply just part of a song. Continuing this we can use certain classifiers like Facebook user profile to get people’s religion as well and determine whether or not the user could be the recipient of hateful speech from that standpoint. However; after researching it seems like this is difficult more as many facial recognition software has been trained substantially on Caucasian models, so it seems like they are overtuned in that regard. This paper however does give us a great baseline to work with in preventing hate speech, as we now have the grammar in order to figure out whether or not a certain tweet or post is in fact hate speech, so now all we have to do is make the model more accurate by adding more context, maybe looking at the users account who is posting the hate speech to see if they have a history of posting hate speech or perhaps looking at the person account to see if they have a history of receiving hate speech. All this details we can use to make the model more accurate by eliminating confounding factors like song lyrics, sarcasm, quoting, etc.

Read More

[Reflection #2] – [01/31] -[Christopher Blair]

Perhaps the most interesting yet somewhat obvious conclusion of this paper happens on the sixth page where they state simply that “Titles are a strong differentiating factor between real and fake news”. This begs the question, for real or fake news who do we put the engagement decider on, as in, given that a real news article wants to communicate without giving away the best parts of the story, fake news articles seem to by and large explain exactly what they say in the article in the title. This is interesting proposition because it could be the basis for further explaining the phenomena of fake news. Particularity, the self-referential notions that people consider that is inherent to our bias is really in the fake news articles that we click, if we suppose that the title explains most of the story then could we delve deeper in the psyche of someone who frequently reads fake news? For example, if we were to run a study in which people who were identified were frequent consumers of fake news sites, the research question that I would ask is how long do they spend reading the article? If the title alone describes each story point as was presented in the paper then why do they even need to read the article in the first place, well I believe that in the first it is a form of social validation and constrained exploration in the form that, people will not venture out of their comfort zones into places where they disagree, this is what I believe can be explained through this study. Secondly, I believe it is like instant gratification, we know what is going to happen because we read the title of the article, so we feel gratified when we click on the story and confirm that notion. Finally, to answer the research question I believe that compared to a real news story, much less time would be spent on a fake news article because of the over-descriptive title however, the fake news website would likely try to drive inter-site engagement by presenting the user with further fake news articles to click on after. I am fixated on the titles of fake news more so than the content, because the titles themselves drive users to click their articles to the detriment of public discourse.

When it comes to the heuristical claim, I am very interested because I wanted to understand where they could possibly derive these statistics from in a way that would allow them to actively make claims without just simply synthesizing the numbers. This notion is complicated, but I remember my relatively basic statistics class about Simpsons Paradox, this is a particularity interesting jump where you could get data to “work for you”. If you can take statistics and decouple them and present them by stripping previous context from them then you can essentially make radicicolous claims, like this segment from a webpage (https://blog.revolutionanalytics.com/2013/07/a-great-example-of-simpsons-paradox.html)

Since 2000, the median US wage has risen about 1%, adjusted for inflation.

But over the same period, the median wage for:

  • high school dropouts,
  • high school graduates with no college education,
  • people with some college education, and
  • people with Bachelor’s or higher degrees

have all decreased. In other words, within every educational subgroup, the median wage is lower now than it was in 2000.

Considering most fake news is motivated by a specific special interest, working to shift perspectives of people by claiming fallacious statistical claims that would never be valid. I believe that through an extension of this paper we can look at the incidence of statistical claims within fake news articles in order to build a classifier that can take regular real news statistical claims, a separate one that can classify fake news claims and figure out whether there is a predominate style in either. My conjecture is that fake news will follow a format to the degree of citing singular statistics in isolation so that many of their readers may be unable to consider the context and externalities that need to be consider with that statistic in order to give it context. This would be an interesting project because it would give us some meta-knowledge within the field of data analytics to be on the lookout for publications that may commit this fallacy. As well as keeping people informed that the statistics that they are reading could be skewed to implant a specific idea into their heads.

Read More