Summaries
The papers assigned for this class touch upon the influence of social media on healthcare and medical conditions. The first paper tries to profile users on Twitter into three groups — pro-vaccination, anti-vaccination, and converts to anti-vaccination — based on linguistic features from their Twitter feed over four years. The authors build an attitude (stance) classifier, which is subsequently used to classify users as pro- or anti-vaccination if 70% or more of their tweets leaned one way. Then, the authors run the Meaning Extraction Method on these users tweets to find the themes. They perform a between-groups analysis and observe that anti-vaccination tweeters are anti-government, discuss the effects of organic food, and mention family-related words often, as compared to pro-vaccinations tweets which mentioned chronic health issues and technology more. They also found that converts to anti-vaccination were influenced by the hoax story on “#CDCWhistleBlower”.
Choudhary et al. conduct an interesting study on predicting depression via social media. In the same vein as the first paper, they look to analyze linguistic cues, as well as other behavioral traits, that precede a depression diagnosis. They create a gold standard corpus of tweets from users who were diagnosed with MDD using the CES-D^2 test. Based on user consent, they quantify an individual’s social media behavior for a year ahead of the date of diagnosed depression. The authors find that individuals with depression show lowered social activity, greater negative emotion, increased medicinal concerns, and heightened expression of religious thoughts. The authors eventually build a classifier that can predict depression ahead of time with an accuracy of 70%.
Reflections
I found the content of both papers very engaging and socially relevant. In some sense, I expected anti-vaccination proponents to have similar beliefs about other conspiracy theories, as well as an angry and aggressive tone in their tweets. This was validated by the research. The paper would have even more engaging if the authors discussed the fourth class – users who became pro-vaccination, as the ultimate goal would be to encourage more users to get vaccinations and provide herd immunity. I suspect such an analysis would be useful to dispel other conspiracy theories as well. However, I had two concerns with the dataset:
- The authors found 2.12M active-pro (373 users), 0.46M active-anti (70 users), and 0.85M joining-anti (223 users). These users are tweeting almost ~4 times a day. Is it likely some of them are bots?
- The authors also assume that all users have the same time of inflection from pro-vaccination to anti-vaccination. I am not certain how valid the assumption will be.
Methodologically, the authors also use the Meaning Extraction Method (MEM) to extract topics. While MEM works well in their case, it would be nice to see their motivation to use a non-standard method when LDA or one of its variants might have worked too. Are there cases where MEM performs better?
I found their experiments in the second paper very well designed. It was nice to see the authors account for bias and noise on Amazon Mechanical Turk by (1) ignoring users who finished within two minutes and (2) using an auxiliary screening test. However, I took the CES-D^2 test and wasn’t quite sure how I felt about the results. I really liked the fact that they publish the depression lexicon (Table 3 in the paper), which showed what linguistic features correlate well with individuals with depression. However, I was concerned about the model’s recall value. The authors highlight the precision and accuracy. However, when it comes to predicting depression, having a high recall value is probably more important. We wouldn’t mind false positives as long as we were able to identify all people who were potentially depressed. Moreover, while the field of social science calls for interpretability, scenarios such as depression perhaps call for simply better models over interpretability. I was also surprised to find that ~36% of their users showed signs of depression. While it is certainly possible that the authors attempted to use a balanced dataset, the number seems on the higher side (especially when global depression percentages are ~5%).
Questions
- Facebook recently came out with their depression predicting AI. Professor Munmun De Choudhary was quoted in a Mashable article – “I think academically sharing how the algorithm works, even if they don’t reveal every excruciating detail, would be really beneficial,” she says, “…because right now it’s really a black box.”
Even if it weren’t a black box and details about the features were made available, does one expect them to be very different from their results? - Ever since Russian meddling in the US elections has come out, people have realized the power of bots in influencing public opinion. I expect anti-vaccination campaigns were similarly propelled. Is there a way this can be confirmed? Are bots the way to change public opinion on Twitter / Facebook?