Reflection #9 – [02/22] – [Nuo Ma]

In [1] The authors studied the attitudes of people who are against vaccines by analyzing the attitudes of participants involved in the vaccination debate on Twitter. They gathered 315240 tweets related to certain phrases from 144817 users in a 3 year time period. Then users were classified into pro vaccine group, anti vaccine group and joining anti vaccine group by comparing linguistic styles, topics of interest and social characteristics. The authors found that the long-term anti-vaccination supporters have conspiratorial views, mistrust in government and are resolute, and these supporters use more direct language and have higher expressions of anger compared to their pro counterparts..  Also, the “joining-anti” share similar conspiracy thinking but they tend to be less assured and more social in nature. I am curious when they first started the analysis, did they identify “typical twitter users” first and then did a manual analysis first? Because I would assume, for a persistent anti-vaccine person, his/her tweets will be consistently very aggressive. But in here, user’s tweets are not considered consistently. By using user ID that comes with tweet raw data, we might be able to find some conflicting users and filter out some noise data, or it would be interesting to see why there is such a conflict in attitude? Also, I would consider those users who constantly tweet about anti-vaccine to be extreme because people just don’t tweet about this. It would be interesting to see how anti-vaccine tweets spread in times of flu season, and how people view this issue. The spread pattern of tweets can show us who are those opinion leaders who can make an impact. To some sense, we can see this as potentially fake news detection, because those conspiracy stories can be defined as fake news.



[1] Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.”

[2] De Choudhury, Munmun, et al. “Predicting depression via social media.”

Read More

Reflection #9 – [02/22] – Jamal A. Khan

  • Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.”
  • De Choudhury, Munmun, et al. “Predicting depression via social media.”

While both paper target serious and  important issues, the first ones is more interesting of the two, perhaps due to the nature of the question. The fact that anti-vaxers are prone to believing in conspiracy theories and in general exhibit distrust and a phobia of sorts seems highly logical. I was surprised to see that while the paper highlighted people who joined the anti-vax group, it ignored the people who left! What are linguistic and topical queues that the transitions (anti-vax to pro-vax) exhibit. I believe that this is important to understand to be able to fight against the “self-sealing” quality of conspiracies or in this case anti-vaccination theories. Overall, though the paper was very convincing and thorough.

A follow up question would be to find the trends and growth patterns of these theories and how contageous they are and how long they take to die out. Another interesting thing that could mined is the source of the claims they make and the validity thereof, this would provide insight into the processes involved in the birth of these conspiracies.

Coming onto the second paper, i feel the main motive of the paper is to predict depression, however the choice of classifier or model as i always complain about is weak again. Since the features were hand designed interpretability wouldn’t have been a issue. Therefore, ensemble techniques should’ve been opted for. In particular gradient boosting should have been used here.

Regardless of the choice of the classifiers, a direct followup questions is whether the same techniques can be applied to new forms of social media e.g. Facebook posts aren’t limited to short sentences and dyads are mostly in personal messages, Instagram’s content heavily leans towards pictures and videos etc.  Can image analysis provide better insights perhaps? i.e. do other forms of media e.g. pictures and videos that are not text contain a better signal? Another interesting followup question is that whether these episodes of depression are isolated case of it crowd mass depression?. A graph/network analysis might provide good insight

Coming from a more ethics perspective an important question is whether social media platforms have the right to even monitor depression or behavioral traits? If they do find a person who’s highly vulnerable what sort of action can they take? I’m interested to know what other people think of this.

Finally, i would  like to mention that the most of the social media posts that i come across these day are either sarcastic or play-off on being super busy, stressed or sad in an attempt to be funny. I believe a lot of these posts would pollute the dataset and it doesn’t seem like the authors have catered for it. Simply relying on law of large numbers isn’t going to get rid of this issue because it’s more of a prevailing trend rather than an outlying one.

Read More

Reflection #9 – [02/22] – [Jiameng Pu]

De Choudhury, Munmun, Michael Gamon, Scott Counts, and Eric Horvitz. “Predicting depression via social media.” ICWSM13 (2013): 1-10.

Summary & Reflection:

Tens of millions of people around the world each year suffer from depression but global provisions and services for identifying, supporting, and treating mental illness of this nature have been considered as insufficient. Since people’s virtual activities on social media can potentially indicate their mental state to some degree, the paper explores the potential to use social media to detect and diagnose the major depressive disorder in individuals. By compiling a set of Twitter users who report being diagnosed with clinical depression and observing their social media postings over a year preceding the onset of depression, they measure and feed behavioral attributes, such as social engagement, emotion, language and linguistic styles, to a statistical classifier that estimates of the risk of depression. Results indicate there are useful signals for characterizing the onset of depression, which can further instrumental in developing practical detection tools for depression.

I’m pretty impressed by using the Amazon’s Mechanical Turk interface to conduct clinical depression survey, which obviously a great advance that can cover more participates. But for the survey design, there is still one common question that whether the participants can remain objective and honest when answering the questionnaire and providing self-reported information since respondents would Because sometimes the respondent would unconsciously or even consciously hide their true situation. Although I was thinking about how to improve it, I did not think of a better way. But What we need to pay special attention to is that the design of the questions, which should appropriately guide the psychological state of the participants. The problem should not be blunt or irritating. For Measuring Depressive Behavior, I’m impressed by some of the measures such as defining the egocentric social graph, but I’m not convinced by some hypothesis like “Individuals with depression condition are likely to use these names in their posts”. In my intuition, I do not think depression patients will positively desire to receive feedback on their effects during the course of treatment. I also strongly feel that one of the most important things in social science research is actually to alleviate biases existing in many places, e.g., the authors in this paper conduct an auxiliary screening test in addition to the CES-D questionnaire to get rid of noisy responses.

Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.” In ICWSM, pp. 269-278. 2016.

Summary & Reflection:

Public health can be threatened by an anti-vaccination movement which reduces the likelihood of disease eradication. Anti-vaccine information can be disseminated on social media like Twitter, thus Twitter data would help understand the drivers of attitudes among participants involved in the vaccination debate. By collecting tweets of users who persistently hold pro and anti-attitudes, and those who newly adopt anti attitudes towards vaccination, they find that those with long-term anti-vaccination attitudes manifest conspiratorial thinking, mistrust in government, and are resolute and in-group focused in language.

By comparing linguistic styles, topics of interest and social characteristics of over 3 million tweets from Twitter, Mitra et al categorize users into 3 group: anti-vaccines, pro-vaccines, and joining-anti vaccine cohort. The data collection process involves 2 main phases where they extracted tweet sample from the Twitter Firehouse stream in the first phase and built a classifier to classify the collected posts as pro-vaccine and anti-vaccine tweets. The MEM model that extracts dimensions along which users express themselves seems pretty interesting and it should be a good tool in other potential areas such as personalized recommendation functionality of the social platform since it can capture clusters of co-occurring words which can identify linguistic dimensions that represent psychologically meaningful themes.

Read More

Reflection #9 – [02/22] – [Hamza Manzoor]

[1]. MITRA, T.; COUNTS, S.; PENNEBAKER, J.. Understanding Anti-Vaccination Attitudes in Social Media. International AAAI Conference on Web and Social Media, North America, mar. 2016.

[2]. Choudhury, M.D., Counts, S., Gamon, M., & Horvitz, E. (2013). Predicting Depression via Social Media. ICWSM.


In [1] Mitra et al. tried to figure out the anti-vaccination attitudes in social media. The authors tried to understand the attitudes of participants involved in the vaccination debate on Twitter. They used five phrases related to vaccines to gather data from January 01, 2012 to June 30, 2015, totaling to 315,240 tweets generated by 144,817 unique users. After filtering, the final dataset had 49,354 tweets by 32,282 unique users. These users were classified into three groups: pro-vaccine, anti-vaccine and joining-anti (those who convert to anti-vaccination). The authors found that the long-term anti-vaccination supporters have conspiratorial views and are firm in their beliefs. Also, the “joining-anti” share similar conspiracy thinking but they tend to be less assured and more social in nature.

In [2] Choudhury et al. predict depression via social media. The authors use crowdsourcing to compile a set of Twitter users who report being diagnosed with clinical depression, based on a standard psychometric instrument. The total of 1,583 crowd-workers completed the human intelligence tasks and only 637 participants provided the access to their Twitter feeds. After filtering, the final dataset had 476 users who self-reported to have been diagnosed with depression in the given time range. They measured the behavioral attributes relating to social engagement, emotion, language and linguistic styles, ego network, and mentions of antidepressant medications of these 476 users over a year preceding the onset of depression. They built a statistical classifier that provides estimates of the risk of depression before the reported onset with an average accuracy of ~70%. The authors found that the individuals with depression show lowered social activity, greater negative emotion and much greater medicinal concerns and religious thoughts.


Both the papers are very socially relevant and I really enjoyed both readings. The first paper says that the long-term anti-vaccination supporters are very firm in their conspiracy views and we need new tactics to counter the damaging consequences of anti-vaccination beliefs but I think that the paper missed a very key analysis of fourth class of people “joining-pro”. Analyzing the fourth class might have provided some key insights into “tactics” to counter anti-vaccination beliefs. I also have major concerns regarding the data preparation. Even though MMR is related to vaccines but autism has more to do with genes because autism tends to run in families which makes me question that why did 3 out of 5 phrases had autism is it? Secondly, the initial dataset to identify pro and anti stance contained 315,240 tweets generated by 144,817 individuals and the final dataset had 49,354 tweets by 32,282 unique users. This means that each user on average had 1.5 tweets relating to vaccines in almost 3.5 years. Is this data enough to classify the users as pro-vaccination or anti-vaccination? Because millions of tweets of these same users are analyzed in the analysis.

The second paper was also an excellent read and even though 70% accuracy might not be the most desirable results in this case but I really liked the way entire analysis was conducted. The authors mined 10% sample of a snapshot of the “Mental Health” category of Yahoo! Answers. One thing that I would like to know is that is mining websites ethical? Because scraping often violates the terms of service of the target website and secondly, publishing scraped content may be a breach of copyright. I also have a doubt that how can all “randomly” selected 5 posts in Table 2 out of 2.1 million tweets be all exactly related to depression. Was this at all really random?

I also feel that the third post has more to do with sarcasm than depression. Which makes me wonder why sarcasm was not catered for in the analysis?

Furthermore, just like the first paper, I have concerns with the data collection for this paper as well. They had put me off when they said that they paid 90 cents for completing the tasks. Is everyone really going to fill anything with complete honesty for 90 cents? Secondly, 476 out of final 554 users self-reported to have been diagnosed with depression, which is ~86% of users where as, the depression rate is US is ~16%. Which makes me question again that did they fill the survey honestly, especially if you are paying them 90 cents? Other than that, the authors did a terribly good job in analysis especially the features that they created. They covered all grounds from number of inlinks to linguistic styles to even the use of antidepressants. I believe that except for the data part, the analysis of both papers were thorough done and are excellent examples of good experimental designs.


Read More

Reflection #9 – [02/22] – [Aparna Gupta]

Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.” ICWSM. 2016.
De Choudhury, Munmun, et al. “Predicting depression via social media.” ICWSM 13 (2013): 1-10.

Paper 1 by Mitra et al., focuses on understanding the Anti-Vaccination Attitude in social media. The authors have collected over 3 million tweets from Twitter, compared and contrasted their linguistic styles, topics of interest, social characteristics and underlying social cognitive dimensions. They have categorized users into 3 group: anti-vaccines, pro-vaccines, and joining-anti vaccine cohort. Their analysis majorly involves examining individual’s overt expressions towards vaccination in a social media platform. The data collection process involved 2 main phases wherein phase 1 involved extracting tweet sample from the Twitter Firehouse stream between January 1 and 5, 2012 and classified tweets based on 5 phrases. Using these phrases, they fetched more tweets spanning four calendar years. Post data collection the authors built a classifier to classify the collected posts as pro-vaccine and anti-vaccine tweets and using trigrams and hashtags as features they built a supervised learning classifier which gave an accuracy of 84.7%. The authors then segregated users into 3 groups: long-term advocates of pro and anti-vaccination attitude and new users adopting the anti-vaccination attitude. I really like the method adopted by Mitra et al, to analyze the “What” aspect of the topics which people generally talk about. The MEM topic modeling approach implemented by the authors looks quite convincing and I wonder as the authors suggest, how can this study be extended to other social media platforms? And will It produce similar results? I didn’t find anything unconvincing in the paper however, I wonder if the same approach can be applied to other domains apart from public health.

Paper 2 by De Choudhury et al, talks about the depression which is a serious challenge in personal and public health. The objective of this paper is to explore the potential use of social media to detect and diagnose the major depressive disorder in individuals. The authors have collected tweets of users who report being diagnosed with clinical depression using crowdsourcing.  I wonder how can we differentiate if an individual’s posts depressing content on Twitter only to seek attention or they are actually depressed. The hypothesis: “changes in language, activity, and social ties may be used jointly to construct statistical models to detect and even predict MDD in a fine-grained manner”. Based on the individual’s social media behavior the authors have derived measures like user engagement and emotion, egocentric social graph, linguistic style, depressive language use, and mentions of antidepressant medications – to quantify an individual’s social media behavior. It was interesting that the authors conducted an auxiliary screening test in addition to the CES-D questionnaire to eliminate noisy responses. Although authors have not explicitly indicated in the HITs that the two tests were depression screening questionnaires, However, I believe that the questions in CES-D are quite obvious to make individuals understand that the questionnaire is related to depression. Hence, I am not quite sure if this approach would have helped minimize the possible bias. In the Prediction framework section of the paper where authors have described the models they have implemented to build the classifier, it would have been helpful if they would have given information of the dimensions after dimensionality reduction(PCA).

In the end, both the pair of papers presents some quite interesting results. Re-iterating what I have mentioned earlier, I didn’t find anything unconvincing in both the papers and was quite impressed by both the studies.

Read More

Reflection #9 – [02/22] – Ashish Baghudana

Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.” ICWSM. 2016.
De Choudhury, Munmun, et al. “Predicting depression via social media.” ICWSM 13 (2013): 1-10.


The papers assigned for this class touch upon the influence of social media on healthcare and medical conditions. The first paper tries to profile users on Twitter into three groups — pro-vaccination, anti-vaccination, and converts to anti-vaccination — based on linguistic features from their Twitter feed over four years. The authors build an attitude (stance) classifier, which is subsequently used to classify users as pro- or anti-vaccination if 70% or more of their tweets leaned one way. Then, the authors run the Meaning Extraction Method on these users tweets to find the themes. They perform a between-groups analysis and observe that anti-vaccination tweeters are anti-government, discuss the effects of organic food, and mention family-related words often, as compared to pro-vaccinations tweets which mentioned chronic health issues and technology more. They also found that converts to anti-vaccination were influenced by the hoax story on “#CDCWhistleBlower”.

Choudhary et al. conduct an interesting study on predicting depression via social media. In the same vein as the first paper, they look to analyze linguistic cues, as well as other behavioral traits, that precede a depression diagnosis. They create a gold standard corpus of tweets from users who were diagnosed with MDD using the CES-D^2 test. Based on user consent, they quantify an individual’s social media behavior for a year ahead of the date of diagnosed depression.  The authors find that individuals with depression show lowered social activity, greater negative emotion, increased medicinal concerns, and heightened expression of religious thoughts. The authors eventually build a classifier that can predict depression ahead of time with an accuracy of 70%.


I found the content of both papers very engaging and socially relevant. In some sense, I expected anti-vaccination proponents to have similar beliefs about other conspiracy theories, as well as an angry and aggressive tone in their tweets. This was validated by the research. The paper would have even more engaging if the authors discussed the fourth class – users who became pro-vaccination, as the ultimate goal would be to encourage more users to get vaccinations and provide herd immunity. I suspect such an analysis would be useful to dispel other conspiracy theories as well. However, I had two concerns with the dataset:

  • The authors found 2.12M active-pro (373 users), 0.46M active-anti (70 users), and 0.85M joining-anti (223 users). These users are tweeting almost ~4 times a day. Is it likely some of them are bots?
  • The authors also assume that all users have the same time of inflection from pro-vaccination to anti-vaccination. I am not certain how valid the assumption will be.

Methodologically, the authors also use the Meaning Extraction Method (MEM) to extract topics. While MEM works well in their case, it would be nice to see their motivation to use a non-standard method when LDA or one of its variants might have worked too. Are there cases where MEM performs better?

I found their experiments in the second paper very well designed. It was nice to see the authors account for bias and noise on Amazon Mechanical Turk by (1) ignoring users who finished within two minutes and (2) using an auxiliary screening test. However, I took the CES-D^2 test and wasn’t quite sure how I felt about the results. I really liked the fact that they publish the depression lexicon (Table 3 in the paper), which showed what linguistic features correlate well with individuals with depression. However, I was concerned about the model’s recall value. The authors highlight the precision and accuracy. However, when it comes to predicting depression, having a high recall value is probably more important. We wouldn’t mind false positives as long as we were able to identify all people who were potentially depressed. Moreover, while the field of social science calls for interpretability, scenarios such as depression perhaps call for simply better models over interpretability. I was also surprised to find that ~36% of their users showed signs of depression. While it is certainly possible that the authors attempted to use a balanced dataset, the number seems on the higher side (especially when global depression percentages are  ~5%).


  1. Facebook recently came out with their depression predicting AI. Professor Munmun De Choudhary was quoted in a Mashable article – “I think academically sharing how the algorithm works, even if they don’t reveal every excruciating detail, would be really beneficial,” she says, “…because right now it’s really a black box.”
    Even if it weren’t a black box and details about the features were made available, does one expect them to be very different from their results?
  2. Ever since Russian meddling in the US elections has come out, people have realized the power of bots in influencing public opinion. I expect anti-vaccination campaigns were similarly propelled. Is there a way this can be confirmed? Are bots the way to change public opinion on Twitter / Facebook?

Read More

Reflection #9 – [02/22] – [John Wenskovitch]

This pair of papers examines the role of social media on aspects of healthcare, both attitudes toward vaccination and predicting depression.  In Mitra et al., the authors look at Twitter data to understand linguistic commonalities in users who are consistently pro-vaccine, consistently anti-vaccine, and transition from pro-to-anti.  They found that consistently anti-vaccine users are (my wording) conspiracy nutjobs who distrust government and also communicate very directly, whereas users who transition to anti-vaccine seem to be actively looking for that information, being influenced more by concerns about vaccination more than just being generally conspiracy-minded.  The De Choudhury et al. paper also uses Twitter (along with mTurk) to measure social media behavioral attributes of depression sufferers.  They analyze factors such as engagement, language, and posting time distributions to understand what factors social media factors can be used to separate depressed and non-depressed populations.  Following this analysis, they build a ~70% accurate predictor for depression via social media signs.

My biggest surprise with the Mitra et al. paper was the authors’ decision to exclude a cohort of users who transition from anti-vaccine to pro-vaccine.  I understand the goals and motivations that the authors have presented, but it feels to me research focused on understanding how best to bring these misguided fools back to reality is just as important as the other way around.  Understanding how to prevent others from diving into the anti-vaccine pit is also clearly useful research, but I’d be more interested in reading a study that gives recommendations for rehabilitation rather than prevention, as well as simply displaying what topics are commonly found in discussions around the time that these users return to sanity.  I guess it’s a bit late to propose a new class project now, but this really interests me.

Going beyond the linguistic and topical analysis, I’d also be curious to run a network analysis study in this dataset.  Twitter affords unidirectional relationships, where an individual follows another user with no guarantee of reciprocation.  This leads to interesting research questions such as (1) if a prominent member of the anti-vaccine community follows me back, am I more influenced to join the community?  (2) Is the interconnectedness of follow relationships within the anti-vaccine community stronger than in the general population?  (3) How long does it take for an incoming member of the anti-vaccine group to be indistinguishable from a long-time member with respect to the number/strength of these follow relationships?

As a depression sufferer myself, the De Choudhury et al. paper was also a very interesting read.  I paused in reading the paper to score myself on the revised version of the CES-D test cited, and the result was pretty much what I expected.  So there’s one more validation point to demonstrate that the test is accurate.

I thought it was interesting that the authors acquired their participants via mTurk instead of going through more “traditional” routes like putting up flyers in psychiatrist offices.  There’s certainly an advantage to getting a large number of participants easily through computational means, and the authors did work hard to ensure that they restricted their study to quality participants, but I’m still a bit wary about using mTurkers for a study.  This is especially true in this case, where the self-reporting nature of mTurk is going to stack with the self-reporting nature of depression.  Using public Twitter data from these users clearly helps firm up their analysis and conclusions, but my wariness about taking this route in a study of my own hasn’t faded since reading through the paper.

Read More

Reflection #9 – 02-22 – Pratik Anand

Paper 1 : Predicting Depression via Social Media

Paper 2 : Understanding Anti-Vaccination Attitudes in Social Media


The two papers are more distinct than the previous papers for reflections, even though they deal with the issues within the same sphere.

Paper 1 takes a very important topic which is very relevant to everyone, predicting traits of depression through social posts.
The authors observe depression through tweets of the user. A user who is not much active/expressive on twitter cannot be of much help in predicting depression. On the other hand, active twitter users show a lot of sign of going into depression as well as on using anti-depressants. The authors observerd that the traits included negative langauge in tweets and lesser interactions with replies and DM. Are these results generizable to other social platforms ? Can a youtuber ‘s depression predicted by the facial experssions and language on his/her videos ?  On extending it youtube, more parameters need to be taken into consideration.

Paper 2 analyses the behavior of anti-vaccinners. It observes how earlier concept of herd community to vaccination is failing due to online information sharing. The authors focus on three groups – pro-vaccine, anti-vaccine and those who recently switch to anti-vaccine and what triggers that. One interesting note in the paper is that they don’t take into account the people who switch to pro-vaccine and their triggers. I believe it will shed some light over what makes someone realise that they were part of a conspirationalist group. This can be used to create methods which reverse the effects of brainwashing by such anti-vaccine groups.

The paper uses MEM topic model to categories user tweets into themes of the tweets anti-vacciners tweet and care about. It doesn’t take into the account of virality of the news topic. For example, during the Syrian revolution, a lot of people were tweeting about government, war, violence etc. The authors don’t mention that if they have taken care to minimize effects of viral news of the tweet topics.
The anti-vacciners group show a close-knit group characteristics. This is equally true in real life also. People generally stay with other people with similar viewpoints or at least compatible view points. This brings to the paper’s conclusion which states that small triggers are enough for a person to join the anti-vaccine group. In my opinion, those people have a long exposure to such thinking outside of twitter and they get vocal only when they join a certain group.


Read More

Reflection #9 – [02/22] – [Vartan Kesiz-Abnousi]

First Paper Reviewed
[1] MITRA, T.; COUNTS, S.; PENNEBAKER, J.. Understanding Anti-Vaccination Attitudes in Social Media. International AAAI Conference on Web and Social Media, North America, mar. 2016. Available at: <>. Date accessed: 21 Feb. 2018.


The authors examine the attitudes of people who are against vaccines. They compare them with a pro-vaccine group and to their differences with the people are just joining the anti-vaccination camp. The data is four years of longitudinal data from Twitter, capturing vaccination discussions on Twitter. They identify three groups: those who are persistently pro vaccine, those who are persistently anti vaccine and users who newly join the anti-vaccination cohort. After fetching each cohort’s entire timeline of tweets, totaling to more than 3 million tweets, we compare and contrast their linguistic styles, topics of interest, social characteristics and underlying cognitive dimensions.  Subsequently, they built a classifier to determine positive and negative attitudes towards vaccination. They find that people holding persistent anti-vaccination attitudes use more direct language and have higher expressions of anger compared to their pro counterparts. Adopters of anti-vaccine attitudes show similar conspiratorial ideation and suspicion towards the government.


The article stresses that alternative methods should be adopted (non-official sources) in order to change the opinion of those who belong in the anti-vaccination group. However, this would work on the targeted groups who have anti-vaccination attitudes. If the informational method changes, it might have adverse effects, in the sense that it might revert pro-vaccination people into anti-vaccination.

I wonder if they could use unsupervised learning and perform and explorative analysis in order to find more groups of people. In addition, I didn’t know that population attitudes extracted from tweet sentiments has been shown to correlate with traditional polling data.

For the first phase, the authors use snowball samples. However, such samples are subject to numerous biases. For instance, people who have many friends are more likely to be recruited into the sample. I also find it interesting that the final set of words basically included a permutation of the words: mmr, autism, vaccine and measles. Is this what anti vaccination groups mainly focus on? The authors use a qualitative examination and find that trigrams and hashtags were prominent cues of a tweet’s stance towards vaccination. Interestingly enough, only “Organic Food” is statistically significant both Between Groups and Within Time.


  1. What kind of qualitative examination made the authors choose trigrams and hastaghs as the prominent cues of a tweet’s stance towards vaccination?
  2. I wonder whether the authors could find more than the three groups by using an unsupervised learning method.
  3. The number of Pre-Time Tweets are significantly less than Post-Time Tweets. Was that intentional?


Second Paper Reviewed

[2] Choudhury, M.D., Counts, S., Gamon, M., & Horvitz, E. (2013). Predicting Depression via Social Media. ICWSM.


The mail goal of the paper is to predict Major Depressive Disorder (henceforth MDD), as the title suggests, through social media. They author collect their data via crowdsourcing, specifically Amazon Turk. They ask them to complete a standardized depression form (CES-D) and they compare the answers to another standardized form (BID) in order to see whether they are correlated. They quantify the user’s behavior through their Twitter posts. They include two groups those who do suffer from depression and those who do not and they compare the two groups. Finally, they build a classifier that predicts MMD which has an accuracy of 70%.

The authors suggest that Twitter posts contains useful signals for characterizing the onset of depression in individuals, as measured through decrease in social activity, raised negative affect, highly clustered egonetworks, heightened relational and medicinal concerns, and greater expression of religious involvement.


It should be noted that the classification is based on the behavioral attributes of people who already had a depression. Did they ask them for how long they suffer from a major depression disorder? I imagine someone who has been diagnosed for having depression years ago might have different behavioral attributes compared to someone who has been diagnosed a few months ago. In addition, being diagnosed as having depression is not equivalent to the actual onset of depression.

What if they collected the pre-depression onset tweets and compare them with the post-depression tweets? That might be an interesting extension. In addition, since the tweets are from the same individuals, factors that do not change temporarily could be controlled.

Something that puzzles me is their seemingly ad-hoc choice of onset dates. Specifically, they keep individuals with depression onset dates anytime in the last one year, but no later than three months prior to the day the survey was taken. Are they discarding individuals who have depression onset dates for more than one year? There is an implicit assumption that people who suffer from MMD are homogeneous.


  1. Why do they keep depression onset dates within the last year? Why not go further back?
  2. There is an implicit assumption by the authors. That people who suffer from a MDD are the same (i.e. homogeneous). Is someone who suffers from MDD for years the same as someone suffers for a few months? This lack of distinction might affect the classification model.
  3. An extension would be to study the twitter posts of the people who have MDD through time. Specifically, pre-MDD vs post-MDD behavior, for the same users. Since they are the same users, they will be able to control for factors that do not change through time.

Read More

Reflection 8 [Anika Tabassum]

Experimental evidence of massive-scale emotional contagion through social networks


The paper analyzes the influence of emotions of people on his friends and other people in social networks. The authors observe user posts from real-time social network data like Facebook for over 20 yrs. They identify posts as positive and negative from the words contained in the posts. They observe the behavior of people using two different patterns. First reducing positive contents from user news feed, second reducing negative contents from user news feed. Their observation show that people having more negative contents in news feed post more negative status and vice versa.



Some challenges and questions-

The paper identify positive/negative contents and posts with words. What if some positively used words used in the posts in negative or sarcastic way?

Can it happen that on reaction to some negative contents the status updated are positive? People’s perspective can be different.

How the contents are identified as positive/negative. Same posts or content can be negative to one while positive to other people.

Some ideas:

better observation to understand which contents change people reaction most? Is it more vocal over posts/ texts or video, photos etc.


Read More