Reflection #10 – [03/22] – [Ashish Baghudana]

March 22, 2018 ashish Leave a comment

Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.

Lee, Kyumin, James Caverlee, and Steve Webb. “Uncovering social spammers: social honeypots+ machine learning.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.

Summary 1

Kumar et al. present a data-driven view of sockpuppets in online discussion forums and social networks. They identify pairs of sockpuppets in online discussion communities hosted by Disqus. Since the data isn’t labeled, the authors devise an automatic technique of identifying sockpuppets based on the frequency of posting and look for multiple logins from the same IP address. In their research, the authors find linguistic differences in the posts of sockpuppets and ordinary users. Primarily, they find that sockpuppets use the first person or second person more often than normal users. They also find that sockpuppets write poorer English and more likely to be downvoted, reported, or deleted. The authors note that there is a dichotomy of sockpuppets – pretenders and non-pretenders. Finally, the authors build a classifier for sockpuppet pairs and determine the predictors of sockpuppetry.

Critique 1

I found the paper very well structured and the motivations clearly explained. The authors received non-anonymous data of users on Disqus. Their dataset creation technique using IP addresses, user sessions, and frequency of posting was very interesting. However, it appears like they use some sense of intuition to determine these three factors in identifying sockpuppets. In my opinion, they should have attempted to validate their ground truth externally – possibly using Mechanical Turks. Their results also seem to suggest that there is a primary account and the remaining are secondary. In today’s world of fake news and propaganda propagation, I wonder if accounts are created solely for the purpose of promoting one view. I was equally fascinated by the dichotomy of sockpuppets. In the non-pretenders group, users post different content on different forums. This would mean that the sockpuppets are non-interacting. Why then would a person create two identities?

Summary 2

Following the theme of today’s class, the second paper attempts to identify social spammers in social networking contexts like MySpace and Facebook. They propose a social honeypot framework to lure spammers and record their activity, behavior, and information. A honeypot is a user profile with no activity. On a social network platform, if this honeypot receives unsolicited friend requests (MySpace) or followers (Twitter), it is likely a social spammer. The authors collect information about such candidate spam profiles and build an SVM classifier to differentiate spammers and genuine profiles.

Critique 2

Unlike a traditional machine learning model, the authors opt for a human in the loop model. A set of profiles selected by the classifier are marked to human validators. Based on the feedback from the validators, the model is revised. I think this is a good approach to data collection, validation, and model training. As more feedback is incorporated, the model keeps getting better and encompassing different social spam behaviors. The authors also find an interesting classification of social spammers – more often than not, they attempt to sell pornographic content or enhancement pills, promote their businesses or attempt to phish user details by redirecting people to phishing websites. Since the paper is from 2010, they also use MySpace (a now defunct social network?). It would have been nice to see an analysis of which features stood out in their classification task – however, the authors only presented results of different models.

Reflection #10 – 03/15 – Pratik Anand

March 14, 2018March 14, 2018 pratik Leave a comment

Paper 1 : An Army of Me: Sockpuppets in Online Discussion Communities

Paper 2 : Uncovering Social Spammers: Social Honeypots + Machine Learning

The first paper is of special interest to me. It deals with sockpuppets and fake accounts on social media and forums. Being an active user, I see a lot of this in action. The paper identifies sockpuppets accounts are those which are maintained by a single person, referred to as puppeteer, who uses this account to either promote/denounce a certain viewpoint or cause overall dissent without the consequences of getting affected by account bans by moderators.
The authors acknowledge that it is very hard to get ground truth data for this matter. So they use observational studies to get insights into sockpuppet behavior. Could the techniques which are used for ground truth on spam messages be applied for sockpuppets ? In my opinion, a list of banned users in a social forum is a good way to get started.
Some of the traits observed are :

1. Sockpuppets accounts are created quite early on by the users
2. They usually post at the same time and on the same topics.
3. They rarely start a discussion but usually participate in discussions.
4. The discussion topics are always very controversial.
5. They write very simialr content

The authors also observed that the sockpuppets are treated harshly by the community. Is it due to their behavior or just a side-effect of the fact that they only participate in posts about controversial topics? Not all sockpuppets are malicious. The question of pretenders vs non-pretenders was very intriguing. Some people keep a sock puppet for entirely comical/other purposes and I don’t believe that the authors’ method of classifying them based on username is effective enough.
This is because many non-pretenders may keep multiple sockpuppet account based around a joke which will fail to be classified as non-pretending account by the authors’ method.

The authors have provided a case where two sockpuppets, by the same pupetmaster, argue against each other. They justify this behavior as their means to increase traffic to the given post. I am not sure if that is the reason. They didn’t provide a way to identify if those sockpuppets are indeed handled by the same person. There is also a possibility of a group of people maintaining certain sockpuppet accounts. This will make their patterns everchanging and also provide alternate reasoning to the argument point raised above.

The second paper deals with creating honeypots to learn about spam account traits and using it for spam classifier. The authors do a good job of explainaing how social spam is different from email spam as it has a touch of personalized message which is a more effective startegy for luring users. Though this paper doesn’t go into details of how they setup the honeypots, they share the observations from analysing the spammers who got into the honeypot. The honeypots were created on MySpace and Twitter and spammer behavior vary different in both cases. The authors note that MySpace is more of a long form of social communication platform. Thus, they identify “About Me” section as the most important part of a spammer profile which can be used in classification. They make an asusmption that it won’t change radically as it is like a sales pitch and thus, the spam classifiers will be able to detect them. I believe this is a limitation of the technique. About Me can be changed as easily as any other section. It is indeed important but replacing it will be like replacing one sales pitch with another. Hence, that justification doesn’t hold up.

The paper details that the authors’ created MySpace social profile with geographic location pertaining to every state in the USA. What was the reasoning behind this ? Do different geographic locations provide a level of genuineness which these honeypots profile require?
Lastly, can a reverse technique be used by the spamemrs to identify honeypots profiles and take safeguards against them ?

Reflection #9 – [02/22] – [Nuo Ma]

February 22, 2018 lorinma Leave a comment

In [1] The authors studied the attitudes of people who are against vaccines by analyzing the attitudes of participants involved in the vaccination debate on Twitter. They gathered 315240 tweets related to certain phrases from 144817 users in a 3 year time period. Then users were classified into pro vaccine group, anti vaccine group and joining anti vaccine group by comparing linguistic styles, topics of interest and social characteristics. The authors found that the long-term anti-vaccination supporters have conspiratorial views, mistrust in government and are resolute, and these supporters use more direct language and have higher expressions of anger compared to their pro counterparts.. Also, the “joining-anti” share similar conspiracy thinking but they tend to be less assured and more social in nature. I am curious when they first started the analysis, did they identify “typical twitter users” first and then did a manual analysis first? Because I would assume, for a persistent anti-vaccine person, his/her tweets will be consistently very aggressive. But in here, user’s tweets are not considered consistently. By using user ID that comes with tweet raw data, we might be able to find some conflicting users and filter out some noise data, or it would be interesting to see why there is such a conflict in attitude? Also, I would consider those users who constantly tweet about anti-vaccine to be extreme because people just don’t tweet about this. It would be interesting to see how anti-vaccine tweets spread in times of flu season, and how people view this issue. The spread pattern of tweets can show us who are those opinion leaders who can make an impact. To some sense, we can see this as potentially fake news detection, because those conspiracy stories can be defined as fake news.

[1] Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.”

[2] De Choudhury, Munmun, et al. “Predicting depression via social media.”

Reflection #9 – [02/22] – Jamal A. Khan

February 22, 2018 jamal93 Leave a comment

Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.”
De Choudhury, Munmun, et al. “Predicting depression via social media.”

While both paper target serious and important issues, the first ones is more interesting of the two, perhaps due to the nature of the question. The fact that anti-vaxers are prone to believing in conspiracy theories and in general exhibit distrust and a phobia of sorts seems highly logical. I was surprised to see that while the paper highlighted people who joined the anti-vax group, it ignored the people who left! What are linguistic and topical queues that the transitions (anti-vax to pro-vax) exhibit. I believe that this is important to understand to be able to fight against the “self-sealing” quality of conspiracies or in this case anti-vaccination theories. Overall, though the paper was very convincing and thorough.

A follow up question would be to find the trends and growth patterns of these theories and how contageous they are and how long they take to die out. Another interesting thing that could mined is the source of the claims they make and the validity thereof, this would provide insight into the processes involved in the birth of these conspiracies.

Coming onto the second paper, i feel the main motive of the paper is to predict depression, however the choice of classifier or model as i always complain about is weak again. Since the features were hand designed interpretability wouldn’t have been a issue. Therefore, ensemble techniques should’ve been opted for. In particular gradient boosting should have been used here.

Regardless of the choice of the classifiers, a direct followup questions is whether the same techniques can be applied to new forms of social media e.g. Facebook posts aren’t limited to short sentences and dyads are mostly in personal messages, Instagram’s content heavily leans towards pictures and videos etc. Can image analysis provide better insights perhaps? i.e. do other forms of media e.g. pictures and videos that are not text contain a better signal? Another interesting followup question is that whether these episodes of depression are isolated case of it crowd mass depression?. A graph/network analysis might provide good insight

Coming from a more ethics perspective an important question is whether social media platforms have the right to even monitor depression or behavioral traits? If they do find a person who’s highly vulnerable what sort of action can they take? I’m interested to know what other people think of this.

Finally, i would like to mention that the most of the social media posts that i come across these day are either sarcastic or play-off on being super busy, stressed or sad in an attempt to be funny. I believe a lot of these posts would pollute the dataset and it doesn’t seem like the authors have catered for it. Simply relying on law of large numbers isn’t going to get rid of this issue because it’s more of a prevailing trend rather than an outlying one.

Reflection #9 – [02/22] – [Jiameng Pu]

February 22, 2018 jiameng Leave a comment

De Choudhury, Munmun, Michael Gamon, Scott Counts, and Eric Horvitz. “Predicting depression via social media.” ICWSM13 (2013): 1-10.

Summary & Reflection:

Tens of millions of people around the world each year suffer from depression but global provisions and services for identifying, supporting, and treating mental illness of this nature have been considered as insufficient. Since people’s virtual activities on social media can potentially indicate their mental state to some degree, the paper explores the potential to use social media to detect and diagnose the major depressive disorder in individuals. By compiling a set of Twitter users who report being diagnosed with clinical depression and observing their social media postings over a year preceding the onset of depression, they measure and feed behavioral attributes, such as social engagement, emotion, language and linguistic styles, to a statistical classifier that estimates of the risk of depression. Results indicate there are useful signals for characterizing the onset of depression, which can further instrumental in developing practical detection tools for depression.

I’m pretty impressed by using the Amazon’s Mechanical Turk interface to conduct clinical depression survey, which obviously a great advance that can cover more participates. But for the survey design, there is still one common question that whether the participants can remain objective and honest when answering the questionnaire and providing self-reported information since respondents would Because sometimes the respondent would unconsciously or even consciously hide their true situation. Although I was thinking about how to improve it, I did not think of a better way. But What we need to pay special attention to is that the design of the questions, which should appropriately guide the psychological state of the participants. The problem should not be blunt or irritating. For Measuring Depressive Behavior, I’m impressed by some of the measures such as defining the egocentric social graph, but I’m not convinced by some hypothesis like “Individuals with depression condition are likely to use these names in their posts”. In my intuition, I do not think depression patients will positively desire to receive feedback on their effects during the course of treatment. I also strongly feel that one of the most important things in social science research is actually to alleviate biases existing in many places, e.g., the authors in this paper conduct an auxiliary screening test in addition to the CES-D questionnaire to get rid of noisy responses.

Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.” In ICWSM, pp. 269-278. 2016.

Summary & Reflection:

Public health can be threatened by an anti-vaccination movement which reduces the likelihood of disease eradication. Anti-vaccine information can be disseminated on social media like Twitter, thus Twitter data would help understand the drivers of attitudes among participants involved in the vaccination debate. By collecting tweets of users who persistently hold pro and anti-attitudes, and those who newly adopt anti attitudes towards vaccination, they find that those with long-term anti-vaccination attitudes manifest conspiratorial thinking, mistrust in government, and are resolute and in-group focused in language.

By comparing linguistic styles, topics of interest and social characteristics of over 3 million tweets from Twitter, Mitra et al categorize users into 3 group: anti-vaccines, pro-vaccines, and joining-anti vaccine cohort. The data collection process involves 2 main phases where they extracted tweet sample from the Twitter Firehouse stream in the first phase and built a classifier to classify the collected posts as pro-vaccine and anti-vaccine tweets. The MEM model that extracts dimensions along which users express themselves seems pretty interesting and it should be a good tool in other potential areas such as personalized recommendation functionality of the social platform since it can capture clusters of co-occurring words which can identify linguistic dimensions that represent psychologically meaningful themes.

Reflection #9 – [02/22] – [Hamza Manzoor]

February 22, 2018 hamzamanzoor Leave a comment

[1]. MITRA, T.; COUNTS, S.; PENNEBAKER, J.. Understanding Anti-Vaccination Attitudes in Social Media. International AAAI Conference on Web and Social Media, North America, mar. 2016.

[2]. Choudhury, M.D., Counts, S., Gamon, M., & Horvitz, E. (2013). Predicting Depression via Social Media. ICWSM.

Summaries:

In [1] Mitra et al. tried to figure out the anti-vaccination attitudes in social media. The authors tried to understand the attitudes of participants involved in the vaccination debate on Twitter. They used five phrases related to vaccines to gather data from January 01, 2012 to June 30, 2015, totaling to 315,240 tweets generated by 144,817 unique users. After filtering, the final dataset had 49,354 tweets by 32,282 unique users. These users were classified into three groups: pro-vaccine, anti-vaccine and joining-anti (those who convert to anti-vaccination). The authors found that the long-term anti-vaccination supporters have conspiratorial views and are firm in their beliefs. Also, the “joining-anti” share similar conspiracy thinking but they tend to be less assured and more social in nature.

In [2] Choudhury et al. predict depression via social media. The authors use crowdsourcing to compile a set of Twitter users who report being diagnosed with clinical depression, based on a standard psychometric instrument. The total of 1,583 crowd-workers completed the human intelligence tasks and only 637 participants provided the access to their Twitter feeds. After filtering, the final dataset had 476 users who self-reported to have been diagnosed with depression in the given time range. They measured the behavioral attributes relating to social engagement, emotion, language and linguistic styles, ego network, and mentions of antidepressant medications of these 476 users over a year preceding the onset of depression. They built a statistical classifier that provides estimates of the risk of depression before the reported onset with an average accuracy of ~70%. The authors found that the individuals with depression show lowered social activity, greater negative emotion and much greater medicinal concerns and religious thoughts.

Reflections:

Both the papers are very socially relevant and I really enjoyed both readings. The first paper says that the long-term anti-vaccination supporters are very firm in their conspiracy views and we need new tactics to counter the damaging consequences of anti-vaccination beliefs but I think that the paper missed a very key analysis of fourth class of people “joining-pro”. Analyzing the fourth class might have provided some key insights into “tactics” to counter anti-vaccination beliefs. I also have major concerns regarding the data preparation. Even though MMR is related to vaccines but autism has more to do with genes because autism tends to run in families which makes me question that why did 3 out of 5 phrases had autism is it? Secondly, the initial dataset to identify pro and anti stance contained 315,240 tweets generated by 144,817 individuals and the final dataset had 49,354 tweets by 32,282 unique users. This means that each user on average had 1.5 tweets relating to vaccines in almost 3.5 years. Is this data enough to classify the users as pro-vaccination or anti-vaccination? Because millions of tweets of these same users are analyzed in the analysis.

The second paper was also an excellent read and even though 70% accuracy might not be the most desirable results in this case but I really liked the way entire analysis was conducted. The authors mined 10% sample of a snapshot of the “Mental Health” category of Yahoo! Answers. One thing that I would like to know is that is mining websites ethical? Because scraping often violates the terms of service of the target website and secondly, publishing scraped content may be a breach of copyright. I also have a doubt that how can all “randomly” selected 5 posts in Table 2 out of 2.1 million tweets be all exactly related to depression. Was this at all really random?

I also feel that the third post has more to do with sarcasm than depression. Which makes me wonder why sarcasm was not catered for in the analysis?

Furthermore, just like the first paper, I have concerns with the data collection for this paper as well. They had put me off when they said that they paid 90 cents for completing the tasks. Is everyone really going to fill anything with complete honesty for 90 cents? Secondly, 476 out of final 554 users self-reported to have been diagnosed with depression, which is ~86% of users where as, the depression rate is US is ~16%. Which makes me question again that did they fill the survey honestly, especially if you are paying them 90 cents? Other than that, the authors did a terribly good job in analysis especially the features that they created. They covered all grounds from number of inlinks to linguistic styles to even the use of antidepressants. I believe that except for the data part, the analysis of both papers were thorough done and are excellent examples of good experimental designs.

Reflection #9 – [02/22] – [Aparna Gupta]

February 22, 2018February 22, 2018 agupta12 Leave a comment

Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.” ICWSM. 2016.

De Choudhury, Munmun, et al. “Predicting depression via social media.” ICWSM 13 (2013): 1-10.

Reflection:

Paper 1 by Mitra et al., focuses on understanding the Anti-Vaccination Attitude in social media. The authors have collected over 3 million tweets from Twitter, compared and contrasted their linguistic styles, topics of interest, social characteristics and underlying social cognitive dimensions. They have categorized users into 3 group: anti-vaccines, pro-vaccines, and joining-anti vaccine cohort. Their analysis majorly involves examining individual’s overt expressions towards vaccination in a social media platform. The data collection process involved 2 main phases wherein phase 1 involved extracting tweet sample from the Twitter Firehouse stream between January 1 and 5, 2012 and classified tweets based on 5 phrases. Using these phrases, they fetched more tweets spanning four calendar years. Post data collection the authors built a classifier to classify the collected posts as pro-vaccine and anti-vaccine tweets and using trigrams and hashtags as features they built a supervised learning classifier which gave an accuracy of 84.7%. The authors then segregated users into 3 groups: long-term advocates of pro and anti-vaccination attitude and new users adopting the anti-vaccination attitude. I really like the method adopted by Mitra et al, to analyze the “What” aspect of the topics which people generally talk about. The MEM topic modeling approach implemented by the authors looks quite convincing and I wonder as the authors suggest, how can this study be extended to other social media platforms? And will It produce similar results? I didn’t find anything unconvincing in the paper however, I wonder if the same approach can be applied to other domains apart from public health.

Paper 2 by De Choudhury et al, talks about the depression which is a serious challenge in personal and public health. The objective of this paper is to explore the potential use of social media to detect and diagnose the major depressive disorder in individuals. The authors have collected tweets of users who report being diagnosed with clinical depression using crowdsourcing. I wonder how can we differentiate if an individual’s posts depressing content on Twitter only to seek attention or they are actually depressed. The hypothesis: “changes in language, activity, and social ties may be used jointly to construct statistical models to detect and even predict MDD in a fine-grained manner”. Based on the individual’s social media behavior the authors have derived measures like user engagement and emotion, egocentric social graph, linguistic style, depressive language use, and mentions of antidepressant medications – to quantify an individual’s social media behavior. It was interesting that the authors conducted an auxiliary screening test in addition to the CES-D questionnaire to eliminate noisy responses. Although authors have not explicitly indicated in the HITs that the two tests were depression screening questionnaires, However, I believe that the questions in CES-D are quite obvious to make individuals understand that the questionnaire is related to depression. Hence, I am not quite sure if this approach would have helped minimize the possible bias. In the Prediction framework section of the paper where authors have described the models they have implemented to build the classifier, it would have been helpful if they would have given information of the dimensions after dimensionality reduction(PCA).

In the end, both the pair of papers presents some quite interesting results. Re-iterating what I have mentioned earlier, I didn’t find anything unconvincing in both the papers and was quite impressed by both the studies.

Reflection #9 – [02/22] – Ashish Baghudana

February 22, 2018 ashish Leave a comment

Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.” ICWSM. 2016.

De Choudhury, Munmun, et al. “Predicting depression via social media.” ICWSM 13 (2013): 1-10.

Summaries

The papers assigned for this class touch upon the influence of social media on healthcare and medical conditions. The first paper tries to profile users on Twitter into three groups — pro-vaccination, anti-vaccination, and converts to anti-vaccination — based on linguistic features from their Twitter feed over four years. The authors build an attitude (stance) classifier, which is subsequently used to classify users as pro- or anti-vaccination if 70% or more of their tweets leaned one way. Then, the authors run the Meaning Extraction Method on these users tweets to find the themes. They perform a between-groups analysis and observe that anti-vaccination tweeters are anti-government, discuss the effects of organic food, and mention family-related words often, as compared to pro-vaccinations tweets which mentioned chronic health issues and technology more. They also found that converts to anti-vaccination were influenced by the hoax story on “#CDCWhistleBlower”.

Choudhary et al. conduct an interesting study on predicting depression via social media. In the same vein as the first paper, they look to analyze linguistic cues, as well as other behavioral traits, that precede a depression diagnosis. They create a gold standard corpus of tweets from users who were diagnosed with MDD using the CES-D^2 test. Based on user consent, they quantify an individual’s social media behavior for a year ahead of the date of diagnosed depression. The authors find that individuals with depression show lowered social activity, greater negative emotion, increased medicinal concerns, and heightened expression of religious thoughts. The authors eventually build a classifier that can predict depression ahead of time with an accuracy of 70%.

Reflections

I found the content of both papers very engaging and socially relevant. In some sense, I expected anti-vaccination proponents to have similar beliefs about other conspiracy theories, as well as an angry and aggressive tone in their tweets. This was validated by the research. The paper would have even more engaging if the authors discussed the fourth class – users who became pro-vaccination, as the ultimate goal would be to encourage more users to get vaccinations and provide herd immunity. I suspect such an analysis would be useful to dispel other conspiracy theories as well. However, I had two concerns with the dataset:

The authors found 2.12M active-pro (373 users), 0.46M active-anti (70 users), and 0.85M joining-anti (223 users). These users are tweeting almost ~4 times a day. Is it likely some of them are bots?
The authors also assume that all users have the same time of inflection from pro-vaccination to anti-vaccination. I am not certain how valid the assumption will be.

Methodologically, the authors also use the Meaning Extraction Method (MEM) to extract topics. While MEM works well in their case, it would be nice to see their motivation to use a non-standard method when LDA or one of its variants might have worked too. Are there cases where MEM performs better?

I found their experiments in the second paper very well designed. It was nice to see the authors account for bias and noise on Amazon Mechanical Turk by (1) ignoring users who finished within two minutes and (2) using an auxiliary screening test. However, I took the CES-D^2 test and wasn’t quite sure how I felt about the results. I really liked the fact that they publish the depression lexicon (Table 3 in the paper), which showed what linguistic features correlate well with individuals with depression. However, I was concerned about the model’s recall value. The authors highlight the precision and accuracy. However, when it comes to predicting depression, having a high recall value is probably more important. We wouldn’t mind false positives as long as we were able to identify all people who were potentially depressed. Moreover, while the field of social science calls for interpretability, scenarios such as depression perhaps call for simply better models over interpretability. I was also surprised to find that ~36% of their users showed signs of depression. While it is certainly possible that the authors attempted to use a balanced dataset, the number seems on the higher side (especially when global depression percentages are ~5%).

Questions

Facebook recently came out with their depression predicting AI. Professor Munmun De Choudhary was quoted in a Mashable article – “I think academically sharing how the algorithm works, even if they don’t reveal every excruciating detail, would be really beneficial,” she says, “…because right now it’s really a black box.”
Even if it weren’t a black box and details about the features were made available, does one expect them to be very different from their results?
Ever since Russian meddling in the US elections has come out, people have realized the power of bots in influencing public opinion. I expect anti-vaccination campaigns were similarly propelled. Is there a way this can be confirmed? Are bots the way to change public opinion on Twitter / Facebook?

Reflection #9 – [02/22] – [John Wenskovitch]

February 22, 2018 John Wenskovitch Leave a comment

This pair of papers examines the role of social media on aspects of healthcare, both attitudes toward vaccination and predicting depression. In Mitra et al., the authors look at Twitter data to understand linguistic commonalities in users who are consistently pro-vaccine, consistently anti-vaccine, and transition from pro-to-anti. They found that consistently anti-vaccine users are (my wording) conspiracy nutjobs who distrust government and also communicate very directly, whereas users who transition to anti-vaccine seem to be actively looking for that information, being influenced more by concerns about vaccination more than just being generally conspiracy-minded. The De Choudhury et al. paper also uses Twitter (along with mTurk) to measure social media behavioral attributes of depression sufferers. They analyze factors such as engagement, language, and posting time distributions to understand what factors social media factors can be used to separate depressed and non-depressed populations. Following this analysis, they build a ~70% accurate predictor for depression via social media signs.

My biggest surprise with the Mitra et al. paper was the authors’ decision to exclude a cohort of users who transition from anti-vaccine to pro-vaccine. I understand the goals and motivations that the authors have presented, but it feels to me research focused on understanding how best to bring these misguided fools back to reality is just as important as the other way around. Understanding how to prevent others from diving into the anti-vaccine pit is also clearly useful research, but I’d be more interested in reading a study that gives recommendations for rehabilitation rather than prevention, as well as simply displaying what topics are commonly found in discussions around the time that these users return to sanity. I guess it’s a bit late to propose a new class project now, but this really interests me.

Going beyond the linguistic and topical analysis, I’d also be curious to run a network analysis study in this dataset. Twitter affords unidirectional relationships, where an individual follows another user with no guarantee of reciprocation. This leads to interesting research questions such as (1) if a prominent member of the anti-vaccine community follows me back, am I more influenced to join the community? (2) Is the interconnectedness of follow relationships within the anti-vaccine community stronger than in the general population? (3) How long does it take for an incoming member of the anti-vaccine group to be indistinguishable from a long-time member with respect to the number/strength of these follow relationships?

As a depression sufferer myself, the De Choudhury et al. paper was also a very interesting read. I paused in reading the paper to score myself on the revised version of the CES-D test cited, and the result was pretty much what I expected. So there’s one more validation point to demonstrate that the test is accurate.

I thought it was interesting that the authors acquired their participants via mTurk instead of going through more “traditional” routes like putting up flyers in psychiatrist offices. There’s certainly an advantage to getting a large number of participants easily through computational means, and the authors did work hard to ensure that they restricted their study to quality participants, but I’m still a bit wary about using mTurkers for a study. This is especially true in this case, where the self-reporting nature of mTurk is going to stack with the self-reporting nature of depression. Using public Twitter data from these users clearly helps firm up their analysis and conclusions, but my wariness about taking this route in a study of my own hasn’t faded since reading through the paper.

Reflection #9 – 02-22 – Pratik Anand

February 21, 2018February 21, 2018 pratik Leave a comment

Paper 1 : Predicting Depression via Social Media

Paper 2 : Understanding Anti-Vaccination Attitudes in Social Media

The two papers are more distinct than the previous papers for reflections, even though they deal with the issues within the same sphere.

Paper 1 takes a very important topic which is very relevant to everyone, predicting traits of depression through social posts.
The authors observe depression through tweets of the user. A user who is not much active/expressive on twitter cannot be of much help in predicting depression. On the other hand, active twitter users show a lot of sign of going into depression as well as on using anti-depressants. The authors observerd that the traits included negative langauge in tweets and lesser interactions with replies and DM. Are these results generizable to other social platforms ? Can a youtuber ‘s depression predicted by the facial experssions and language on his/her videos ? On extending it youtube, more parameters need to be taken into consideration.

Paper 2 analyses the behavior of anti-vaccinners. It observes how earlier concept of herd community to vaccination is failing due to online information sharing. The authors focus on three groups – pro-vaccine, anti-vaccine and those who recently switch to anti-vaccine and what triggers that. One interesting note in the paper is that they don’t take into account the people who switch to pro-vaccine and their triggers. I believe it will shed some light over what makes someone realise that they were part of a conspirationalist group. This can be used to create methods which reverse the effects of brainwashing by such anti-vaccine groups.

The paper uses MEM topic model to categories user tweets into themes of the tweets anti-vacciners tweet and care about. It doesn’t take into the account of virality of the news topic. For example, during the Syrian revolution, a lot of people were tweeting about government, war, violence etc. The authors don’t mention that if they have taken care to minimize effects of viral news of the tweet topics.
The anti-vacciners group show a close-knit group characteristics. This is equally true in real life also. People generally stay with other people with similar viewpoints or at least compatible view points. This brings to the paper’s conclusion which states that small triggers are enough for a person to join the anti-vaccine group. In my opinion, those people have a long exposure to such thinking outside of twitter and they get vocal only when they join a certain group.