Reflection #13 – [04/10] – [Ashish Baghudana]

Pryzant, Reid, Young-joo Chung and Dan Jurafsky. “Predicting Sales from the Language of Product Descriptions.” (2017).

Summary

Pryzant’s paper focuses on predicting sales of products from product descriptions. The problem is not straightforward because sales are often impacted by brand loyalty and pricing strategies, and not merely a function of the language of product descriptions. The authors solve this problem by using a neural network with an adversarial objective – a good predictor of sales, but a bad predictor of brand and price. They use the adversarial neural network to mine textual features to feed into their mixed effects model where the textual features form the fixed effects, and the product and the brand form random effects.

The authors use the Japanese e-commerce website Rakuten as their data source and choose two categories – chocolate and health. The motivation to choose these two is that

  • high variability in the chocolate products
  • pharmaceutical goods that are often sold wholesale
  • the two categories are at two ends of a spectrum

Performance metrics showed that R2 values with textual features and random effects consistently performed better than without the textual features. On further analysis, the authors find that influential keywords had the following properties – informativeness, authority, seasonality relevance, and politeness.

Critique

I enjoyed reading the paper for multiple reasons. Firstly, the focus of the paper, despite being neural network-based, was not on the model itself, but on the final goal of finding influential keywords and why they were influential. Secondly, the use of the neural network in this model was not just to do a bunch of predictions (though I’m sure it would be good at that), it was to do feature selection. As some of the other critiques already mentioned, this is counter-intuitive but seems to work really well. Thirdly, adversarial objectives haven’t worked very well for text previously, but the authors were able to find good use of the technique in textual feature selection.

A few points that I had comments about:

  • The authors chose chocolates and health products as the two categories. Health products (in my opinion) do not have any brand loyalty associated with them, especially if you have only one choice, or if the drug is generic. In such a category, why does one need to control for the brand?
  • On a related note, could the authors have done this analysis on a product category like shoes or electronics?
  • Thirdly, the results that arose out of the influential keyword analysis reflect the Japanese culture. The keywords are likely to differ in a country like the US (where product descriptions might play a significantly more important role) or India (where the price would play a very important role).
  • Finally, images are becoming key to selling products in the online space. I have no study to prove this, but increasingly users have decreased attention spans and focus on the images and the reviews to decide if they want a product. It would be interesting to incorporate those features as well in future work.

 

 

Read More

Reflection #12 – [04/05] – [Ashish Baghudana]

Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.” arXiv preprint arXiv:1708.00524 (2017).
Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.” Multimedia tools and applications 76.8 (2017): 10653-10676.

Summary 1

In the first paper, Felbo et al. build an emoji predictor on an extremely large dataset of 1.2 billion tweets and obtain state-of-the-art performance on sentiment, emotion, and sarcasm detection tasks. Since the dataset is not already labeled, the authors use distant supervision as an alternate. The authors demonstrate the success of their DeepMoji model on their dataset and transfer this knowledge to other target tasks. Transfer learning is achieved through a new approach they name “chain-thaw” that fine-tunes a layer at a time. The experiments section shows DeepMoji (with d=1024) achieving a top-5 accuracy of 43.8%, a 5% increase from fasttext’s classification module. The benchmarking experiment also shows DeepMoji (chain-thaw) outperforming state of the art techniques for the specific dataset and task.

Critique 1

The paper does numerous things really well. Firstly, their dataset is huge! This could indeed be one of the reasons for the success of their model. While the approach seems nearly perfect, I would love to know how long training a model on such large a dataset takes. Secondly, they built a website around their model – https://deepmoji.mit.edu/ and I really liked the way users can type in a sentence to obtain emojis associated with it. Interesting to note is that this dataset was obtained before Twitter shifted from 140 characters to 280 characters. The DeepMoji website refuses to process anything over 140 characters. I am not sure if this is a limitation on the front-end, or if the model’s accuracy diminishes beyond this limit. Finally, the paper is definitely more in the machine learning space than the computational social science space, at least in its current form. A good follow-up paper on this would be to use the DeepMoji model to detect bullying or trolls on Twitter (if they are associated more with specific emojis). It is also nice to see the code and the model being open-sourced and easily available for other researchers to use.

Summary 2

In the second paper, Nguyen et al. use linguistic and topic analysis-based features to classify sub-groups of online depression communities. They choose to study the online social media, Live Journal (LJ). LJ is divided into multiple communities and each community has several users posting about topics related to the community. The authors select a final cohort of 24 communities with 38,401 posts which were subsequently grouped into 5 subgroups – depression, bipolar disorder, self-harm, grief/bereavement, and suicide. Their features included LIWC and weights from corpus-topic distribution and topic-word distribution. Using these features, they built 4 different classifiers and found that Lasso performed the best.

Critique 2

I had several problems with this paper. The motivation for the paper was confusing – the authors wish to analyze characteristics of depression, however, they immediately deviate from this problem statement. Subsequently, they categorize five kinds of communities – depression, bipolar disorder, self-harm, grief, and suicide, but do not say why there are five categories, not more or less. The dataset itself collected is small and the authors do not provide any information about how the dataset was labeled. If the authors themselves labeled the communities themselves, it might have introduced bias into the training data, which could have easily been alleviated by using Amazon Mechanical Turks.

From my understanding of the features used, the authors run LIWC analysis to get 68 psycholinguistic features, and subsequently topic distribution for each post. They subsequently run a feature selection technique and show which features were important for four binary classifiers i.e. depression vs. bipolar, self-harm, grief, and suicide. Running feature selection and building four binary classifiers makes it difficult to understand coefficients of the model. The five communities could have been compared better if the authors built a multi-class classifier. Furthermore, I did not understand the semantic meaning of the topics and why they had higher weights for some classifiers without looking at the topics themselves. The authors also do not provide any justification to why they ran LDA with 50 topics. They should have run a perplexity-topics plot to determine the number of topics by the elbow method. Finally,  I also did not find any information about their train-test/cross-validation process.

Overall, it feels like this could paper could do with rework on the dataset and more discussion. I was not left with any feel for what constitutes depression vs. self-harm vs. bipolar disorder, and so on.

Read More

Reflection #11 – [03/27] – [Ashish Baghudana]

King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science 345.6199 (2014): 1251722.
Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.” ICWSM. 2015.

Summary 1

King et al. conducted a large-scale experimental study of censorship in China by creating their own social media websites, submitting different posts, and observing how these were reviewed and/or censored. They obtained technical know-how in the use of automatic censorship software from the support services of the hosting company. Based on user guides, documentation, and personal interviews, the authors deduced that most social media websites in China conduct an automatic review through keyword matching. The keywords are generally hand-curated. They reverse-engineer the keyword list by posting their own content and observing which posts get through. Finally, the authors find that posts that invoke collective action are censored as compared to criticisms of the government or its leaders.

Reflection 1

King et al. conduct fascinating research in the censorship domain. (The paper felt as much a covert spy operation as research work). The most interesting observation from the paper is that posts about collective action are censored, but not those that criticise the government. This is labeled the collection action hypothesis vs. state critique hypothesis. This means two things – negative posts about the government are not filtered, and positive posts about the government can get filtered. The paper also finds that automated reviews are not very useful. The authors observe a few edge cases too – posts about corruption, wrongdoing, senior leaders in the government (however innocuous their actions might be), and sensitive topics such as Tibet are automatically censored. These may not bring about any collective action, either online or offline, but are still deemed censor-worthy. The paper makes the claim that certain topics are censored irrespective of whether they are for or against the topic.

I came across another paper by the same set of authors from 2017 – King, Gary, Jennifer Pan, and Margaret E. Roberts. “How the Chinese government fabricates social media posts for strategic distraction, not engaged argument.” American Political Science Review 111.3 (2017): 484-501. If censorship is one side of the coin, then bots and sockpuppets constitute the other. It would not be too difficult to imagine “official” posts by the Chinese government that favor their point of view and distract the community from more relevant issues.

The paper threw open several interesting questions. Firstly, is there punishment for writing posts that go against the country policy? Secondly, the Internet infrastructure in China must be enormous. From a systems scale, do they ensure each and every post goes through their censorship system?

Summary 2

The second paper by Hiruncharoenvate et al. carries the idea of keyword-based censoring forward. They base their paper on the observation that activists have employed homophones of censored words to get past automated reviews. The authors develop a non-deterministic algorithm that generates homophones for the censored keywords. They suggest that homophone transformations would cost Sina Weibo an additional 15 hours per keyword per day. They also find that posts with homophones tend to stay 3 times longer on the site on average. The authors also tie up the paper by demonstrating that native Chinese readers did not face any confusion while reading the homophones – i.e. they were able to decipher their true meaning.

Reflection 2

Of all the papers we have read for the Computational Social Science, I found this paper to be the most engaging, and I liked the treatment of their motivations, design of experiments, results, and discussions. However, I also felt disconnected because of the language barrier. I feel natural language processing tasks in Mandarin can be very different from that in English. Therefore, I was intrigued by the choice of algorithms (tf-idf) that the authors use to obtain censored keywords, and then further downstream processing. I am curious to hear how the structure of Mandarin influences NLP tasks from a native Chinese speaker!

I liked Experiment 2 and the questions in the AMT task. The questions were unbiased and actually evaluate if the person understood which words were mutated.

However, the paper also raised other research questions. Given the publication of this algorithm, how easy is it to reverse engineer the homophone generation and block posts that contain the homophones as well? They keyword-matching algorithm could be tweaked just a little to add homophones to the list, and checking if several of these homophones occurred together or with other banned keywords.

Finally, I am also interested in the definitions of free speech and how they are implemented across different countries. I am unable to draw the line between promoting free speech and respecting the sovereignty of a nation and I am open to hearing perspectives from the class about these issues.

Read More

Reflection #10 – [03/22] – [Ashish Baghudana]

Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.
Lee, Kyumin, James Caverlee, and Steve Webb. “Uncovering social spammers: social honeypots+ machine learning.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.

Summary 1

Kumar et al. present a data-driven view of sockpuppets in online discussion forums and social networks. They identify pairs of sockpuppets in online discussion communities hosted by Disqus. Since the data isn’t labeled, the authors devise an automatic technique of identifying sockpuppets based on the frequency of posting and look for multiple logins from the same IP address. In their research, the authors find linguistic differences in the posts of sockpuppets and ordinary users. Primarily, they find that sockpuppets use the first person or second person more often than normal users. They also find that sockpuppets write poorer English and more likely to be downvoted, reported, or deleted. The authors note that there is a dichotomy of sockpuppets – pretenders and non-pretenders. Finally, the authors build a classifier for sockpuppet pairs and determine the predictors of sockpuppetry.

Critique 1

I found the paper very well structured and the motivations clearly explained. The authors received non-anonymous data of users on Disqus. Their dataset creation technique using IP addresses, user sessions, and frequency of posting was very interesting. However, it appears like they use some sense of intuition to determine these three factors in identifying sockpuppets. In my opinion, they should have attempted to validate their ground truth externally – possibly using Mechanical Turks. Their results also seem to suggest that there is a primary account and the remaining are secondary. In today’s world of fake news and propaganda propagation, I wonder if accounts are created solely for the purpose of promoting one view. I was equally fascinated by the dichotomy of sockpuppets. In the non-pretenders group, users post different content on different forums. This would mean that the sockpuppets are non-interacting. Why then would a person create two identities?

Summary 2

Following the theme of today’s class, the second paper attempts to identify social spammers in social networking contexts like MySpace and Facebook. They propose a social honeypot framework to lure spammers and record their activity, behavior, and information. A honeypot is a user profile with no activity. On a social network platform, if this honeypot receives unsolicited friend requests (MySpace) or followers (Twitter), it is likely a social spammer. The authors collect information about such candidate spam profiles and build an SVM classifier to differentiate spammers and genuine profiles.

Critique 2

Unlike a traditional machine learning model, the authors opt for a human in the loop model. A set of profiles selected by the classifier are marked to human validators. Based on the feedback from the validators, the model is revised. I think this is a good approach to data collection, validation, and model training. As more feedback is incorporated, the model keeps getting better and encompassing different social spam behaviors. The authors also find an interesting classification of social spammers – more often than not, they attempt to sell pornographic content or enhancement pills, promote their businesses or attempt to phish user details by redirecting people to phishing websites. Since the paper is from 2010, they also use MySpace (a now defunct social network?). It would have been nice to see an analysis of which features stood out in their classification task – however, the authors only presented results of different models.

Read More

Reflection #9 – [02/22] – Ashish Baghudana

Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.” ICWSM. 2016.
De Choudhury, Munmun, et al. “Predicting depression via social media.” ICWSM 13 (2013): 1-10.

Summaries

The papers assigned for this class touch upon the influence of social media on healthcare and medical conditions. The first paper tries to profile users on Twitter into three groups — pro-vaccination, anti-vaccination, and converts to anti-vaccination — based on linguistic features from their Twitter feed over four years. The authors build an attitude (stance) classifier, which is subsequently used to classify users as pro- or anti-vaccination if 70% or more of their tweets leaned one way. Then, the authors run the Meaning Extraction Method on these users tweets to find the themes. They perform a between-groups analysis and observe that anti-vaccination tweeters are anti-government, discuss the effects of organic food, and mention family-related words often, as compared to pro-vaccinations tweets which mentioned chronic health issues and technology more. They also found that converts to anti-vaccination were influenced by the hoax story on “#CDCWhistleBlower”.

Choudhary et al. conduct an interesting study on predicting depression via social media. In the same vein as the first paper, they look to analyze linguistic cues, as well as other behavioral traits, that precede a depression diagnosis. They create a gold standard corpus of tweets from users who were diagnosed with MDD using the CES-D^2 test. Based on user consent, they quantify an individual’s social media behavior for a year ahead of the date of diagnosed depression.  The authors find that individuals with depression show lowered social activity, greater negative emotion, increased medicinal concerns, and heightened expression of religious thoughts. The authors eventually build a classifier that can predict depression ahead of time with an accuracy of 70%.

Reflections

I found the content of both papers very engaging and socially relevant. In some sense, I expected anti-vaccination proponents to have similar beliefs about other conspiracy theories, as well as an angry and aggressive tone in their tweets. This was validated by the research. The paper would have even more engaging if the authors discussed the fourth class – users who became pro-vaccination, as the ultimate goal would be to encourage more users to get vaccinations and provide herd immunity. I suspect such an analysis would be useful to dispel other conspiracy theories as well. However, I had two concerns with the dataset:

  • The authors found 2.12M active-pro (373 users), 0.46M active-anti (70 users), and 0.85M joining-anti (223 users). These users are tweeting almost ~4 times a day. Is it likely some of them are bots?
  • The authors also assume that all users have the same time of inflection from pro-vaccination to anti-vaccination. I am not certain how valid the assumption will be.

Methodologically, the authors also use the Meaning Extraction Method (MEM) to extract topics. While MEM works well in their case, it would be nice to see their motivation to use a non-standard method when LDA or one of its variants might have worked too. Are there cases where MEM performs better?

I found their experiments in the second paper very well designed. It was nice to see the authors account for bias and noise on Amazon Mechanical Turk by (1) ignoring users who finished within two minutes and (2) using an auxiliary screening test. However, I took the CES-D^2 test and wasn’t quite sure how I felt about the results. I really liked the fact that they publish the depression lexicon (Table 3 in the paper), which showed what linguistic features correlate well with individuals with depression. However, I was concerned about the model’s recall value. The authors highlight the precision and accuracy. However, when it comes to predicting depression, having a high recall value is probably more important. We wouldn’t mind false positives as long as we were able to identify all people who were potentially depressed. Moreover, while the field of social science calls for interpretability, scenarios such as depression perhaps call for simply better models over interpretability. I was also surprised to find that ~36% of their users showed signs of depression. While it is certainly possible that the authors attempted to use a balanced dataset, the number seems on the higher side (especially when global depression percentages are  ~5%).

Questions

  1. Facebook recently came out with their depression predicting AI. Professor Munmun De Choudhary was quoted in a Mashable article – “I think academically sharing how the algorithm works, even if they don’t reveal every excruciating detail, would be really beneficial,” she says, “…because right now it’s really a black box.”
    Even if it weren’t a black box and details about the features were made available, does one expect them to be very different from their results?
  2. Ever since Russian meddling in the US elections has come out, people have realized the power of bots in influencing public opinion. I expect anti-vaccination campaigns were similarly propelled. Is there a way this can be confirmed? Are bots the way to change public opinion on Twitter / Facebook?

Read More

Reflection #8 – [02/20] – [Ashish Baghudana]

Bond, Robert M., et al. “A 61-million-person experiment in social influence and political mobilization.” Nature 489.7415 (2012): 295.
Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. “Experimental evidence of massive-scale emotional contagion through social networks.” Proceedings of the National Academy of Sciences 111.24 (2014): 8788-8790.

Summary – 1

This week’s paper readings focus on influence on online social networking platforms. Both papers come from researchers at Facebook (collaborating with Cornell University). The first paper by Bond et al. discusses how social messages on Facebook can encourage users to vote. They run large-scale controlled studies, divvying up their population into three segments – “social message” group, “informational message” group and a control group. In the “informational message” group, users were shown note telling them that “Today is election day” with a count of Facebook users who voted and a button “I voted“. In the “social message” group, users were additionally shown 6 randomly selected friends who had voted. In the control group, no message was shown.
The researchers discovered that users who received the social message were 2.08% more likely to click on the I voted button as compared to the informational message group. Additionally, they matched the user profiles with public voting records to measure real-world voting. They observed that the “social message” group was 0.39% more likely to vote than users who received no message at all or received the informational message. Finally, the authors measured the effect of close friends on influence and discovered that a user was 0.224% more likely to have voted if a close friend had voted.

Reflection – 1

Firstly, it is difficult to imagine how large 61 million really is! I was personally quite awed at the scale of experimentation and data collection through online social media and what they can tell us about human behavior.

This paper dealt with multiple issues and could have been a longer paper. An immediately interesting observation is that the effect of online appeals to vote is very small. In more traditional survey-based experiments, this increase would probably not have been noticeable. However, I found it odd that the “social message” group consisted of over 60 million people, however, the “informational message” and control group had only ~600,000. The imbalance seems uncharacteristic for an experiment of this scale and I am curious what the class thought about the distribution of the dataset.

Finally, I found the definition of the close friends arbitrary. The authors define close friends as users who were in the eightieth percentile or higher of frequency of interaction. This definition seems engineered to retrofit the observation that 98% of users had at least one close friend and an average of 10 close friends.

Summary – 2

The 2014 paper on emotional contagion on social networks is mired in controversy. The researchers ask the question – do emotional states spread through social networks? Quoting verbatim from the paper:

The experiment manipulated the extent to which people (N = 689,003) were expose to emotional expressions in their News Feed.

The researchers adapt LIWC to their News Feed algorithm to filter out positive or negative content. They find that when negativity was reduced, users posted more positive content, and consequently when positivity was reduced, the users post less positive content.

Reflection – 2

While the experiment itself is within the realms of Facebook’s acceptable data use policy, there are several signs in the Editorial Expression of Concern and Correction that this experiment may not pass the Institutional Review Board. However, if these two experiments are deemed ethical, they are examples of great experiment design.

Neither paper builds any model. They rely on showing correlation between their dependent and independent variables. As with the previous paper, the effects are small. But applied to large population, even small percentage increases are large enough to take note of.

Questions

  1. I personally find it quite scary neither study can be done by a university and explicitly needs access from Facebook (or any other social media) for that matter. The consolidation of social media analytics capability in the hands of a few may not bode well for ordinary citizens. How can academic research make this data more available and reachable?
  2. The first paper lays basis for how societies can be influenced online. Can this be used to target only a small section of users? Can this also be used to identify groups that are under-represented and vocalize their opinions?

Read More

Reflection #7 – [02/13] – Ashish Baghudana

Niculae, Vlad, et al. “Linguistic harbingers of betrayal: A case study on an online strategy game.” arXiv preprint arXiv:1506.04744 (2015).

In very much the same vein as The Language that Gets People to Give: Phrases that Predict Success on Kickstarter by Mitra et al., the authors in this paper look for linguistic cues that foretell betrayal in relationships. Their research focuses on the online game Diplomacy that is set in the pre-World War 1 era. An important aspect of this paper is understanding the game and its intricacies. Each player chooses a country, forms alliances with other players, and tries to win the game by capturing different territories in Europe. Central to the game are these alliances and betrayals, and the conversations that happen when a player becomes disloyal to a friend.

The paper uses draws on prior research work in extracting politeness, sentiment, and linguistic cues for several of its features, and it was instructive to see the uses of some of these social computing tools in their research.

The authors find that there are subtle signs that predict betrayal, namely:

  1. An imbalance of positive sentiment before the betrayal, where the betrayer uses more positive sentiment;
  2. Less argumentation and discourse from the betrayer;
  3. Less planning markers in the betrayer’s language;
  4. More polite behavior from the betrayer; and
  5. An imbalance in the number of messages exchanged

Intuitively, I can relate to observations #2, #3, and #5. However, positive sentiment and polite behavior would perhaps not indicate betrayal in an offline context. I do wish that these results were explained better and more examples given to indicate why they made sense.

I also felt that the machine learning model to predict betrayal could have been described better. I could not immediately understand their feature extraction mechanism — were linguistic cues used as binary features or count features? Assuming it wasn’t a thin-slicing study and they used count features, did they normalize the counts over the number of times two players spoke? Additionally, they compared the performance of their model against the players (who were never able to predict a betrayal, i.e. their accuracy was 0%). While 0% -> 57% seems like a big jump, the machine learning model could have predicted at random and still obtained a 50% accuracy rate. This begs the question of how accurate the model really is and what features it found important.

Papers in computational social science often need to define (otherwise abstract) social constructs precisely, and quantitatively. Niculae et al. attempt to define friendships, alliances, and betrayals in this paper. While I like and agree with their definitions with respect to the game, it is important to recognize that these definitions are not necessarily generalizable. The paper studies a small subset of relationships online. I would be interested in seeing how this could be replicated for more offline contexts.

Read More

Reflection #5 – [02/06] – [Ashish Baghudana]

Garrett, R. Kelly. “Echo chambers online?: Politically motivated selective exposure among Internet news users.” Journal of Computer-Mediated Communication 14.2 (2009): 265-285.
Bakshy, Eytan, Solomon Messing, and Lada A. Adamic. “Exposure to ideologically diverse news and opinion on Facebook.” Science 348.6239 (2015): 1130-1132.

Summary

The central theme in both these papers talks about echo chambers on the Internet, and specifically, social media websites, such as Facebook. The fundamental premise that the authors build on is that – likes attract, opposites repel. Garrett mentions selective pressure as the main driver behind people’s news choices on the Internet. Algorithms in social media websites and recommender systems suggest similar content to users that drives them towards an extreme, i.e. opinions are reinforced and differing opinions are not read or recommended.

However, the papers differ in their methods. Garrett et al. conduct a small and specific user behavior-tracking study (N = 727), with users recruited from readers of two news websites – AlterNet (left-leaning) and WorldNetDaily (right-leaning). Since readers already visit these websites, ground truths about their political affiliation are assumed. Once these users sign up for the study,  Bakshy et al. perform similar analysis on a significantly larger scale of 10.1 million active users of Facebook who self-report their political affiliations. Their evaluation involves sophisticated data collection involving ~3.8 billion potential exposures, 903 million exposures and 59 million clicks. The paper by Bakshy et. al is very dense in content and I refered to [1, 2] for more explanation.

Both papers conclude by confirming our suspicions about content preference amongst users – they spend more time reading opinion-reinforcing articles over opinion-challenging opinions.

Reflections

Kelly Garrett’s paper, though published in 2009, uses data collected in 2005. This was a time before global social media websites like Facebook and Twitter were prevalent. At the time, the author chose the best means to generate ground truth by looking at left-leaning and right-leaning websites. However, this mechanism of classification feels naïve. It is possible that certain users merely stumbled upon the website or participated in the survey for the reward. Equally importantly, the sample is not truly reflective of the American population, as a vast majority may look for news from a more unbiased source.

One of the undercurrents of the “Echo chambers online?” paper is the effect of Internet in making these biases profound. However, the study does not speak or attempt to measure users’ preferences before the advent of Internet. Would the same citizenry buy newspapers that were partisan or is this behavior reflective only of news on the Internet?

Bakshy et al.’s paper is considerably more recent (2015). While it is evaluating many of the same questions as Garrett’s paper, the methodology and mechanism are fundamentally different, as is the time period. Therefore, comparing the two papers feels a little unfair. Facebook and Twitter are social platforms, and in that sense, very different from news websites. These are platforms where you do not fully choose the content you want to see. The content served to you is an amalgamation of that shared by your friends, and a ranking algorithm. However, a distinction must be made between a website like Facebook and one like Twitter. The authors themselves highlight an important point:

Facebook ties primarily reflect many different offline social contexts: school, family, social activities, and work, which have been found to be fertile ground for fostering cross-cutting social ties.

Therefore, it is substantially more possible to interact with an opinion-challenging article. However, interaction is sometimes poorly defined because there is no real way of knowing if a user merely looked at the article’s summary without clicking on it. Hence, tracking exposure can be tricky and an avenue for further research.

Questions

  1. Almost 10 years later, especially after the wave of nationalism across the world, is there more polarization of opinion on the Internet?
  2. Is polarization an Internet phenomenon or are we measuring it just because most content is now served digitally? Was this true back in 2005?
  3. Can and should recommendation algorithms have individual settings to allow users to modify their feed and allow more diverse content?

References

[1] https://solomonmessing.wordpress.com/2015/05/24/exposure-to-ideologically-diverse-news-and-opinion-future-research/

[2] https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/LDJ7MS

Read More

Reflection #4 – [1/29] – Ashish Baghudana

Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.” Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 2013.

Reflection

In this paper, the authors pose two interesting research questions – can political misperceptions and inaccuracies be corrected by a fact-checker, and if so, does real-time correction work better than delayed correction on users? The authors run an experiment where they first read an accurate post about EHRs followed by an inaccurate post from a political blog. The readers were divided into three groups where they were presented with corrections in the inaccurate post:

  • immediately after reading a false post
  • after a distraction activity
  • never

Garrett et al. report that corrections can be effective, even on politically charged topics. Based on the questionnaire at the end of the experiment, the authors concluded that users who were presented with corrections were more accurate on their knowledge on EHRs in general. Specifically, immediate-correction users were more accurate than delayed-correction. However, immediate correction also accentuated the attitudinal bias of these users. People who viewed the issue negatively had an increase in resistance to correction.

This paper is unlike any of the papers we have read in this class till now. In many senses, I feel this paper deals entirely with psychology. While it is applicable to computer scientists in designing fact-checking tools, it has more far-reaching effects. The authors created separate material for each group in their experiment and physically administered the experiment for each of their users. This research paper is a demonstration of meticulous planning and execution.

An immediate question from the paper is – would this experiment be possible using Amazon Mechanical Turk (mTurk)? This would have helped collect more data easily. It would also enable the authors to run multiple experiments with different cases – i.e. more contentious issues than EHRs. The authors mention that second (factually incorrect) article was associated with a popular political blog. If the political blog was right-leaning or left-leaning and this was known to the users, did it affect their ratings in the questionnaire? The authors could have kept an intermediate survey (after stage 1) to understand their prior biases.

A limitation that the authors mention is that of reinforcement of corrections. Unfortunately, running experiments involving humans is a massive exercise, and it would be difficult to repeat this several times. Another issue with these experiments is that users are likely to treat the questionnaire as a memory test and answer based on that, rather than their true beliefs. I also had a contention with the racial diversity in the sample population. The population is majorly white (~86%).

This study can be extended to study the correlation between party affiliation and political views with the willingness of the user for correction. Are certain groups of people more prone to incorrect beliefs?

Read More

Reflection #3 – [1/25] – Ashish Baghudana

Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” ICWSM. 2015.

Summary

In this paper, Cheng et al. attempt to characterize anti-social behavior in three online communities – CNN, IGN, and Breitbart. The study address three major questions:

  • Is anti-social behavior innate or dependent on the community influences?
  • Does the community help in improving behavior or worsen it?
  • Can anti-social users be identified early on?

They find that the banned users were less readable, get more replies and concentrate on fewer threads. The authors also find that communities affect writing styles over time. If a user’s posts are unfairly deleted, their writing quality is likely to decrease over time. Subsequently, the authors characterized the banned users based on their deleted posts and found the distribution to be bimodal. An interesting characteristic of banned users is that the frequency of their posts is much higher than that of non-banned users. Finally, the authors build a classifier to predict if a user might be banned based on their first 10 posts and report an accuracy of 0.8.

Reflection

The paper comes at a time when cyberbullying, harassment and trolling are at their peak on the Internet. I found their research methodology very didactic – to effectively summarize and divvy up 1.7 million users and 40 million posts. It is also interesting to read into their use of Amazon Mechanical Turk to generate text quality scores, especially because this metric does not exist in the NLP sphere.

At several instances in the paper, I found myself asking the question what kinds of anti-social behavior do people exhibit online? While the paper focused on the users involved, and characteristics of their posts that made them undesirable on such online communities, it would have been much more informative had the authors also focused on the posts itself. Topic modeling (LDA) or text clustering would have been a great way of analyzing why they were banned. Many of the elements of anti-social behavior discussed in the paper would hold true for bots and spam.

Another fascinating aspect that the paper only briefly touched upon was the community effect. The authors chose the three discussion communities very effectively – CNN (left of center), Breitbart (right of center) and IGN (video games). Analyzing the posts of the banned users on each of these communities might indicate community bias and allow us to ask questions such as are liberal views generally banned on Breitbart?

The third set of actors on this stage (the first two being banned users and the community) are the moderators. Since the final decision of banning a user rests with the moderators, it would be interesting to ask the question what kind of biases do the moderators display? Is their behavior erratic or does it follow a trend?

One tiny complaint I had with the paper was their visualizations. I often found myself squinting to be able to read the graphs!

Questions

  • How could one study the posts themselves rather than the users?
    • This would help understand anti-social behavior holistically, and not just from the perspective of non-banned users
  • Is inflammatory language a key contributor to banning certain users, or are users banned even for disagreeing with long-standing community beliefs?
  • Do the banned posts on different communities exhibit distinct features or are they generalizable for all communities?

Read More