Reflection #13 – [04/10] – [Meghendra Singh]

April 10, 2018 meghs Leave a comment

Hu, Nan, Ling Liu, and Jie Jennifer Zhang. “Do online reviews affect product sales? The role of reviewer characteristics and temporal effects.” Information Technology and Management9.3 (2008): 201-214.

The paper discusses an interesting study that tries to link customer reviews with product sales on Amazon.com, while taking into account the temporal nature of customer reviews. The authors present evidence for the fact that the market understands the difference between favorable and unfavorable news (reviews), therefore bad news (review) would lead to lower sales and vice versa. The authors also find that consumers pay attention to reviewer characteristics and like reputation and exposure. Hence, the hypothesis that higher quality reviewers will drive the sales of a product on e-commerce avenues. The authors use Wilcoxon Z-statistic and multiple regression to support their findings.

First, it would be interesting to see if a review coming from a reviewer who has actually purchased the product (a “verified purchase”) has more impact on the product sales than one coming from someone who hasn’t purchased the product. This is one of the things I look for when using Amazon.com. Another aspect is that products like Books, DVDs and Videos are neither consumables nor necessities. What I mean is that, people can have very particular tastes when it comes to what they read and watch. For example, Alice might be a big fan of Sci-Fi movies while Bob might like Drama more. In my opinion the sales for these products would depend more on what the distribution of these “genre-preferences” are like in the market (i.e. people who have the time and money to relish these products). It would be interesting to re-do this study with a greater variety of product categories. I feel that reviews would play a much bigger role when it comes to the sales of products like, consumables (for e.g., groceries, cosmetics, food) and necessities (for e.g., bags, umbrellas, electronics) because almost everyone “needs” these products and trusted online reviews would act as a signal for the quality of these products. It would be interesting to verify this intuition.

Second, given the authors mention bounded rationality and opportunism, I feel that when buying a product on Amazon.com, it is very unlikely that I would look for the “reputation/quality” of the reviewer before making a purchase. Again, what would matter the most for me is if the review is coming from an actual customer (a verified purchase). Additionally, the amount of time I would spend researching a product and digging into it’s reviews is directly proportional to the cost of the product. Also, the availability of discounts, free shipping, etc. can greatly bias my purchase decisions. I am not sure whether classifying reviewers as high/low quality and products as high/low-coverage based on the median of the sample is a good idea. What happens to those who lie on the median? Why choose the median? Why not the mean? Moreover, it would be interesting to see the distribution of sales-rank over the three product categories. Do these distributions follow a power law?

In summary, I enjoyed reading the paper and feel that it was very novel for 2008 and as the authors mention, the work can be extended in various ways now that we have more data, computing power and analysis techniques.

Reflection #12 – [04/05] – [Meghendra Singh]

April 5, 2018 meghs Leave a comment

Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.” arXiv preprint arXiv:1708.00524(2017).
Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.” Multimedia tools and applications 76.8 (2017): 10653-10676.

The first paper presents DeepMoji, an emoji prediction deep neural network, trained using 1.2 billion tweets. I found the paper and the DeepMoji demo available at https://deepmoji.mit.edu/ very compelling and fascinating. The key contribution of this work was to show how emoji occurrences in tweets can be used to learn richer emotional representation of text. To this end the authors construct a deep neural network model (DeepMoji) using an embedding layer, two bidirectional LSTM layers, Bahdanau attention mechanism and a Softmax classifier at the end. The authors detail the pretraining process, three transfer learning approaches (full, last and chain-thaw) and evaluate the models obtained on 3 NLP tasks on 8 benchmark datasets across 5 domains. Results of the evaluation suggest that the model trained using the chain-thaw transfer learning procedure beats the state of the art on all the benchmark datasets.

I am not really sure how upsampling works and the authors do not discuss the upsampling technique used in the paper. Also, it would have been interesting to know if the authors experimented with different architectures of the network before finalizing on the one presented here. How did they arrive at this particular architecture? Additionally, will increasing the number of BiLSTM layers improve performance on the benchmark datasets and will this change in architecture be comparable with the chain-thaw transfer learning technique are questions that can be explored. Moreover, since tweet length is limited to 280 characters, it is not possible to analyze longer texts with high confidence using this technique, unless the study is repeated on a dataset with longer texts mapped to specific emojis. It might be difficult to replicate this study for languages other than English and Mandarin. This is because large twitter/weibo-like data sources that contain distant supervision labels in the form of emojis, may not exist for other languages. Therefor, it will be interesting to see what other distant supervision techniques can be used to predict emotional labels for texts on social media in other languages.

In table 2, we see that most of the emojis on Twitter are positive (laughter, sarcasm, love) and the negative emojis (sad face, crying face, heartbreak), I wonder if the same trend would be observed on other social media websites. Nevertheless, given the proliferation of emojis in computer mediated communication, it would be interesting to repeat this study with data like: facebook posts, comments, posts and comments on any social website. Additionally, as one can use this approach to effectively determine various emotions that are associated with any text at a very granular level, this approach can be used to filter content/news for a user. For example, if a user only wants to read content that is optimistic and cheerful, this approach can filter out all the content that does not fall in that bucket. One can also think of using this approach to detect the psychological state of an author. It might be interesting to see if the emotional content of an author’s posts remains consistently pessimistic does that predict clinical conditions like: depression, anxiety or self-harm events?

This brings us to the second paper which analyzes 38K posts in 24 Live Journal communities to discover the psycholinguistic features and content topics present in online communities discussing about Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement and Suicide. The study generated 68 psycholinguistic features using LIWC and 50 topics using LDA for text from 5K posts (title and content text). The authors subsequently use these topics and LIWC features as predictors with LASSO regression, for the 5 subgroups of communities interested in the 5 disorders/conditions (Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement and Suicide). The authors find that latent topics had greater predictive power than linguistic features for bipolar disorder, grief/bereavement communities, and self-harm subgroups. The most interesting fact for me was that help-seeking was not a topic in any of the subgroups and only the Bipolar Disorder subgroup discussed treatments. This seems very strange for communities dedicated to discussion for psychological illness.

It would be interesting to repeat this experiment and see if the results remain consistent. I say this because the authors do a random sampling of 5K posts and this may have missed certain topics, LIWC categories. It would also be interesting to know the statistics about lengths of these posts and whether this was taken into consideration when sampling the posts? Another aspect to point out is that the Bipolar Disorder subgroup had a larger number of communities (7 out of 24) did this somehow effect the diversity of topics extracted? Perhaps it might be a good idea to use all the posts from the 24 communities? We also see that Lasso outperformed the other three classifiers and it would be interesting to see if ensemble classifiers would outperform Lasso? Overall the second paper was an excellent read and presented some very interesting results.

Reflection #11 – [03-27] – [Meghendra Singh]

March 27, 2018March 27, 2018 meghs Leave a comment

King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science6199 (2014): 1251722.
Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.” ICWSM. 2015.

The first paper presents a large-scale experimental study of Chinese social media censorship. The authors created accounts on multiple social media sites and submitted various texts, while observing which texts get posted and which get censored. The authors also interviewed employees of a bulletin board software company and other anonymous sources to get a first hand account of the various strategies used by social media websites to censor certain content. This approach is analogous to reverse engineering the censorship system; hence the title of the paper is appropriate. The key hypothesis that this study tries to prove is that of collective action potential, i.e. the target of censorship is people who join together to express themselves collectively, stimulated by someone other than the government, and seem to have the potential to generate collective action in the real-world [How censorship in China allows government criticism but silences collective expression.].

Overall, I find the paper to be an interesting read and Figure 1 gives a nice overview of the various paths a social media post can take on Chinese discussion forums. The authors find that most social media websites used hand-curated keyword matching for automatic review of user posted content. The most interesting fact was that large Chinese social media firms will be hiring 50, 000 to 75, 000 human censors and Chinese Communist Party’s propaganda department, major Chinese news websites and commercial corporations had collectively employed two million “public opinion analysts” (professionals policing public opinion online) as early as 2013 [1]. This implies that for every 309 Internet users in China there was one human censor (There were approximately 618 million Internet users in China in 2013) [2]. With regards to the histogram presented in Figure 4, other than the reasons presented in the paper for the high number of automated reviews on government websites, it may be the case that these websites might be getting a lot more posts then private websites. I believe a large number of posts, would lead to a greater number of posts being selected for automatic review. Additionally, if a person has an issue with a government policy or law, trying to publish their disagreement on a government forum might seem more appropriate to them. Now, given the fact that phrases like: change the law (变法) and disagree (不同意) are blocked from being posted, even on Chinese social media sites, I believe any post showing concern or disagreement with a government policy or law on a government website will be highly likely to be reviewed. Moreover, given the long (power-law like) tailed nature of Chinese social media (as shown in the pie chart below from [King et. al. 2013]), I feel majority of the small private social media websites would be acting as niche communities (e.g., food enthusiasts, fashion, technology, games) and it is unlikely that individuals would post politically sensitive content on such communities.

The second paper discusses an interesting approach to evade censorship mechanisms on Sina Weibo (A popular Chinese microblogging website). The authors cite the decision tree of Chinese censorship from the first paper and highlight the fact that homophone substitution can be used to evade keyword based automatic review and censorship mechanisms. The paper details a non-deterministic algorithm that can generate homophones for sensitive keywords that maybe used to filter microblogs (weibos) for review by censors. The authors prove that the homophone transformation does not lead to a significant change in the interpretability of the post by conducting Mechanical Turk, Human Intelligence Task experiments. The key idea here is that if the censors try to counter the homophone transformation approach by adding all homophones for all blocked keywords to the blocked keyword list, they would end up censoring as much as 20% of the daily posts on Sina Weibo. This would be detrimental for the website as this implies loosing a significant amount of daily post and users (if the users are banned for posting the content). The authors suggest that the only approach which would work to censor homophone transformed posts, while not sabotaging the websites daily traffic would be to employ human censors. This would impose 15 additional human-hours per day worth of effort on the censors for each banned word, which is substantial as there are thousands of banned words.

In Experiment 1, the authors stopped checking status of posts after 48 hours, a question I have is that do all posts ultimately get read by some human censor? If this is the case, is there a justification for the 48-hour threshold to consider a post as uncensored? As the authors suggest in the study limitations, posts by established accounts (specially those having a lot of followers) might be scrutinized (or prioritized for review/censorship) more. It would be interesting to see if there exists a correlation between the number of followers an account has and the time at which their sensitive posts get deleted.

Furthermore, in the results for Experiment 1, the authors specify that there is a statistically significant difference between the publishing rate of the original and transformed posts, in terms of raw numbers, we don’t see a huge difference between the number of original (552) and transformed (576) posts that got published. It would be interesting to repeat Experiment a couple of times to see if these results remain consistent. Additionally, I feel we might be able to apply a Generative adversarial network (GAN) here, with a generator generating different transformations of an original “sensitive” weibo which have high interpretability although can fool the discriminator, the discriminator would act like a censor and decide whether or not the generated weibo should be deleted. Although, I am not sure about the exact architecture of the networks or the availability of sufficient training data for this approach.

Addendum: An interesting list of terms blocked from being posted on Weibo.

Reflection #10 – [03/22] – [Meghendra Singh]

March 22, 2018 meghs Leave a comment

Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.
Lee, Kyumin, James Caverlee, and Steve Webb. “Uncovering social spammers: social honeypots+ machine learning.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.

In the first paper, Kumar et. al. study “sockpuppets” in nine discussion communities (a majority of these are news websites). The study shows that sockpuppets behave differently than ordinary users, and these differences can be used to identify them. Sockpuppet user accounts have specific posting behavior, they don’t usually start discussions, their posts are generally short and contain certain linguistic markers (For e.g., greater use of personal pronouns such as “I”). The authors begin by automatically labeling 3665 users as sockpuppets in the 9 discussion communities using their IP addresses and user session data. The authors identify two types of sockpuppets: pretenders (pretending to be legitimate users) and non-pretenders (easily identifiable as fake users, i.e. sockpuppets, by the discussion community). The two main results of the paper are classification of sockpuppet user pairs from ordinary user pairs (ROC AUC=0.91) and predicting if an individual user is a sockpuppet (ROC AUC=0.68), using activity, community and linguistic (post) features. The paper does a good job of explaining the behavior of sockpuppets in the comments section of articles in typical news websites and how these behaviors can be used to detect sockpuppets and thereby lead to maintaining healthy and unbiased online discussion communities. The paper references a lot of prior work and I really appreciate the fact that most of the decisions about features, parameters and other assumptions made in the study, are grounded in past literature. While reading the paper, a fundamental question that came to my mind was, if we can already identify a sockpuppets using IP addresses and temporal features of their comments, what is the point of using predictive modeling to differentiate sockpuppets from ordinary users? In essence, if we already have a high precision, rule-based approach to detect sockpuppets why rely on predictive modeling that performs a little better than random chance (ROC AUC=0.68)?

I found the sockpuppet-ordinary user conversation example at the end of section 2 really funny, and I feel that the first comment itself is rather suspicious. This example also seems to indicate that the puppetmaster (S2) is the author of the article on which these comments are being posted. This leads to the question that given a puppetmaster has multiple sockpuppet accounts will their main account be considered an ordinary user? If not, does this mean that some of the articles themselves are being written by sockpuppets? A research question in this context can be: “detecting news articles written by sockpuppets in popular news websites”. Another question I had was why did the authors use cosine similarity between feature vectors of users? And what are the statistics for this metric (mean and standard deviation of cosine similarities between sockpuppet and ordinary user feature vectors). Additionally, is there a possibility of using a bag of words model here, instead of numeric features like LIWC and ARI computed from user’s posts? Moreover, there is a potential to experiment with other classification techniques here and see if they can perform better than Random Forest.

Lastly, as the authors suggest in discussion and conclusion, it would be interesting to repeat this experiment on big social platforms like Facebook and Twitter. This becomes really important in today’s world, where online social communities are rife with armies of sockpuppets, spambots and astroturfers, hell-bent on manipulating public opinion, scamming innocent users and enforcing censorship.

The second paper by Lee et. al. addresses a related problem of detecting Spammers on MySpace and Twitter using Social Honeypots and classifiers. The study presents an elegant infrastructure for capturing potential spammer profiles, extracting features from these profiles and training popular classifiers for detecting spammers with high accuracy and low FPR. The most interesting finding for me were the most discriminative features (i.e., About Me Text and Number of URLs per tweet) for classifying spammers from legitimate users and the fact that ensemble classifiers (Decorate, etc.) performed the best. Given that deep learning was not really popular in 2010, it would be interesting to apply state of the art deep learning technique for the classification problem discussed in this paper. As we have already seen that the discriminative features that separate spammers from regular users vary from one platform/domain to other, it would be interesting to see if there exist common cross-platform, cross-domain (universal) features that are equivalently discriminative. Although, MySpace may not be dead, it would be interesting to redo this study on Instagram which is a lot more popular now, and has a very real spammer problem. Based on personal experience, I have observed legitimate users on Instagram becoming spammers once they have enough followers. Will a social honeypot based approach work for detecting such users? Another challenge with detecting spam (or spammers) on a platform like Instagram is that most of the spam is in the form of stories (posts which automatically disappear in 24 hours), while the profiles may look completely ordinary.

Reflection #8 – [02/20] – [Meghendra Singh]

February 19, 2018 meghs Leave a comment

Bond, Robert M., et al. “A 61-million-person experiment in social influence and political mobilization.” Nature 489.7415 (2012): 295.
Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. “Experimental evidence of massive-scale emotional contagion through social networks.” Proceedings of the National Academy of Sciences 111.24 (2014): 8788-8790.

Both the papers provide interesting analysis of user generated data on Facebook. As far as I remember, the key idea behind the first paper was briefly discussed in one of the early lectures. While, there might be some ethical concerns regarding the data collection, usage and human subject consent in both the studies, I find the papers to be very relevant and thought provoking in today’s world where social media is more or less an indispensable part of everyone’s lives. The first paper by Bond et. al. discusses a randomized controlled trial of political mobilization messages on 61 million Facebook users during the 2010 U.S. congressional elections. The experiment showed an ‘informational’ or ‘social’ message at the top of the news feed of Facebook users in the U.S. (18 years of age and above) as shown in the image below.

Approximately 60 million users where shown the social message, 600 K users were shown the social message and 600 K users were not shown any message adding up to the ‘61-million-person’ sample advertised in the title. The key finding of this experiment was that messages on social media directly influenced political self-expression, information seeking and real-world voting behavior of people (at least those on Facebook). Additionally, ‘close-friends’ in a social network (a.k.a. strong ties) are responsible for the transmission of self-expression, information seeking and real-world voting behavior. In essence, strong ties play a more significant role in spreading online and real-world behavior as compared to ‘weak ties’, in online social networks. Next, I summarize my thoughts on this article.

The authors find that users who received the social message (instead of the plain informational message) where 2.08% more likely to click on the ‘I Voted’ button. This seems to suggest a causality between the presence of images of friends who pushed the ‘I Voted’ button and the user’s decision to push the ‘I Voted’ button. I am not convinced with this suggestion because of the huge difference in the sample size of the social and informational message groups. I believe online social networks are complex systems and spread of behaviors (contagions) in such systems is a non-linear and emergent phenomenon. I feel that ignoring the differences between the two samples (in terms of network size and structure) is a little unreasonable while making such comparisons at the gross level. I feel this particular result will be more convincing if the two samples were relatively similar and the findings were consistent for repeated experiments. Another interesting analysis could be to look at, which demographic segments are influenced more by the social messages as compared to the informational messages. Is the effect, reversed for certain segments of the user population? Lastly, approximately 12.8% of the 2.1 billion user accounts on Facebook are either fake or duplicate. It would be interesting to see how these accounts would affect the results published in this article.

The second article by Kramer et. al. suggests that emotions can spread similar to contagions, from one user to another in online social networks. The article presents an experiment wherein the amount of positive and negative posts in the News Feed of Facebook users was artificially reduced by 10%. The key observation was that, when positive posts were reduced the amount of positive words in the affected user’s status updates decreased. Similarly, when negative posts were reduced the amount of negative words in the affected user’s status updates decreased. I think this result suggests that people innately reciprocate the emotions they experience (even in the absence of nonverbal cues) acting like feedback loops. I feel that the weeklong study described in the article is somewhat insufficient to support the results. It might also be more convincing if the experiment was repeated and the observations remained consistent each time. Another thing that I feel is missing in the article is statistics about the affected users status updates, i.e. what was the mean, std. dev. of the number status updates posted by the users. Additionally, it is important to know if the users posted status updates only ‘after’ reading their News Feeds? And if this ‘temporal’ information is captured in the data at all? Based on my limited observations on Facebook status updates, I feel most of the time they relate to the daily experiences of the user. For example, visit to a restaurant, a promotion, successful defense, holidays or trips. I feel it’s very important that we avoid ‘Apophenia’ when it comes to this kind of research. Also, it is unclear to me why the authors have used Poisson regression here and what is the response variable?

Reflection #7 – 02/12 Meghendra Singh

February 12, 2018 meghs Leave a comment

Niculae, Vlad, et al. “Linguistic harbingers of betrayal: A case study on an online strategy game.” arXiv preprint arXiv:1506.04744 (2015).

The paper discusses a very interesting research question, that of friendships, alliances and betrayals. The key idea here is that between a pair of allies, conversational attributes like: positive sentiment, politeness and focus on future planning can foretell the fate of the alliance (i.e. if one of the allies will betray the other). Niculae et. al. analyze 145K messages between players, from 249 online games of “Diplomacy” (a war-themed strategy game) and trained two classifiers to classify betrayals from lasting friendships and seasons preceding the last friendly interaction from older seasons respectively.

Niculae et. al. do a good job of defining the problem in the context of Diplomacy. Specifically, the “in-game” aspects of movement, support, diplomacy, orders, battles, acts of friendships and hostilities. I feel that unlike real world, a game environment leads to a very clear and unambiguous definition of betrayal and alliance. While this makes it easier to apply computational tools like machine learning for making predictions in such environments, the developed approach might not readily applicable to real world scenarios. While talking about relationship stability in “Diplomacy” the authors point to the fact that the probability of a friendship dissolving into enmity is about five times greater than hostile players becoming friends. I feel this statistic is very much context dependent and might not relate to similar real world scenarios. Additionally, there seems to be an implicit “in-game” advantage for deception and betrayal (“solo victories” being more prestigious than “team victories”). The technique described in the paper only uses linguistic cues within dyads to predict betrayal, however there might be many other aspects leading to a betrayal. Although difficult, it might be interesting to see if the deceiving player is actually being influenced by another player outside the dyad (maybe by observing the betrayer’s communication with other players?). Also there might be other reasons to betray “in-game”. For example, one of the allies becoming to powerful (maybe the fear of a powerful ally taking over a weak ally’s territory might make the weak ally betray). The point being, only looking at player communication might not be a sufficient signal for detecting betrayal, more so in the real world.

Also, there can be many other aspects associated with communication in the physical world like: body language, facial expressions, gestures, eye contact, tone of voice. These verbal and non-verbal cues are seldom captured in computer mediated textual communication, although they might play a big role in decision making and acts of friendship as well as betrayal. I feel it would be really interesting if the study can be repeated for some cooperative game that supports audio/video communication between players instead of only text. Also, I believe the “clock” of the game, i.e. the time taken to finish one season, and making decisions is very different from the real world. The game might afford the players a lot of time to deliberate and choose their actions. In real world, one may not have this privilege?

Additionally, the accuracy of the logistic regression based classifier discusses in section 4.3 is only 57% (5% higher than chance) and I feel this might be because of under-fitting, hence it might be interesting to explore other machine learning techniques for classifying betrayals using linguistic features. While, the study tries to address a very important and appealing research question, I feel it is quite difficult to predict lasting friendships, eventual separations and unforeseen betrayals (even in a controlled virtual game), principally because of the inherent human irrationality and strokes of serendipity.

Reflection #6 – [02/08] – Meghendra Singh

February 8, 2018 meghs Leave a comment

Danescu-Niculescu-Mizil, Cristian, et al. “A computational approach to politeness with application to social factors.” arXiv preprint arXiv:1306.6078 (2013).

In this paper, the authors explore, existence of a relationship between politeness and social power. To establish this relationship, the paper uses data from two online communities (Wikipedia and Stack Exchange). These are requests directed at owners of talk-pages on Wikipedia and those directed towards authors of posts on Stack Exchange. The authors began by labeling 10,957 requests from the two data sources using Amazon Mechanical Turk, thereby creating the largest corpus of politeness annotations. Next, the authors detail 20 domain-independent lexical and syntactic features (or politeness strategies), grounded in politeness literature. The authors subsequently develop, two SVM based classifiers: ‘BOW’ and ‘Ling.’ for classifying polite and impolite requests (“ Did the authors forget to write about the ‘Alley’ classifier? 🙂 ”). The Bag of words (BOW) classifier using unigram features in the training data (i.e. labeled Wikipedia and Stack Exchange requests data) and served as a baseline. On the other hand, the linguistically informed (Ling.) classifier used the 20 linguistic features along with the unigrams and improved the baseline accuracy by 3-4%. In order to address the main research question of change in politeness, with change in social power the authors compare the politeness levels of requests made by Wikipedia editors, before and after they become administrators (i.e. before and after elections). The key finding is that the politeness score of requests by editors who successfully become administrators after public elections dropped, whereas the same increased for unsuccessful editors, as shown in the figure below.

Additionally, the authors present other interesting findings like: Question-askers are politer than Answer-givers, politeness of requests reduces as reputation increases on Stack Exchange, Wikipedians from the U.S. Midwest are the politest, female Wikipedians are generally more polite and there is significant variance in politeness of of requests in the programming language communities (0.47 for Python to 0.59 for Ruby).

The first question that came to my mind while reading this paper was, there may be other behaviors and traits that may be associated with requests and responses (or writing in general). For example, compassion, persuasiveness, verbosity/succinctness, quality of language. A nice follow-up to this work might be to re-run the study for these other qualities, maybe even using other datasets (say Q&A sites like: Quora, Answers.com, Yahoo Answers?). I do feel that there isn’t a huge difference between the politeness scores for successful versus failed Wikipedia administrator candidates especially after the election. I encountered this old (but interesting) paper which investigates politeness in written persuasion by examining a set of letters written by academics at different ranks in support of a colleague who had been denied promotion and tenure at a major state university in the U.S. One of the key findings of the study was that, “the formulation of a request is conditioned by the relative power of the participators”. The following plot from the paper, shows the relative politeness in letters of request by academics at different ranks.

This seems to suggest a different result when compared with those presented in Danescu-Niculescu-Mizil et al. We can clearly see that in this particular case the politeness in written requests generally increases with increase in academic rank. Maybe politeness has more contextual underpinnings that need to be researched. Also, this more recent paper in social psychology links, politeness with conservatism, whereas compassion with political liberalism. As, Danescu-Niculescu-Mizil et al. specify that “Wikipedians from U.S. Midwest are the most polite”, it would be interesting to validate such relationship between behaviors (like, politeness) and political attitudes established by prior social science literature. Some more questions one might ask here are: How polite are the responses of Wikipedia editors versus administrators? Do polite requests generally get polite responses? Are people more likely to respond to a polite request instead of a not so polite request? On the technical side, it might be interesting to experiment with other models for classification, say Random Forests, Neural Nets, Logistic Regression. Also, techniques for improving model performance like: bagging, boosting, k-fold cross validation might be interesting avenues of exploration. It may also be interesting to determine the politeness/impoliteness of general written text (say news articles, editorials, reviews, critique, social media posts) and examine how this affects the responses to these articles (shares, likes and comments).

Reflection #5 – [02/06] – [Meghendra Singh]

February 4, 2018 meghs Leave a comment

Garrett, R. Kelly. “Echo chambers online?: Politically motivated selective exposure among Internet news users.” Journal of Computer-Mediated Communication2 (2009): 265-285.
Bakshy, Eytan, Solomon Messing, and Lada A. Adamic. “Exposure to ideologically diverse news and opinion on Facebook.” Science6239 (2015): 1130-1132.

Both the papers discuss about online “echo chambers” or communities/groups/forums on the Internet, that are devoid of differing viewpoints, i.e. places where individuals are exposed only to information from like-minded people. The second paper also talks about “filter bubbles” or the behavior of content-delivery services/algorithms to only deliver or recommend content to users based on their viewing history. Both of these issues are important as they can give rise to fragmented, opinionated and polarized citizenry. While Garrett’s paper mostly focused on the analysis of behavior-tracking data collected from the readers of 2 partisan online news websites, Bakshy et. al. analyzed de-identified, social news sharing data of 10.1 million Facebook users in the U.S.

The results presented in Garrett’s paper suggest that individuals are more likely to read news stories containing “high” opinion-reinforcing information as compared to “high” opinion-challenging information. Additionally, people generally tend to spend more time reading news stories containing “high” opinion-challenging information as compared to those containing “high” opinion- reinforcing information. While reading the paper I felt that it would be interesting to study, how reading opinion-reinforcing news affects the reader’s opinion/attitude versus reading news that conflicts with the reader’s attitude. While both the studies focused on political news which in my opinion can have a wide range of debatable topics, I feel it would be interesting to redo the study on groups/communities whose basis are fanatical, unscientific beliefs, like: anti-vaccination, religious extremism and flat Earth to name a few. We can also think of repeating this study in other geographies (instead of just the U.S.), and also compare the medium of news delivering. For example, people maybe are more likely to read a news story with opinion-challenging information if its presented to them in a physical newspaper vs online news website? This points to a deeper question of, is the Internet making us more opinionated, insular, trapped in our idiosyncratic beliefs and ideologies?

If I have understood it correctly, the participant’s in Garrett’s study complete a post-reading assessment after reading every news story. Given that the participant’s only have 15 minutes to read the stories, it is unclear if the time spent finishing the post-assessment questionnaire was included in these 15 minutes. If the post-assessment was indeed included in the 15 minute reading window, I feel it might bias the post assessment or the choice of the second news story selected. Moreover, it would have been useful to have some statistic about the length of news stories, say the mean and standard deviation of the word-counts. Other than this, I feel it would have been useful to know more about the distribution of age and income in the two subject populations (the author reports the average age and some information about the income). It may also be interesting to analyze the role played by gender, age and income on political opinion as a whole. Overall, I feel the paper presented a very interesting qualitative study for it’s time, a time when users had a lot more control over what they read.

The Science article by Bakshy et. al. presents the quantitative analysis really well and does a good job, explaining the process of media exposure in friendship networks on Facebook. An interesting research question can be to study, how likely are people to share a news story that conflicts with their affiliations/ideology/opinions as compared to one that aligns with their opinions. Another thought/concern is whether the presented results would hold across geographies.

Reflection #4 – [1/30] – [Meghendra Singh]

January 30, 2018 meghs Leave a comment

Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.” Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 2013.
Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.” Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017.

Both the papers focus on issues surrounding credibility of information available on the world wide web and provide directions for future research in the subject matter. Garrett and Weeks focus on the implications of correcting inaccurate information in real-time versus presenting the corrected information after a delay (after a distractor task). Here, the authors used a between-participants experiment to compare participant beliefs on the issue of electronic health records (EHRs), when a news article about EHRs is presented with corrections as opposed to when it is presented with delayed corrections. The study was conducted on 574 demographically diverse U.S. based participants. On the other hand, Mitra et. al., present a model for assessing the credibility of social media events, which was trained using linguistic and control features present in 1377 event streams (66M twitter posts) of the CREDBANK corpus. In this case the authors first use Mechanical Turkers to score the credibility of individual event streams and subsequently train a penalized logistic regression (using LASSO regularization) to predict the ordinal credibility level (Low, Medium or High) of the event streams.

In their paper, Garrett and Weeks explore a subtle yet interesting issue of real-time correction of information leading to individuals rejecting carefully documented evidence and forming a distrust for the source. The paper seems to suggest that people who are predisposed to a certain ideology are more likely to question and not trust any real-time corrections to information on the internet (articles, blogposts, etc.) that go against there ideology. Whereas, people who already have doubts about the information are more likely to agree with the corrections. Upon reading the initial sections of the paper I felt that delayed corrections to information available online might not be really useful. I say this because people rarely revisit an article which they have read in the past. If the corrections are not presented as the readers are going through the information, how will they ultimately receive the corrected information? It is highly unlikely that they will revisit the same article in the future? I also feel that the study might be prone to sample bias since the attitudes, biases and predispositions of people in the U.S. may not reflect those of another geography. Additionally, as the authors also mention in the limitation of the study, we might get different results if the particular issue that was being analyzed is changed (e.g. we might get different results if the issue was anti-vaccination?).

In the second paper, Mitra et. al. focused on predicting the “perceived” credibility of social media event reportage using linguistic and non-linguistic features. Although the approach is interesting, I feel that there can be a difference between the perceived an actual credibility of an event. For example, given that Mitra et. al., have published that Subjectivity in the original tweets is a good predictor of credibility, malicious twitter users, wanting to spread misinformation, might artificially incorporate language features that improve the Subjectivity of their tweets so that, they seem more credible? A system based on the model presented in the paper would likely assign high perceived credibility to the tweets spreading misinformation in this case? A research question might be, to come up with a model that can detect and compensate for such malicious cases? Another interesting question might be to devise a system that can measure and present users with an events’ “actual” credibility (maybe using crowdsourcing or dependable journalistic channels?) instead of the “perceived” credibility based on language markers in the tweets about the event?

Another, question I have is why the authors use the specific form of P_ca(i.e. why were the +1 or “Maybe Accurate” ratings not used for computing P_ca?). Also, there are 66M tweets in CREDBANK, given that these are clustered into 1377 event streams, there should be roughly 47K tweets in each event stream (assuming an even distribution). Did each of the 30 Turkers (who were rating an event) read through the 47K tweets or were these divided between the Turkers? Although, I do agree with the authors that this study circumvents the problem of sampling bias as it analyzes a comprehensive collection of a large set of social media events, I feel there is a fair chance of “Turker bias” creeping into the model (in Table 2, we generally see a majority of Turkers rating the events as [+2] i.e. Certainly Accurate? I am curious, was there a group of Turkers who always rated any event stream presented to them as “Certainly Accurate”?)

Reflection #3 – [1/25] – Meghendra Singh

January 25, 2018 meghs Leave a comment

Cheng, J., Danescu-Niculescu-Mizil, C., & Leskovec, J. (2015, April). Antisocial Behavior in Online Discussion Communities. In ICWSM (pp. 61-70).

The paper presents an interesting analysis of users on news communities. The objective here is to identify users who engage in antisocial behavior like – trolling, flaming, bullying and harassment on such communities. Through this paper, the authors reveal compelling insights into the behavior of users who were banned. These insights are: banned users post irrelevantly, garner more replies, focus on a small number of discussion threads and post heavily on these threads. Additionally, posts by such users are less readable, lack positive emotion and more than half of these posts are deleted. Further, the reduction in text quality of their posts and the probability of the posts being deleted increase over time. Furthermore, the authors suggest that certain user features can be used to detect users that will be potentially banned. To this end a few techniques to identify “bannable” users are discussed towards the end of the paper.

First, I would like to quote from the Wikipedia article about Breitbart News:

Breitbart News Network (known commonly as Breitbart News, Breitbart or Breitbart.com) is a far-right American news, opinion and commentary website founded in 2007 by conservative commentator Andrew Breitbart. The site has published a number of falsehoods and conspiracy theories, as well as intentionally misleading stories. Its journalists are ideologically driven, and some of its content has been called misogynist, xenophobic and racist.

My thought after looking through Breitbart.com was, isn’t this community itself somewhat antisocial? One can easily imagine a lot of liberals getting banned in this forum for contending the posted articles? And this is what the homepage of Breitbart.com looked like in the morning:

While the paper itself presents a stimulating discussion about antisocial behavior in online discussion forums, I feel that there is a presumption that a user’s antisocial behavior always results in them being banned. The authors discuss that communities are initially tolerant to antisocial posts and users, and this bias can easily be used to evade getting banned. For example, a troll may initially post antisocial content, switch to the usual positive discussions for a substantial period of time and return to posting antisocial content. Also, what’s to stop a banned user from creating a new account and return to the community, I mean all you need is a new e-mail account for Disqus? This is important because most of these news communities don’t require the notion of reputation for posting comments on their articles. On the other hand, I feel that the “gamified” reputation system on communities like Stack Exchange would act as a deterrent against antisocial behavior. Hence, it would be interesting to find who gets banned in such “better designed” communities and are the markers of antisocial behavior similar to those of news communities? An interesting post here.

Another question to ask is are there deeper tie-ins of antisocial behavior on online discussion forums? Are these behaviors predictors of some pathological condition with the human posting the content? The authors briefly mention these issues in the related work. Also, it would be interesting to discover, if a troll on one community, is also a troll on another community? The authors mention that this research can lead to new methods for identifying undesirable users in online communities. I feel that detecting undesirable users beforehand is a bit like finding criminals before they have committed the crime, and there may be some ethical issues involved here. A better approach might be to looks for linguistic markers that suggest antisocial themes in the content of a post and warn the user of the consequences of submitting it, instead of recommending users to be banned to the moderator, after the damage has already been done. This also leads to the question that what are the events/news/articles that generally lead to antisocial behavior? Are there certain contentious topics that lead regular users to bully and troll others? Another question to ask here is: Can we detect debates in comments to a post? This might be a relevant feature that can predict antisocial behavior. Additionally, establishing a causal link between the pattern of replies in a thread and the content of the replies may help to identify “potential” antisocial posts. A naïve approach to handle this might be to simply restrict the maximum number of comments a user can submit to a thread? Another interesting question maybe to find out, if FBUs start contentious debates, i.e. do they generally start a thread or do they prefer replying to existing threads? The authors provide some indication towards this question, in the section “How do FBUs generate activity around themselves?”.

Lastly, I feel that a classifier precision of 0.8 is not good enough for detecting FBUs. I say this because the objective here is to recommend for banning potential antisocial users to human moderators, so as to keep their manual labor and having a lot of false-positives will defeat this purpose in some sense. Also, I don’t quite agree with the claim that the classifiers are cross-domain. I feel that there will be a huge overlap between CNN and Breitbart.com in the area of political news. Also, the dataset is derived from primarily news websites where people discuss and comment on a articles written by journalists and editors. These might not apply to Q&A websites (For E.g. Quora, StackOverflow) or places where users can submit articles (For E.g. Medium) or more technically inclined communities (For E.g. TechCrunch).