Reflection #13 – [04/10] – [Jamal A. Khan]

  1. “Predicting Sales from the Language of Product Descriptions”

Since I’ll be presenting one of the papers for tomorrows(4/10) class, I’ll be writing a reflection for the other paper only.

I thoroughly enjoyed reading this paper for two reasons:

  • The idea of using a neural network for feature selection.
  • coverage of different aspects of results.

To start off I’ll go against my first reason of liking the paper because i think the comment needs to be made. Neural networks (NN) by design are more or less feature generators, not feature selectors. Hence, using one for selection of features seems pretty counter intuitive e.g. When Yan Lecun Convolutions Neural Networks have been so wildly successful because they’re able to automatically learn filters detect features like vertical edge, shapes or contours without being told what to extract. Thinking along these lines the paper seems pretty ill-motivated because one should be able to use the same gradient reversal technique and attention mechanism in a sequence or convolutional model to abstract out the implicit effects of confounds like brand loyalty and pricing strategies. So why didn’t they do it?

The answer, or atleast one of the reasons that is very straight forward, is interpretability. While there’s a good chance that the NN will generate features that are better than any of the handmade ones, they won’t make much sense to us. This is why is like the authors idea of leveraging the NNs power to select features instead of having it engineering them.

Coming onto the  model, i believe that the authors could’ve done a better job at explaining the architecture. It’s a convention to state the input and output shapes that the layers take in and spit out, a very good example of which is one of the most popular architectures, inception V3. It took me a bit to figure out the dimensionality of the attention layer.

Also the authors do little to comment on the extensiblity of the study to different type of products and languages. So how applicable is the same/similar model to let’s say English which has very different grammatical structure? Also since the topic is feature selection, can a similar technique be used to rank other features i.e. something not textual? as a transactional thought, I think the application is limited to text.

While it’s all well and good that the authors want to remove the effects of confounds, the only thing that the paper has illustrated is the models ability to select good tokens. I think the authors themselves have underestimated the model. Models having LSTM layers followed by attention layers to generate summary encodings are able to perform language translation (which is a very difficult learning task for machines), hence by intuition i would say that the model would’ve been able to detect what sort of writing style attracts most customers. So my questions is that when the whole idea is to see mine features of reviews to help better sell a product, why was language style completely ignored?

Just as  food for thought for people who might be into deep learning (read with a pinch of salt though) . I think the model is an overkill and the k-nearby-words method of training skipgram embeddings (the method used for Word2Vec generation) would’ve been able to do the same and perhaps more efficiently. The only thing that would need to be modified would be the loss function, where instead of trying to find vector representation that capture similarity of words only we would introduce the notion of log(sales). This way the model would capture both words that are similar in meaning and sale power. Random ideas like the one i’ve proposed need to be tested though, so you can probably ignore it.

Finally, sections like Neural Network layer reviews add nothing and break the flow. Perhaps these have been included to increase the length because the actual work done could be concisely delivered in 6 pages. I agree with John’s comment that this seems more like a workshop paper (a good one though).



One last thing that i forgot to put into the reflection (and am too lazy to restructure now) is that this line of work isn’t actually novel either. Interested readers should check out the following paper from Google. Be warned though it’s a tough read but if  you’re good with feed forward NN math, you should be fine.


Read More

Reflection #12 – [04/05] – [Jamal A. Khan]

  1. Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.”
  2. Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.”

The first paper regarding emoji’s is an intriguing one. There are things i really like about the paper and things that i don’t. Starting with the things i like, the data-set size is massive, which these days is rare to see. The way the authors pre-process the text is somewhat questionable and might introduce artifacts, but given the massiveness of the dataset that shouldn’t be the case. I would dare say that the actual novelty of the paper is the processed data set and it probably is also the reason why the model performs very well.

Coming onto the next logical step i.e. the model architecture,  I feel that there is nothing novel here, Bi-directional LSTMs and then an attention layer or two on top isn’t new. Furthermore the explanation of the model isn’t clear. Are the LSTMs stacked? or are they in sequence? if they are in sequence then why (this isn’t a seq-seq i.e. encoder-decoder model)? Also it doesn’t make sense to have 2 LSTMs in sequence because the same could be achieved by replacing the existing ones with new one such that it has recurrence of both of previous ones combined. I realize this might be an involved question but I would like to know if someone else in the class understood this part because I most certainly didn’t.

Now, the authors claim that this “chain-thaw” transfer learning is novel and my opinion of this claim may be polarizing. The ethics of the claim are something i would like to discuss in class as well. To me chain-thaw is not anything new or innovative and it’s something i’ve already done in the ML course I took at VT. The reason why I say it’s not novel isn’t because i was able to come up with it on my own but because it is something so commonplace that people consider it trivial. The ability to freeze layers and re-train the layer of choice has been present in Keras (a deep learning API) since it’s inception which dates back to mid 2015. Has anyone else claimed this “chain-thawing” as their own? probably not. Does that make the authors the first one to claim it? probably yes. It is a contribution to the scientific community in any way? probably not 🙁 . Hence, this long rant brings me to my actual question, is claiming something as novel when the technique/information is common sense/trivial academically misleading or a false claim? To me it seems that the claim was made as a means to give the give the paper more selling power which it didn’t need because it was a good paper to begin with.

So, while reading this paper i ran into this random reddit-like snippet. Might be a tab bit …….. NSFW

Since the words themselves are so close to each other (talking about vector space embedding from the first layer of the network), would the network be able to extract these “connotations”? These connotations might exist in emojis as well, from personal usage and experience, I believe they do.

A direct follow up question that this paper raises is that can sentences that are purely written in emoji (no text) be translated to normal text or can meaning be inferred from them instead of just emotion? I think using the pretrained model an encoder can be readily built, but the decoder may be a whole different animal! primarily because of the lack of translated sentences i.e. a curated translation dataset. For folks who are new to seq-seq/encoder-decoder models, i would recommend reading up on NMT, this repo is a good primer and has practicle examples+code.


The second paper focuses on a different topic as compared to the first. Outright i would like to raise the question that why does the paper use such an old dataset? Reddit should have depression related reddits and subreddits, infact a quick google search shows some already.

A crawler (using the results from google as the seeds) and a scraper to pull the reddits could’ve proved to be a very effective approach in building a much more recent and perhaps representative dataset.  Another confusing aspect was the choice of the 5 categories depression,  self-harm, grief, bipolar disorder, and suicide. Why these 5?wasn’t the original goal to study depression? so why didn’t they focus on types of depression instead of the categories listed above?

The classification methodology chosen for the paper is questionable. Instead of a multi class classifier which would be able to classify into depression,  self-harm, grief, bipolar disorder, and suicide, the authors have chosen to build 4 binary classifiers, why? it’s very counter  intuitive, perhaps i’m missing something. Also, since i’m not knowledgeable about mental health problems, how would one go about labeling examples in the data-set? as compared to physical diseases/problems e.g. fractures in bones, is there a universally agreed upon classification or is that different practitioners would label differently? the  labeling may completely change the models developed.

Another weakness is that the paper implicitly assumes that the depressed people post to forums. Do they post? or do they disappear? I guess that could be a research topic on it’s  own. Overall the paper’s idea was pretty good but poorly explained and executed. I feel that  the paper had much more potential.


Read More

Reflection #11 – [03/27] – [Jamal A. Khan]

  • Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.
  • King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.

Both of the papers assigned the next class are about Chinese censorship and in a sense have this heroic writing tone in terms of how the idea is being put forward. I didn’t quite like the way the ideas were staged but that is irrelevant and subjective.

Regardless of the tone of the writing or my likes or dislikes at that, I like the first paper’s idea of using the semantics of Chinese language itself as a deterrent against censorship. The complexity of the language has come as a blessing in disguise. Before i get into the critical details of the paper, i would say that the approach of the authors is sound and has been well demonstrated. Therefore, the reflection will focus on what can be done (or undone in my case) using the paper as  a base.

Since the title itself states that the purpose of the paper is to “bypass” the censorship, a natural question is “Does this method still work?”. A naive approach to breaking this scheme, or at the very least majorly cutting down the human cost that the authors talk about, would be to build a homophone replacement method. This is very much possible with the recent advances in Word Embedding schemes (referring to works in [1], [2] and especially [3]) and their ability to detect the similarity is usage of words. These embeddings per say do not look at what a word is but rather at how it occurs to deduce the importance, similarity and in case of [3] hierarchy as well when mapping to a arbitrary dimensional vector space (meaning that it could deduce what a random smiley means as well) . Hence to these embedding the homophones are very similar words IF they are used in the same context (which they are!). Since the proposed solution in the paper relies on the reader being able to deduce meaning of the sentences from the context of the article or the situation/news trends, Embeddings will be able to do so well, if not better, and hence the system would censor the posts. So, i  guess my point is that, this method might be outdated now, the only overhead the censoring system would have to bear is the training of a new embedding model every day or so.

The other question is “Do the censorship practices still function the same way”. Now that NLP tasks are being dominated by sequence models (deep learning models based on bi-directional RNNs for example),  it might be possible to automatically detect even better now. I feel that this question is one that needs further exploration and there is no direct answer.

Another natural question to ask would be: does this homophonic approach extend to other languages as well? For Urdu (Punjabi as well), English and to some extent Arabic, the languages which i know myself, I’m not too sure if such a variety of homophones exists. Since it doesn’t then a straight follow up question is Can we develop language invariant censorship avoidance schemes? I feel that this could be some very exiting work. Maybe some inspiration can be drawn from schemes such as [4].

The second paper by King et al., I must is pretty impressive. The amount of detail in terms of experiment design, the consideration undertaken and the way results are presented is pretty much on point. Now, I’m not to  familiar with Chinese censorship and it’s effects, so i can’t make much of the results. The thing that is surprising to me is that posts with collective action potential are banned while those critiquing the government are not, why? Another surprising finding was the absence  of a centralized method of censorship and this leads me back to my original question that with newer NLP techniques powered  by deep learning emerging, will the censor hammer come down harder? will these digital terminators be more efficient at their job? In the unfortunate case, that this dystopian scenario were to come true, how’re we to deal with it?.

I guess with both the paper combined an ethical question needs to be discussed: Is censorship ethical? if no, then why? if yes, then under what circumstance and to what extent? It would be nice to hear other peoples opinion on this in class.


[1] Efficient Estimation of Word Representations in Vector Space:

[2] GloVe: Global Vectors for Word Representation:

[3] Poincaré Embeddings for Learning Hierarchical Representations:

[4] Unobservable communication over fully untrusted infrastructure:

Read More

Reflection #10 – [03/22] – [Jamal A. Khan]

Both of the papers revolve around the theme of fake profile, albeit of different types.

  1. Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.
  2. Lee, Kyumin, James Caverlee, and Steve Webb. “Uncovering social spammers: social honeypots+ machine learning.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.

The first paper about sockpuppets is well written and pretty well explained throughout. However, the motivation of the paper seems weak! i see that there are ~ 3300 socketpuppets out of a total of ~2.3 million users! So that bring me to my question that is that even worthy enough problem to tackle? Do we need an automated classification model?  why do socketpuppets need to be studied? Do they have harmful effects that need to be mitigated?
Moving forward, the entirety of the paper builds up to a classifier and though not a bad thing, I get a feeling that the work was conducted top down (idea of a classifier for sockpuppets -> features needed to build it) but the writing is bottom up (need to study sockpuppets and the use of the generated material to make a classifier). Regardless, the study does raise some follow up questions, some of which seem pretty interesting to check out:

  • Why do people make sock puppets? What purpose are the puppets are being used for? are they created for a targeted objective or are they more troll like (just for fun? or just because someone can?)
  • How do puppeteers differ from ordinary users?
  • Can community influence the creation of sockpuppets? I realize the paper already partially answer this question, but i think there needs to be much more dialed down attention on temporal effects of community on, and the behavior of, a puppeteer before the creation of the puppets)

Coming onto the classifier, I have a few grievances. Like many other papers that we have discussed in class, the paper lacks the details of the classifier model used e.g. number of tress in the ensemble, the max tree depth, what’s the voting strategy: do all tress get the same vote count?. However, I will give the authors credit because this is the first paper among the ones we’ve read that has used a powerful classifier as opposed to simple logistic regression. Still, the model has poor predictive power. Since, i’m working on an NLP classification problem i’m wondering if the sequential models might work better.


Moving onto the second paper, it’s a great idea but executed poorly and so, I apologize for the harsh critique in advance. The idea of honeypot profiles is intriguing but just like how social features can be used to sniff out spammer profiles, they can used to sniff out honeypots and hence, the trap can be avoided. So I think the paper’s approach is naive in the sense that it needed more work on why and how the social honeypots are more robust to changes in strategies by the spammers.

Regardless, the good thing about the project is the actual deployment of the honeypots. However, the main promise of being able to  classify the spammer and non-spammers has not been delivered. The scale of the training dataset is minuscule and not representative i.e. there are only 627 deceptive and 388 legtimate profiles for the MySpace classification task. Hence, the validity of the following table becomes questionable.

With the  dataset of the same scale as the one being used here, we could’ve also fitted a multinational regression and perhaps gotten similar results. The choices of the classifiers has not been motivated and why have so many been tested?. It seems the paper has fallen victim to the approach that “when you have a hammer everything looks like a nail”. The same story is repeated with the twitter classification task.

Regardless of my critique, the paper presents the classifier results in more detail than most papers, so that’s a plus. It was quite interesting to see that age in figure 2 (show below) had a big ROC. So, my question is that are the spammer profiles younger than legitimate user profiles?

Another, question is regarding the test of time for the study, will a similar classifier perform well  on the MySpace of today (i.e. Facebook)? Since, the user base now is probably much different and diverse now, the traits of legitimate users have changed.

Finally, I would like other people’s opinion on the last portion of the paper i.e. “in-the-wild” testing. I think this last section is just plain wrong and misleading at best. The authors say that

“… the traditional classification metrics presented in the previous section would be infeasible to apply in this case. Rather than hand label millions of profiles, we adopted the spam precision metric to evaluate the quality of spam predictions. For spam precision, we evaluate only the predicted spammers (i.e., the profiles that the classifier labels as spam). …”

Correct me if I’m wrong please but the spam precision metric proposed is measuring the true positive rate for the profiles that were actually classified as spam and not the ones that were actually spam. This is misleading because it ignores the profiles that weren’t detected in the first place and so for all we know, the true-negatives may have been orders of magnitude larger than the detected negatives. For Example, suppose in actuality we had 100000 spam profile in 500000 overall profiles, out of which only 5000 were detected. The authors are only reporting how many of the 5000 were actually true positives and  not how many of the 100000 were true positive. There is no shortcut to research and i think the reason cited above in italics is simply a poor excuse to avoid a repetitive and time consuming task. In the past few years, good data is what has driven Machine Learning to it’s current popularity and so to make a claim using ML, the data needs to be to a great degree unquestionable. It’s for the same reason that most companies don’t mind releasing their deep learning model’s architecture because they know that without the data that the company had, no one will be able to reproduce similar results. Therefore, to me all the results in section 5 are bogus and irrelevant at best. Again I apologize for the harsh critique.

Read More

Reflection #9 – [02/22] – Jamal A. Khan

  • Mitra, Tanushree, Scott Counts, and James W. Pennebaker. “Understanding Anti-Vaccination Attitudes in Social Media.”
  • De Choudhury, Munmun, et al. “Predicting depression via social media.”

While both paper target serious and  important issues, the first ones is more interesting of the two, perhaps due to the nature of the question. The fact that anti-vaxers are prone to believing in conspiracy theories and in general exhibit distrust and a phobia of sorts seems highly logical. I was surprised to see that while the paper highlighted people who joined the anti-vax group, it ignored the people who left! What are linguistic and topical queues that the transitions (anti-vax to pro-vax) exhibit. I believe that this is important to understand to be able to fight against the “self-sealing” quality of conspiracies or in this case anti-vaccination theories. Overall, though the paper was very convincing and thorough.

A follow up question would be to find the trends and growth patterns of these theories and how contageous they are and how long they take to die out. Another interesting thing that could mined is the source of the claims they make and the validity thereof, this would provide insight into the processes involved in the birth of these conspiracies.

Coming onto the second paper, i feel the main motive of the paper is to predict depression, however the choice of classifier or model as i always complain about is weak again. Since the features were hand designed interpretability wouldn’t have been a issue. Therefore, ensemble techniques should’ve been opted for. In particular gradient boosting should have been used here.

Regardless of the choice of the classifiers, a direct followup questions is whether the same techniques can be applied to new forms of social media e.g. Facebook posts aren’t limited to short sentences and dyads are mostly in personal messages, Instagram’s content heavily leans towards pictures and videos etc.  Can image analysis provide better insights perhaps? i.e. do other forms of media e.g. pictures and videos that are not text contain a better signal? Another interesting followup question is that whether these episodes of depression are isolated case of it crowd mass depression?. A graph/network analysis might provide good insight

Coming from a more ethics perspective an important question is whether social media platforms have the right to even monitor depression or behavioral traits? If they do find a person who’s highly vulnerable what sort of action can they take? I’m interested to know what other people think of this.

Finally, i would  like to mention that the most of the social media posts that i come across these day are either sarcastic or play-off on being super busy, stressed or sad in an attempt to be funny. I believe a lot of these posts would pollute the dataset and it doesn’t seem like the authors have catered for it. Simply relying on law of large numbers isn’t going to get rid of this issue because it’s more of a prevailing trend rather than an outlying one.

Read More

Reflection #8 – [02/20] – [Jamal A. Khan]

  1. Bond, Robert M., et al. “A 61-million-person experiment in social influence and political mobilization.”
  2. Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. “Experimental evidence of massive-scale emotional contagion through social networks.”

Both of the paper assigned though short are pretty interesting. Both you them have to do with social contagion showing where behavior of people can propagates outwards.

In the first paper, the observation/behavior that bugs me is that people who were shown the informational message behaved in a manner unsettlingly similar to people who had no message at all.  Desire to attain social credibility alone cannot be the cause! because the difference in validated voting records between the control group and informational message group is practically speaking non-existent. This leads to what i think might be a pretty interesting question, “do people generally lie on social media platform to fit the norm in order to achieve acceptance from community? monkey sees, monkey does?”. A slight intuition, though controversial, might be that elections in USA are highly celebritized, which might be affecting how voters behave on social media. Another important factor that i think was not controlled for by the authors was fake accounts which may have a significant impact on the results. We’ve seen recently in the US Presidential how these bogus accounts can be used to influence elections.

The second paper was more interesting of the two and slightly worrying in a sense too. Taking the result just at face value, “is it possible to program sentiments of crowds through targeted and doctored posts? if yes, how much can this impact important events such as presidential elections”.

Nevertheless, moving on to the content of the paper itself, I disagree with the methodology of the authors in using just LIWC for analysis. While it may be a good tool, the results should have been cross-tested with other similar tools. Another thing to be noted is the division of posts into a binary category with the threshold being just a single positive or negative word. I feel that this is choice of threshold is flawed and will fail to capture sarcasm or jokes. My suggested approach would have been to have three categories of negative, neutral/ambiguous and positive. The authors’ choice of Poisson regression is not well-motivated and implicitly assumes that posting times (gaps between posts) assume a Poisson distribution of which no proof has been provided which leads me to believe that the results might be artifacts of the fit and not actual observations. Finally, a single trial, in my opinion, is insufficient; multiple trials should’ve been conducted with sufficient gaps in between.

Regardless of the approach adopted, building on the results of the paper that when people see more positive posts their subsequent posts are positive and vice versa for negative statements, my questions is that “People who subsequently share positive or negative posts after being exposed to the stimuli, are they actually more positive or more negative respectively? or is it again monkey sees, monkey does i.e. do they share the similar posts as their entourage to stay relevant?”.  I might be going off a tangent but the it might be interesting to observe the impact of age i.e. “Are younger people more susceptible to being influenced by online content and does age act as a deterrent against that gullibility?”

Read More

Reflection #6 – [02/08] – [Jamal A. Khan]

Both of the papers assigned for today deal with computational linguistics revolving around politeness and repectfulness:

  1. Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013) “A computational approach to politeness with application to social factors”.
  2. Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017) “Language from police body camera footage shows racial disparities in officer respect”.

I’m going to talk about the second one first because it seems to tackle a more interesting topic and asks a hard question with serious social consequences. The authors question whether black people are systematically treated differently(with less respect) by police officers? and the answer, alarmingly, is yes.

However, i don’t think the authors catered for the drastic difference in crime rate in Oakland as compared to the national average and this makes generalizability of the study a major concern.

So my question is then is that “does the oakland police department have a problem of racial profiling or do police departments in general exhibit this trend?” it’s very important to realize the difference and it’s an easy one to overlook. Another important aspect that is missing, and is pointed out by the authors as well is that the while the trends may be obvious, the reasons thereof are not! which i believe are quite important.

Finally i think without taking the language, body language and facial expressions of the people being questioned, the study might paint an incomplete picture. With modern deep-learning techniques and out of the box solutions for object detection, facial segmentation etc. such analysis is now possible.  Hence, while I feel that this study is a step in the right direction but it’s one which needs much more work.

Coming back to the first paper, before i get into the paper itself, just by reading the abstract the question the comes to mind is whether models such as the one proposed by the authors have the potential to be used for language policing and how intrusive could the effect be on freedom of speech? The reason why i raise this question, is that while we are heavily concerned about whether people face opinion-challenging information online so as to not create “echo chamber”, we tend to have the opposite stance on language used which possibly has the potential to create a “slowflake culture”. In my opinion negative experience are necessary for growth.

Nevertheless going back to the paper itself, the most interesting trends were from the stack-overflow results showing that people become less polite (haughty?) after gaining non-monetary affluence and that they becomes more polite when they loose. I guess the former goes along the lines of “humility is a hard attribute to find?”. Though i think the analysis is too one-dimensional in the sense that formality and or the necessary strictness that is required of someone when in position of power, might be perceived as less polite but no less respectful? I think a separate set of experiments needs to be run to confirm this observation and so i wouldn’t treat this particular results as a really interesting hypothesis and nothing more. Perhaps popularity of users and politeness could also be studied in a different set of experiments.

Another subtle point that the paper missed is that English is now becoming (probably has already become) a universal language and the way it’s used differs quite widely among different cultures and geographic. This seems to be a repeating trend among the papers we’ve read so far in the class. The question here becomes then what kind of effects do the local languages have on the perception of respect in (translated?) English?It may be the case that phrases when translated over from the local language to English become less-polite or maybe even rude.

Finally, i was wondering how well the study would fare with more modern NLP techniques which capture not only sentence structure but also inter sentence relationship (the proposed classifier doesn’t do that right now) and would the findings still hold or get augmented.


Read More

Reflection #5 – [02/06] – Jamal A. Khan

Both the papers ask questions along similar lines, however, the question raised by the Kelly is more subtle and in a sense a follow up to the one posed by Bakshy et al.

  1. Garrett, R. Kelly. “Echo chambers online?: Politically motivated selective exposure among Internet news users.” Journal of Computer-Mediated Communication2 (2009): 265-285.
  2. Bakshy, Eytan, Solomon Messing, and Lada A. Adamic. “Exposure to ideologically diverse news and opinion on Facebook.” Science6239 (2015): 1130-1132.

The first paper asks quite an interesting question that deals with the consequences of technology  i.e. “Will selective new exposure or the ability to limit one’s exposure to articles or sources of a certain kind lead to a dystopian future where democracy will collapse?” The answer  to which was, unsurprisingly, no.

The results of the paper suggest that prospective opinion-reinforcing information greatly influences article selection and that immediately raises a follow up question, “how does fake news framed as opinion-reinforcing content come into play in shaping perception of truth among people?”. If people are more likely to consume articles that support their ideological point of view, then such articles can serve as good tools for derailing the truth. Since there is potential of the truth being manipulated, “does this lead people to very extreme, one-sided and tunneled vision i.e. can people be indirectly be programmed to adopt a certain point of view?if so, how devastating can the effects be?”

The second question has more to do with the design of the study, “does voluntarity (if that’s a word) exaggerate behaviors?” i.e. does passive vs active instrumentation of people lead to different results? will a person’s choice of articles be different if they knew that they were being monitored?

The third question has to do with the choice of the population. People chosen from well known left or right leaning sights are generally quite enthusiastic and/or as compared to the general public. Hence the results that were reported here might be an artifact of the choice of the population. So i guess the question i’m asking is “whether people who read news from partisan sources are democrats for the sake of being democrats or republicans for the sake of being republicans?”. Hence the generalization of the results to the general public is uncertain.

The second paper asks a different question of whether people get isolated from a opinion-challenging information even if they choose to. Given, the amount of data available to the authors i’m surprised the paper is as fleshed out as it could be, but the paper does a good job of quantitatively proving that the fear is unfounded. An interesting thing that i noted was that the neutral group tended to have a smaller proportion of friends who were neutral as compared to liberal of conservative friend, the difference is almost ~20%.

So is it that neutral people have these proportions because they are neutral in leaning, or is that them being neutral is an affect of having equal proportions of conservative or liberal friends?

Another aspect that i think could’ve been easily studied in the paper but was overlooked, is that given hard news that was opinion challenging, how likely are people to share it? This statistic would give deeper insight into how open are each of the three groups to change and/or how likely they are to peruse the truth of the matter and whether they value truth over ego.

Perhaps, this study could also look into different but specific topics e.g. climate change, and look at similar statistics to determine if  “micro echo chambers” (topic based echo chambers) are formed. This sounds certainly seems like an interesting direction to me!

Read More

Reflection #4 – [1/30] – Jamal A. Khan

In this paper the questions posed by the authors are quite thought provoking and are ones which, i believe, most of us in computer science would simply overlook, mostly perhaps due to lack of knowledge in psychology. The main question is what sort of corrective approach would lead people to let go of political misconception? is it an immediate correction? or is it a correction provided after some delay? or perhaps no corrections are the way to go? Given the fact that this is a computational social science course, i found this paper, though interesting, a bit out of place. The methodologies, questions and the way results and discussion are built up, make it quite inline to be psychology paper.

Nevertheless, coming back to the paper, I felt that two aspects completely ignored during analysis were gender and age:

  • How do men and women react to same stimuli of correction?
  • Are younger people more open to differing opinions and/or are young people more objective at judging the evidence countering their misperceptions?

It would’ve been great to see how the results extrapolate among different racial groups. Though, i guess that’s a bit unreasonable to expect given that ~87% of the sample population comprises of one racial group. This highlighted snippet from the paper made a chuckle:

There’s nothing wrong  in making claims but one should have the numbers to back it up, which doesn’t seem to be the case here.

The second question that arose in my mind comes from personal experience is whether the results of this study would hold true in, say, Pakistan or India? The reason I ask this question is that politics there is driven by different factors such as religion and what not, so the behavior of people and their tendency to stick to a certain views regardless of evidence flouting them would be different.

The third point, would be the the relationship of the aforementioned concerns and level of education of the subject:

  • Do more educated have the ability to push aside their preconceived notions and point of views when presented with corrective evidence?
  • How is level of education co-related with the notion of whether a media/new outlet is biased or not and the ability to ignore that notion?

Before moving onto the results and discussions, i have a few concerns about how some of the data was collected from the participants. In particular:

  • They have a 1-7 scale for some questions, 1 being the worst and 7 being the best case. How do people even put themselves on such a scale that has no reference point? Given that was no reference point, or at least none that authors mentioned, any answer given by the participants to such questions will be ambiguous and highly biased relative to what they consider to be a 1 or 7 on the scale. Hence results drawn from such questions would be misleading at best.
  • The second concern has more to do with the the timing assigned to the reading? why 1 minute or even 2 at that? why was it not considered mandatory for the participants to read the entire document/piece of text? what motivated this method and what merits does it have if any.
  • MTurk was launched publicly on November 2, 2005 and the paper published in 2013. Was it not possible to gather more data using remote participants.

Now the results section managed to pull all sorts of triggers for me so i’m not going to get into details of them and just pose three questions:

  • Graphs with unlabelled y-axis? though i don’t doubt the authenticity or intentions of the authors but this makes the results so much less credible for me.
  • Supposing the y-axis are in 0-1 range, why are all the threshold at 0.5?
  • Why linear regression? won’t that force all results to be artifacts of the fit rather than actual trends? Logistic regression or Regression Trees i believe would have been a better choice without sacrificing interpretability.

Now the results drawn are quite interesting e.g. One of thefindings that I didn’t expect was that real-time correction don’t actually provoke heightened counter arguments but that the problem comes into play via biases stemming from prior attitudes when comparing credibility of the claims. So the question then arises, how do we correct people when they’re wrong about ideas they feel strongly about and when the strength of their belief might dominate the ability to reason? In this regard, I like the first recommendation made by the authors of presenting news from sources that the users trust, which these days can be easily extracted from their browsing history or even input form the users themselves. Given the fact that extraction of such information form users is already common place i.e. google and Facebook use to place ads, I  we needn’t worry about privacy. What we do need to worry about is as the authors mentioned it’s tendency to backfire and reinforce the mispercetptions. So, then the question transforms to how do we not make this customization tool a two-edged-sword? one idea is that maybe we could provide users a scale of how left or right leaning the source is when presenting the information and tailor the list of sources to include more neutral ones or tailor the raking to make sources seem neutral and occasionally sprinkle in the opposite leaning sources as well to spice things up.

I would like to close off by saying that, we as computer scientist do have the ability, far above many, to build tools that shape society and it falls upon to us to understand the populace and human behavior and the peoples’ psychology much more deeply than already we do. If we don’t then we run into the danger of the tools generating results contrary to what they had been designed to.  As Uncle Ben put it With great power comes great responsibility“.

Read More

Reflection #3 -[01/25]- [Jamal A. Khan]

This paper studies anti-social behavior of people on three different platforms which i believe is quite relevant looking at the ever increasing consumption of social media. First off,  in my opinion what the authors have studied is not anti-social behavior, but rather negative, unpopular and/or inflammatory behavior (which also might not be the case as I’ll highlight a bit later). Nonetheless, the findings are interesting.

Referring to Table 1 in the paper (also shown above) I’m surprised to see so few posts deleted. I was expecting something in the vicinity of 9-10% but that might be just me though! maybe I have a tendency to run into more trolls online 🙁 . What are other people’s experiences, do these numbers reflect the number of trolls you find online?

Now, a fundamental problem that I have with the paper is the use of moderators actions of “banning or not banning” as the ground truth. This approach fails to address a few things. First, What of the moderators biases? One moderator might consider certain comments on certain topic acceptable while another might not and this varies based on how the person in question feels about the topic at hand.  For example, I very rarely talk or care about politics, hence most comments seem innocuous to me, even ones that i see other people react very strongly to. That being the case, if I was the moderator that saw some politically charged comments i would most probably ignore them.

Second, unpopular opinions expressed by people most certainly don’t count as anti-social behavior or troll remarks or even as attempts to derail the discussion or cause inflammation e.g. one such topic that could pop up on IGN would be the strict gender binary enforced by most video game studios, which will, by my experience, get down voted pretty quickly because people are resistant to such changes. So this raises a few questions as to what is used a metric to deal with unpopular posts? Are down-votes used by the moderators as a metric to remove the posts?

Third, varying use of English based on demographics would thrown off the language similarity among posts for FBUs and NBUs by a fair margin and thee authors don’t seem to have catered for it. The paper relies quite heavily on this metric for making a lot of the observations. So, If we were conducting a follow up study, how would we go about taking cultural difference in use of English into account? Do we even have to i.e. will demographically more diverse platforms automatically have a  normalizing effect?

Finally, the idea of detecting problematic people beforehand seems like a good idea at first but on second thought i think it might not be so! but that depends on how the tool is used. The reason why I say this is because, suppose we had an omnipotent classifier that could with 100% accuracy, what would we do once we have the predictions? Ban the users beforehand? wouldn’t that be a violation of the right to opinion and freedom of speech? wouldn’t the classifier just reflect what the people like to see and hear and end up tailoring their content to their point of views? and in a dystopian scenario wouldn’t it just lead to snowflake culture?

As a closing note, how would the results be if the study was to be repeated on Facebook pages? would the results from this study  generalize?

Read More