Reflection #13 – 04/10 – Pratik Anand

Paper : Do online reviews affect product sales? The role of reviewer characteristics and temporal effects

The paper provides an interesting take on impact of online reviews on product sales.
The authors posit that apart from general review rating of a product, additional information like reviewer’s standing, review quality as well as product coverage play a big role in sale of a product. They use the seminal work on TCE and Uncertainty Reduction Theory to support their hypothesis. The latter suggests that if a user doesn’t have enough information about a product, they will try to reduce the uncertainty by looking at product reviews, popularity, reputation of reviewers etc. What all factors play a role in reducing uncertainty ? Social features can also play a big role, especially in e-commerce sites.
If a friend has recommended a product, it will outweigh the impact of other factors like review score and quality.
Why didn’t the authors take review score + quality + reviewer standing as part of a hierarchical process to reduce uncertainty ? A typical user may look for higher review scores for reducing the sameple space. Then, she may use the quality, reviewer standing and other factors and repeat this process till she is satisfied with the final results.
The paper doesn’t take into acount the language of the review text for determining its quality. How it is determining quality otherwise ? Amazon has a review helpfulness score. Are they using those ?
Even though it is intuitive that product coverage play a big role, rise of discovery services in online e-commerce as well as food review portals show that users might not always go for the most featured products. Many people always want to seek out new products and thus, coverage may have some negative impact in such cases too.
Lastly, the authors have mentioned that the impact of negative reviews on product sales reduces with time ? What could be the reasoning behind it ? Is it bound to the type of recommendation algorithm used by e-commerce portal? If such algorithm takes into account all the reviews of an item since the beginning, will such argument hold?

It is a good paper which establishes various hypothesis about online product and tests them against a given dataset. A future direction of research may be to try it in different markets and take language of review as a factor of quality

Read More

Reflection #12 – 04/05 – Pratik Anand

1) Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

The paper aims to use the emojis to detect diverse set of emotions in text. The authors acknowledge the previous work in this field with the use of postive and negative emojis for finding the emotion of the text. They extend the previous work by having a more diverse set of classifications. A good example is emotional distinction between “this is shit” and “this is the shit”. The former has a negative connotation whereas the latter has a strong positive connotation. Their model, DeepEmoji, is a variation of LSTM model. The authors used Twitter dataset with emojis for training their model. For tweets with multiple emojis, the authors use the same tweet separately for each emoji in their dataset. Sometimes, a group of emojis collectively convey a message. Won’t such messaging be lost if the emojis are analyzed separately rather than a group? It indeed has a problem of distinction of such cases from cases where different emojis have no relation among each other. There is an attempt later in the paper to cluster emojis together into different groups. I believe that work can be extended to address the point mentioned above. Can a similar approach be applied to memes ?

Overall, it is a unique technical paper which even has a live demo available at deepmoji.mit.edu which one can play around. It is good to see work in emojis as they are the future of human communication.

2) Using linguistic and topic analysis to classify sub-groups of online depression communities

The paper uses linguistic features to identify sub-communities in the online depression communities. The paper, despite being fairly recent, uses Live Journal dataset for its studies. They could have gotten much better results with more popular communities like reddit, facebook groups datasets. Live Journal as a blogging website is on decline since 2010s. Nevertheless, if the dataset is representative, it is good enough. The authors identified various communities for depressed people which they grouped into the categories like Depression, Self-Harm, Grief/Bereavement, Bipolar Disorder and Suicide. What is the reasoning behind such cateogorisation ?
The authors use linguistic features as well as topic modelling to extract feature sets. The word clouds from topics provided some intersting keywords. An interesting fact is that no unique identifiers were found for Depression community except for filler words. What could be the reason for such behavior ? Can it be changed by different classification methodology? What should be the ground truth in cases related to mental health?
The authors posit that the latent features are representative of the sub groups and can be used for identification of such sub-groups. Another point is that it could be used as a starting point and correlated with data from other social networks to create a mental health profile of an individual. Different linguistic profile can help in understanding such communities better.

Read More

Reflection #11 – 03/27 – Pratik Anand

Paper 1 : Reverse-engineering censorship in China: Randomized experimentation and participant observation

Paper 2 : Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions

Both the papers play two parts of a larger process – observation and counter-action.
The first paper deals with the reasearchers trying to understand how Chinese censoeship works. In the light of lack of any official documentation, they use experimentally controlled anecdotal evidences to build an hypthesis of functioning of “The Great Firewall of China“. They use some well-known topics with history of censorship and post discussions on various Chinese forums. They had some interesting observations like corruption cases and criticism as well as praise of goverment are more heavily censored than sensitive topics like Tibet and border disputes. This could represent the priorities of the government towards censorship and can help to bypass it for majority of the sensitive cases.
Non-familiarity with the language creates difficulty in understanding the nuances of the language used in surviving posts and banned posts. Though, mostly it seems like that the censorship is primarily dependent on keyword matching with exernal techniques as subsidiaries, will global advancement in NLP research might be harmful than useful in this case ? With the help of advanced NLP, censorship tools can go beyond just words and infer context from the statments.
This brings to the second paper on bypassing censorship. The authors make use of homophone substitutions to fool auto-censorship tools. Language is again a barrier here in fully grasping the effects of homophones substitutions. However, it can be inferred that a limited number of substitutions are possible for every word to be replaced. This creates a problem if those substitutions become popular. The censorship tools can easily ban those. A common recent example is the removal of the 2-term restriction on Presidentship in China. The people started criticising it using mathematical terminlogy of 1,2,……N to represent infinite numbers. Messages like “Congratulating Xi Jinping for getting selected as President for the Nth time”. The censor tools went ahead and not only recognised the context of this very subtle joke but also blocked the letter N for some time. Hence, it shows that no matter how robust and covert a system is, if it gains enough traction, it will come into focus and gets banned. There is a need ot find ways which cannot be countered even after they have been exposed.

Read More

Reflection #10 – 03/15 – Pratik Anand

Paper 1 : An Army of Me: Sockpuppets in Online Discussion Communities

Paper 2 : Uncovering Social Spammers: Social Honeypots + Machine Learning

The first paper is of special interest to me. It deals with sockpuppets and fake accounts on social media and forums. Being an active user, I see a lot of this in action. The paper identifies sockpuppets accounts are those which are maintained by a single person, referred to as puppeteer, who uses this account to either promote/denounce a certain viewpoint or cause overall dissent without the consequences of getting affected by account bans by moderators.
The authors acknowledge that it is very hard to get ground truth data for this matter. So they use observational studies to get insights into sockpuppet behavior. Could the techniques which are used for ground truth on spam messages be applied for sockpuppets ? In my opinion, a list of banned users in a social forum is a good way to get started.
Some of the traits observed are :

1.  Sockpuppets accounts are created quite early on by the users
2.  They usually post at the same time and on the same topics.
3. They rarely start a discussion but usually participate in discussions.
4. The discussion topics are always very controversial.
5. They write very simialr content

The authors also observed that the sockpuppets are treated harshly by the community. Is it due to their behavior or just a side-effect of the fact that they only participate in posts about controversial topics? Not all sockpuppets are malicious. The question of pretenders vs non-pretenders was very intriguing. Some people keep a sock puppet for entirely comical/other purposes and I don’t believe that the authors’ method of classifying them based on username is effective enough.
This is because many non-pretenders may keep multiple sockpuppet account based around a joke which will fail to be classified as non-pretending account by the authors’ method.

The authors have provided a case where two sockpuppets, by the same pupetmaster, argue against each other. They justify this behavior as their means to increase traffic to the given post. I am not sure if that is the reason. They didn’t provide a way to identify if those sockpuppets are indeed handled by the same person. There is also a possibility of a group of people maintaining certain sockpuppet accounts. This will make their patterns everchanging and also provide alternate reasoning to the argument point raised above.

The second paper deals with creating honeypots to learn about spam account traits and using it for spam classifier. The authors do a good job of explainaing how social spam is different from email spam as it has a touch of personalized message which is a more effective startegy for luring users. Though this paper doesn’t go into details of how they setup the honeypots, they share the observations from analysing the spammers who got into the honeypot. The honeypots were created on MySpace and Twitter and spammer behavior vary different in both cases. The authors note that MySpace is more of a long form of social communication platform. Thus, they identify “About Me” section as the most important part of a spammer profile which can be used in classification. They make an asusmption that it won’t change radically as it is like a sales pitch and thus, the spam classifiers will be able to detect them. I believe this is a limitation of the technique. About Me can be changed as easily as any other section. It is indeed important but replacing it will be like replacing one sales pitch with another. Hence, that justification doesn’t hold up.

The paper details that the authors’ created MySpace social profile with geographic location pertaining to every state in the USA. What was the reasoning behind this ? Do different geographic locations provide a level of genuineness which these honeypots profile require?
Lastly, can a reverse technique be used by the spamemrs to identify honeypots profiles and take safeguards against them ?

Read More

Reflection #9 – 02-22 – Pratik Anand

Paper 1 : Predicting Depression via Social Media

Paper 2 : Understanding Anti-Vaccination Attitudes in Social Media

 

The two papers are more distinct than the previous papers for reflections, even though they deal with the issues within the same sphere.

Paper 1 takes a very important topic which is very relevant to everyone, predicting traits of depression through social posts.
The authors observe depression through tweets of the user. A user who is not much active/expressive on twitter cannot be of much help in predicting depression. On the other hand, active twitter users show a lot of sign of going into depression as well as on using anti-depressants. The authors observerd that the traits included negative langauge in tweets and lesser interactions with replies and DM. Are these results generizable to other social platforms ? Can a youtuber ‘s depression predicted by the facial experssions and language on his/her videos ?  On extending it youtube, more parameters need to be taken into consideration.

Paper 2 analyses the behavior of anti-vaccinners. It observes how earlier concept of herd community to vaccination is failing due to online information sharing. The authors focus on three groups – pro-vaccine, anti-vaccine and those who recently switch to anti-vaccine and what triggers that. One interesting note in the paper is that they don’t take into account the people who switch to pro-vaccine and their triggers. I believe it will shed some light over what makes someone realise that they were part of a conspirationalist group. This can be used to create methods which reverse the effects of brainwashing by such anti-vaccine groups.

The paper uses MEM topic model to categories user tweets into themes of the tweets anti-vacciners tweet and care about. It doesn’t take into the account of virality of the news topic. For example, during the Syrian revolution, a lot of people were tweeting about government, war, violence etc. The authors don’t mention that if they have taken care to minimize effects of viral news of the tweet topics.
The anti-vacciners group show a close-knit group characteristics. This is equally true in real life also. People generally stay with other people with similar viewpoints or at least compatible view points. This brings to the paper’s conclusion which states that small triggers are enough for a person to join the anti-vaccine group. In my opinion, those people have a long exposure to such thinking outside of twitter and they get vocal only when they join a certain group.

 

Read More

Reflection #8 – 2/20 – Pratik Anand

Paper 1 : A 61-million-person experiment in social influence and political mobilization

Paper 2 : Experimental evidence of massive-scale emotional contagion through social networks

The two paper discuss two aspects of the same phenomenon, one is specific to politics and other is general – influence of online social ties on user behavior.

The first paper shows that social messages can cause small-scale effect in political mobilization. They are more effective than just informational messages.
The paper performs experiments on influencing people to vote on Election Day. The informational message pops up in a user’s feed stating that it is an election day and if they have voted. It also shows link for finding the nearest polling booth as well as number of people who self-reported voting in that election.
Another variation of this message was a social message where the user is shown photos of friends, with whom user has “close-ties”, who have self-reported voting in that election.
The study finds that the social message has a much higher impact on political mobilization. The people are more likely to click on link to find nearest polling booth and self-report themselves as voted. The authors also mention that that this effect is only visible at a macroscopic level and when the pictures of close friends is shown in the message.

The paper raises many questions. First of all, there is no way to verify whether a self-reporting user has actually voted. Also, there are enough external factors which can make a user go for voting other than these messages by Facebook. It cannot be distinguished if these messages were the factors behind a user’s decision to vote.

The second paper takes a more general approach. It tries to identify if people are positively or negatively influenced by posts of their online friends on social networks. It saw that people post more positive posts if they are shown more positive posts from their friends and same for the negative posts. I liked its hypothesis a lot because it opens doors for new kind of questions, apart from general question of generizability, diversity and validity etc. Given that the hypothesis is verified by other experiments too and is identified as genuine human behavior online, the new question is , what causes such behavior. The people who start posting more positive things in influence of their social friends are really happier or are they just posting happy posts to stay relevant in their social circle ?
Another interesting question is for negative posts. If depressing posts make other people depressed and suicidal, then will social platforms like Facebook enforce some kind of negativity censorship? Will it be in alignment with freedom of speech and expression? These are very complex questions with no correct answers.

Read More

Reflection #7 – [02/13] – Pratik Anand

This paper is about decoding real world people interaction using thought experiments and gaming scenarios. A game of diplomacy with its interactions provides a perfect opportunity to learn about changes in player communucations with relation to oncoming betrayal.
Can the betrayal really be automatically predicted by tonal change in communication ? Is it generalizable to other real world scenario or even artificial scenario like other games ?
The paper shows that the game supports long term alliances but it offers lucrative solo victories too, leading to betrayals. Alliances get broken with time which is intuitive. The paper provided defined structures for defining friendship, betrayal and the parties involved, victim and betrayer through the communications as well as the game commands. The structure of the discourse provides enough linguistic clues to determine whether friendship will last or not for a considerable period of time. The authors also develop a logistic regression to create a model for predicting betrayal. Few questions which arise are : are linguistic cues general enough because people talk differently even in a strict English speaking nation. Similar question for betrayal : the betrayer are usually more polite before the act . It could be in context of this game and may or may not apply elsewhere, even in other games.
The paper makes a point of sudden yet inevitable betrayal where more markers are provided by the victim than the betrayer. The victim uses more planning words and is less polite than usual. In context of this game, long term planning is a measure of trust, so can it be generalized to more trust results in inevitable betrayal conclusion ? It could be far-fetched even with many anecdotal evidences.

Lastly, I believe the premise is highly unrealistic and not at all comparable to real world scenarios. Also, the proposition that betrayal can be predicted is highly doubtful and cannot be relied upon for real world communications. Also, since it is based on linguistic evaluations, this system can be gamed making the point of prediction useless.

Read More

Reflection #6 – [2/8] – Pratik Anand

Paper 1 : A computational approach to politeness with application to social factors

Paper 2 : Language from police body camera footage shows racial disparities in officer respect

Both the papers provide interesting ideas related to respect and politeness in communication. While the second paper deals with a very hot topic of unfair treatment of minority races by the police in real world, the first paper takes a more academic approach towards behavior in online space.
The authors in paper 1 develop a framework to measure politeness in online conversation which is a very valuable resource. It has words with their politness scores. They painstakingly comb through interactions of editors in Wikipedia and annotate it as polite or impolite conversation. I really liked the fact that they acknowledged that these filtering might be biased as politeness is subjective. So, they took extra measure to choose only those words which are unanimously marked by all the annotators. Once completed, they used it to apply to StackExchange conversations and found that people who were polite earlier, became less polite once they gained a certain status or power after admin elections. It is a very fascinating result. Similarly, people who lost in those elections became more polite. What could be causing this ? Is this a general with power comes arrogance argument or something else ? Did those people start getting more involved and hence, got tired of being careful with words ? Also the paper doesn’t say about the people who got popular votes but didn’t apply for admin elections. Does their behavior change too ? The paper also mentions that people from certain part of US are more polite than the others. It brings an interesting point of culture. Some people are culturally polite in conversation by US and western standards. Other people are not fluent in English vocabulary enough and could be more direct and hence, appear to be impolite. Also, sarcasm plays a big role in conversations.Yeah, right” is a less polite phrase made up of more polite ones. The paper doesn’t talk about such limitations in its study.

The second paper is short but provides a crucial insight on how race based biases can influence a conversation. It analyses the transcripts from video feed of police officers who stopped someone and use it to determine how much polite they were in the talk. The test subjects are not revealed the race or sex of the people being investigated or the police officer but their conclusion shows that Non-Caucasians are talked less politely by the police irrespective of the race of the police officer. The paper measure in form of Respect and Formality. Though, formality is maintained for white as well black people. But it is drastically less for black people. An interesting observation is from Fig 5 that formality goes down with more time passing. Respect has a minima but increases on both ends of time of interaction. This trend can be explained by the fact that greetings are given at the start and the end of a conversation. A further direction of this paper should be facial and body gesture of the police officer as well as the apprehended user.

Overall, both papers bring attention to change in politeness either due to power and influence or racial difference which opens up new fields of study.

Read More

Reflection #4 – [1/30] – Pratik Anand

Paper 1 : The Promise and Peril of Real-time Corrections to Political Misconceptions
Paper 2 : A Parsimonious Language Model of Social Media Credibility Across Disparate Events

Both papers deal with user opinions on information accessed on the Internet, especially news articles. The first paper shows that the individuals exposed to real-time correction to the news articles they are reading, are less likely to be influenced by the corrections if it goes against their belief system. On the other hand, it is effective if the correction goes with their beliefs. The paper goes into details into explaining that more emotionally invested a user in the topic of an article, less likely he/she is going to be swayed by the correction whether it is real-time or delayed. Users get more defensive in such cases especially if the counterargument is provided in real-time. The acceptance rate of counterargument is a little better if they are presented after a delay to the user. I can understand that a delay can provide the reader with time to introspect his/her beliefs on the topics and may raise curiosity for counterarguments. However, will all kind of delays have that effect ? Human attention span is pretty low and it may happen when the correction is introduced to the user, the user may not remember the original article but only its summarized judgments and the correction may not have a strong effect on it. I liked the part of the paper where they discussed how users only keep summary judgment of attitude and beliefs and discard the evidences. When presented with counterarguments, they decide whether the new idea is persuasive enough to update old beliefs, paying little attention to the facts provided by the article or its source. I think that the readers will trust a counterargument to deeply held beliefs more if it comes from sources which they already trust. Similar observation has been made in paper which adds that the framing of the counterargument is also very important. A counterargument which is respectable of the reader’s beliefs will have a greater impact.

The second paper introduces the factor of the language model used in credibility of a news articles. Certain words reduce credibility and other increase credibility. Such research is quite important, however, it doesn’t take into account external factors like the personal beliefs and attitude of the reader, his/her political and social context etc. Even though paper makes a case of certain words of assertiveness which cause increased credibility, it cannot be generalized. For some people, assertive opinions which are go against their beliefs might appear less credible than opinions which shows ambiguity and reasoning of both sides of the arguments. A good direction of research from the second paper should be inclusion of diversity factors into credibility account. Different factors like gender, race, economic and social status, age etc. could vary the results.

An example of different result would be a research on similar topic was done on reddit community changemyview. In changemyview, people ask to be provided with challenging views to their deeply held beliefs. The paper concluded that in most of the cases, language and the tone of the counterargument played a bigger role than quantity or quality of the facts. A counterargument which causes the reader to introspect and arrive on the same conclusion as the counterargument will be highly influential in changing reader’s view. It also makes a point that only those people who are ready to listen to counterarguments participate in that community. Hence, the people who decide to stay in their echo chamber are not evaluated in the study.

Read More

Reflection #3 – [1/25] – [Pratik Anand]

The paper deals with a very relevant topic for social media – antisocial behavior including trolling and cyber bullying.
The authors make a point of understanding the patterns of trolls via their online posts, effects of the community on them and if they can be predicted. It is understandable that anonymity of the internet can cause regular normal users to act differently online. A popular cartoon caption says “On the Internet, nobody knows you’re a dog” . Anonymity is a two way street. You can act anyway you want, but so can someone else.

Community’s response to trolling behavior is also interesting, as it shows strict censorship behavior results in more drastic bad behavior. Hence, some communities use shadowban where the user doesn’t get know that it has been banned. Its posts will only be visible to itself and not others. Are those kind of bans included in FBUs ? The biasness of community moderators should be brought into question – some moderators are sensitive towards certain topics and ban users on even little offense. Thus, moderators behavior can also result in more post deletions and bans. Use of post deletion as ground truth is questionable here. One funny observation is that IGN has more deleted posts than reported posts. What could be the reason ?
The paper doesn’t cover all the grounds related to trolling and abuse. A large number of banning happens when trolling users abuse others over personal messages. The paper doesn’t seem to take that into account. The paper also does not include temporarily banned users. I believe including them will provide crucial insight into corrective behavior by some users and their self-control. I don’t think deleted/reported posts should be a metric for measuring anti-social behavior. Some people post on controversial topics or go off-topic and their posts are reported. This does not constitute as anti-social behavior but it will be included in such kind of metric based on deleted posts. The biasness of moderators is already mentioned above. Cultural differences play a role too. In my experience, many a times, a legitimate post has been branded as a troll behavior because the user was not very comfortable with English, or American use of a statement structure. For example, phrase “having a doubt” in Indian English communicates different things than that in American English. A better solution is analysis of discussions and debates on a community forum and how users react to it.
Based on the issues discussed above, the prospect of predicting anti-social behavior from only 10 posts is problematic. Users can banned based on such decisions. In communities like Steam (gaming marketplace), getting banned means losing access to one’s account and bought video games. Thus, banning users can have implications. Banning users over 10 posts could be over-punishment. A single bad day can make someone lose their online account.

In conclusion, the paper is a good step towards understanding trolling behavior but such multi-faceted problem cannot be identified on simpler metrics. It requires social context and a more sophisticated approach to identify such behavior. The application of such identifications also require some thought so that it is fair and not heavy-handed.

Read More