Reflection #7 – [02/13] – Pratik Anand

This paper is about decoding real world people interaction using thought experiments and gaming scenarios. A game of diplomacy with its interactions provides a perfect opportunity to learn about changes in player communucations with relation to oncoming betrayal.
Can the betrayal really be automatically predicted by tonal change in communication ? Is it generalizable to other real world scenario or even artificial scenario like other games ?
The paper shows that the game supports long term alliances but it offers lucrative solo victories too, leading to betrayals. Alliances get broken with time which is intuitive. The paper provided defined structures for defining friendship, betrayal and the parties involved, victim and betrayer through the communications as well as the game commands. The structure of the discourse provides enough linguistic clues to determine whether friendship will last or not for a considerable period of time. The authors also develop a logistic regression to create a model for predicting betrayal. Few questions which arise are : are linguistic cues general enough because people talk differently even in a strict English speaking nation. Similar question for betrayal : the betrayer are usually more polite before the act . It could be in context of this game and may or may not apply elsewhere, even in other games.
The paper makes a point of sudden yet inevitable betrayal where more markers are provided by the victim than the betrayer. The victim uses more planning words and is less polite than usual. In context of this game, long term planning is a measure of trust, so can it be generalized to more trust results in inevitable betrayal conclusion ? It could be far-fetched even with many anecdotal evidences.

Lastly, I believe the premise is highly unrealistic and not at all comparable to real world scenarios. Also, the proposition that betrayal can be predicted is highly doubtful and cannot be relied upon for real world communications. Also, since it is based on linguistic evaluations, this system can be gamed making the point of prediction useless.

Read More

Reflection #7 – [02/13] – [John Wenskovitch]

This paper describes a study that uses the online game Diplomacy to learn whether betray can be detected in advance through the choice of wording used in interactions between the betrayer and the victim.  After explaining the game and its interactions, the authors describe their methodology and begin listing their findings.  Of most interest to me were the findings that the betrayer was more likely to express positive sentiments before the betrayal, that an imbalance in the number of exchanged messages also plays a role, that future betrayers don’t plan as far ahead as their victims (based on linguistic analysis of future plans in-game), and that it was also possible to computationally predict a betrayal in advance more accurately than humans could.

Much like some of the earlier papers in this course, I appreciated that the authors included descriptions of the tools that they used, including the Stanford Sentiment Analyzer and the Stanford Politeness classifier.  I don’t anticipate using either of those in our course project, but it is still nice to know that they exist for potential future projects.

The authors don’t argue that their findings are fully generalizable, but they do make a claim that their framework can be extended to a broad range of social interaction.  I didn’t find that claim well substantiated.  In Diplomacy, a betrayal is a single obvious action in which a pair of allies is suddenly no longer allied.  However, betrayals in many human relationships are often more nuanced than a single action, and often take place over longer timescales.  I’m not certain how well this framework will apply to such circumstances when much more than a few lines of text precede the betrayal.

I appreciated the note in the conclusion that the problem of identifying a betrayal is not a task that the authors expect to be solvable with high accuracy, as that would necessitate the existence of a “recipe” for avoiding betrayal in relationships.  I hadn’t thought about it that way when reading through their results, but it makes sense.  I wonder how fully that logic could be extended to other problems in the computational social science realm – how many problems are computationally unsolvable simply because solving them would violate some common aspect of human behavior?

Read More

Reflection #7 – 02/12 Meghendra Singh

Niculae, Vlad, et al. “Linguistic harbingers of betrayal: A case study on an online strategy game.” arXiv preprint arXiv:1506.04744 (2015).

The paper discusses a very interesting research question, that of friendships, alliances and betrayals. The key idea here is that between a pair of allies, conversational attributes like: positive sentiment, politeness and focus on future planning can foretell the fate of the alliance (i.e. if one of the allies will betray the other). Niculae et. al. analyze 145K messages between players, from 249 online games of “Diplomacy” (a war-themed strategy game) and trained two classifiers to classify betrayals from lasting friendships and seasons preceding the last friendly interaction from older seasons respectively.

Niculae et. al. do a good job of defining the problem in the context of Diplomacy. Specifically, the “in-game” aspects of movement, support, diplomacy, orders, battles, acts of friendships and hostilities. I feel that unlike real world, a game environment leads to a very clear and unambiguous definition of betrayal and alliance. While this makes it easier to apply computational tools like machine learning for making predictions in such environments, the developed approach might not readily applicable to real world scenarios. While talking about relationship stability in “Diplomacy” the authors point to the fact that the probability of a friendship dissolving into enmity is about five times greater than hostile players becoming friends. I feel this statistic is very much context dependent and might not relate to similar real world scenarios. Additionally, there seems to be an implicit “in-game” advantage for deception and betrayal (“solo victories” being more prestigious than “team victories”). The technique described in the paper only uses linguistic cues within dyads to predict betrayal, however there might be many other aspects leading to a betrayal. Although difficult, it might be interesting to see if the deceiving player is actually being influenced by another player outside the dyad (maybe by observing the betrayer’s communication with other players?). Also there might be other reasons to betray “in-game”. For example, one of the allies becoming to powerful (maybe the fear of a powerful ally taking over a weak ally’s territory might make the weak ally betray). The point being, only looking at player communication might not be a sufficient signal for detecting betrayal, more so in the real world.

Also, there can be many other aspects associated with communication in the physical world like: body language, facial expressions, gestures, eye contact, tone of voice. These verbal and non-verbal cues are seldom captured in computer mediated textual communication, although they might play a big role in decision making and acts of friendship as well as betrayal. I feel it would be really interesting if the study can be repeated for some cooperative game that supports audio/video communication between players instead of only text. Also, I believe the “clock” of the game, i.e. the time taken to finish one season, and making decisions is very different from the real world. The game might afford the players a lot of time to deliberate and choose their actions. In real world, one may not have this privilege?

Additionally, the accuracy of the logistic regression based classifier discusses in section 4.3 is only 57% (5% higher than chance) and I feel this might be because of under-fitting, hence it might be interesting to explore other machine learning techniques for classifying betrayals using linguistic features. While, the study tries to address a very important and appealing research question, I feel it is quite difficult to predict lasting friendships, eventual separations and unforeseen betrayals (even in a controlled virtual game), principally because of the inherent human irrationality and strokes of serendipity.

Read More

Reflection 6

Summary 1:

The authors try to identify politeness by analyzing texts in purpose of analyzing how politeness in language can effect various social factors as well as people. They analyze politeness factors from texts and comments in different aspects using Wikipedia and Stackexchange data. They also build an automated classifier to extract politeness score from texts. Their observation finds out the variation of politeness score based on gender, demography as well as status. The politeness data collected would be insightful to study different aspects of social and political factors.

Reflection 1:

The paper presents at first presents a linguistic analysis of how politeness varies in different words used in different sentences. Also, a same word can have different politeness scores for different use of sentences. They collected data from Wikipedia and Stackexchange and use human efforts to annotate those data with politeness scores. This data has been used to build two SVM classifiers- one using unigram feature representation another using threshold value of a politeness score for unigram features. Although, the classifier has high accuracy rate of predicting a word being ‘polite’ and ‘impolite’, my point is will this data work for all aspects other than Wikipedia and Stackexchange. Suppose, a word might have high politeness score in these sites but they can be used as satirical or for other negative expression in social networks. Also, here they used unigram feature which is very weak. Instead of this they could also bring context words or n-gram features to analyze and build the classifier.

Summary 2:

In this paper the authors study on the racial disparities shown by the police officers by analyzing their linguistic interaction with people. This study was conducted in three steps- First, perceiving the behavior of police officers from their language, second, identifying correlations between sentences or words with respect and finally, to find out the presence of racial disparity in an officer by correlating other studies. The data collected for this study have been collected by transcribing video footage of officers at stop points into texts. The experiment finds out that most officers are likely to treat white people with more respect than black and people of other races.

Reflection 2:

The idea of the paper was to find out the behavior of each individual police officer at from the ratings given by participants for them. Then this computes the feature of each utterances of a police officer. Figure 2 shows a statistical graph of use of each feature for the white and black people along with the respect coefficient score of each feature. They claim that their analysis correlates with the human study obtained from the participants. However, it did not study about the officer mood, work load and other factors which might influence a officer’s behavior.

 

 

 

Read More

Reflection #6 – [02-08] – [Nuo Ma]

Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013) “A computational approach to politeness with application to social factors”.

This paper focus on linguistic aspects of politeness on social platforms like Wikipedia, Stack Exchange. The author conducted linguistic analysis for politeness using two classifiers, bag of words classifier and linguistically informed classifier. They were able to report algorithm results close to human performance. Also, the analysis of the relationship between politeness levels and social power shows a negative correlation between politeness and power on Stack Exchange or Wikipedia.

This paper is good to me. The illustration of the use of common phrases (strategies) relative to politeness is the way of how we normally make this judgement only from verbal words intuitively. However, bag of word may not entirely reflect the politeness.  Maybe using bag of phrases, and analysis the grammatical structure of the sentences. Because the use of formal and complete grammar may indicate politeness. Also what might be a drawback is that, when building two classifiers to predict politeness, the author tested the classifiers both in-domain (training and testing data from the same source) and cross-domain (training on Wikipedia and testing on Stack Exchange, and vice versa). To me these are communities with distinctive characteristics, Wikipedia is more formal, and the use of word is more precise, while stack exchange is more casual in terms of use of language. A short yet precise answer in stack exchange would be viewed by human as polite, but not by the classifier. The domain transfer here is worthy some discussion. Or what about combining these two data sources and perform a Leave-one-out cross-validation (LOOCV) ?

 

Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017). Language from police body camera footage shows racial disparities in officer respect. Proceedings of the National Academy of Sciences, 201702413.

This paper tries to study the respect-level between different races, gender and nationality etc. using transcribed data from Oakland Police Department body camera videos. They extracted the respectfulness of police officer language by applying computational linguistic methods on transcripts. Disparities are shown in the speaking toward black and white community members. The first thing I can think about is the prior of such events. In the case of traffic stops, whether the target comply with officer’s instructions, whether the car is clean. These are small cues that will affect officer’s actions. I’m sure they won’t be so polite after a highway chase. So, audio transcription alone is insufficient to prove this correlation.

 

Read More

Reflection #6 – [02-08] – [Patrick Sullivan]

Language from Police Body Camera Footage shows Racial Disparities in Officer Respect” by Voigt et al investigates almost exactly what the title describes.

I am surprised they mention cases of conflict between communities and their police forces only in the midwest and east coast states, but then go on to study the police force in Oakland California. If the study was looking to impact public perception on the conflicts between police forces and their communities, then I would think the best approach would be to study the areas where the conflicts take place. I am not sure what the authors would conclude if they didn’t find supporting evidence of their argument. Would the authors claim that police forces in general do not treat black drivers differently? Or would they then claim that Oakland police forces are more respectful than their counterparts in other areas? Applying this same analysis to the cities mentioned as conflicted and comparing the results could answer these questions readily.  It would also provide a more impactful conclusion since it can rule out alternative explanations.

An extension of the study would be very helpful to see if this racial disparity is persistent or changeable. If the same analysis was used on data that came before major news stories on the behavior of police officers, maybe these ideas could be explored. Future studies and follow-ups with this analysis could also show how police respond following a news event or change when adopting new tactics. High profile police cases likely have an effect on police behavior far from the incident, and this effect could be measured.

Read More

Reflection #6 – [02/08] – Hamza Manzoor

[1]. Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013) “A computational approach to politeness with application to social factors”.

[2]. Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017) “Language from police body camera footage shows racial disparities in officer respect”.

 

The Danescu et al. paper proposes a framework for identifying linguistic aspects of politeness in requests. They analyze requests in two online communities: Stack Exchange and Wikipedia on exploring the content politeness on social platforms like Wikipedia, Stack Exchange. They annotated over 10,000 utterances from over 400,000 requests using Amazon Mechanical Turk. Using this corpus of over 10,000 utterances, they conducted a linguistic analysis by constructing a politeness classifier. Their study shows that politeness and power are negatively correlated, that is, the politeness level decreases with increase in power. They also show the relationship between politeness and gender.

In second paper, Voigt et al. investigate that the language from police body camera footage shows racial disparities in officer respect. They analyzed footages from police body-worn cameras, and conducted three studies to identify how respectful police officers are towards white and black community members by applying computational linguistic methods on transcripts. Their results show that police officers show less respect toward black versus white community members, even after controlling various factors such as race of the officer or location of stop.

I particularly liked how they prepared data in both papers especially in second. In first paper, they tried to mitigate the affects of subjectivity of politeness and explained how people are more polite before becoming admins. But does it matter? I am not sure about Wikipedia but on Stack Exchange, the elections are majorly independent of their past. I vote people by just reading at their story and plans for community and I never go to their profile and look at each of their questions that how polite were they. Therefore, can we claim that people show politeness to gain power? I don’t think so. Secondly, we know that politeness and power have inverse relationships in real world as well. Therefore, can we generalize this? Can we claim that online communities are similar to real world communities? Because repercussions of being impolite are very different in both.

The second paper has very thorough analysis and there is hardly anything wrong with the study they performed. It was really interesting to see that how formality decreases with increase in time in general but in a higher crime rate formality increases. In the first study, they randomly sampled 414 unique officer utterances and asked participants to rate it. Is out of context utterance from middle of conversation a true predictor of politeness? Also, I believe that using Oakland for study without explaining black vs. white crime rates in Oakland is somewhat naïve approach. It might be possible that crime rate by black community members in the Oakland is higher and hence as a result police officers are less polite towards them. Also, the paper explains that there is no correlation of politeness with race of police officer. Does this mean that even a black officer is less polite towards black community member? If so, then crime rate must be looked at. Otherwise, the analysis and modeling in paper was very well presented.

Read More

Reflection #5 – [02-08] – [Md Momen Bhuiyan]

Paper #1: A computational approach to politeness with application to social factors
Paper #2: Language from police body camera footage shows racial disparities in officer respect

Summary #1:
This paper does a qualitative analysis of the linguistic features that relate to politeness. From there the authors create a machine learning model that can be used to automatically measure politeness. Authors use two different websites for testing the generalizability of the model. Based on the result the authors do quantitative analysis on the relationship between politeness and social outcome, the relationship between politeness and power. From the result, it appears that users who are more polite are more likely to be elected as admin and once elected they become less polite.

Summary #2:
This paper looks into police’s interaction with drivers seen from police body-camera which has been adopted recently due to controversy regarding their interaction with the black community. This paper introduces a computational model using only transcription of the speech between an officer and a driver. In the first part of the study, a group of 60 participants was used to rate utterances on two criteria, respect and formality. The study finds that there’s no significant difference in formality from police officers for both white and black community. In case of respect, this is significantly different. Based on these result the authors created a computation model to automatically predict score from the data.

Reflection #1:
This paper tried to infer the relationship between politeness and authority. Thier analysis is in some sense lacking. This becomes more evident from reading the paper 2 which does test for other factors that can affect any inference. For example, in this case, authors don’t check for the factors like the responsibility of admins, age difference etc. Although different levels of moderation capabilities are given to users with different reputation, it is common in StackOverflow for the admins to do a lot of moderation. If you look into the number of duplicate question it will be quite clear that to make the site useful to all types of users strict moderation is necessary for the questions from new users. Another factor that might have an affect on the politeness of the users is the age. It is common in StackOverflow that older users are ruder At the same time, they also have very high rating which correlates with thier chance of being a moderator. In the last election, I think one of the main question was about candidates attitude toward the strictness in moderation (Full disclosure: I have voted for Cody Gray in the last election). So these factors might have some effect on the analysis of politeness.

Reflection #2:
This study was done in an intuitive and simple manner. The authors created a model and tried to find out if it was affected by any control variables like severity of the offence, formality, outlier in data etc. The first thing that comes to mind from the method is that the model only focuses on the utterance in textual format rather than speech. It doesn’t appear to be a good ground truth as the RMSE of average users is about .842. The authors uses this partial ground truth from the rating of the human raters to build a computational model for predicting respectfulness in utterances. Another problem with the computational model is the transcribing part of the data is fully manual. In that perspective this is a semi-automatic model. A complex approach will be by using speech directly which solves the previous problem.

Read More

Reflection #6 – [02/08] – Aparna Gupta

  1. Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013) “A computational approach to politeness with application to social factors”.
  2. Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017) “Language from police body camera footage shows racial disparities in officer respect”.

Danescu et al., have proposed a computational approach to politeness with application to social factors. They build a computational framework to study the relationship between politeness and social power, showing how people behave once elevated. The authors have built a new corpus where data comes from 2 online communities – Wikipedia and Stack exchange. To label the data they used Amazon Mechanical Turks and labeled over 10,000 utterances. The Wikipedia data was used to train the politeness classifier whereas Stack Exchange data was used to test the classifier. Authors have constructed a politeness classifier with a wide range of domain-independent lexical, sentiment, and dependency features and presents a comparison between two classifiers – a Bag of Words classifier and linguistically informed classifier. The classifiers were evaluated in both in-domain and cross-domain settings. Looking at the results of cross-domain setting, I wonder if the politeness classifier will give same or better results for a different corpus from a different domain. Their results confirm shows a significant relationship between politeness and social power, showing that polite Wikipedia editors, once elevated, becomes less polite and Stack Exchange users at the top of the reputation scale are less polite than those at the bottom. However, it would be interesting to identify a common feature list, irrespective of the domain, given any corpus which can classify polite and impolite requests or posts or replies.

Voigt et al., have also proposed computational linguistic methods to extract the level of respect and politeness automatically from transcripts. The authors of this paper talk about racial disparity between black and white communities by traffic signal officers. The data was collected from the transcribed body camera footage from vehicle stops of white and black community conducted by the Oakland Police Department during April 2014. Since the officers were made to wear the cameras and record their own footage, will they still show racial disparity? Can there be other factors behind it?  I really like the approach and the  3 studies – Perceptions of Officer Treatment from Language, Linguistic Correlates of Respect and Racial Disparities in Respect, conducted by the authors. However, I wonder if the results will be the same if a similar study is conducted in different cities (which reports low or high racial disparities.)

Read More

Reflection #6 – [02/08] – Meghendra Singh

Danescu-Niculescu-Mizil, Cristian, et al. “A computational approach to politeness with application to social factors.” arXiv preprint arXiv:1306.6078 (2013).

In this paper, the authors explore, existence of a relationship between politeness and social power. To establish this relationship, the paper uses data from two online communities (Wikipedia and Stack Exchange). These are requests directed at owners of talk-pages on Wikipedia and those directed towards authors of posts on Stack Exchange. The authors began by labeling 10,957 requests from the two data sources using Amazon Mechanical Turk, thereby creating the largest corpus of politeness annotations. Next, the authors detail 20 domain-independent lexical and syntactic features (or politeness strategies), grounded in politeness literature. The authors subsequently develop, two SVM based classifiers: ‘BOW’ and ‘Ling.’ for classifying polite and impolite requests (“ Did the authors forget to write about the ‘Alley’ classifier? 🙂 ”). The Bag of words (BOW) classifier using unigram features in the training data (i.e. labeled Wikipedia and Stack Exchange requests data) and served as a baseline. On the other hand, the linguistically informed (Ling.) classifier used the 20 linguistic features along with the unigrams and improved the baseline accuracy by 3-4%. In order to address the main research question of change in politeness, with change in social power the authors compare the politeness levels of requests made by Wikipedia editors, before and after they become administrators (i.e. before and after elections). The key finding is that the politeness score of requests by editors who successfully become administrators after public elections dropped, whereas the same increased for unsuccessful editors, as shown in the figure below.

Additionally, the authors present other interesting findings like: Question-askers are politer than Answer-givers, politeness of requests reduces as reputation increases on Stack Exchange, Wikipedians from the U.S. Midwest are the politest, female Wikipedians are generally more polite and there is significant variance in politeness of of requests in the programming language communities (0.47 for Python to 0.59 for Ruby).

The first question that came to my mind while reading this paper was, there may be other behaviors and traits that may be associated with requests and responses (or writing in general). For example, compassion, persuasiveness, verbosity/succinctness, quality of language. A nice follow-up to this work might be to re-run the study for these other qualities, maybe even using other datasets (say Q&A sites like: Quora, Answers.com, Yahoo Answers?). I do feel that there isn’t a huge difference between the politeness scores for successful versus failed Wikipedia administrator candidates especially after the election. I encountered this old (but interesting) paper which investigates politeness in written persuasion by examining a set of letters written by academics at different ranks in support of a colleague who had been denied promotion and tenure at a major state university in the U.S. One of the key findings of the study was that, “the formulation of a request is conditioned by the relative power of the participators”. The following plot from the paper, shows the relative politeness in letters of request by academics at different ranks.

This seems to suggest a different result when compared with those presented in Danescu-Niculescu-Mizil et al. We can clearly see that in this particular case the politeness in written requests generally increases with increase in academic rank. Maybe politeness has more contextual underpinnings that need to be researched. Also, this more recent paper in social psychology links, politeness with conservatism, whereas compassion with political liberalism. As, Danescu-Niculescu-Mizil et al. specify that “Wikipedians from U.S. Midwest are the most polite”, it would be interesting to validate such relationship between behaviors (like, politeness) and political attitudes established by prior social science literature. Some more questions one might ask here are: How polite are the responses of Wikipedia editors versus administrators? Do polite requests generally get polite responses? Are people more likely to respond to a polite request instead of a not so polite request? On the technical side, it might be interesting to experiment with other models for classification, say Random Forests, Neural Nets, Logistic Regression. Also, techniques for improving model performance like: bagging, boosting, k-fold cross validation might be interesting avenues of exploration. It may also be interesting to determine the politeness/impoliteness of general written text (say news articles, editorials, reviews, critique, social media posts) and examine how this affects the responses to these articles (shares, likes and comments).

Read More