Reflection #7 – 02/12 Meghendra Singh

February 12, 2018 meghs Leave a comment

Niculae, Vlad, et al. “Linguistic harbingers of betrayal: A case study on an online strategy game.” arXiv preprint arXiv:1506.04744 (2015).

The paper discusses a very interesting research question, that of friendships, alliances and betrayals. The key idea here is that between a pair of allies, conversational attributes like: positive sentiment, politeness and focus on future planning can foretell the fate of the alliance (i.e. if one of the allies will betray the other). Niculae et. al. analyze 145K messages between players, from 249 online games of “Diplomacy” (a war-themed strategy game) and trained two classifiers to classify betrayals from lasting friendships and seasons preceding the last friendly interaction from older seasons respectively.

Niculae et. al. do a good job of defining the problem in the context of Diplomacy. Specifically, the “in-game” aspects of movement, support, diplomacy, orders, battles, acts of friendships and hostilities. I feel that unlike real world, a game environment leads to a very clear and unambiguous definition of betrayal and alliance. While this makes it easier to apply computational tools like machine learning for making predictions in such environments, the developed approach might not readily applicable to real world scenarios. While talking about relationship stability in “Diplomacy” the authors point to the fact that the probability of a friendship dissolving into enmity is about five times greater than hostile players becoming friends. I feel this statistic is very much context dependent and might not relate to similar real world scenarios. Additionally, there seems to be an implicit “in-game” advantage for deception and betrayal (“solo victories” being more prestigious than “team victories”). The technique described in the paper only uses linguistic cues within dyads to predict betrayal, however there might be many other aspects leading to a betrayal. Although difficult, it might be interesting to see if the deceiving player is actually being influenced by another player outside the dyad (maybe by observing the betrayer’s communication with other players?). Also there might be other reasons to betray “in-game”. For example, one of the allies becoming to powerful (maybe the fear of a powerful ally taking over a weak ally’s territory might make the weak ally betray). The point being, only looking at player communication might not be a sufficient signal for detecting betrayal, more so in the real world.

Also, there can be many other aspects associated with communication in the physical world like: body language, facial expressions, gestures, eye contact, tone of voice. These verbal and non-verbal cues are seldom captured in computer mediated textual communication, although they might play a big role in decision making and acts of friendship as well as betrayal. I feel it would be really interesting if the study can be repeated for some cooperative game that supports audio/video communication between players instead of only text. Also, I believe the “clock” of the game, i.e. the time taken to finish one season, and making decisions is very different from the real world. The game might afford the players a lot of time to deliberate and choose their actions. In real world, one may not have this privilege?

Additionally, the accuracy of the logistic regression based classifier discusses in section 4.3 is only 57% (5% higher than chance) and I feel this might be because of under-fitting, hence it might be interesting to explore other machine learning techniques for classifying betrayals using linguistic features. While, the study tries to address a very important and appealing research question, I feel it is quite difficult to predict lasting friendships, eventual separations and unforeseen betrayals (even in a controlled virtual game), principally because of the inherent human irrationality and strokes of serendipity.

Reflection 6

February 8, 2018 anikat1 Leave a comment

Summary 1:

The authors try to identify politeness by analyzing texts in purpose of analyzing how politeness in language can effect various social factors as well as people. They analyze politeness factors from texts and comments in different aspects using Wikipedia and Stackexchange data. They also build an automated classifier to extract politeness score from texts. Their observation finds out the variation of politeness score based on gender, demography as well as status. The politeness data collected would be insightful to study different aspects of social and political factors.

Reflection 1:

The paper presents at first presents a linguistic analysis of how politeness varies in different words used in different sentences. Also, a same word can have different politeness scores for different use of sentences. They collected data from Wikipedia and Stackexchange and use human efforts to annotate those data with politeness scores. This data has been used to build two SVM classifiers- one using unigram feature representation another using threshold value of a politeness score for unigram features. Although, the classifier has high accuracy rate of predicting a word being ‘polite’ and ‘impolite’, my point is will this data work for all aspects other than Wikipedia and Stackexchange. Suppose, a word might have high politeness score in these sites but they can be used as satirical or for other negative expression in social networks. Also, here they used unigram feature which is very weak. Instead of this they could also bring context words or n-gram features to analyze and build the classifier.

Summary 2:

In this paper the authors study on the racial disparities shown by the police officers by analyzing their linguistic interaction with people. This study was conducted in three steps- First, perceiving the behavior of police officers from their language, second, identifying correlations between sentences or words with respect and finally, to find out the presence of racial disparity in an officer by correlating other studies. The data collected for this study have been collected by transcribing video footage of officers at stop points into texts. The experiment finds out that most officers are likely to treat white people with more respect than black and people of other races.

Reflection 2:

The idea of the paper was to find out the behavior of each individual police officer at from the ratings given by participants for them. Then this computes the feature of each utterances of a police officer. Figure 2 shows a statistical graph of use of each feature for the white and black people along with the respect coefficient score of each feature. They claim that their analysis correlates with the human study obtained from the participants. However, it did not study about the officer mood, work load and other factors which might influence a officer’s behavior.

Reflection #6 – [02-08] – [Nuo Ma]

February 8, 2018February 8, 2018 lorinma Leave a comment

Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013) “A computational approach to politeness with application to social factors”.

This paper focus on linguistic aspects of politeness on social platforms like Wikipedia, Stack Exchange. The author conducted linguistic analysis for politeness using two classiﬁers, bag of words classiﬁer and linguistically informed classiﬁer. They were able to report algorithm results close to human performance. Also, the analysis of the relationship between politeness levels and social power shows a negative correlation between politeness and power on Stack Exchange or Wikipedia.

This paper is good to me. The illustration of the use of common phrases (strategies) relative to politeness is the way of how we normally make this judgement only from verbal words intuitively. However, bag of word may not entirely reflect the politeness. Maybe using bag of phrases, and analysis the grammatical structure of the sentences. Because the use of formal and complete grammar may indicate politeness. Also what might be a drawback is that, when building two classifiers to predict politeness, the author tested the classifiers both in-domain (training and testing data from the same source) and cross-domain (training on Wikipedia and testing on Stack Exchange, and vice versa). To me these are communities with distinctive characteristics, Wikipedia is more formal, and the use of word is more precise, while stack exchange is more casual in terms of use of language. A short yet precise answer in stack exchange would be viewed by human as polite, but not by the classifier. The domain transfer here is worthy some discussion. Or what about combining these two data sources and perform a Leave-one-out cross-validation (LOOCV) ?

Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017). Language from police body camera footage shows racial disparities in officer respect. Proceedings of the National Academy of Sciences, 201702413.

This paper tries to study the respect-level between different races, gender and nationality etc. using transcribed data from Oakland Police Department body camera videos. They extracted the respectfulness of police ofﬁcer language by applying computational linguistic methods on transcripts. Disparities are shown in the speaking toward black and white community members. The first thing I can think about is the prior of such events. In the case of traffic stops, whether the target comply with officer’s instructions, whether the car is clean. These are small cues that will affect officer’s actions. I’m sure they won’t be so polite after a highway chase. So, audio transcription alone is insufficient to prove this correlation.

Reflection #6 – [02-08] – [Patrick Sullivan]

February 8, 2018August 23, 2018 Patrick Sullivan Leave a comment

“Language from Police Body Camera Footage shows Racial Disparities in Officer Respect” by Voigt et al investigates almost exactly what the title describes.

I am surprised they mention cases of conflict between communities and their police forces only in the midwest and east coast states, but then go on to study the police force in Oakland California. If the study was looking to impact public perception on the conflicts between police forces and their communities, then I would think the best approach would be to study the areas where the conflicts take place. I am not sure what the authors would conclude if they didn’t find supporting evidence of their argument. Would the authors claim that police forces in general do not treat black drivers differently? Or would they then claim that Oakland police forces are more respectful than their counterparts in other areas? Applying this same analysis to the cities mentioned as conflicted and comparing the results could answer these questions readily. It would also provide a more impactful conclusion since it can rule out alternative explanations.

An extension of the study would be very helpful to see if this racial disparity is persistent or changeable. If the same analysis was used on data that came before major news stories on the behavior of police officers, maybe these ideas could be explored. Future studies and follow-ups with this analysis could also show how police respond following a news event or change when adopting new tactics. High profile police cases likely have an effect on police behavior far from the incident, and this effect could be measured.

Reflection #6 – [02/08] – Hamza Manzoor

February 8, 2018February 8, 2018 hamzamanzoor Leave a comment

[1]. Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013) “A computational approach to politeness with application to social factors”.

[2]. Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017) “Language from police body camera footage shows racial disparities in officer respect”.

The Danescu et al. paper proposes a framework for identifying linguistic aspects of politeness in requests. They analyze requests in two online communities: Stack Exchange and Wikipedia on exploring the content politeness on social platforms like Wikipedia, Stack Exchange. They annotated over 10,000 utterances from over 400,000 requests using Amazon Mechanical Turk. Using this corpus of over 10,000 utterances, they conducted a linguistic analysis by constructing a politeness classifier. Their study shows that politeness and power are negatively correlated, that is, the politeness level decreases with increase in power. They also show the relationship between politeness and gender.

In second paper, Voigt et al. investigate that the language from police body camera footage shows racial disparities in officer respect. They analyzed footages from police body-worn cameras, and conducted three studies to identify how respectful police ofﬁcers are towards white and black community members by applying computational linguistic methods on transcripts. Their results show that police officers show less respect toward black versus white community members, even after controlling various factors such as race of the officer or location of stop.

I particularly liked how they prepared data in both papers especially in second. In first paper, they tried to mitigate the affects of subjectivity of politeness and explained how people are more polite before becoming admins. But does it matter? I am not sure about Wikipedia but on Stack Exchange, the elections are majorly independent of their past. I vote people by just reading at their story and plans for community and I never go to their profile and look at each of their questions that how polite were they. Therefore, can we claim that people show politeness to gain power? I don’t think so. Secondly, we know that politeness and power have inverse relationships in real world as well. Therefore, can we generalize this? Can we claim that online communities are similar to real world communities? Because repercussions of being impolite are very different in both.

The second paper has very thorough analysis and there is hardly anything wrong with the study they performed. It was really interesting to see that how formality decreases with increase in time in general but in a higher crime rate formality increases. In the first study, they randomly sampled 414 unique officer utterances and asked participants to rate it. Is out of context utterance from middle of conversation a true predictor of politeness? Also, I believe that using Oakland for study without explaining black vs. white crime rates in Oakland is somewhat naïve approach. It might be possible that crime rate by black community members in the Oakland is higher and hence as a result police officers are less polite towards them. Also, the paper explains that there is no correlation of politeness with race of police officer. Does this mean that even a black officer is less polite towards black community member? If so, then crime rate must be looked at. Otherwise, the analysis and modeling in paper was very well presented.

Reflection #5 – [02-08] – [Md Momen Bhuiyan]

February 8, 2018 MD MOMEN BHUIYAN Leave a comment

Paper #1: A computational approach to politeness with application to social factors
Paper #2: Language from police body camera footage shows racial disparities in officer respect

Summary #1:
This paper does a qualitative analysis of the linguistic features that relate to politeness. From there the authors create a machine learning model that can be used to automatically measure politeness. Authors use two different websites for testing the generalizability of the model. Based on the result the authors do quantitative analysis on the relationship between politeness and social outcome, the relationship between politeness and power. From the result, it appears that users who are more polite are more likely to be elected as admin and once elected they become less polite.

Summary #2:
This paper looks into police’s interaction with drivers seen from police body-camera which has been adopted recently due to controversy regarding their interaction with the black community. This paper introduces a computational model using only transcription of the speech between an officer and a driver. In the first part of the study, a group of 60 participants was used to rate utterances on two criteria, respect and formality. The study finds that there’s no significant difference in formality from police officers for both white and black community. In case of respect, this is significantly different. Based on these result the authors created a computation model to automatically predict score from the data.

Reflection #1:
This paper tried to infer the relationship between politeness and authority. Thier analysis is in some sense lacking. This becomes more evident from reading the paper 2 which does test for other factors that can affect any inference. For example, in this case, authors don’t check for the factors like the responsibility of admins, age difference etc. Although different levels of moderation capabilities are given to users with different reputation, it is common in StackOverflow for the admins to do a lot of moderation. If you look into the number of duplicate question it will be quite clear that to make the site useful to all types of users strict moderation is necessary for the questions from new users. Another factor that might have an affect on the politeness of the users is the age. It is common in StackOverflow that older users are ruder At the same time, they also have very high rating which correlates with thier chance of being a moderator. In the last election, I think one of the main question was about candidates attitude toward the strictness in moderation (Full disclosure: I have voted for Cody Gray in the last election). So these factors might have some effect on the analysis of politeness.

Reflection #2:
This study was done in an intuitive and simple manner. The authors created a model and tried to find out if it was affected by any control variables like severity of the offence, formality, outlier in data etc. The first thing that comes to mind from the method is that the model only focuses on the utterance in textual format rather than speech. It doesn’t appear to be a good ground truth as the RMSE of average users is about .842. The authors uses this partial ground truth from the rating of the human raters to build a computational model for predicting respectfulness in utterances. Another problem with the computational model is the transcribing part of the data is fully manual. In that perspective this is a semi-automatic model. A complex approach will be by using speech directly which solves the previous problem.

Reflection #6 – [02/08] – Aparna Gupta

February 8, 2018 agupta12 Leave a comment

Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013) “A computational approach to politeness with application to social factors”.
Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017) “Language from police body camera footage shows racial disparities in officer respect”.

Danescu et al., have proposed a computational approach to politeness with application to social factors. They build a computational framework to study the relationship between politeness and social power, showing how people behave once elevated. The authors have built a new corpus where data comes from 2 online communities – Wikipedia and Stack exchange. To label the data they used Amazon Mechanical Turks and labeled over 10,000 utterances. The Wikipedia data was used to train the politeness classifier whereas Stack Exchange data was used to test the classifier. Authors have constructed a politeness classifier with a wide range of domain-independent lexical, sentiment, and dependency features and presents a comparison between two classifiers – a Bag of Words classifier and linguistically informed classifier. The classifiers were evaluated in both in-domain and cross-domain settings. Looking at the results of cross-domain setting, I wonder if the politeness classifier will give same or better results for a different corpus from a different domain. Their results confirm shows a significant relationship between politeness and social power, showing that polite Wikipedia editors, once elevated, becomes less polite and Stack Exchange users at the top of the reputation scale are less polite than those at the bottom. However, it would be interesting to identify a common feature list, irrespective of the domain, given any corpus which can classify polite and impolite requests or posts or replies.

Voigt et al., have also proposed computational linguistic methods to extract the level of respect and politeness automatically from transcripts. The authors of this paper talk about racial disparity between black and white communities by traffic signal officers. The data was collected from the transcribed body camera footage from vehicle stops of white and black community conducted by the Oakland Police Department during April 2014. Since the officers were made to wear the cameras and record their own footage, will they still show racial disparity? Can there be other factors behind it? I really like the approach and the 3 studies – Perceptions of Officer Treatment from Language, Linguistic Correlates of Respect and Racial Disparities in Respect, conducted by the authors. However, I wonder if the results will be the same if a similar study is conducted in different cities (which reports low or high racial disparities.)

Reflection #6 – [02/08] – Meghendra Singh

February 8, 2018 meghs Leave a comment

Danescu-Niculescu-Mizil, Cristian, et al. “A computational approach to politeness with application to social factors.” arXiv preprint arXiv:1306.6078 (2013).

In this paper, the authors explore, existence of a relationship between politeness and social power. To establish this relationship, the paper uses data from two online communities (Wikipedia and Stack Exchange). These are requests directed at owners of talk-pages on Wikipedia and those directed towards authors of posts on Stack Exchange. The authors began by labeling 10,957 requests from the two data sources using Amazon Mechanical Turk, thereby creating the largest corpus of politeness annotations. Next, the authors detail 20 domain-independent lexical and syntactic features (or politeness strategies), grounded in politeness literature. The authors subsequently develop, two SVM based classifiers: ‘BOW’ and ‘Ling.’ for classifying polite and impolite requests (“ Did the authors forget to write about the ‘Alley’ classifier? 🙂 ”). The Bag of words (BOW) classifier using unigram features in the training data (i.e. labeled Wikipedia and Stack Exchange requests data) and served as a baseline. On the other hand, the linguistically informed (Ling.) classifier used the 20 linguistic features along with the unigrams and improved the baseline accuracy by 3-4%. In order to address the main research question of change in politeness, with change in social power the authors compare the politeness levels of requests made by Wikipedia editors, before and after they become administrators (i.e. before and after elections). The key finding is that the politeness score of requests by editors who successfully become administrators after public elections dropped, whereas the same increased for unsuccessful editors, as shown in the figure below.

Additionally, the authors present other interesting findings like: Question-askers are politer than Answer-givers, politeness of requests reduces as reputation increases on Stack Exchange, Wikipedians from the U.S. Midwest are the politest, female Wikipedians are generally more polite and there is significant variance in politeness of of requests in the programming language communities (0.47 for Python to 0.59 for Ruby).

The first question that came to my mind while reading this paper was, there may be other behaviors and traits that may be associated with requests and responses (or writing in general). For example, compassion, persuasiveness, verbosity/succinctness, quality of language. A nice follow-up to this work might be to re-run the study for these other qualities, maybe even using other datasets (say Q&A sites like: Quora, Answers.com, Yahoo Answers?). I do feel that there isn’t a huge difference between the politeness scores for successful versus failed Wikipedia administrator candidates especially after the election. I encountered this old (but interesting) paper which investigates politeness in written persuasion by examining a set of letters written by academics at different ranks in support of a colleague who had been denied promotion and tenure at a major state university in the U.S. One of the key findings of the study was that, “the formulation of a request is conditioned by the relative power of the participators”. The following plot from the paper, shows the relative politeness in letters of request by academics at different ranks.

This seems to suggest a different result when compared with those presented in Danescu-Niculescu-Mizil et al. We can clearly see that in this particular case the politeness in written requests generally increases with increase in academic rank. Maybe politeness has more contextual underpinnings that need to be researched. Also, this more recent paper in social psychology links, politeness with conservatism, whereas compassion with political liberalism. As, Danescu-Niculescu-Mizil et al. specify that “Wikipedians from U.S. Midwest are the most polite”, it would be interesting to validate such relationship between behaviors (like, politeness) and political attitudes established by prior social science literature. Some more questions one might ask here are: How polite are the responses of Wikipedia editors versus administrators? Do polite requests generally get polite responses? Are people more likely to respond to a polite request instead of a not so polite request? On the technical side, it might be interesting to experiment with other models for classification, say Random Forests, Neural Nets, Logistic Regression. Also, techniques for improving model performance like: bagging, boosting, k-fold cross validation might be interesting avenues of exploration. It may also be interesting to determine the politeness/impoliteness of general written text (say news articles, editorials, reviews, critique, social media posts) and examine how this affects the responses to these articles (shares, likes and comments).

Reflection #6 – [02/08] – [Jamal A. Khan]

February 8, 2018 jamal93 Leave a comment

Both of the papers assigned for today deal with computational linguistics revolving around politeness and repectfulness:

Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013) “A computational approach to politeness with application to social factors”.
Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … & Eberhardt, J. L. (2017) “Language from police body camera footage shows racial disparities in officer respect”.

I’m going to talk about the second one first because it seems to tackle a more interesting topic and asks a hard question with serious social consequences. The authors question whether black people are systematically treated differently(with less respect) by police officers? and the answer, alarmingly, is yes.

However, i don’t think the authors catered for the drastic difference in crime rate in Oakland as compared to the national average and this makes generalizability of the study a major concern.

So my question is then is that “does the oakland police department have a problem of racial profiling or do police departments in general exhibit this trend?” it’s very important to realize the difference and it’s an easy one to overlook. Another important aspect that is missing, and is pointed out by the authors as well is that the while the trends may be obvious, the reasons thereof are not! which i believe are quite important.

Finally i think without taking the language, body language and facial expressions of the people being questioned, the study might paint an incomplete picture. With modern deep-learning techniques and out of the box solutions for object detection, facial segmentation etc. such analysis is now possible. Hence, while I feel that this study is a step in the right direction but it’s one which needs much more work.

Coming back to the first paper, before i get into the paper itself, just by reading the abstract the question the comes to mind is whether models such as the one proposed by the authors have the potential to be used for language policing and how intrusive could the effect be on freedom of speech? The reason why i raise this question, is that while we are heavily concerned about whether people face opinion-challenging information online so as to not create “echo chamber”, we tend to have the opposite stance on language used which possibly has the potential to create a “slowflake culture”. In my opinion negative experience are necessary for growth.

Nevertheless going back to the paper itself, the most interesting trends were from the stack-overflow results showing that people become less polite (haughty?) after gaining non-monetary affluence and that they becomes more polite when they loose. I guess the former goes along the lines of “humility is a hard attribute to find?”. Though i think the analysis is too one-dimensional in the sense that formality and or the necessary strictness that is required of someone when in position of power, might be perceived as less polite but no less respectful? I think a separate set of experiments needs to be run to confirm this observation and so i wouldn’t treat this particular results as a really interesting hypothesis and nothing more. Perhaps popularity of users and politeness could also be studied in a different set of experiments.

Another subtle point that the paper missed is that English is now becoming (probably has already become) a universal language and the way it’s used differs quite widely among different cultures and geographic. This seems to be a repeating trend among the papers we’ve read so far in the class. The question here becomes then what kind of effects do the local languages have on the perception of respect in (translated?) English?It may be the case that phrases when translated over from the local language to English become less-polite or maybe even rude.

Finally, i was wondering how well the study would fare with more modern NLP techniques which capture not only sentence structure but also inter sentence relationship (the proposed classifier doesn’t do that right now) and would the findings still hold or get augmented.

Reflection #6 – [02/08] – Vartan Kesiz-Abnousi

February 8, 2018February 8, 2018 vartan Leave a comment

[1] Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors. arXiv preprint arXiv:1306.6078.

[2] Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., & Eberhardt, J. L. (2017). Language from police body camera footage shows racial disparities in officer respect. Proceedings of the National Academy of Sciences, 201702413.

The Danescu et al paper proposes computational framework for identifying and characterizing aspects of politeness marking in requests. They start with a corpus of requests annotated for politeness, specifically two large communities Wikipeidia and StackExchange. They use this to construct a politeness classifier. The classifier achieves near human-level accuracy across domains, which highlights the consistent nature of politeness strategies

The reason Danescu et al use requests are because they involve the speaker imposing on the addressee, making them ideal for exploring the social value of politeness strategies and because they stimulate negative politeness. I believe there should be a temporal aspect There is surely a qualitative difference between Wikipedia and StackExchange. The type of requests has a different nature on those two communities. This might explain the result.

Second, there is a problem I believe with respect to the generalizing this theory in the “real world”. An online community is quite different compared to a real life community, for instance a University or a corporation. In online communities people are not only geographically separated, but is truly the worst thing that can happen to someone who is not polite on Wikipedia, or Stack Exchange compared to an office environment? There would be consequences that go beyond a digital reputation. I would also be interested to conduct an experiment in those communities. What if we “artificially” established fake users with extraordinary high popularity with the same requests as users who have extremely low popularity? How polite would people respond?

Technically, there is a big difference in the number of requests on the two domains, WIKI and SE. The same size of requests from SE is ten time larger. Therefore, what puzzles me is why did they use Wikipedia as their training data instead of Stack Exchange. Why did they not use Stack Exchange

In addition, the annotators were told that the sentences were from emails of co-workers. I wonder what kind of effect that has on the results. Perhaps the annotators have specific expectations of “politeness” from co-workers that would not be the same if they knew that they were examining requests from Wikipedia and SE. Second, I see that the authors are doing a “z-score normalization” on an ordinal variable (Likert scale) which is statistically wrong. You cannot take the average of an ordinal variable. That includes the standard deviation. And nothing indicates an average of 0 in Figure 1. Instead of doing that, they can either simply report the median or use an IRT (Item Response Model) model with polytomous outcomes, which appropriate for Likert scales. In addition, while the inter-annotator agreement is not random based on the test they perform, the mean correlation is not particularly high either. Just because it is not random, does not mean that there is a consensus.

And why is the inter-annotation pairwise correlation coefficient around 0.6? The answer is different people have different notions of what they deem as “polite”. If the authors collected the demographics of the annotators, I believe we would see some interesting results. First, it might have improved the accuracy of the classifiers drastically. Demographics such as income, education, the industry that they work could have an impact. For instance, does someone who works in the Wall Street pit in Manhattan has the same notion of “politeness” as a nun?

In the second paper, henceforth Voight et al, as the title suggests, the authors investigate language from police body camera footage shows racial disparities in officer respect. They do this we analyze the respectfulness of police officer language toward white and black community members during routine traffic stops.

I believe this paper is related a lot to the previous paper on many levels. Basically, the language displays the perceived power differential between the two (or more) agents who are interacting. Most importantly, it is the fact that there is no punishment, or there are no stakes that further bolsters such behaviors. For instance, once people lose their elections, they become politer. The power of this paper is that it is using real camera footage, not an online platform. Based on the full regression model in the Appendix, apologizing makes a big difference in the “Respect” and “Formal” models. The coefficients are both statistically significant and signs are reversed, apologizing is positive with respect, as expected.