[1] Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors. arXiv preprint arXiv:1306.6078.
[2] Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., & Eberhardt, J. L. (2017). Language from police body camera footage shows racial disparities in officer respect. Proceedings of the National Academy of Sciences, 201702413.
The Danescu et al paper proposes computational framework for identifying and characterizing aspects of politeness marking in requests. They start with a corpus of requests annotated for politeness, specifically two large communities Wikipeidia and StackExchange. They use this to construct a politeness classifier. The classifier achieves near human-level accuracy across domains, which highlights the consistent nature of politeness strategies
The reason Danescu et al use requests are because they involve the speaker imposing on the addressee, making them ideal for exploring the social value of politeness strategies and because they stimulate negative politeness. I believe there should be a temporal aspect There is surely a qualitative difference between Wikipedia and StackExchange. The type of requests has a different nature on those two communities. This might explain the result.
Second, there is a problem I believe with respect to the generalizing this theory in the “real world”. An online community is quite different compared to a real life community, for instance a University or a corporation. In online communities people are not only geographically separated, but is truly the worst thing that can happen to someone who is not polite on Wikipedia, or Stack Exchange compared to an office environment? There would be consequences that go beyond a digital reputation. I would also be interested to conduct an experiment in those communities. What if we “artificially” established fake users with extraordinary high popularity with the same requests as users who have extremely low popularity? How polite would people respond?
Technically, there is a big difference in the number of requests on the two domains, WIKI and SE. The same size of requests from SE is ten time larger. Therefore, what puzzles me is why did they use Wikipedia as their training data instead of Stack Exchange. Why did they not use Stack Exchange
In addition, the annotators were told that the sentences were from emails of co-workers. I wonder what kind of effect that has on the results. Perhaps the annotators have specific expectations of “politeness” from co-workers that would not be the same if they knew that they were examining requests from Wikipedia and SE. Second, I see that the authors are doing a “z-score normalization” on an ordinal variable (Likert scale) which is statistically wrong. You cannot take the average of an ordinal variable. That includes the standard deviation. And nothing indicates an average of 0 in Figure 1. Instead of doing that, they can either simply report the median or use an IRT (Item Response Model) model with polytomous outcomes, which appropriate for Likert scales. In addition, while the inter-annotator agreement is not random based on the test they perform, the mean correlation is not particularly high either. Just because it is not random, does not mean that there is a consensus.
And why is the inter-annotation pairwise correlation coefficient around 0.6? The answer is different people have different notions of what they deem as “polite”. If the authors collected the demographics of the annotators, I believe we would see some interesting results. First, it might have improved the accuracy of the classifiers drastically. Demographics such as income, education, the industry that they work could have an impact. For instance, does someone who works in the Wall Street pit in Manhattan has the same notion of “politeness” as a nun?
In the second paper, henceforth Voight et al, as the title suggests, the authors investigate language from police body camera footage shows racial disparities in officer respect. They do this we analyze the respectfulness of police officer language toward white and black community members during routine traffic stops.
I believe this paper is related a lot to the previous paper on many levels. Basically, the language displays the perceived power differential between the two (or more) agents who are interacting. Most importantly, it is the fact that there is no punishment, or there are no stakes that further bolsters such behaviors. For instance, once people lose their elections, they become politer. The power of this paper is that it is using real camera footage, not an online platform. Based on the full regression model in the Appendix, apologizing makes a big difference in the “Respect” and “Formal” models. The coefficients are both statistically significant and signs are reversed, apologizing is positive with respect, as expected.