Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky,
Jure Leskovec, and Christopher Potts, “A computational approach to politeness with application to social factors”
Summary:
The authors in this papers introduced a computational framework for detecting and describing politeness. The built a politeness corpus from annotating request (which embed politeness) from two platforms (Wikipedia and Stack exchange). Then they trained a couple of classifiers based on that corpus (The Wikipedia requests) and tested them using the Stack Exchange annotated request. The classifiers achieved near-human accuracy. Finally, they classify the rest of the requests using those models and discuss the findings from the classified data and the social theory between social outcomes, power and politeness.
Reflection:
- I appreciate all the statistical analysis and validation the authors did in this work. I think this paper gives a lot of statistical guidance to those who need to perform similar linguistic analysis of similar problems
- I wish the real world follow this theories were being polite is appreciated and awarded
- I also suggest if the authors would study the correlation in Stack Exchange between the politeness of the questions being asked and the number of answers/responds they receive. I believe we would find a strong positive correlation between those two
- Although the authors did hard work gathering and annotating and analyzing this data/requests but I think there is a big shortcoming in their work which is the number of annotated request. The number of annotated requests is about 11,000 request out of the whole gathered 409,000 requests which is about only 2.7%. They used 2.7% of the data to train and test their models then used those models to classify the rest (about 97.3%) which has been used in their analysis. Despite that they mentioned in the Human Validation paragraph that they turned to human annotation to validate their methodology but they did not mention how many requests has been validated. I am skeptical about the amount of annotated data and I think they should increase the annotated requests set into a reasonable percentage and then redo their analysis
- I do not find table 8 useful in this paper as I can not find any association between the programming languages and the politeness. I wish the authors gave more explanation about that table
- I admire how the authors employed the politeness theory in order to explain the findings of their analysis. I believe readings and courses in Sociology and Psychology are crucial for the Social Computing course otherwise it would be just data analytics without any social insights