Reflection #6 – [02/08] – [John Wenskovitch] | CS6724 Spring18: Computational Social Science

This pair of papers discussed computational mechanisms and studies for determining politeness, both in online communities (Danescu-Niculescu-Mizil et al.) and in interactions with law enforcement (Voigt et al.). In the DNM et al. paper, the authors use requests from both Wikipedia and Stack Exchange to build a politeness corpus, using Mechanical Turkers to annotate these requests on a spectrum of politeness. They then build two classifiers to predict politeness, testing the classifiers both in-domain (training and testing data from the same source) and cross-domain (training on Wikipedia and testing on Stack Exchange, and vice versa). The Voigt et al. paper used transcribed data from Oakland Police Department body cameras for several studies regarding racial disparity in officer behavior. These studies included measuring respectfulness and politeness perceptions, identifying linguistic features to model respect, and measuring racial disparities in respect.

I have generalizability concerns about both papers because of the choices made in data collection. In the DNM paper, both the bag of words classifier and the linguistically informed classifier performed worse than the human reference percentages of classification. This was true both in-domain and cross-domain, though cross-domain is more valid for this concern. As a result, I suspect that any new corpus that is added as a source will have classification rates similar to the cross-domain accuracy. Further, their use cases that focus on requests provide a further bias – I suspect that introducing any new corpus not focused on requests will have similar performance, if not worse performance. I wonder if their classifier accuracies might improve if they consider more than two sentences of text with each request, to acquire additional contextual information.

The Voigt paper used only transcribed body camera audio from a city police department, in a city well-known for crime rates. As a result, their results may not generalize to interactions with law enforcement in rural communities (where crime rates are different), near the national borders (where demographics are different), and in safer communities (where criminal behavior is less prevalent). Further, the behavior of the officers may differ with the knowledge that they are wearing body cameras. I’m curious to know if patterns found in transcribed audio from police cruiser dashboard cameras (in situations when the officers aren’t wearing body cameras) are any better or worse than the results shown in this study.

In general, I also felt that the discussion sections of the papers were the most interesting parts. The DNM paper looks at specific cases within the corpus, such as changes in the behavior of Wikipedia moderators when they become administrators and no longer have to be as polite (and who have to be particularly polite in the timeframe leading up to their election). The Voigt paper discussion notes that while their work demonstrates that racial disparities in levels of officer respect exist, the root causes of those disparities are less clear, perhaps an ideal target for a follow-up study on a broader range of interaction transcriptions.

Another potential follow-up study to the Voigt paper could consider the effect of seasons on officer politeness. All of the data from the Oakland Police Department was from interactions that occurred in April. Are officers more likely to be polite when the weather is nicer, or less polite in the depths of winter? And if there are seasonal or weather-related changes, does the racial disparity grow or shrink?

I found the distributions from Figure 1 of the DNM paper to be intriguing. I’m curious why the Stack Exchange politeness spectrum seems to mimic the Gaussian distribution that you would expect to see, but the Wikipedia politeness spectrum seems to plateau just above the mean. Trying to understand the difference in these distributions would be yet another interesting follow-up study – is the difference a result of inflated semi-polite interaction frequency because of the moderators trying to become administrators, or is it a result of the language in interactions on Wikipedia being more formal than the informal Stack Exchange, or some other reason entirely? I’m curious to hear the thoughts of anyone else in the class.

John Wenskovitch

Leave a Reply Cancel reply