2/26/20 – Jooyoung Whang – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

The paper provides research on Fairness, Explainable Artificial Intelligence (XAI), and people’s judgment change. The authors introduce a preprocessing method to reduce the bias of a dataset for known bias-inducing attributes. They also show four explanation methods of the classification results: Sensitivity, Input-Influence, Case, and Demographic. Using different combination of the above configurations, AI classifications of the COMPAS data was presented to MTurk workers for feedback. As a result, the paper reports that case-based explanations were often seen as less fair than other explanation methods. The authors also found that sensitivity explanations are the most effective at addressing unfairness. Finally, the paper shows that the evaluator’s position on machine learning heavily impacts his or her reaction to a classifier output and explanations.

When I looked at the paper’s sample sensitivity explanation, it gave me a strong impression that the system was racist. I think many others would have had a similar thought, especially if they do not have enough knowledge about machine learning and regression. Because of this, it concerned me that some people may be lured more towards making the opposite decision than the one that the AI made as a repulsive reaction. This is clearly adding another bias in the opposite direction. I believe an explanatory model should only give helpful information about the model instead of giving bias. Thinking of a possible solution, the authors could have rephrased the same information in a different way. For example, instead of bluntly saying that the classifier would have made a different decision, the system could have reported the probability for each label. This provides the same information but adds less obvious bias. Another solution would be preprocessing the data to not have the bias in the first place like the authors suggested.

I liked the idea of comparing the subject’s prior position to using ML with their judgment of the classifier. This relates to a reflection I made last week, where I stated the possibility that people may make decisions by putting more weight when the model makes a wrong decision. As I have expected, the paper reported that prior positions do in fact make a huge difference in a user’s judgment. Either building more trust with the users or building the software to effectively address both kinds of users would be needed to address this issue.

The followings are the questions I had while reading the paper:

1. Would there be a possibility where preprocessing the data would add bias to the data instead of removing it? What if the attribute that was thought to be unneeded for the classification was actually crucial to the judgment?

2. The authors state that one of the limitations of their study is conducting it with MTurk workers and not the actual users of the software. Do you think this was really a limitation? The attributes used for the classifier and explanations in their experiment seemed general enough for non-professionals to make a meaningful judgment.

3. If you were to design a classifier with an explanation model, which explanation method would you pick? (Out of Sensitivity, Input-Influence, Case, and Demographic) What do you like about the chosen method?

2 thoughts on “2/26/20 – Jooyoung Whang – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

  1. Interesting reflections! I particularly like the concept of introducing a bias in the opposite direction unintentionally. It is highly possible that the user might choose the exact opposite of what the model suggests purely based on the obvious bias. Regarding the third question, I probably would use a combination of Sensitivity and Input-Influence based explanations since these give the users a clearer picture of which attributes specifically influenced the result thereby providing the implementation details to a certain extent and also help bring out the inherent biases in the system.

  2. You’ve pointed out an interesting thing in that the output of the sensitivity explanation could bias people in the other direction. I do agree with your points that perhaps rephrasing is necessary. But at the same time, I think the sensitivity explanation uses that phrasing only in situations where the African American profile is otherwise not different from a Caucasian profile except for the race, but the system classified the person as more likely to re-offend. It may have ultimately been a bad example to use. Maybe it would have been useful to show different examples for different profiles just as a comparison.

Leave a Reply