Summary:
“Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment” by Dodge et. al. talks about explainable machine learning and how to ensure fairness. They conduct an empirical study involving around 160 Amazon Mechanical Turk workers. They demonstrate that certain explanations are considered “inherently less fair” while others may help in enhancing the people’s confidence in the fairness of algorithms. They also talk about different kinds of model interpretability; (i) model wide fairness and (ii)case-specific fairness discrepancies. They also show that people react differently to different styles of explanations based on individual differences. They conclude with a discussion on how to provide personalized and adaptive explanations. There are 21 different definitions of fairness. In general, fairness can be defined as “….discrimination is considered to be present if for two individuals that have the same characteristic relevant to the decision making and differ only in the sensitive attribute (e.g., gender/race) a model results in different decisions”. Disparate impact is the consequence of deploying unfair models where one protected group is affected negatively compared to the protected group. This paper talks about the explanation given by machine learning models and how such models can be inherently fair or unfair.
Reflections:
The researchers attempt to answer three primary research questions: (i) How do different styles of explanation impact fairness judgment of an ML system? They study in depth if certain ML models are more effective in teasing out the unfairness of the models. They also analyze if some explanations are inherently fairer. The second questions that the researchers tackle are (ii) How do individual factors in cognitive style and prior position on algorithmic fairness impact the fairness judgment with regard to different explanations? Lastly, the researchers question the benefits and the drawbacks of different explanations in supporting fairness judgment of ML systems? The researchers offer various explanations that can be based on input features, demographic features, sensitive features, and case-based explanations. The authors conduct an online survey and ask participant different questions. However, an individual’s background might also influence the answers given by the mechanical turkers. The authors perform a qualitative as well as quantitative analysis. One of the major limitations of this work is that the analysis was performed by crowd workers with limited experience whereas in real life the decision is made by lawyers. Additionally, the authors could have used LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (Shapley Additive Values) values for offering post-hoc explanations. The authors have also not studied an important element which is the confidence as they did not control for it.
Questions:
1. Which other model is unfair? Give some examples?
2. Are race and gender the only sensitive attributes? Can models discriminate based on some other attribute? If yes, which ones?
3. Who is responsible for building unfair ML models?
4. Are explanations of unfair models enough? Does that build enough confidence in the model?
5. Can you think of any adverse effects of providing model explanations?