SUMMARY
In this paper, the authors talk about the impact updates made to an AI model can have on the overall human-machine team performance. They describe the mental model that a human develops through the course of interacting with an AI system and how this gets impacted when an update is made to the AI system. They introduce the notion of ‘compatible’ AI updates and propose a new objective that will penalize new errors (errors introduced in the new model that were not present in the original model). The authors introduced terms such as ‘locally-compatible updates’, ‘compatibility score’ as well as ‘globally-compatible updates’. They performed experiments with high-stakes domains such as recidivism prediction, in-hospital mortality prediction, and credit risk assessment. They also developed a platform to study human-AI teams called CAJA, which is a web-based game and the authors claim that no human is a task expert. CAJA enables designers to vary different parameters including the number of human-visible features, AI accuracy, reward function, etc.
REFLECTION
I think this paper was very interesting as I have never considered the impact on team performance due to updates to an AI system. The idea of a mental model, as introduced by the authors of this paper, was novel to me as I have never thought about the human aspect of utilizing such AI systems that make various recommendations. This paper reminded me of the multiple affordances mentioned in the paper ‘An Affordance-Based Framework for Human Computation and Human-Computer Collaboration’ wherein both humans and machine are in pursuit of a common goal and leverage the strengths of both humans and machines.
I thought that it was good that they defined the notion of compatibility to include the human’s mental model and I agree that developers retraining AI models are susceptible to focus on retraining in terms of improving the accuracy of a model and that they tend to ignore the details of human-AI teaming.
I was also happy to read that the workers used as part of the study performed in this paper were paid on average $20/hour as per the ethical guidelines for requesters.
QUESTIONS
- The paper mentions the use of Logistic Regression and multi-layer perceptron. Would a more detailed study on the types of classifiers that are used in these systems help?
- Would ML models that have better interpretability for the decisions made have given better initial results and prevented the dip in team performance? In such cases, would providing a simple ‘change log’ (as is done in a case of other software applications), have aided in preventing this dip in team performance or would it have still been confusing to the humans interacting with the system?
- How were the workers selected for the studies performed on the CAJA platform? Were there any specific criteria used to select such workers? Would the qualifications of the workers have affected the results in anyway?
Hi Palakh!
Great reflections and discussion points. Regarding the second discussion point, “Would ML models that have better interpretability for the decisions made have given better initial results and prevented the dip in team performance? In such cases, would providing a simple ‘changelog’ (as is done in a case of other software applications), have aided in preventing this dip in team performance or would it have still been confusing to the humans interacting with the system?”, I had similar thoughts in mind while reading the paper.
I feel that describing the reason behind a particular decision taken by the AI would definitely help prevent the dip in team performance since the human involved can now shape her/his mental model based on the reason provided and make an informed decision. I am not sure if this would be conveyed by just presenting a simple changelog since the human might not make much sense out of it and more often than not these would be disregarded. I feel that some form of explanation that is able to convey the reason in a fairly understandable form would definitely help prevent the team performance from decreasing.