Summary
In this paper, the authors study the effect of updating AI system on the human-AI team performance. The study is focused on the decision-making systems where the users decide on whether accept the AI system recommendation or perform a manual process to make a decision. The authors name the users experience a mental model that the users built over a course of the usage of the system. Improving the accuracy of the AI system might disturb the users’ mental model and decrease the overall performance of the system. The paper mentioned two examples of a readmission system that is used by doctors to predict if the patient will get readmitted or not and also another system that is used by judges and shows the negative impact of the system updates on both systems. The authors propose a platform that can be used by the users to recognize objects and can built users mental model and give the user rewards and get feedback to improve the overall system performance which encompass both of the AI system performance and compatibility.
Reflection
I found the idea of the compatibility very interesting. I always thought that the performance of the AI model on the validation is the most and only factor that should be taken into consideration, and I never thought about the negative effect on the user experience or mental model of the user, and now I can see that the compatibility and the performance tradeoff is a key in deploying a successful AI agent.
At the beginning, I thought that the word compatibility was not the right term to describe the subject. My understanding was compatibility in software systems refer to making sure the a newer version of the system should still work in different versions of the operating system, but now I think the user is taking a similar role as the operation system when dealing with the AI agent.
Updating the AI system looks similar to updating the user interface of an application where the users might not like a newly added feature or the new way used by the system handle a task.
Questions
- The authors mention the patient readmission and the judge examples to demonstrate how the AI update might affect the users, are there any other examples?
- The authors propose a platform that can get user feedback but not in real world setting. Can we build a platform that can get feedback at run-time using reinforcement learning where the reward can be calculated in each user action ad adjust the action to use the current model or previous model?
- If we want to use crowd-sourcing to improve the performance/compatibility of the AI system then the challenge will be on building a mental model for the user since different user will take a different task and we have no control on choosing the same worker every time, any idea that can help on using crowd-sourcing to improve the AI agent.