Summary
In this paper, the authors discuss the acceptance of imperfect AI systems by human users. They, specifically, consider the case of email-scheduling assistants for their experiments. Three features are adopted to interface between users and the assistant: Accuracy Indicator (indicates expected accuracy of the system), Example-based Explanation design (explanation of sample test-cases for pre-usage development of human mental models) and Control Slider Design (to let users control the system aggressiveness, False Positive vs False Negative Rate).
The participants of the study completed a 6 step procedure. The procedures analyzed their initial expectations of the system. The evaluation showed that the features helped in setting the right expectation for the participants. Additionally, it concluded that systems with High Recall gave a pseudo-sense of higher accuracy than High Precision. The study shows that user expectations and acceptance can be satisfied through not only intelligible explanations but also tweaking model evaluation metrics to emphasize on one over another (Recall over Precision in this case).
Reflection
The paper explores user expectations of AI and their corresponding perceptions and acceptance. An interesting idea is tweaking evaluation metrics. In previous classes and a lot of current research, we discuss utilizing interpretable or explainable AI as the fix for the problem of user acceptance. However, the research shows that even simple measures such as tweaking evaluation metrics to prefer recall over precision can create a pseudo-sense of higher accuracy for users. This makes me question the validity of current evaluation techniques. Current metrics are statistical measures for deterministic models. The statistical measures directly correlated with user acceptance because of human comprehensibility of their behaviour. However, given the indeterministic nature of AI and its incomprehensible nature, old statistical measures may not be the right way to validate AI models. For our specific research/problems, we need to study end-user more closely and design metrics that correlate to the user demands.
Another important aspect is the human comprehensibility of AI systems. We notice from the paper that addition of critical features significantly increased user acceptance and perception. I believe there is a necessity for more such systems across the field. The features help the user expectation of the system and also help adoption of AI systems in real-world scenarios. The slider is a great example of manual control that could be provided to users to enable them to set their expectations from the system. Explanation of the system also helps develop human mental models so they can understand and adapt to AI changes faster. Example, search engines or recommender systems record information on users. If the users understand they store and utilize for their recommendation, they would modify their usage accordingly to fit the system requirements. This will improve system performance and also user experience. Also, it will lead to a sense of pseudo-deterministic nature in AI systems.
Questions
- Can such studies help us in finding relevance of evaluation metric in the problem’s context?
- Evaluation metrics have been designed as statistical measures. Has this requirement changed? Should we design metrics based on user experience?
- Should AI be developed according to human expectations or independently?
- Can this also be applied to processes that generally do not directly involve humans such as recommender systems or search engines?
Word Count: 557