Reading: Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 1–14. https://doi.org/10.1145/3290605.3300641
Different parts of our lives are being infused with AI magic. With this infusion, however, comes problems, because the AI systems deployed aren’t always accurate. Users are used to software systems being precise and doing exactly the right thing. But unfortunately they can’t extend that expectation for AI systems because they are often inaccurate and make mistakes. Thus it is necessary for developers to set expectations of the users ahead of time so that the users are not disappointed. This paper proposes three different visual methods of setting the user’s expectations on how well the AI system will work: an indicator depicting accuracy, a set of examples demonstrating how the sytem works, and a slider that controls how aggressively the system should work. The system under evaluation is a detector that will identify and suggest potential meetings based on the language in an email. The goal of the paper isn’t to improve the AI system itself, but rather to evaluate how well the different expectation setting methods work given an imprecise AI system.
I want to note that I really wanted to see an evaluation on the effects of mixed techniques. I hope that it will be covered in possible future work they do but am also afraid that such work might never get published because it would be classified as incremental (unless they come up with more expectation setting methods beyond the three mentioned in this paper, and do a larger evaluation). It is useful to see that we now have numbers to back up that high-recall applications under certain scenarios are perceived as more accurate. It makes intuitive sense that it would be more convenient to deal with false positives (just close the dialog box) than false negatives (having to manually create a calendar event). Also, seeing the control slider brings to mind the trick that some offices play where they have the climate control box within easy reach of the employees but it actually doesn’t do anything. It’s a placebo to make people think it got warmer/colder when nothing has changed. I realize that the slider in the paper is actually supposed to do what it advertised, but it makes me think of other places where a placebo slider can be given to a user to make them think they have control when in fact the AI system remains completely unchanged.
- What other kinds of designs can be useful for expectation setting in AI systems?
- How would these designs look different for a more active AI system like medical prediction, rather than a passive AI system like the meeting detector?
- The paper claims that the results are generalizable for other passive AI systems, but are there examples of such systems where it is not generalizable?
Hi, Subil.
It is interesting that you bring up the placebo effect. It also brings me to a question. The placebo effect requires that the human mind strongly believe something will have an effect. I am sure there is a certain threshold that breaks this “belief”. For example, if the system always provided half-precision and half population and the fake interface allowed a 10% adjustment to each side, I’m sure the placebo effect will work. But what if it was 50%? Won’t the user easily notice that the interface is fake? I am curious about what this threshold might be.
Related to the placebo effect, I also wonder if intentionally giving a lower accuracy rate than what the system can actually do will affect user experience. Since the user will have lower expectations, will they have a better experience?
In regard to your question 2, I think these designs will give a sense of warning to the user. Since the users will know how the AI performs, they will adjust the trust they have with the AI to make appropriate decisions.