02/05/2020 – Bipasha Banerjee – Guidelines for Human-AI Interaction

Summary

The paper was published at the ACM CHI Conference on Human Factors in Computing Systems in 2019. The main objective of the paper was to propose 18 general design guidelines for human-AI interaction. The authors consolidated more than 150 design recommendations from multiple sources into a set of 20 guidelines and then revised them to 18. They also performed a user study of 49 participants to evaluate the clarity and relevance of the guidelines. This entire process was done in four phases, namely, consolidating the guidelines, modified evaluation, user study, and expert evaluation of revision. For the user study portion, they recruited people from the HCI background with at least a year of experience in the HCI domain. They evaluated all the guidelines based on their relevance, clarity, and clarifications. Then they had experts review the revisions, and it helped in the detection of problems related to wording and clarity. Experts are people with work experience to UX or HCI, which are familiar with heuristic evaluations. Eleven experts were recruited, and they preferred the revised versions for most. The paper highlights that there is a tradeoff between specialization and generalization.

Reflection

The paper did an extensive survey on existing AI-related designs and proposed 18 applicable guidelines. This is an exciting way to reduce 150 current research ideas to 18 general principles. I liked the way they approached the guidelines based on clarity and relevance. It was interesting to see how this paper referenced the “Principles of Mixed-Initiative User Interfaces”, published in 1999 by Eric Horowitz. The only thing I was not too fond of is the paper was a bit of a monotonous read about all the guidelines. Nonetheless, the guidelines are extremely useful in developing a system that aims to use the human-AI interaction effectively. I liked how they used users and experts to evaluate the guidelines, which suggest the evaluation process is dependable. I do agree with the tradeoff aspect. To make a guideline more usable, the specialization aspect is bound to suffer. It was interesting to learn that the latest AI research is more dominantly found in the industry as they have up-to-date guidelines about the AI design. However, there was no concrete evidence produced in the paper to support this theory.

Questions

  1. They mentioned that “errors are common in AI systems”. What kind of errors are they referring to? What is the percentage of error these systems encounter on an average?
  2. Was there a way to incorporate ranking of guidelines? (During both the user evaluation as well as the expert evaluation phase)
  3. The paper indicates that “the most up-to-date guidance about AI design were found in industry sources”. Is it because the authors are biased in their opinion or do, they have a concrete evidence to state this?

Leave a Reply