02/05/20 – Mohannad Al Ameedi – Guidelines for Human-AI Interaction

Summary

In this paper, the authors suggest 18 design guidelines to build for Human – AI infused systems. These guidelines try to resolve the issues with many human interaction systems that either don’t follow guidelines or follow some guidelines that are not tested or evaluated. These issues include producing offensive, unpredictable, or dangerous results and might let users stop using these systems and therefore proper guidelines are necessary. in addition, advances in the AI field introduced new user demands in area like sound recognition, pattern recognition, and translation. These 18 guidelines help users to understand what the AI systems can do and cannot do and how well can do it, show information related to the task on focus with the appropriate timing in a way that can fit the user social and culture context, make sure that the user can request services when needed or ignore unnecessary services, offer explanation why the system do certain things, maintain a memory of the user recent action and try to learn from it, etc. These guidelines went through different phases starting from consolidating different guidelines, heuristics evaluation, user study, and expert evaluation and revisions. The authors hope that these guidelines will help building better AI-infused systems that can scale and work better with the increased number of users and advances AI algorithms and systems.

Reflection

I found the idea of putting together design guidelines very interesting as it make a standardization that can help building human-AI systems, and also help evaluating or testing these systems and can be used as a baseline when building large scale AI infused systems to avoid an well known issues associated with the previous systems.

I also found that the collection of academic and industrial guidelines are interested since it is collected based on over 20 years human interaction which can be regarded as a very valuable and rich information that can be used in different domains and fields.  

I agree with the authors that some of AI-infused systems that not follow certain guidelines are confusing and not effective and sometimes counterproductive when the suggestions or recommendations are irrelevant, and that explains why some AI enabled or infused systems were popular on certain times but they couldn’t satisfy the user demands and eventually stopped being used by users.  

Questions

  • Are these guidelines followed in Amazon Mechanical Turk?
  • The authors mention that there is a tradeoff between generality and specializations, what tradeoff factors when need to consider?

Read More

02/05/2020-Donghan Hu-Guidelines for Human-AI Interaction

Guidelines for Human-AI Interaction

In this paper, the authors focus on the problem that human-AI interaction researches need the light of advances in this growing technology. According to this, the authors propose 18 generally applicable design guidelines for the designs and studies for human-AI interactions. Based on a 49-participant involved user study, writers test the validation of these guidelines. These 18 guidelines are: 1) make clear what the system can do, 2) make clear how well the system can do what it can do, 3) time services based on context, 4) show contextually relevant information, 5) match relevant social norms, 6) mitigate social biases, 7) support efficient invocation, 8) support efficient dismissal, 9) support efficient correction, 10) scope services when in doubt, 11) make clear why the system did what it did, 12) remember recent interaction, 13) learn from user behavior, 14) update and adapt cautiously 15) encourage granular feedback, 16) convey the consequences of user actions, 17) provide global controls and 18) Notify users about changes. After the user study,

After reading this paper, I am kind of surprised that authors can purpose 18 guidelines for human-AI interaction designing. I am most interested in the category of “During interaction”. This discussion focus factors about time, context, personal data and social norms. In my opinion, providing users with specific services that can assist their interactions should also be considered in this part. For example, accessible and assistant. In addition, considering social norms is a great idea. Individuals who use the AI system have many kinds of background, abilities, and ethics. We cannot treat every person with the same design of the applications and systems. Allowing users to design their preferred user interfaces, features, and functions in one general system is a promising but challenging research question. I think this is a promising topic in the future. At present, many applications and systems allow users to customize their own features with the provided default settings. Players can design their own models for games, like the Steam platform. For Google Chrome, users can design their own theme based on their motivations and goals. I believe this feature can be achieved by multiple human-AI interaction systems later.

Among these 18 different guidelines, I notice that an AI application does not have to require all these guidelines. Hence, do some of the guidelines have high majorities than others? Or, in the process of designing, researchers should treat each of them equally?

In your opinion, which guidelines do you consider are more important and will focus on them in the future? Or which guidelines you might have ignored in the previous researches?

In this paper, the authors mentioned the tradeoff between generality and specialization. How do you think to solve this problem?

Will these guidelines become useless due to the increase of specialization in various kinds of applications and systems in the future?

Read More

2/5/20 – Lee Lisle – Guidelines for Human-AI Interaction

Summary

               The authors (of which there are many) go over the various HCI-related findings for Human-AI interaction and categorize them into eighteen different types over 4 categories (applicable to when the user encounters the AI assistance). The work makes sure the reader knows it was from the past twenty years of research and from a review of industry guidelines, articles and editorials in the public domain, and a (non-exhaustive) survey of scholarly papers on AI design. In all, they found 168 guidelines that they then performed affinity diagramming (and filtering out concepts that were too “vague”), resulting in twenty concepts. Eleven members of their team at Microsoft then performed a modified discount heuristic evaluation (where they identified an application and its issue) and refined their guidelines with that data, resulting in 18 rules. Next, they performed a user study with 49 HCI experts where each was given an AI-tool and asked to evaluate it. Lastly, they had experts validate their revisions in the previous phase.

Personal Reflection

               These guidelines are actually quite helpful in evaluating an interface. As someone who has performed several heuristic evaluations in a non-class setting, having defined rules that can be easily determined if they’ve been violated makes the process significantly quicker. Nielsen’s heuristics have been the gold standard for perhaps too long, so revisiting the creation of guidelines is ideal. It also speaks to how new this paper is, being from 2019’s CHI conference.

               Various things surprised me in this work. First, I was surprised that they stated that contractions weren’t allowed for their guidelines because they weren’t clear. I haven’t heard that complaint before, and it seemed somewhat arbitrary. A contraction doesn’t change a sentence much (doesn’t in this sentence is clearly “does not”), but I may be mistaken here. I was also surprised to find their tables in figure 1 to be hard to read, as if maybe it as a bit too information dense to clearly impart their findings. I was also surprised about their example for guideline 6, as suggesting personal pronouns and kind of stating there are only 2 is murky, at best (I would’ve used a different example entirely). Lastly, the authors completely ignored the suggestion of keeping the old guideline 15, stating their own reasons despite the expert’s preferences.

               I also think this paper in particular will be a valuable resource for future AI development. In particular, it can give a lot of ideas for our semester project. Furthermore, these guidelines can help early on in the process of designing future interactions, as they can refine and correct interaction mistakes before the implementation of many of these features.

               Lastly, I thought it was amusing the “newest” member of the team got a shout-out in the acknowledgements.

Questions

  1. The authors bring up trade-offs as being a common occurrence in balancing these (and past) guidelines. Which of these guidelines do you think is easier or harder to bend?
  2. The authors ignored the suggestion of their own panel of experts in revising one of their guidelines. Do you think this is appropriate for this kind of evaluation, and why or why not?
  3. Can you think of an example of one of these guidelines not being followed in an app you use? What is it, and how could it be improved?

Read More

02/05/2020 – Guidelines for Human AI Interaction – Subil Abraham

Reading: Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 1–13. https://doi.org/10.1145/3290605.3300233

With AI and ML making its way into every aspect of our electronic lives, it has become pertinent to examine how well it functions when faced with users. In order to do that, we need to have some set of rules or guidelines that we can use as a reference to identify whether the interaction between a human and an AI powered feature is actually functioning the best way it should function. This paper aims to fill that gap, collating the knowledge of 150 recommendations for human AI interfaces and distilling down into 18 distinct guidelines that can be checked for compliance. They also go through the process of refining and tailoring these guidelines to remove ambiguity through heuristic evaluations where experts try to match the guidelines to sample interactions and identify whether the interaction adheres to or violates the guideline or if the guideline is relevant to that particular interaction at all.

  • Though it’s only mentioned in a small sentence in the Discussion section, I’m glad that they point out and acknowledge that there is a tradeoff between being very general (at which point the vocabulary you devise is useless and you have to start defining subcategories), and being very specific (at which point you need to start adding addendums and special cases willy-nilly). I think the set of guidelines in this paper does a good job of trying to strike that balance.
  • I do find it unfortunate that they anonymized the products that they used to test interactions on. Maybe this is just standard practice when it comes to this kind of HCI work to not specify the exact products that they evaluate to avoid dating the work in the paper. It probably makes sense this way they have control of the narrative and can simply talk about the application in terms of the feature and interaction tested. This avoids having to grapple over which version of the application they used on which day, because applications get updated all the time and violations might get patched and fixed and thus the application is no longer a good example for a guideline adherence or violation that was noted earlier.
  • It is kind of interesting that a majority of the experts in phase 4 preferred the original version of guideline 15 (encourage feedback) as opposed to revised version (provide granular feedback) that was successful in the user study. I wish they had explained or speculated why that was.
  1. Why do you think experts in phase preferred the original version of guideline 15 as opposed to revised version, even though the revised version was demonstrated to cause less confusion between it and guideline 17 compared to the original version?
  2. Are we going to see even more guidelines, or a revision of these guidelines 10 years down the line, when AI assisted applications become even ubiquitous?
  3. As the authors pointed out, the current ethics related guidelines (5 and 6) may not be sufficient to cover all the ethical concerns. What other guidelines should there be?

Read More

02/05/2020 – Bipasha Banerjee – Guidelines for Human-AI Interaction

Summary

The paper was published at the ACM CHI Conference on Human Factors in Computing Systems in 2019. The main objective of the paper was to propose 18 general design guidelines for human-AI interaction. The authors consolidated more than 150 design recommendations from multiple sources into a set of 20 guidelines and then revised them to 18. They also performed a user study of 49 participants to evaluate the clarity and relevance of the guidelines. This entire process was done in four phases, namely, consolidating the guidelines, modified evaluation, user study, and expert evaluation of revision. For the user study portion, they recruited people from the HCI background with at least a year of experience in the HCI domain. They evaluated all the guidelines based on their relevance, clarity, and clarifications. Then they had experts review the revisions, and it helped in the detection of problems related to wording and clarity. Experts are people with work experience to UX or HCI, which are familiar with heuristic evaluations. Eleven experts were recruited, and they preferred the revised versions for most. The paper highlights that there is a tradeoff between specialization and generalization.

Reflection

The paper did an extensive survey on existing AI-related designs and proposed 18 applicable guidelines. This is an exciting way to reduce 150 current research ideas to 18 general principles. I liked the way they approached the guidelines based on clarity and relevance. It was interesting to see how this paper referenced the “Principles of Mixed-Initiative User Interfaces”, published in 1999 by Eric Horowitz. The only thing I was not too fond of is the paper was a bit of a monotonous read about all the guidelines. Nonetheless, the guidelines are extremely useful in developing a system that aims to use the human-AI interaction effectively. I liked how they used users and experts to evaluate the guidelines, which suggest the evaluation process is dependable. I do agree with the tradeoff aspect. To make a guideline more usable, the specialization aspect is bound to suffer. It was interesting to learn that the latest AI research is more dominantly found in the industry as they have up-to-date guidelines about the AI design. However, there was no concrete evidence produced in the paper to support this theory.

Questions

  1. They mentioned that “errors are common in AI systems”. What kind of errors are they referring to? What is the percentage of error these systems encounter on an average?
  2. Was there a way to incorporate ranking of guidelines? (During both the user evaluation as well as the expert evaluation phase)
  3. The paper indicates that “the most up-to-date guidance about AI design were found in industry sources”. Is it because the authors are biased in their opinion or do, they have a concrete evidence to state this?

Read More

02/05/2020 – Palakh Mignonne Jude – Guidelines for Human -AI Interaction

SUMMARY

In this paper, the authors propose 18 design guidelines for human-AI interaction with the aim that these guidelines would serve as a resource for practitioners. The authors codified over 150 AI-related design recommendations and then through multiple phases, and refinement processes modified this list and defined 18 generally applicable principles. As part of the first phase, the authors reviewed AI products, public articles, and relevant scholarly papers. They obtained a total of 168 potential guidelines which were then clustered to form 35 concepts. This was followed by a filtration process that reduced the number of concepts to 20 guidelines. As part of phase 2, the authors conducted a modified heuristic evaluation attempting to identify both applications and violations of the proposed guidelines. They utilized 13 AI-infused products/features as part of this evaluation study. This phase helped to merge, split, and rephrase different guidelines and reduced the total number of guidelines to 18. In the third phase, the authors conducted a user study with 49 HCI practitioners in an attempt to understand if the guidelines were applicable across multiple products and to obtain feedback about the clarity of the guidelines. The authors ensured that the participants had experience in HCI and were familiar with discount usability testing methods. Modifications were made to the guidelines based on the feedback obtained from the user study based on the level of clarity and relevance of the guidelines. In the fourth phase, the authors conducted an expert evaluation of the revisions. These experts comprised of people who had work experience in UX/HCI and were well-versed with discount usability methods. With the help of these experts, the authors assessed whether the 18 guidelines were easy to understand. After this phase, they published a final set of 18 guidelines.

REFLECTION

After reading the 1999 paper on ‘Principles of Mixed-initiative User Interfaces’, I found that the study performed by this paper was much more extensive as well as more relatable as the AI-infused systems considered were systems that I had some knowledge about as compared to the LookOut system that I have never used in the past. I felt that the authors performed a thorough comparison and included various important phases in order to formulate the best set of guidelines. I found that it was interesting that this study was performed by researchers from Microsoft 20 years after the original 1999 paper (also done at Microsoft). I believe that the authors provided a detailed analysis of each of the guidelines and that it was good that they included identifying applications of the guidelines as part of the user study.

I felt that some of the violations reported by people were very well thought out; for example, when reporting a violation for an application where the explanation was provided but inadequate with respect to the navigation product – ‘best route’ was suggested, but no criteria was given for why the route was the best. I feel that such notes provided by the users were definitely useful in helping the authors better assimilate good and generalizable guidelines.

QUESTION

  1. Which, in your experience, among the 18 guidelines did you find to be most important? Was there any guideline that appeared to be ambiguous to you? For those that have limited experience in the field of HCI, were there any guidelines that seemed unclear or difficult to understand?
  2. The authors mention that they do not explicitly include broad principles such as ‘build trust’, but instead made use of indirect methods by focusing on specific and observable guidelines that are likely to contribute to building trust. Is there a more direct evaluation that can be performed in order to measure building trust?
  3. The authors mention that it is essential that designers evaluate the influences of AI technologies on people and society. What methods can be implemented in order to ensure that this evaluation is performed? What are the long-term impacts of not having designers perform this evaluation?
  4. For the user study (as part of phase 3), 49 HCI practitioners were contacted. How was this done and what environment was used for the study?

Read More

02/05/20 – Fanglan Chen – Guidelines for Human-AI Interaction

The variability of current AI designs as well as automated inferences of failures – ranging from the  disruptive or confusing to the more serious – calls for creating more effective and intuitive user experiences with AI. The paper “Guidelines for Human-AI interaction” enriches the ongoing conversation on heuristics and guidelines towards human-centered design for AI systems. In this paper, Amershi et al. identified more than 160 potential recommendations for Human-AI interaction from respected sources that ranged from scholarly research papers to blog posts and internal documents. Through a 4-phase framework, the research team systematically distilled and validated the guideline candidates into a unified set of 18 guidelines. This work empowers the community by providing a resource for designers working with AI and facilitates future research into the refinement and development of principles for human-AI interaction.

The proposed 18 guidelines in the paper are grouped into four sections that prescribe how an AI system should behave upon initial interaction, as the user interacts with the system, when the system is wrong, and over time. As far as I can see, the major research question is how to control automated inferences to some extent when they are performing under uncertainty. We can imagine that it would be extremely dangerous in scenarios in which humans are unable to intervene when AI makes incorrect decisions. Take autonomous vehicles for example, AI may behave abnormally under the real-world situations that it has not faced in its training. How to integrate efficient dismissal or correction is an important question to consider in the initial design of the autonomous system.

Also, we need to be aware of that while the guidelines for Human-AI Interaction are developed to support design decisions, they are not intended to be used as a simple checklist. One of the important intentions is to support and stimulate conversations between user experience and engineering practitioners that lead to better AI design. Another takeaway from this paper is that there will always be numerous situations where AI designers must consider trade-offs among guidelines and weigh the importance of one or more over others. Beyond the 4-phase framework presented in the paper, I think there are at least two points worth of discussion. Firstly, the 4-phase framework is more like a narrowing down process, while no open-ended questions are raised in the feedback circle. The functioning and goals of apps in different categories may vary. Rising capabilities and use cases may suggest there is a need for additional guidelines. As the AI design advances, we may need more innovative ideas about the future AI design instead of constraining to the existing guidelines. Secondly, it seems all the evaluators participated in the user study are in the domain of HCI and a number of them gain years of experience in the field. I’m wondering if the opinions of end users without HCI experience need to be considered as well and how a wider involvement would impact the final results. I think the following questions are worthy of further discussion.

  • Which of the 18 proposed design guidelines are comparatively difficult to employ in AI designs? Why?
  • Besides the proposed guidelines, are there any design guidelines worthy of attention but not discussed in the paper?
  • Some of the guidelines seem to be of greater importance than others in user experience of specific domains. Do you think the guidelines need to be tailored to the specific categories of applications?
  • In the user study, do you think it would be important to include end users who actually use the app but without experience studying on HCI?

Read More

2/5/2020 – Jooyoung Whang – Guidelines for Human-AI Interaction

The paper is a good extraction of various design recommendations of human-AI interaction systems that have been collected for more than 20 years since the rise of AI. The authors run 4 iterations of filtering to end up with a final set of 18 guidelines that have been thoroughly reviewed and used. Their source of data comes from commercial AI products, user reviews, and related literature. In each of the iterations, the authors:

1. Extracted the initial set of guidelines

2. Reduced the number down via internal evaluation

3. Performed user study to verify relevance and clarity

4. Tested the guidelines with experts of the field

The authors provide a nicely summarized table containing all the guidelines and their examples. Rather than going in-depth about the resulting guidelines themselves, the authors focus more on the process and feedback that they received. The authors conclude by stating that the provided guidelines are mostly for general design cases and not specific ones.

When I was examining the guideline table, I liked how it was divided into four cases in the design iteration. In a usability engineering class that I took, I learned that a product’s design lifecycle consists of Analyze, Design, Prototype, and Evaluate, in their respective order (and can repeat). I could see that the guidelines focus a lot on Analyze, Design, and Evaluate. It was interesting that prototyping wasn’t strongly implied in the guidelines. I assume it may have been because the whole design iteration was considered a pass of prototyping. It may also have been because a system involving artificial intelligence is too hard to create a low fidelity prototype. The reason for going through a prototyping process is to quickly filter out what works and what doesn’t. As the nature of artificial intelligence requires extensive training and complicated reasoning, a pass of prototyping will accordingly take longer than other kinds of products.

It is very interesting that the guidelines (for long term) instruct that the AI system must inform its actions to the users. In my experience using AI systems such as voice recognition not knowing about machine learning techniques, the system mostly appeared as a black box. I also observed many people who intentionally tried not to use these kinds of systems because of suspicion. I think revealing portions of information and giving control to the users is a very good idea. This will allow more people to quickly adjust to the system.

The followings are the questions that came up to me when I was reading the paper:

1. As in my reflection, it is expensive to go through an entire design process for human-AI systems. Would there be a good workaround for this problem?

2. How much control do you think is appropriate to give to the users of the system? The paper mentions informing how the system will react to certain user actions and allowing the user to choose whether or not to use the system. But can we and should we allow further control?

3. The paper focuses on general cases of designing human-AI systems. They note that they’ve intentionally left out special cases. What kinds of special systems do you think will not need to follow the guidelines?

Read More

02/05/20 – Myles Frantz – Guidelines for Human-AI Interaction

Through this paper, the various Microsoft authors created, and survey tested a set of guide lines (or best approaches) for designing and creating AI Human interactions. Throughout their study, they went through 150 AI design recommendations, ran their initial set of guidelines through a strict set of heuristics, and finally through multiple rounds in user study consisting of 49 moderates (with at least 1 year of self-reported experience) HCI practitioners. From this, the resulting 18 guidelines had the categories of “Initially” (at the start of development), “During interaction”, “When wrong” (the ai system), and “Over time”. These categories include some of the following (but not limited to): “Make clear what the system can do”, “Support efficient invocation”, and “Remember recent interactions”. Throughout the user study, these guidelines were tested to how relevant they would be in the specific avenue of technology (such as Navigation and Social Networks). Throughout these ratings, at least 50% of the respondents thought the guidelines were clear, while approximately 75% of the resonant thought the guidelines were at least neutral (or all right to understand). Finally, a set of HCI experts were asked to ensure further revisions on the guidelines were accurate and better reflected the area.

I agree and really appreciate the insight into the relevancy testing of each guideline on each section of industry. Not only does this help to avoid mis-appropriation of guidelines into unintended sections, it also helps create a guideline for the guidelines. This will help ensure people implementing these set of guidelines have a better idea as to the best place they could be used.

I also agree and like the thorough testing that went into the vetting process for these guidelines. Within last weeks readings it seems the surveys were majority or solely based on the surveys of papers and subjective to the authors. Having various rounds of testing with people who have generally high average of experience within the field grants great support to the guidelines.

  • One of my questions for the authors would be a post-mortem of the results and their impact upon the industry. Regardless of the citations it would be interesting to see how many platforms integrate these guidelines into their systems and to what extent.
  • Following up on the previous question, I would like to see another paper (possibly survey) exploring the different methods of implementations used throughout the different platform. A comparison between the different platforms would help to better showcase and exemplify each guideline.
  • I would also like to see each of these guidelines run against a sample of expert psychologists and determine their affects in the long run. Along with what was described in the paper (Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research) as algorithm aversion (“a phenomenon in which people fail to trust an algorithm once they have seen the algorithm make a mistake”), I would like to see if these guidelines would create an environment making the interaction to immersive that the human subjects are either rejecting it or completely accepting of it.

Read More

02/05/2020 – Nurendra Choudhary – Guidelines for Human-AI Interaction

Summary

In this paper, the authors propose a set of 18 guidelines for human-AI interaction design. The guidelines are codified 150 AI-related design recommendations collected from diverse sources. Additionally, they also validate the design from both the users’ and expert’s perspective.

For the users, the principles are evaluated by 49 HCI practitioners by testing out a familiar AI-driven feature of a product. The goal is to estimate the number of guidelines followed/not followed by the feature. The feedback form also had a field for “does not apply” with a corresponding explanation field. Also, the review included a clarity component to find out ambiguity in the guidelines. From the empirical study, the authors were able to conclude that all the guidelines were majorly clear and hence could be applied to human-AI interactions. The authors revised the guidelines according to the feedback and conducted an expert review

The guidelines are really suitable when deploying ML systems in the real-world. Generally, in the AI community, researchers do not find any immediate concrete benefits for developing user-friendly systems. However, when such systems need to be deployed for real-world users, the user experience or human-AI interaction becomes a crucial part of the overall mechanism.

For the experts, the old and new guidelines were presented and they agreed on the revised guidelines for all but one (G15). From this, the authors conclude the effectiveness of the review process.

Reflection

Studying its applicability is really important (like the authors did in the paper), because I do not feel all of them are necessary for the diverse number of  applications. It is interesting to notice that for photo organizers, most of the guidelines are already being followed and that they include the most number of “does not apply”. Also, e-commerce seems to be plagued with issues. I think this is because of the gap in transparency. The AI systems in photo-organizers need to be advertised to the users and it directly affects their decisions. However, on the other hand for e-commerce, the AI systems work in the background to influence user choices.

AI systems steadily learn new things and its several times not interpretable by the researchers who invented them. So, this I believe is an unfair ask. However, as the AI research community pushes for increased interpretability in the systems, I believe it is possible and will definitely help users. Imagine if you could explicitly set the features attached to your profile to improve your search recommendations.

Similarly, focus on “relevant social norms” and “mitigate social biases” are presently not currently focus but I believe these will grow over time to form a dominant area of ML research. 

I think we can use these guidelines as tools to diversify AI research into more avenues focusing on building systems that inherently maintain these principles. 

Questions

  1. Can we study the feasibility and cost-to-benefit ratio of making changes to present AI systems based on these guidelines?
  2. Can such principles be evaluated from the other perspective? Can we give better data guidelines for AI to help it learn?
  3. How frequently does the list need to evolve with the evolution of ML systems?
  4. Do the users always need to know the changes in AI? Think about interactive systems, the AI learns in real-time. Wouldn’t there be too many notifications to track for a human user? Would it become something like spam?

Word Count: 569

Read More