Through this paper, the various Microsoft authors created, and survey tested a set of guide lines (or best approaches) for designing and creating AI Human interactions. Throughout their study, they went through 150 AI design recommendations, ran their initial set of guidelines through a strict set of heuristics, and finally through multiple rounds in user study consisting of 49 moderates (with at least 1 year of self-reported experience) HCI practitioners. From this, the resulting 18 guidelines had the categories of “Initially” (at the start of development), “During interaction”, “When wrong” (the ai system), and “Over time”. These categories include some of the following (but not limited to): “Make clear what the system can do”, “Support efficient invocation”, and “Remember recent interactions”. Throughout the user study, these guidelines were tested to how relevant they would be in the specific avenue of technology (such as Navigation and Social Networks). Throughout these ratings, at least 50% of the respondents thought the guidelines were clear, while approximately 75% of the resonant thought the guidelines were at least neutral (or all right to understand). Finally, a set of HCI experts were asked to ensure further revisions on the guidelines were accurate and better reflected the area.
I agree and really appreciate the insight into the relevancy testing of each guideline on each section of industry. Not only does this help to avoid mis-appropriation of guidelines into unintended sections, it also helps create a guideline for the guidelines. This will help ensure people implementing these set of guidelines have a better idea as to the best place they could be used.
I also agree and like the thorough testing that went into the vetting process for these guidelines. Within last weeks readings it seems the surveys were majority or solely based on the surveys of papers and subjective to the authors. Having various rounds of testing with people who have generally high average of experience within the field grants great support to the guidelines.
- One of my questions for the authors would be a post-mortem of the results and their impact upon the industry. Regardless of the citations it would be interesting to see how many platforms integrate these guidelines into their systems and to what extent.
- Following up on the previous question, I would like to see another paper (possibly survey) exploring the different methods of implementations used throughout the different platform. A comparison between the different platforms would help to better showcase and exemplify each guideline.
- I would also like to see each of these guidelines run against a sample of expert psychologists and determine their affects in the long run. Along with what was described in the paper (Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research) as algorithm aversion (“a phenomenon in which people fail to trust an algorithm once they have seen the algorithm make a mistake”), I would like to see if these guidelines would create an environment making the interaction to immersive that the human subjects are either rejecting it or completely accepting of it.