02/05/2020-Donghan Hu-Guidelines for Human-AI Interaction

Guidelines for Human-AI Interaction

In this paper, the authors focus on the problem that human-AI interaction researches need the light of advances in this growing technology. According to this, the authors propose 18 generally applicable design guidelines for the designs and studies for human-AI interactions. Based on a 49-participant involved user study, writers test the validation of these guidelines. These 18 guidelines are: 1) make clear what the system can do, 2) make clear how well the system can do what it can do, 3) time services based on context, 4) show contextually relevant information, 5) match relevant social norms, 6) mitigate social biases, 7) support efficient invocation, 8) support efficient dismissal, 9) support efficient correction, 10) scope services when in doubt, 11) make clear why the system did what it did, 12) remember recent interaction, 13) learn from user behavior, 14) update and adapt cautiously 15) encourage granular feedback, 16) convey the consequences of user actions, 17) provide global controls and 18) Notify users about changes. After the user study,

After reading this paper, I am kind of surprised that authors can purpose 18 guidelines for human-AI interaction designing. I am most interested in the category of “During interaction”. This discussion focus factors about time, context, personal data and social norms. In my opinion, providing users with specific services that can assist their interactions should also be considered in this part. For example, accessible and assistant. In addition, considering social norms is a great idea. Individuals who use the AI system have many kinds of background, abilities, and ethics. We cannot treat every person with the same design of the applications and systems. Allowing users to design their preferred user interfaces, features, and functions in one general system is a promising but challenging research question. I think this is a promising topic in the future. At present, many applications and systems allow users to customize their own features with the provided default settings. Players can design their own models for games, like the Steam platform. For Google Chrome, users can design their own theme based on their motivations and goals. I believe this feature can be achieved by multiple human-AI interaction systems later.

Among these 18 different guidelines, I notice that an AI application does not have to require all these guidelines. Hence, do some of the guidelines have high majorities than others? Or, in the process of designing, researchers should treat each of them equally?

In your opinion, which guidelines do you consider are more important and will focus on them in the future? Or which guidelines you might have ignored in the previous researches?

In this paper, the authors mentioned the tradeoff between generality and specialization. How do you think to solve this problem?

Will these guidelines become useless due to the increase of specialization in various kinds of applications and systems in the future?

Read More

2/5/20 – Lee Lisle – Guidelines for Human-AI Interaction

Summary

               The authors (of which there are many) go over the various HCI-related findings for Human-AI interaction and categorize them into eighteen different types over 4 categories (applicable to when the user encounters the AI assistance). The work makes sure the reader knows it was from the past twenty years of research and from a review of industry guidelines, articles and editorials in the public domain, and a (non-exhaustive) survey of scholarly papers on AI design. In all, they found 168 guidelines that they then performed affinity diagramming (and filtering out concepts that were too “vague”), resulting in twenty concepts. Eleven members of their team at Microsoft then performed a modified discount heuristic evaluation (where they identified an application and its issue) and refined their guidelines with that data, resulting in 18 rules. Next, they performed a user study with 49 HCI experts where each was given an AI-tool and asked to evaluate it. Lastly, they had experts validate their revisions in the previous phase.

Personal Reflection

               These guidelines are actually quite helpful in evaluating an interface. As someone who has performed several heuristic evaluations in a non-class setting, having defined rules that can be easily determined if they’ve been violated makes the process significantly quicker. Nielsen’s heuristics have been the gold standard for perhaps too long, so revisiting the creation of guidelines is ideal. It also speaks to how new this paper is, being from 2019’s CHI conference.

               Various things surprised me in this work. First, I was surprised that they stated that contractions weren’t allowed for their guidelines because they weren’t clear. I haven’t heard that complaint before, and it seemed somewhat arbitrary. A contraction doesn’t change a sentence much (doesn’t in this sentence is clearly “does not”), but I may be mistaken here. I was also surprised to find their tables in figure 1 to be hard to read, as if maybe it as a bit too information dense to clearly impart their findings. I was also surprised about their example for guideline 6, as suggesting personal pronouns and kind of stating there are only 2 is murky, at best (I would’ve used a different example entirely). Lastly, the authors completely ignored the suggestion of keeping the old guideline 15, stating their own reasons despite the expert’s preferences.

               I also think this paper in particular will be a valuable resource for future AI development. In particular, it can give a lot of ideas for our semester project. Furthermore, these guidelines can help early on in the process of designing future interactions, as they can refine and correct interaction mistakes before the implementation of many of these features.

               Lastly, I thought it was amusing the “newest” member of the team got a shout-out in the acknowledgements.

Questions

  1. The authors bring up trade-offs as being a common occurrence in balancing these (and past) guidelines. Which of these guidelines do you think is easier or harder to bend?
  2. The authors ignored the suggestion of their own panel of experts in revising one of their guidelines. Do you think this is appropriate for this kind of evaluation, and why or why not?
  3. Can you think of an example of one of these guidelines not being followed in an app you use? What is it, and how could it be improved?

Read More

02/05/2020 – Guidelines for Human AI Interaction – Subil Abraham

Reading: Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 1–13. https://doi.org/10.1145/3290605.3300233

With AI and ML making its way into every aspect of our electronic lives, it has become pertinent to examine how well it functions when faced with users. In order to do that, we need to have some set of rules or guidelines that we can use as a reference to identify whether the interaction between a human and an AI powered feature is actually functioning the best way it should function. This paper aims to fill that gap, collating the knowledge of 150 recommendations for human AI interfaces and distilling down into 18 distinct guidelines that can be checked for compliance. They also go through the process of refining and tailoring these guidelines to remove ambiguity through heuristic evaluations where experts try to match the guidelines to sample interactions and identify whether the interaction adheres to or violates the guideline or if the guideline is relevant to that particular interaction at all.

  • Though it’s only mentioned in a small sentence in the Discussion section, I’m glad that they point out and acknowledge that there is a tradeoff between being very general (at which point the vocabulary you devise is useless and you have to start defining subcategories), and being very specific (at which point you need to start adding addendums and special cases willy-nilly). I think the set of guidelines in this paper does a good job of trying to strike that balance.
  • I do find it unfortunate that they anonymized the products that they used to test interactions on. Maybe this is just standard practice when it comes to this kind of HCI work to not specify the exact products that they evaluate to avoid dating the work in the paper. It probably makes sense this way they have control of the narrative and can simply talk about the application in terms of the feature and interaction tested. This avoids having to grapple over which version of the application they used on which day, because applications get updated all the time and violations might get patched and fixed and thus the application is no longer a good example for a guideline adherence or violation that was noted earlier.
  • It is kind of interesting that a majority of the experts in phase 4 preferred the original version of guideline 15 (encourage feedback) as opposed to revised version (provide granular feedback) that was successful in the user study. I wish they had explained or speculated why that was.
  1. Why do you think experts in phase preferred the original version of guideline 15 as opposed to revised version, even though the revised version was demonstrated to cause less confusion between it and guideline 17 compared to the original version?
  2. Are we going to see even more guidelines, or a revision of these guidelines 10 years down the line, when AI assisted applications become even ubiquitous?
  3. As the authors pointed out, the current ethics related guidelines (5 and 6) may not be sufficient to cover all the ethical concerns. What other guidelines should there be?

Read More

02/05/2020 – Sushmethaa Muhundan – Power to the People: The Role of Humans in Interactive Machine Learning

The paper promotes the importance of studying users and having ML systems learning interactively from them. The effectiveness of such systems that take into account their users and learn from them is often better than traditional systems and this illustrated using multiple examples. The authors feel that the involvement of users would lead to better user experiences and more robust learning systems. Interactive ML systems offer more rapid, focused and incremental model updates when compared to traditional ML systems by involving the end-user to interact and drive the system towards the intended behavior. This was often restricted to skilled practitioners in the traditional ML systems and had led to delays in incorporating end-users feedback. The benefits of interactive ML systems are two-fold: not only do they help validate the system’s performance with real users, but they also help in gaining insights for future improvement. User interaction with interactive ML was studied in detail and common themes were presented in this paper. Novel interfaces for interactive ML was also discussed that aimed at leveraging human knowledge more effectively and efficiently. These involved new methods for receiving inputs as well as providing outputs which in turn gave the user more control over the learning system and made the system more transparent.

Active learning is an ML paradigm that involves the learner choosing the examples from which they learn. It was interesting to learn about the negative impacts of this paradigm which led to frustration amongst users in the setting of interactive learning. It was uncovered that the users found the stream of questions annoying. On one hand, users are wanting to get involved in such studies to better understand the ecosystem while on the other hand, certain models are getting negative feedback. Another aspect that I found interesting was that the users were open to learning about the internal workings of the system and how the feedback affected the system. The direct impact of their feedback on subsequent iterations of the model motivated them to get more involved. It was also good to note that users were willing to give detailed feedback if given a choice as opposed to just helping with classification. 

Regarding future work, I agree with the authors in that standardization of work done so far on interactive ML under different domains is required in order to avoid duplication of work by researchers in different domains. Converging on and adopting a common language is the need of the hour to help accelerate research in this space. Also, given the subjective nature of the studies explained in this paper, I feel that a comprehensive study needs to be done and a thorough round of testing involving a diverse number of people is necessary before adopting any new interface since we do not want the new interface to be counter-productive as was in several cases cited here.

  • The paper talks about the trade-off between accuracy and speed while dealing with research on user interactions with interactive machine learning due to the requirement for rapid model updates. What are some ways to handle this trade-off?
  • While interactive ML systems involve interaction with end-users, how can the expertise of skilled practitioners be leveraged and combined with these systems to make the process more effective?
  • What are some innovative methods that can be used to experiment with crowd-powered systems to investigate how crowds of people might collaboratively drive such systems?

Read More

02/05/2020- Yuhang Liu -Power to the People: The Role of Humans in Interactive Machine Learning

This paper proposes a new machine learning model-interactive machine learning. The ability to build this learning model is largely driven by advances in machine learning. However, more and more researchers are aware of the importance of studying the users of these systems. In this paper, the author promotes this method and demonstrates how it can lead to a better user experience and a more effective learning system. After exploring many examples, the authors also reached the following conclusions:

  1. This machine learning mode is different from the traditional machine learning mode. Because the user participates, the interaction cycle is faster than the traditional machine learning cycle, which increases the possibility of interaction between the user and the machine.
  2. Researching users is the key to advancing research in this area. Knowing the user can better design the system and better respond to people.
  3. It is unneccesarry to be restrict learning system and the user, because it will lead the interaction process more transparent and produce better results.

First of all, from the text, we know that models in interactive machine learning are updated faster and more concentrated. This is because the user checks interactively and adjusts subsequent inputs. Due to these fast interaction cycles, even users with little or no machine learning expertise can guide machine learning through low-cost trial and error or specialized experiments on input and output. This also shows that the foundation of interactive machine learning is fast, centralized and incremental interaction cycles. And these cycles will help users participate in the process of machine learning. These cycles also lead to tight coupling between users and the system, making it impossible to study the system in isolation. Therefore, we found that in the new system, the machine and the user interact with each other, and in my opinion, in the future, there will be more and more research on the user, and people will eventually pay more attention to the user, because the user experience can ultimately determine the quality of a product, and for this system, the user can influence the machine learning, and the feedback from the machine to the user can ultimately determine the quality of the learning process.

Secondly, the paper mentions that a common language across diverse fields should be developed, which coincides with last week’s paper “Affordance-based framework for human-computer collaboration”, although the domains mentioned are different, and this paper proposes is later, but I think this reflects a same idea, we should establish a common language, for example, in the process of interactive machine learning, there are many ways to analyze and describe the various interactions between humans and machine learners. Therefore, there is an important opportunity to bring together and adopt a common language in these areas to help accelerate research and development in this area, but also in other areas. In this way, in the process of cross-disciplinary integration, we will also have new discoveries and have new impacts.

Questions:

1.Do you think that frequent interactions must have a positive impact on machine learning?

2.For beginners in machine learning, do you think this interactive machine learning is beneficial?

3.In machine learning, which one have a significant impact on the learning result, human or the model’s efficiency.

Read More

02/05/20 – Nan LI – Power to the People: The Role of Humans in Interactive Machine Learning

Summary:

The author in this paper indicated that interactive machine learning can promote the democratization of applied machine learning, which enables users to make use of machine learning-based systems to satisfy their own requirements. However, achieving effective end-user interaction through interactive machine learning brings new challenges. To addressing these challenges and highlight the role and importance of users in the interactive machine learning process, the author presented case studies and the discussion based on the results. For the first section of the case studies presented in the paper indicate that end-user always expect richer involvement in the interactive machine-learning process than just label instances or as an oracle. Besides, the transparency of system work could improve the user experience and the accuracy of the resulting models. Then, the case studies in the next sections indicate richer user interactions were beneficial within a limited boundary, and may not be appropriate for all scenarios. Finally, the author discussed the challenges and opportunities for interactive machine learning systems such as the desire for developing common language across diverse fields, etc.

Reflection:

Personally, I am not very familiar with machine learning. However, after reading this paper, I think the interactive machine learning system could amplify the effects of machine learning on our daily life to a great extent. Especially users with no or little machine learning knowledge could involve in the learning process could not only improve the accuracy of learning outcomes but also richer the interaction between users and products.

One typical example I have experienced the interactive machine learning is one of the features of Netease Cloud Music Player – Private Radio. The private radio recommends music you may like based on your playlist, and then require your feedback, which is like or not. The more feedback you provided, the more likely you would like the next recommendation. Thus, the user study results presented in the paper that end-user would like richer interactive is reasonable. I would also like to tag the recommend music not just like or not, which may also include the reason such as I like this because of the melody or lyrics.

I also agree with the scenario that transparency can help people provide better labels. In my opinion, the transparency of how system works have the same effect as providing users feedback on how their operations influenced the system. A good understanding of the impact of users’ actions would allow them to proactively five more accurate feedback. Regard as the Music Player example, if my private radio always recommends music I like, in order to hear more good music, I will more willing to provide feedback. Conversely, if my feedback has no influence on the radio recommendation situation, I will just give up this feature.

Questions:

  • Do you have a similar experience in the interactive machine-learning system?
  • What is your expectation of these systems?
  • What do you think of the tradeoff between machine learning and human-computer interaction in this interactive learning system?
  • Talk about any of the challenges faced by the interactive learning system which demonstrated at the end of the paper.

Read More

02/05/20 – Dylan Finch – Power to the People: The Role of Humans in Interactive Machine Learning

Summary of the Reading

Interactive machine learning is another form of machine learning that allows for much more precise and continuous changes to the model, rather than large updates that drastically change the model. In interactive machine learning models, domain experts are able to continuously update the model as it produces results, reacting to the predictions it makes in almost real time. Examples of this type of machine learning system include online recommender systems like those on Amazon and Netflix.

In order for this type of system to work, there needs to be an oracle who can correctly label data. Usually this is a person. However, people do not like being an oracle and in some cases, they can be quite bad at it.Humans would also like richer more rewarding interactions with the machine learning algorithms. The paper suggests some way that these interactions could be made richer for the person training the model.

Reflections and Connections

At the end of the paper, the authors say that these new types of interaction with interactive machine learning is a potentially powerful tool that needs to be applied to the right circumstances. I completely agree. I think that this technology, like all technologies, will be useful in some places and not in others. I think that in cases of a simple recommender system, most people are happy to just give a rating every now and then or answer a survey question every now and then. In cases like this, I think that richer interactions would take away from the simplicity and usefulness of the system. But in other cases, it would be nice to be able to kind of work with the machine learning model to generate better answers in the future. 

I also think that in some fields, technologies like the ones presented in his paper will be extremely valuable. I think that in life, it is very easy to get stuck in a rut and to not be able to think outside of the ways that we have always done things. But, it is important to do that to push technology forward. We have always thought of machine learning as an algorithm asking an oracle about specific examples. When we create interactive machine learning, we replaced the oracle with a person and applied the same ideas. But, as this paper points out, people are not oracles and they don’t like to be treated like them. So the ideas in this paper could be very impromat to unlock new ways of using machine learning in conjunction with people. And, the more we play to the strengths of people, we will be able to create better machine learning algorithms that take advantage of those strengths.

Questions

  1. What is one place you think could use interactive machine learning besides recommender systems?
  2. Which of the presented models for new ways for people to interact with machine learning algorithms do you think has the most promise?
  3. Can you think of any other new interfaces for interactive machine learning not mentioned in the paper?

Read More

02/05/20 – Dylan Finch – Principles of mixed-initiative user interfaces

Summary of the Reading

This paper seeks to help solve some of the issues present with automation in software. Often times, when a user’s tries to automate an action using an agent or tool, they may not get the result they were expecting. The paper lists many of the key issues with the then current implementations of this system.

The paper points out many of the issues that can plague systems that try to take action on behalf of the user. These include things like not adding value for the user, not considering the agent’s uncertainty about the user’s goals, not considering the status of the user’s attention when trying to suggest an action, not inferring the ideal action in light of costs and benefits, not employing a dialog to resolve key conflicts, and many others. After listing these key problems, the authors go on to describe a system that tries to solve many of these issues.

Reflections and Connections

I think that this paper does a great job of finding listing the obstacles that exist for systems that try to automate tasks for a user. It can be very hard for a system to automatically do some tasks for the user. Many times, the intentions of the user are unknown. For example, an automatic calendar event creating agent may try to create a calendar hold for a birthday party for a person that the user does not care about, so they would not want to go to the birthday party. There are many times where a user’s actions depend on much more than simply what is in an email or what is on the screen. That is why it is so important to take into account that fact that the automated system could be wrong. 

I think that the authors of this paper did a great job trying to plan for and correct items when the automated system is wrong about something. Many of the key issues they identify have to do with the agent trying to correctly guess when the user does actually need to use the system and what to do when that guess is wrong. I think that the most important issues that they list are the ones that have to do with error recovery. No system will be perfect, so you should at least have a plan for what will happen when the system is wrong. The system that they describe is excellent in this department. It will automatically go away if the user does not need it, and it will use dailogs to get missing information and correct mistakes. This is exactly what a system like this should do when it encounters an error or does something wrong. There should be a way out and a way for the user to correct the error. 

Questions

  1. Which of the critical factors listed in the paper do you think is the most important? The least?
  2. Do you think that the system the developed does a good job meeting all of the issues they brought up? 
  3. Agents are not as popular as they used to be and this article is very old. Do you think these ideas still hold relevance today?

Read More

02/05/2020 – Nurendra Choudhary – Power to the People: The Role of Humans in Interactive Machine Learning (Amershi et. al.)

Summary

The authors discuss the relatively new area of interactive machine learning systems. The previous ML development workflow relied on a laborious cycle of development by ML researchers, critique and feedback by domain experts and back to fine-tuning and development. Interactive ML enables faster feedback and its direct integration to the learning architecture making the process much faster. The paper  

describes case-studies of the effect these systems have from a human and the algorithm’s perspective.

For the system, the instant feedback provides a more robust learning method where the model can fine-tune itself in real-time leading in a much better user experience.

For humans, labelling data is a very mundane task and interactivity makes it albeit a little complex. This increases important efficiency features like attention and thought. This makes the entire process more efficient and precise.

Reflection

The part that I liked the most was “humans are not oracles”. This puts into question the importance of robustness in labeled datasets. ML systems consider datasets as the ground truth, but this cannot be true anymore. We need to apply statistical measures like confidence intervals even for human annotation. Furthermore, this means ML systems are going to mimic all the potential limitations and problems that plague human society (discrimination and hate speech are such examples). In my opinion, I believe the field of Fairness will rise to significance as more complex ML systems show the clear bias that they learn from human annotations.

Another important aspect is the change in human behaviour due to machines. I think this is not emphasized enough. When we got to know the inner mechanism of search, we modified our queries to match something the machine can understand. This signals our inherent mechanism to adapt based on machines. I think this can be observed throughout the development of human civilization (technology changing our routines, politics, entertainment and even conflicts). Interactive ML simulates this adaptation feature in the context of AI.

Another point: “People Tend to Give More Positive Than Negative Feedback to Learners” is interesting. This means people give feedback based on their nature. It is natural to us. For example, people have different methods of teaching and understanding.  However, AI does not differentiate between its feedback based on the nature of its trainers. I think we need to study this more closely and model our AI to handle human nature. The interesting part to study is the triviality or complexity of modeling human behavior in conjunction with the primary problem.

Regarding the transparency of ML systems, the area has seen a recent push towards interpretability. This is a field of study focusing on understanding the architecture and function of models in a deterministic way. I believe transparency will bring more confidence towards the field. Popular questions like “Is AI going to end the world?” and “Are robots coming?” tend to arise from the lack of transparency in these non-deterministic architectures.

Questions

  1. Can we use existing games/interactive systems to learn more complex data for the machine learning algorithms?
  2. Can we model the attention of humans to understand how it might have affected the previous annotations?
  3. Can we trust datasets if human beings lose attention over a period of time?
  4. From an AI perspective, how can we improve AI systems to account for human error and believe their labels to be ground truth?

Word Count: 574

Read More

02/05/2020 – Palakh Mignonne Jude – Principles of Mixed-initiative User Interfaces

SUMMARY

This paper, that was published in 1999, reviews principles that can be used when coupling automated services with direct manipulation. Multiple principles for mixed-initiative UI have been listed in this paper, such as developing significant value-added automation, inferring ideal action in the light of costs, benefits, and uncertainties, continuing to learn by observing, etc.  The author focusses on the LookOut project – an automated scheduling service for Microsoft Outlook – which was an attempt to aid users in automatically adding appointments to their calendar based on the messages that were currently viewed by the user. He then discusses about the decision-making capabilities of this system under uncertainty – LookOut was designed to parse the header, subject, and body of a message and employ a probabilistic classification system in order to identify the intent of the user. The LookOut system also offered multiple interaction modalities which included direct manipulation, basic automated-assistance, social-agent modality. The author also discusses inferring beliefs from user goals as well as mapping these beliefs to actions. The author discusses the importance of timing these automated services such that they are not invoked before the user is ready for the service.

REFLECTION

I found it very interesting to read about these principles of mixed-initiative UI considering that they were published in 1999 – which, incidentally, was when I first learnt to use a computer! I found that the principles being considered were fairly wide-spread considering the year of publication. However, principles such as ‘considering uncertainty about use’s goals’ and ‘employing dialog to resolve key uncertainties’ could have perhaps been addressed by performing behavior modeling. I was happy to learn that the LookOut system had multiple interaction modalities that could be configured by the user and was surprised to learn that the system employed an automated speech recognition system that was able to understand human speech. It did, however, make me wonder about how this system performed with respect to different accents; even though the words under consideration were basic words such as ‘yes’, ‘yeah’, ‘sure’, I wondered about the performance of the system. I also thought that it was nice that the system was able to identify if a user seemed disinterested and that the system waited in order to obtain a response. I also felt that it was a good design strategy to implement a continued training mechanism and that users could dictate a training schedule for the same. However, if the user were to dictate a training schedule, I wonder if it would cause a difference in the user’s behavior versus if they were to act without knowing that their data would be monitored at that given point in time (consent would be needed, but perhaps randomly observing user behavior would ensure that the user is not made too conscious about their actions).

QUESTIONS

  1. Not having explored the AI systems of the 90s, I am unaware about the way these systems work. The paper mentions that the LookOut system was designed to continue to learn from users, how was this feedback loop implemented? Was the model re-trained periodically?
  2. Since data and the bias present in the data used to train a model is very important, how were the messages collected in this study obtained? The paper mentions that the version of LookOut being considered by the paper was trained using 500 relevant and 500 irrelevant messages – how was this data obtained and labeled?
  3. With respect to the monitoring of the length of time between the review of a message and the manual invocation of the messaging service, the authors studied the relationship based on the size of the message and the time users dwell on the same. What was the demographic of the people used as part of this study? Would there exist a difference in the time taken when considering native versus non-native English speakers?

Read More