02/05/20 – Mohannad Al Ameedi – Guidelines for Human-AI Interaction

Summary

In this paper, the authors suggest 18 design guidelines to build for Human – AI infused systems. These guidelines try to resolve the issues with many human interaction systems that either don’t follow guidelines or follow some guidelines that are not tested or evaluated. These issues include producing offensive, unpredictable, or dangerous results and might let users stop using these systems and therefore proper guidelines are necessary. in addition, advances in the AI field introduced new user demands in area like sound recognition, pattern recognition, and translation. These 18 guidelines help users to understand what the AI systems can do and cannot do and how well can do it, show information related to the task on focus with the appropriate timing in a way that can fit the user social and culture context, make sure that the user can request services when needed or ignore unnecessary services, offer explanation why the system do certain things, maintain a memory of the user recent action and try to learn from it, etc. These guidelines went through different phases starting from consolidating different guidelines, heuristics evaluation, user study, and expert evaluation and revisions. The authors hope that these guidelines will help building better AI-infused systems that can scale and work better with the increased number of users and advances AI algorithms and systems.

Reflection

I found the idea of putting together design guidelines very interesting as it make a standardization that can help building human-AI systems, and also help evaluating or testing these systems and can be used as a baseline when building large scale AI infused systems to avoid an well known issues associated with the previous systems.

I also found that the collection of academic and industrial guidelines are interested since it is collected based on over 20 years human interaction which can be regarded as a very valuable and rich information that can be used in different domains and fields.  

I agree with the authors that some of AI-infused systems that not follow certain guidelines are confusing and not effective and sometimes counterproductive when the suggestions or recommendations are irrelevant, and that explains why some AI enabled or infused systems were popular on certain times but they couldn’t satisfy the user demands and eventually stopped being used by users.  

Questions

  • Are these guidelines followed in Amazon Mechanical Turk?
  • The authors mention that there is a tradeoff between generality and specializations, what tradeoff factors when need to consider?

Read More

02/05/20 – Ziyao Wang – Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

The author did a survey about how crowdsourcing was applied in research on machine learning. Firstly, previous researches were reviewed to conclude categories for the application of crowdsourcing in the machine learning area. The applications were broken into four categories: data generation, evaluating and debugging models, hybrid intelligence systems and behavior studies to inform machine learning research. In each of the categories, the author discussed several specific areas. For each area, the author concluded several related types of research and made an introduction to each of the research. Finally, the author did analyze on understanding the crowd workers. Though crowdsourcing seemed to greatly help machine learning researches, the author did not ignore the problems in this system, such as dishonesty among workers. Finally, this survey gave researchers who focused on machine learning and applied to crowdsource four advice: maintain a three relationship with crowd workers, care about good task design and use pilots.

Reflection:

From this survey, readers can have a thorough view of the applications of crowdsourcing in machine learning research. It concludes most of the state-of-the-art in machine learning areas related to crowdsourcing. Traditional machine learning always facing problems like lack of data, models cannot be judged, lack of user feedback or system not trustable. However, with the application of crowdsourcing, all these problems can be solved with the help of crowdsourcing workers. Though this is only a survey of previous researches, it actually lets readers get a comprehensive view of this combination of technology.

This survey reminds us of the importance of reviewing previous works. When we want to do research about a topic, there will be thousands of researches which may help. However, it is impossible to view all the papers. Instead, if there is a survey that summarized all previous works and categorized them into several more specific categories, we can easily get a comprehensive view of the topic and new ideas may occur. In this paper, with the research of the four categories of application of crowdsourcing in machine learning, the author comes up with the idea to do research to understand the crowd and finally made suggestions for future researchers. Similarly, if we can do a survey of what we want to do as our projects, we may find out what is a need and what is novel in this field, which will lead to the success of the projects and the development of the field.

Also, it is important to consider critically. In this survey, though the author concluded numerous contributions of crowdsourcing towards machine learning researches, he still discussed the potential risk of this application, for example, dishonesty among workers. This is important for future researches and should not be ignored. In our projects, we should also think critically so that the drawbacks of the ideas we proposed can be judged fairly and the project can be practical and valuable.

Problems:

Which factors can contribute to a good task design?

Is there any solution that can solve the problem of dishonesty among workers instead of mitigating it?

In the experiments which aim to find out user reaction towards something, can the reaction of the paid workers be considered similar to the reaction of practical users?

Read More

02/05/2020 – The Role of Humans in Interactive Machine Learning – Subil Abraham

Reading: Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the People: The Role of Humans in Interactive Machine Learning. AI Magazine 35, 4: 105–120. https://doi.org/10.1609/aimag.v35i4.2513

Machine learning systems typically are built by collaboration between the domain experts and the ML experts. The domain experts provide data to the ML experts, who will carefully configure and tune the ML model which is then sent back to the domain experts for review, who will then recommend further changes and the cycle continues until the model reaches an acceptable accuracy level. However, this tends to be a slow and frustrating process and there exists a need to get the actual users involved in a more active manner. Hence, the study of interactive machine learning arose to identify how users can best interact with and improve the ML models through faster, interactive feedback loops. This paper surveys the field, looking at what users like and don’t like when teaching machines, what kind of interfaces are best suited for these interaction cycles and what unique interfaces can exist beyond the simple labelling-learning feedback loop.

When reading about the novel interfaces that exist for interactive machine learning, I find there is an interesting parallel between the development of the “Supporting Experimentation of Inputs” type of interface and to that of text editors. The earliest text editor was the typewriter, where an input once entered could never be taken back. A correction would require starting over or the use of an ugly whiteout. With electronics came the text editors where you could edit only one line at a time. And finally, today we have these advanced, feature rich, editors and IDEs with autocomplete suggestions, in line linting and automatic type checking and error feedback. It would be interesting to see what the next stage of ML model editing would look like if they continued on this trajectory, where we can go from simple “backspace key” type experimentation to more features parallel to what modern text editors have for words. The idea of allowing “Combining Models” as a way to create models draws another interesting parallel to car manufacturing, where cars went from being handcrafted to being built on an assembly line with standardized parts.

I also think their proposal for creating a universal language to connect the different ML fields might end up creating a language that is too general and the different fields, though initially unified, might end up splitting off again due to using only subsets of the language that don’t overlap with each other or by creating new words because the language does not have anything specific enough.

  1. Is the task of creating a “universal language” a good thing? Or would we end up with something too general to be useful and cause fields to create their own subsets?
  2. What other kinds of parallels can we see in the development of machine learning interfaces, like the parallels to text editor development and car manufacturing?
  3. Where is the “Goldilocks zone” for ML systems that are giving context to the user for the sake of transparency? There is a spectrum between “Label this photo with no context” to “here is every minute detail, number of pixels, exact gps location, all sorts of other useless info”. How do we decide which information the ML system should provide as context?

Read More

02/05/2020-Donghan Hu-Guidelines for Human-AI Interaction

Guidelines for Human-AI Interaction

In this paper, the authors focus on the problem that human-AI interaction researches need the light of advances in this growing technology. According to this, the authors propose 18 generally applicable design guidelines for the designs and studies for human-AI interactions. Based on a 49-participant involved user study, writers test the validation of these guidelines. These 18 guidelines are: 1) make clear what the system can do, 2) make clear how well the system can do what it can do, 3) time services based on context, 4) show contextually relevant information, 5) match relevant social norms, 6) mitigate social biases, 7) support efficient invocation, 8) support efficient dismissal, 9) support efficient correction, 10) scope services when in doubt, 11) make clear why the system did what it did, 12) remember recent interaction, 13) learn from user behavior, 14) update and adapt cautiously 15) encourage granular feedback, 16) convey the consequences of user actions, 17) provide global controls and 18) Notify users about changes. After the user study,

After reading this paper, I am kind of surprised that authors can purpose 18 guidelines for human-AI interaction designing. I am most interested in the category of “During interaction”. This discussion focus factors about time, context, personal data and social norms. In my opinion, providing users with specific services that can assist their interactions should also be considered in this part. For example, accessible and assistant. In addition, considering social norms is a great idea. Individuals who use the AI system have many kinds of background, abilities, and ethics. We cannot treat every person with the same design of the applications and systems. Allowing users to design their preferred user interfaces, features, and functions in one general system is a promising but challenging research question. I think this is a promising topic in the future. At present, many applications and systems allow users to customize their own features with the provided default settings. Players can design their own models for games, like the Steam platform. For Google Chrome, users can design their own theme based on their motivations and goals. I believe this feature can be achieved by multiple human-AI interaction systems later.

Among these 18 different guidelines, I notice that an AI application does not have to require all these guidelines. Hence, do some of the guidelines have high majorities than others? Or, in the process of designing, researchers should treat each of them equally?

In your opinion, which guidelines do you consider are more important and will focus on them in the future? Or which guidelines you might have ignored in the previous researches?

In this paper, the authors mentioned the tradeoff between generality and specialization. How do you think to solve this problem?

Will these guidelines become useless due to the increase of specialization in various kinds of applications and systems in the future?

Read More

2/5/20 – Lee Lisle – Guidelines for Human-AI Interaction

Summary

               The authors (of which there are many) go over the various HCI-related findings for Human-AI interaction and categorize them into eighteen different types over 4 categories (applicable to when the user encounters the AI assistance). The work makes sure the reader knows it was from the past twenty years of research and from a review of industry guidelines, articles and editorials in the public domain, and a (non-exhaustive) survey of scholarly papers on AI design. In all, they found 168 guidelines that they then performed affinity diagramming (and filtering out concepts that were too “vague”), resulting in twenty concepts. Eleven members of their team at Microsoft then performed a modified discount heuristic evaluation (where they identified an application and its issue) and refined their guidelines with that data, resulting in 18 rules. Next, they performed a user study with 49 HCI experts where each was given an AI-tool and asked to evaluate it. Lastly, they had experts validate their revisions in the previous phase.

Personal Reflection

               These guidelines are actually quite helpful in evaluating an interface. As someone who has performed several heuristic evaluations in a non-class setting, having defined rules that can be easily determined if they’ve been violated makes the process significantly quicker. Nielsen’s heuristics have been the gold standard for perhaps too long, so revisiting the creation of guidelines is ideal. It also speaks to how new this paper is, being from 2019’s CHI conference.

               Various things surprised me in this work. First, I was surprised that they stated that contractions weren’t allowed for their guidelines because they weren’t clear. I haven’t heard that complaint before, and it seemed somewhat arbitrary. A contraction doesn’t change a sentence much (doesn’t in this sentence is clearly “does not”), but I may be mistaken here. I was also surprised to find their tables in figure 1 to be hard to read, as if maybe it as a bit too information dense to clearly impart their findings. I was also surprised about their example for guideline 6, as suggesting personal pronouns and kind of stating there are only 2 is murky, at best (I would’ve used a different example entirely). Lastly, the authors completely ignored the suggestion of keeping the old guideline 15, stating their own reasons despite the expert’s preferences.

               I also think this paper in particular will be a valuable resource for future AI development. In particular, it can give a lot of ideas for our semester project. Furthermore, these guidelines can help early on in the process of designing future interactions, as they can refine and correct interaction mistakes before the implementation of many of these features.

               Lastly, I thought it was amusing the “newest” member of the team got a shout-out in the acknowledgements.

Questions

  1. The authors bring up trade-offs as being a common occurrence in balancing these (and past) guidelines. Which of these guidelines do you think is easier or harder to bend?
  2. The authors ignored the suggestion of their own panel of experts in revising one of their guidelines. Do you think this is appropriate for this kind of evaluation, and why or why not?
  3. Can you think of an example of one of these guidelines not being followed in an app you use? What is it, and how could it be improved?

Read More

02/05/2020 – Guidelines for Human AI Interaction – Subil Abraham

Reading: Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 1–13. https://doi.org/10.1145/3290605.3300233

With AI and ML making its way into every aspect of our electronic lives, it has become pertinent to examine how well it functions when faced with users. In order to do that, we need to have some set of rules or guidelines that we can use as a reference to identify whether the interaction between a human and an AI powered feature is actually functioning the best way it should function. This paper aims to fill that gap, collating the knowledge of 150 recommendations for human AI interfaces and distilling down into 18 distinct guidelines that can be checked for compliance. They also go through the process of refining and tailoring these guidelines to remove ambiguity through heuristic evaluations where experts try to match the guidelines to sample interactions and identify whether the interaction adheres to or violates the guideline or if the guideline is relevant to that particular interaction at all.

  • Though it’s only mentioned in a small sentence in the Discussion section, I’m glad that they point out and acknowledge that there is a tradeoff between being very general (at which point the vocabulary you devise is useless and you have to start defining subcategories), and being very specific (at which point you need to start adding addendums and special cases willy-nilly). I think the set of guidelines in this paper does a good job of trying to strike that balance.
  • I do find it unfortunate that they anonymized the products that they used to test interactions on. Maybe this is just standard practice when it comes to this kind of HCI work to not specify the exact products that they evaluate to avoid dating the work in the paper. It probably makes sense this way they have control of the narrative and can simply talk about the application in terms of the feature and interaction tested. This avoids having to grapple over which version of the application they used on which day, because applications get updated all the time and violations might get patched and fixed and thus the application is no longer a good example for a guideline adherence or violation that was noted earlier.
  • It is kind of interesting that a majority of the experts in phase 4 preferred the original version of guideline 15 (encourage feedback) as opposed to revised version (provide granular feedback) that was successful in the user study. I wish they had explained or speculated why that was.
  1. Why do you think experts in phase preferred the original version of guideline 15 as opposed to revised version, even though the revised version was demonstrated to cause less confusion between it and guideline 17 compared to the original version?
  2. Are we going to see even more guidelines, or a revision of these guidelines 10 years down the line, when AI assisted applications become even ubiquitous?
  3. As the authors pointed out, the current ethics related guidelines (5 and 6) may not be sufficient to cover all the ethical concerns. What other guidelines should there be?

Read More

02/05/2020 – Sushmethaa Muhundan – Power to the People: The Role of Humans in Interactive Machine Learning

The paper promotes the importance of studying users and having ML systems learning interactively from them. The effectiveness of such systems that take into account their users and learn from them is often better than traditional systems and this illustrated using multiple examples. The authors feel that the involvement of users would lead to better user experiences and more robust learning systems. Interactive ML systems offer more rapid, focused and incremental model updates when compared to traditional ML systems by involving the end-user to interact and drive the system towards the intended behavior. This was often restricted to skilled practitioners in the traditional ML systems and had led to delays in incorporating end-users feedback. The benefits of interactive ML systems are two-fold: not only do they help validate the system’s performance with real users, but they also help in gaining insights for future improvement. User interaction with interactive ML was studied in detail and common themes were presented in this paper. Novel interfaces for interactive ML was also discussed that aimed at leveraging human knowledge more effectively and efficiently. These involved new methods for receiving inputs as well as providing outputs which in turn gave the user more control over the learning system and made the system more transparent.

Active learning is an ML paradigm that involves the learner choosing the examples from which they learn. It was interesting to learn about the negative impacts of this paradigm which led to frustration amongst users in the setting of interactive learning. It was uncovered that the users found the stream of questions annoying. On one hand, users are wanting to get involved in such studies to better understand the ecosystem while on the other hand, certain models are getting negative feedback. Another aspect that I found interesting was that the users were open to learning about the internal workings of the system and how the feedback affected the system. The direct impact of their feedback on subsequent iterations of the model motivated them to get more involved. It was also good to note that users were willing to give detailed feedback if given a choice as opposed to just helping with classification. 

Regarding future work, I agree with the authors in that standardization of work done so far on interactive ML under different domains is required in order to avoid duplication of work by researchers in different domains. Converging on and adopting a common language is the need of the hour to help accelerate research in this space. Also, given the subjective nature of the studies explained in this paper, I feel that a comprehensive study needs to be done and a thorough round of testing involving a diverse number of people is necessary before adopting any new interface since we do not want the new interface to be counter-productive as was in several cases cited here.

  • The paper talks about the trade-off between accuracy and speed while dealing with research on user interactions with interactive machine learning due to the requirement for rapid model updates. What are some ways to handle this trade-off?
  • While interactive ML systems involve interaction with end-users, how can the expertise of skilled practitioners be leveraged and combined with these systems to make the process more effective?
  • What are some innovative methods that can be used to experiment with crowd-powered systems to investigate how crowds of people might collaboratively drive such systems?

Read More

02/05/2020- Yuhang Liu -Power to the People: The Role of Humans in Interactive Machine Learning

This paper proposes a new machine learning model-interactive machine learning. The ability to build this learning model is largely driven by advances in machine learning. However, more and more researchers are aware of the importance of studying the users of these systems. In this paper, the author promotes this method and demonstrates how it can lead to a better user experience and a more effective learning system. After exploring many examples, the authors also reached the following conclusions:

  1. This machine learning mode is different from the traditional machine learning mode. Because the user participates, the interaction cycle is faster than the traditional machine learning cycle, which increases the possibility of interaction between the user and the machine.
  2. Researching users is the key to advancing research in this area. Knowing the user can better design the system and better respond to people.
  3. It is unneccesarry to be restrict learning system and the user, because it will lead the interaction process more transparent and produce better results.

First of all, from the text, we know that models in interactive machine learning are updated faster and more concentrated. This is because the user checks interactively and adjusts subsequent inputs. Due to these fast interaction cycles, even users with little or no machine learning expertise can guide machine learning through low-cost trial and error or specialized experiments on input and output. This also shows that the foundation of interactive machine learning is fast, centralized and incremental interaction cycles. And these cycles will help users participate in the process of machine learning. These cycles also lead to tight coupling between users and the system, making it impossible to study the system in isolation. Therefore, we found that in the new system, the machine and the user interact with each other, and in my opinion, in the future, there will be more and more research on the user, and people will eventually pay more attention to the user, because the user experience can ultimately determine the quality of a product, and for this system, the user can influence the machine learning, and the feedback from the machine to the user can ultimately determine the quality of the learning process.

Secondly, the paper mentions that a common language across diverse fields should be developed, which coincides with last week’s paper “Affordance-based framework for human-computer collaboration”, although the domains mentioned are different, and this paper proposes is later, but I think this reflects a same idea, we should establish a common language, for example, in the process of interactive machine learning, there are many ways to analyze and describe the various interactions between humans and machine learners. Therefore, there is an important opportunity to bring together and adopt a common language in these areas to help accelerate research and development in this area, but also in other areas. In this way, in the process of cross-disciplinary integration, we will also have new discoveries and have new impacts.

Questions:

1.Do you think that frequent interactions must have a positive impact on machine learning?

2.For beginners in machine learning, do you think this interactive machine learning is beneficial?

3.In machine learning, which one have a significant impact on the learning result, human or the model’s efficiency.

Read More

02/05/20 – Nan LI – Power to the People: The Role of Humans in Interactive Machine Learning

Summary:

The author in this paper indicated that interactive machine learning can promote the democratization of applied machine learning, which enables users to make use of machine learning-based systems to satisfy their own requirements. However, achieving effective end-user interaction through interactive machine learning brings new challenges. To addressing these challenges and highlight the role and importance of users in the interactive machine learning process, the author presented case studies and the discussion based on the results. For the first section of the case studies presented in the paper indicate that end-user always expect richer involvement in the interactive machine-learning process than just label instances or as an oracle. Besides, the transparency of system work could improve the user experience and the accuracy of the resulting models. Then, the case studies in the next sections indicate richer user interactions were beneficial within a limited boundary, and may not be appropriate for all scenarios. Finally, the author discussed the challenges and opportunities for interactive machine learning systems such as the desire for developing common language across diverse fields, etc.

Reflection:

Personally, I am not very familiar with machine learning. However, after reading this paper, I think the interactive machine learning system could amplify the effects of machine learning on our daily life to a great extent. Especially users with no or little machine learning knowledge could involve in the learning process could not only improve the accuracy of learning outcomes but also richer the interaction between users and products.

One typical example I have experienced the interactive machine learning is one of the features of Netease Cloud Music Player – Private Radio. The private radio recommends music you may like based on your playlist, and then require your feedback, which is like or not. The more feedback you provided, the more likely you would like the next recommendation. Thus, the user study results presented in the paper that end-user would like richer interactive is reasonable. I would also like to tag the recommend music not just like or not, which may also include the reason such as I like this because of the melody or lyrics.

I also agree with the scenario that transparency can help people provide better labels. In my opinion, the transparency of how system works have the same effect as providing users feedback on how their operations influenced the system. A good understanding of the impact of users’ actions would allow them to proactively five more accurate feedback. Regard as the Music Player example, if my private radio always recommends music I like, in order to hear more good music, I will more willing to provide feedback. Conversely, if my feedback has no influence on the radio recommendation situation, I will just give up this feature.

Questions:

  • Do you have a similar experience in the interactive machine-learning system?
  • What is your expectation of these systems?
  • What do you think of the tradeoff between machine learning and human-computer interaction in this interactive learning system?
  • Talk about any of the challenges faced by the interactive learning system which demonstrated at the end of the paper.

Read More

02/05/20 – Dylan Finch – Power to the People: The Role of Humans in Interactive Machine Learning

Summary of the Reading

Interactive machine learning is another form of machine learning that allows for much more precise and continuous changes to the model, rather than large updates that drastically change the model. In interactive machine learning models, domain experts are able to continuously update the model as it produces results, reacting to the predictions it makes in almost real time. Examples of this type of machine learning system include online recommender systems like those on Amazon and Netflix.

In order for this type of system to work, there needs to be an oracle who can correctly label data. Usually this is a person. However, people do not like being an oracle and in some cases, they can be quite bad at it.Humans would also like richer more rewarding interactions with the machine learning algorithms. The paper suggests some way that these interactions could be made richer for the person training the model.

Reflections and Connections

At the end of the paper, the authors say that these new types of interaction with interactive machine learning is a potentially powerful tool that needs to be applied to the right circumstances. I completely agree. I think that this technology, like all technologies, will be useful in some places and not in others. I think that in cases of a simple recommender system, most people are happy to just give a rating every now and then or answer a survey question every now and then. In cases like this, I think that richer interactions would take away from the simplicity and usefulness of the system. But in other cases, it would be nice to be able to kind of work with the machine learning model to generate better answers in the future. 

I also think that in some fields, technologies like the ones presented in his paper will be extremely valuable. I think that in life, it is very easy to get stuck in a rut and to not be able to think outside of the ways that we have always done things. But, it is important to do that to push technology forward. We have always thought of machine learning as an algorithm asking an oracle about specific examples. When we create interactive machine learning, we replaced the oracle with a person and applied the same ideas. But, as this paper points out, people are not oracles and they don’t like to be treated like them. So the ideas in this paper could be very impromat to unlock new ways of using machine learning in conjunction with people. And, the more we play to the strengths of people, we will be able to create better machine learning algorithms that take advantage of those strengths.

Questions

  1. What is one place you think could use interactive machine learning besides recommender systems?
  2. Which of the presented models for new ways for people to interact with machine learning algorithms do you think has the most promise?
  3. Can you think of any other new interfaces for interactive machine learning not mentioned in the paper?

Read More