02/05/20 – Vikram Mohanty – Power to the People: The Role of Humans in Interactive Machine Learning

Paper Authors: Saleema Amershi, Maya Cakmak, W. Bradley Knox, Todd Kulesza

Summary

This paper highlights the usefulness of intelligent user interfaces or the power of human-in-the-loop workflows for improving machine learning models, and makes the case for moving from traditional machine learning workflows to interactive machine learning platforms. Implicitly, domain experts, or the potential users of such applications, can provide high-quality data points. In order to facilitate that, the role of user interfaces and user experience is illustrated via numerous examples. The paper outlines some challenges and future direction of research for understanding better how user interfaces interact with learning algorithms and vice-versa.

Reflections

  1. The case study with proteins and biochemists illustrates a classic case of frustration associated with iterative design, while striving to align with user needs. However, in this example, the problem space was focused on getting a ML model right for the users. As the case study showed, interactive machine learning applications seemed to be the right fit for solving this problem as opposed to iteratively tuning the model manually by the experts. The research community is rightfully moving in the direction of producing smarter applications, and in order to ensure more (better?) intelligibility of these applications, building user interfaces/applications for interactive machine learning seem to be an effective and cost-efficient route.  
  2. In the realm of intelligent user interfaces, even though human users are not just good enough for providing quality training data and provide a lot more value beyond that, my reflection will center around the “human-in-the-loop” aspect to keep the discussion aligned with the paper’s narrative. The paper, without explicitly mentioning, also shows how we can get good quality training labels without relying solely on crowdsourcing platforms like AMT or Figure Eight, but rather, by focusing on the potential users of such applications, who are often domain experts for the applications. The trade-offs between collecting data from novice workers on AMT and domain experts are pretty obvious: quality vs cost.
  3. The authors, through multiple examples, also make an effective argument about the inevitable role of user interfaces in ensuring a stream of good-quality data. The paper further stresses the importance of user experiences in generating rich and meaningful datasets.
  4. “Users are People, Not Oracles” is the first point, and seems to be a pretty important one. If applications are built with the sole intention of collecting training data, there’s a risk of user experience being sacrificed, which may affect good quality data and the cycle ceases to exist.
  5. Because it is difficult to decouple the contributions of the interface design or the algorithm chosen, coming up with an effective evaluation workflow seems like a challenge. However, it seems to be very context-dependent and following recent guidelines such as https://pair.withgoogle.com/ or https://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/ can go a long way in improving these interfaces.

Questions

  1. For researchers working on crowdsourcing platforms, even it’s for a simple labeling task, how did you handle poor quality data? Did you ever re-evaluate your task design (interface/user experience)?
  2. Let’s say you work in a team with domain experts. Domain experts use an intelligent application in their every day work to accomplish a complex task A (the main goal of the team) and a result, you get data points (let’s call it A-data). As a researcher, you see the value of collecting data points B-data from the domain experts, which may improve the efficiency of task A. However, in order to collect B-data, domain experts have to perform task B, which is an extra task and deviates from A (which is their main objective and what they are paid for). How would you handle this situation? [This is pretty open-ended]
  3. Can you think of any examples where collecting negative user feedback (which can significantly improving the learning algorithm) also fits the natural usage of the application?

Read More

02/04/2020 – Akshita Jha – Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

Summary:
“Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research” by Vaughan is survey paper that provides an informative overview of the crowdsourcing research for the machine leaning community. There are four main application areas:
(i)Data Generation: This is made up of two types of work. The first type of data aggregation is where the several crowdworkers are assigned the same data point and asked to annotate it. The machine learning algorithm then aggregates this data and finalizes the response. The second type of research in data aggregation involves modifying the system to get quality responses from crowdworkers.
(ii)Evaluation and debugging of the model: The crowdsourced workers can help debug and evaluate unsupervised machine learning algorithms like topic modelling, LDA, generative models etc.
(iii)Hybrid systems that utilize both machines and humans to expand its capabilities: Humans and machines have complementary strengths which, if made proper use of, can result in effective systems that help humans as well as improve the machine’s understanding.
(iv)Crowdsourced behavioral experiments that gather data and improve our understanding of how humans would like to interact with machine learning systems: Behavioral experiments can help us understand what how humans would like to interact with the system and the changes that can be made to improve the end user satisfaction.

Reflections:
In my limited knowledge about crowdworkers, I was aware of their importance for data aggregation. The author does a good job highlighting other areas where machine learning researchers might benefit from utilizing the power of crowdworkers. What I found particularly interesting were the case studies making use of crowdworkers to debug models and evaluate their interpretability. When we think of “debugging” models and finding out flows in the system, we mostly try to view things from the developer’s point of view and rely on them completely to debug and evaluate the model’s performance. Using crowdworkers for the task seems like a useful application areas which more machine learning researchers should be aware of. These tasks might also be of greater interest to the crowdworkers because they are not repetitive and involve active participation of the crowdworkers. “Human debugging” can help the system by taking into account the crowdworkers feedback to uncover bottlenecks in machine learning models. Hybrid techniques that involve using human feedback also seems like a promising application area where the system relies extensively on human judgement to make the right decisions. This also puts more responsibility on the machine learning researchers to be creative and come up with unique ways to involve humans. Setting up pilot studies can help in this front. Pilot studies can prove useful as they demonstrate how a lay man interacts with a system and the gaps that exist which should be filled up by the researchers in order to ensure a cohesive experience for the end user. However, care should be taken to ensure that the effort put in by the crowdworkers for building these systems does not go unappreciated.

Questions:
1. Did you agree with the applications of crowdworkers presented in this survey?
1. What steps can be taken to make machine learning researchers aware of these potential applications?
2. Apart from fairly compensating the workers, what steps can be taken to value their contributions?

Read More

02-05-2020 – Ziyao Wang – Principles of Mixed-Initiative User Interfaces

The author proposed an idea about combining both automated services and direct manipulation in developing user interfaces. With the consideration about 12 factors include developing significant value-added automation, uncertainty about a user’s goals, user’s attention, costs, benefits, dialog and so on, the author did research on the Lookout system, which is mixed-initiative user interfaces that enable users and intelligent agents to collaborate efficiently. The factors of value-added calendaring and scheduling service, decision making under uncertainty, multiple interaction modalities and handling invocation failures are evaluated, and future expectations are set. After the discussion about costs and benefits and the research on the Lookout system, the combination of reasoning machinery and direct manipulation was proved to have a promising chance for improving human-computer interaction.

Reflection:

Though this paper is written in 1999, the idea behind the paper is still valuable now. The combination of automated services and direct manipulation has been widely applied to current user-interfaces. For example, in designing of the user-interface of Taobao, the developers built numerous modules. The modules can be arranged by the users according to their preference and in the meantime, there is an AI system that will arrange the modules according to the search history and the user actions. With the combination of these two arrangements, user experience is improved significantly. Apart from Taobao, most current popular applications or websites have similar systems that were recommended by this paper written in 1999. Apart from this paper, there must be other old papers which contain valuable ideas nowadays. For this reason, there is a necessity for the current researchers to review old papers regularly.

It is for sure that we need to read up-to-date papers, which represent the current state-of-the-art. However, some of the ideas prompted in the old papers still work now. Some of the ideas proposed in these papers were impossible to be implemented at that time. As a result, the papers were ignored by other researchers. However, with the development of technology, we should review the papers which propose ideas that were not possible to implement from time to time. Someday, these not real ideas may become true with technology development.

Apart from the idea proposed in the paper, I have another thought about how the author can think about this idea. At that time, researchers focused on both the tools for users to directly manipulate user-interfaces and automated services which can sense user activity and take automated actions. However, researches that focus on the combination of the two aspects are limited. The author considered from both sides and directed a new way for the improvement of human-computer interaction. Similarly, if we can combine two up-to-date research topics which have similarity, novel solutions to some of the current challenges may be proposed and this may be applied in our course projects.

Question:

Which applications applied this proposed approach combining both automated services and direct manipulation?

What should we do if the agent’s decision is conflicted with the user’s decision?

Is it ethical for agents to track user activities? If not, how can agents service automatically?

Read More

02/04/2020 – Akshita Jha – Power to the People: The Role of Humans in Interactive Machine Learning

Summary:
“Power to People: The Role of Humans in Interactive Machine Learning” by Amershi et. al. talks about the tightly coupled interactivity between systems and end users and how to better user experiences while improving system performance. The workflow for conventional machine learning algorithms involves a long drawn out process of training/pre-training, fine-tuning, iteratively tuning hyper-parameters, etc. to improve the target metrics. In comparison, the feedback in the interactive machine learning workflow are rapid, focused and incremental. Prominent real-world examples of interactive machine learning systems include recommender systems like Amazon and Netflix. Interactive machine learning has also been used for image segmentation where the users were asked to mark the foreground and the background image. The system took this feedback into consideration and improved its performance. Similarly, interactive music composition definitely helps improve the system but has also shown to train the students. The authors also present case studies that explore novel interfaces for interactive machine learning. For example, experimentation providing the ability to the end user to modify the input to observe the effect on the final result or the output, studies attempting to understand the efficacy of active vs. passive learning, enabling the users to query the learner as opposed to only answering questions, enabling users to provide active feedback and critique the learner’s output etc. In all the above examples, the user and the system are tightly coupled and form a cohesive unit which is difficult to study in isolation.

Reflections:
The paper presents several case surveys that highlights the differences between machines and humans. One particular case study that I found particularly interesting was where the researchers tried to use human feedback for training a reinforcement learning based model. In conventional reinforcement learning, the agent works in a simulated task environment and receives rewards based on each of its actions. The agent then tries to find ideal policies to best complete the task at hand. It does this maximizing the rewards. Unlike machine learning’s tendency to penalize the agent, humans in the loop focused on giving positive feedback more than the negative feedback which motivated the agent to follow a greedy algorithm. This led to an undesired effect on the agent that actively avoided getting to the goal. This result is fascinating for several reasons: (i) It effectively demonstrates the difference between the way the computers learns and the manner in which human psychology operates and (ii) It shows what can be changed in the system to incorporate human feedback and make it more effective and user friendly. Another unexpected insight was that people value transparency. It was surprising to find out that knowing more about the “black box” model helped in getting better labels. In order to design effective systems, it is critical to understand what humans expect while interacting with a system.

Questions:
1. Which systems do we interact with most on a daily system? Are they interactive?
2. Can we develop metrics to appropriately evaluate a model’s ability to interact?
3. Apart from reinforcement learning are there other any specific machine learning algorithms that might benefit from having humans in the loop?

Read More

02/05/20 – Lulwah AlKulaib- Making Better Use of the Crowd

Summary

The survey provides an overview of machine learning projects utilizing crowdsourcing research. The author focuses on four application areas where crowdsourcing can be used in machine learning research: data generation, models evaluation and debugging, hybrid intelligence systems, and behavioral studies to inform ML research. She argues that crowdsourced studies of human behavior can be valuable for understanding how end users interact with machine learning systems. Then, she argues that these studies are also useful to understand crowdworkers themselves. She explains that it is important to understand crowdworkers and how that would help in defining recommendations of best practices that can be used when working with the crowd. The case studies that she presents show how to effectively run a crowdwork study and provide additional sources of motivation for workers. The case studies also answer how common is dishonesty on crowdsourcing platforms and how to mitigate it when encountered. They also show the hidden social network of crowdworkers and unmask the misconception of independence and isolation in crowdworkers. The author concludes with new best practices and tips for projects that use crowdsourcing. She also emphasizes the importance of pilots to a project’s success.

Reflection

This paper focuses on answering the question: how crowdsourcing can advance machine learning research? It asks the readers to consider how machine learning researchers

think about crowdsourcing. Suggesting an analysis of multiple ways in which crowdsourcing

can benefit and sometimes benefit from machine learning research. The author focuses her attention on 4 categories:

  • Data generation:

She analyzes case studies that aim to improve the quality of crowdsourced labels.

  •  Evaluating and debugging models:

She discusses some papers that used crowdsourcing in evaluating unsupervised machine learning models.

  • Hybrid intelligence systems:

She shows examples of utilizing the “human in the loop” and how these systems are able to achieve more than would be possible with state of the art machine learning or AI systems alone because they make use of people’s skills and knowledge.

  • Behavioral studies to inform machine learning research:

This category discusses interpretable machine learning models design, the impact of algorithmic decisions on people’s lives, and questions that are interdisciplinary in nature and require better understanding of how humans interact with machine learning systems and AI.

The remainder of her survey provides best practices for crowdsourcing by analyzing multiple case studies. She addresses dishonest and spam-like behavior, how to set payments for tasks, what are the incentives for crowdworkers, how crowdworkers can motivate each other, and the communication and collaboration between crowdworkers.

I find that the community of crowdworkers was the most interesting to read. We have always thought that they’re isolated and independent workers. Finding about the forums, how they promote good jobs, and how they encourage one another was surprising.

I also find the suggested tips and best practices suggested are beneficial for crowdsource task posters. Especially if they’re new to the environment.

Discussion

  • What was something unexpected that you learned from this reading?
  • What are your tips for new crowdsource platform users?
  • What would you utilize from this reading into your project planning/work?

Read More

02/05/20 – Lulwah AlKulaib- Power to the people

Summary

The paper argues that users have little to do with application development nowadays. They mention that developers apply machine learning techniques to solve problems but limit their interaction with end users to mediation by practitioners. This results in a long process with multiple iterations which limits the users ability to affect the models. They shed light on the importance of studying users in these systems and present case studies as examples of how these systems could result in better user experiences and more effective learning systems. The authors bring to our attention the advantages of studying user interaction with interactive machine learning systems and some flaws that developers must watch out for. They also present case studies of novel interfaces for interactive machine learning, clarify the different ways that could create richer interactions with users, and emphasize the importance of evaluating them with end users. The authors conclude their paper by underlying that any approach should be appropriately evaluated and tested before deployment since permitting user interactions were often beneficial but not always. They believe that by acknowledging the challenges in this approach, they would produce better machine learning systems as well as better end users.

Reflection

This paper focuses on the importance of the end user’s role in interactive machine learning systems. It raises the questions about how can users effectively influence machine learning systems? and how the machine learning system can appropriately influence the users? The paper also shows case studies that explain how people interact with machine learning systems. In those cases some unexpected results were found, like: people violated assumptions of the machine learning algorithm or they weren’t willing to comply with them. Other cases showed that studies can lead to insights about input and output types that interactive machine learning systems should support. The paper discusses case studies about some novel interfaces for interactive machine learning. Whether the novelty comes from new methods from receiving inputs or giving outputs. They mention that the new input techniques can give users more control over the system while output techniques can make the system more transparent or understandable. The paper does mention though that not all novel interfaces were beneficial, and some certain input and output types lead to obstacles for the user which reduces the accuracy of the learner model. The paper raises a good point about how different end users have different needs and expectations of the systems and therefore, rich interaction techniques must be designed accordingly. I agree with the authors that conducting studies of novel interactive machine learning systems is critical. And that those studies could be the basis of guideline development for future interactive learning systems.

Discussion

  • How would you apply interactive machine learning in your project?
  • Have you encountered such systems in other research papers you have read?
  • What are applications that could benefit from utilizing interactive machine learning systems?
  • How would you utilize some case studies suggestions from the paper in a machine learning model rather than the user experience?

Read More

02/05/20 – Nan LI – Guidelines for Human-AI Interaction

Summary

In this paper, the author proposed and evaluated 18 guidelines for Human-AI Interaction. These guidelines were summarized and distilled through four main stages. The author explained these four phases and present partial results by listing several representative examples. First, the author made exhaustive research on AI design and guidelines from different companies, industries, public articles, and papers. Then, they conducted a modified heuristic evaluation to these guidelines and reflect the results. In the third phase, the author conducted a user study with 49 HCI practitioners to evaluate these guidelines from two main aspects: 1) The broad applicability of guidelines. 2) The semantic intelligibility of guidelines. Finally, the author evaluated and revised the guidelines with experts who have work experience in UX/HCI and familiar with discount usability methods such as heuristic evaluation(from paper). These guidelines are analyzed, adjusted and summarized after each stage based on the results of that stage. In the paper, the author even presents the results of each stage through tables and figures. Finally, the author discussed the scope of these guidelines, as well as issues that he found during the evaluation phases.

Reflection:

The main content of this article is an evaluation of the author’s summary. The evaluation process is divided into three phases. There are many times when we need to evaluate our won hypotheses or conclusions in daily studying and research. Thus, the evaluation process present by the author in this paper has many valuable points that are worth learning.

In the first phases, the author’s original version of the guideline was collected from various aspects. The collection is very comprehensive. It is not limited to published papers or journals but also focuses on existing products and applications.

In the next three phases, each stage of the assessment is very detailed and comprehensive. For example, when the author wants to evaluate whether these guidelines are applicable to AI-infused products, there are only 13 products were inspected. The number is not large, but the function of these products is very representative.

In addition, the personnel involved in the inspection in each phase are professionals with experience in the HCI area, which also ensures the professionalism of the evaluation.

During the evaluation, the author not only focused on the applicability and accuracy of the guidelines but also emphasized the quality of semantic expression. This has a great positive effect on the use and dissemination of the guidelines.

In the final discussion of the article, the author also pointed out the development of AI-infused products should always consider ethical issues instead of just adhere to the design guidelines. I don’ have much comment on this, just suddenly realized that no matter in what area, no matter design what kind of product, it is always linked to ethical issues and bias issues. This is always the most complicated topic.

Questions:

  • This paper gives a very detailed user study process and results. Have you ever conducted a standard HCI user-study? What can you learn from the user-study in this paper?
  • The original version of the guidelines proposed in this article is based on the existing paper and product design summary. However, this summary is more about AI-design than HCI design. How do you think about this? Do you think they should collect more information about the HCI design principle? Or you think the information collected by the author is adequate enough.
  • Do you think the inspection process should include more ordinary AI product users?

Read More