02/05/20 – Vikram Mohanty – Power to the People: The Role of Humans in Interactive Machine Learning

Paper Authors: Saleema Amershi, Maya Cakmak, W. Bradley Knox, Todd Kulesza

Summary

This paper highlights the usefulness of intelligent user interfaces or the power of human-in-the-loop workflows for improving machine learning models, and makes the case for moving from traditional machine learning workflows to interactive machine learning platforms. Implicitly, domain experts, or the potential users of such applications, can provide high-quality data points. In order to facilitate that, the role of user interfaces and user experience is illustrated via numerous examples. The paper outlines some challenges and future direction of research for understanding better how user interfaces interact with learning algorithms and vice-versa.

Reflections

The case study with proteins and biochemists illustrates a classic case of frustration associated with iterative design, while striving to align with user needs. However, in this example, the problem space was focused on getting a ML model right for the users. As the case study showed, interactive machine learning applications seemed to be the right fit for solving this problem as opposed to iteratively tuning the model manually by the experts. The research community is rightfully moving in the direction of producing smarter applications, and in order to ensure more (better?) intelligibility of these applications, building user interfaces/applications for interactive machine learning seem to be an effective and cost-efficient route.
In the realm of intelligent user interfaces, even though human users are not just good enough for providing quality training data and provide a lot more value beyond that, my reflection will center around the “human-in-the-loop” aspect to keep the discussion aligned with the paper’s narrative. The paper, without explicitly mentioning, also shows how we can get good quality training labels without relying solely on crowdsourcing platforms like AMT or Figure Eight, but rather, by focusing on the potential users of such applications, who are often domain experts for the applications. The trade-offs between collecting data from novice workers on AMT and domain experts are pretty obvious: quality vs cost.
The authors, through multiple examples, also make an effective argument about the inevitable role of user interfaces in ensuring a stream of good-quality data. The paper further stresses the importance of user experiences in generating rich and meaningful datasets.
“Users are People, Not Oracles” is the first point, and seems to be a pretty important one. If applications are built with the sole intention of collecting training data, there’s a risk of user experience being sacrificed, which may affect good quality data and the cycle ceases to exist.
Because it is difficult to decouple the contributions of the interface design or the algorithm chosen, coming up with an effective evaluation workflow seems like a challenge. However, it seems to be very context-dependent and following recent guidelines such as https://pair.withgoogle.com/ or https://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/ can go a long way in improving these interfaces.

Questions

For researchers working on crowdsourcing platforms, even it’s for a simple labeling task, how did you handle poor quality data? Did you ever re-evaluate your task design (interface/user experience)?
Let’s say you work in a team with domain experts. Domain experts use an intelligent application in their every day work to accomplish a complex task A (the main goal of the team) and a result, you get data points (let’s call it A-data). As a researcher, you see the value of collecting data points B-data from the domain experts, which may improve the efficiency of task A. However, in order to collect B-data, domain experts have to perform task B, which is an extra task and deviates from A (which is their main objective and what they are paid for). How would you handle this situation? [This is pretty open-ended]
Can you think of any examples where collecting negative user feedback (which can significantly improving the learning algorithm) also fits the natural usage of the application?

Vikram Mohanty

I am a 3rd year PhD student in the Department of Computer Science at Virginia Tech. I work at the Crowd Intelligence Lab, where I am advised by Dr. Kurt Luther. My research focuses on developing novel tools that leverage the complementary strengths of Artificial Intelligence (AI) and collective human intelligence for solving complex, open-ended problems.

Summary

Reflections

Questions

Vikram Mohanty

Leave a Reply Cancel reply