Reading: Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the People: The Role of Humans in Interactive Machine Learning. AI Magazine 35, 4: 105–120. https://doi.org/10.1609/aimag.v35i4.2513
Machine learning systems typically are built by collaboration between the domain experts and the ML experts. The domain experts provide data to the ML experts, who will carefully configure and tune the ML model which is then sent back to the domain experts for review, who will then recommend further changes and the cycle continues until the model reaches an acceptable accuracy level. However, this tends to be a slow and frustrating process and there exists a need to get the actual users involved in a more active manner. Hence, the study of interactive machine learning arose to identify how users can best interact with and improve the ML models through faster, interactive feedback loops. This paper surveys the field, looking at what users like and don’t like when teaching machines, what kind of interfaces are best suited for these interaction cycles and what unique interfaces can exist beyond the simple labelling-learning feedback loop.
When reading about the novel interfaces that exist for interactive machine learning, I find there is an interesting parallel between the development of the “Supporting Experimentation of Inputs” type of interface and to that of text editors. The earliest text editor was the typewriter, where an input once entered could never be taken back. A correction would require starting over or the use of an ugly whiteout. With electronics came the text editors where you could edit only one line at a time. And finally, today we have these advanced, feature rich, editors and IDEs with autocomplete suggestions, in line linting and automatic type checking and error feedback. It would be interesting to see what the next stage of ML model editing would look like if they continued on this trajectory, where we can go from simple “backspace key” type experimentation to more features parallel to what modern text editors have for words. The idea of allowing “Combining Models” as a way to create models draws another interesting parallel to car manufacturing, where cars went from being handcrafted to being built on an assembly line with standardized parts.
I also think their proposal for creating a universal language to connect the different ML fields might end up creating a language that is too general and the different fields, though initially unified, might end up splitting off again due to using only subsets of the language that don’t overlap with each other or by creating new words because the language does not have anything specific enough.
- Is the task of creating a “universal language” a good thing? Or would we end up with something too general to be useful and cause fields to create their own subsets?
- What other kinds of parallels can we see in the development of machine learning interfaces, like the parallels to text editor development and car manufacturing?
- Where is the “Goldilocks zone” for ML systems that are giving context to the user for the sake of transparency? There is a spectrum between “Label this photo with no context” to “here is every minute detail, number of pixels, exact gps location, all sorts of other useless info”. How do we decide which information the ML system should provide as context?