Summary:
“Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research” by Vaughan is survey paper that provides an informative overview of the crowdsourcing research for the machine leaning community. There are four main application areas:
(i)Data Generation: This is made up of two types of work. The first type of data aggregation is where the several crowdworkers are assigned the same data point and asked to annotate it. The machine learning algorithm then aggregates this data and finalizes the response. The second type of research in data aggregation involves modifying the system to get quality responses from crowdworkers.
(ii)Evaluation and debugging of the model: The crowdsourced workers can help debug and evaluate unsupervised machine learning algorithms like topic modelling, LDA, generative models etc.
(iii)Hybrid systems that utilize both machines and humans to expand its capabilities: Humans and machines have complementary strengths which, if made proper use of, can result in effective systems that help humans as well as improve the machine’s understanding.
(iv)Crowdsourced behavioral experiments that gather data and improve our understanding of how humans would like to interact with machine learning systems: Behavioral experiments can help us understand what how humans would like to interact with the system and the changes that can be made to improve the end user satisfaction.
Reflections:
In my limited knowledge about crowdworkers, I was aware of their importance for data aggregation. The author does a good job highlighting other areas where machine learning researchers might benefit from utilizing the power of crowdworkers. What I found particularly interesting were the case studies making use of crowdworkers to debug models and evaluate their interpretability. When we think of “debugging” models and finding out flows in the system, we mostly try to view things from the developer’s point of view and rely on them completely to debug and evaluate the model’s performance. Using crowdworkers for the task seems like a useful application areas which more machine learning researchers should be aware of. These tasks might also be of greater interest to the crowdworkers because they are not repetitive and involve active participation of the crowdworkers. “Human debugging” can help the system by taking into account the crowdworkers feedback to uncover bottlenecks in machine learning models. Hybrid techniques that involve using human feedback also seems like a promising application area where the system relies extensively on human judgement to make the right decisions. This also puts more responsibility on the machine learning researchers to be creative and come up with unique ways to involve humans. Setting up pilot studies can help in this front. Pilot studies can prove useful as they demonstrate how a lay man interacts with a system and the gaps that exist which should be filled up by the researchers in order to ensure a cohesive experience for the end user. However, care should be taken to ensure that the effort put in by the crowdworkers for building these systems does not go unappreciated.
Questions:
1. Did you agree with the applications of crowdworkers presented in this survey?
1. What steps can be taken to make machine learning researchers aware of these potential applications?
2. Apart from fairly compensating the workers, what steps can be taken to value their contributions?