The author did a survey about how crowdsourcing was applied in research on machine learning. Firstly, previous researches were reviewed to conclude categories for the application of crowdsourcing in the machine learning area. The applications were broken into four categories: data generation, evaluating and debugging models, hybrid intelligence systems and behavior studies to inform machine learning research. In each of the categories, the author discussed several specific areas. For each area, the author concluded several related types of research and made an introduction to each of the research. Finally, the author did analyze on understanding the crowd workers. Though crowdsourcing seemed to greatly help machine learning researches, the author did not ignore the problems in this system, such as dishonesty among workers. Finally, this survey gave researchers who focused on machine learning and applied to crowdsource four advice: maintain a three relationship with crowd workers, care about good task design and use pilots.
Reflection:
From this survey, readers can have a thorough view of the applications of crowdsourcing in machine learning research. It concludes most of the state-of-the-art in machine learning areas related to crowdsourcing. Traditional machine learning always facing problems like lack of data, models cannot be judged, lack of user feedback or system not trustable. However, with the application of crowdsourcing, all these problems can be solved with the help of crowdsourcing workers. Though this is only a survey of previous researches, it actually lets readers get a comprehensive view of this combination of technology.
This survey reminds us of the importance of reviewing previous works. When we want to do research about a topic, there will be thousands of researches which may help. However, it is impossible to view all the papers. Instead, if there is a survey that summarized all previous works and categorized them into several more specific categories, we can easily get a comprehensive view of the topic and new ideas may occur. In this paper, with the research of the four categories of application of crowdsourcing in machine learning, the author comes up with the idea to do research to understand the crowd and finally made suggestions for future researchers. Similarly, if we can do a survey of what we want to do as our projects, we may find out what is a need and what is novel in this field, which will lead to the success of the projects and the development of the field.
Also, it is important to consider critically. In this survey, though the author concluded numerous contributions of crowdsourcing towards machine learning researches, he still discussed the potential risk of this application, for example, dishonesty among workers. This is important for future researches and should not be ignored. In our projects, we should also think critically so that the drawbacks of the ideas we proposed can be judged fairly and the project can be practical and valuable.
Problems:
Which factors can contribute to a good task design?
Is there any solution that can solve the problem of dishonesty among workers instead of mitigating it?
In the experiments which aim to find out user reaction towards something, can the reaction of the paid workers be considered similar to the reaction of practical users?
I would like to make a comment on your first question. First, I think to make a good task design, we should consider a lot of issues. For example, the good design of the interface which may enable a high efficient sorting activity; the good design of interaction between user and tasks which may allow users to feel like control the systems. Besides, an important thing is to figure out the users’ needs, like what they expected from this task, what is their motivation, where they are not satisfied. The user’s feedback is always the best tool to design the task.
To the third question, it’s difficult to get the exact user behavior through paid workers. However, the differences can be pre-factored and assessed by having a good understanding of who your target audience is and what skills they bring to the table.