Beyond evaluating the extent of AMT’s influence for research, Vakharia’s paper Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms draws the impact of existing crowd work platforms based on prior work, comparing the operation and workflows among the platforms. In addition, it opens discussions about to what extent that the characteristics of other platforms may have positive and negative impact on the research. Built on analysis of prior white papers, this paper introduces inadequate quality control, inadequate management tools, missing support for fraud prevention, and lack of automated tools. Further, this paper defines twelve ways to comprehensively evaluate the performance of the platforms. At last, the researchers present a comparative results based on their proposed metrics.
While reading this paper, I am thinking that the private crowds problem is also an ethics problem because protection of sensitive or confidential data is significant to different parties. Besides, the ethics problems of the crowd work platforms can be seen from three perspectives, saying, data, human, and interfaces. From the data perspective, we need to ask the requester to provide the rational data by mitigating the bias, such as gender and geo-location. From the human perspective, we need the platforms to assign the workers tasks randomly. From the interface perspective, the requesters need to provide the interfaces that is symetrics for all categories of data, and the data should follow the IID distribution.
Though I have not used a platform like AMT before, I performed data labeling tasks before. The automated tools are really efficient and important to the kind of worker like this. I was part of a team labeling cars for an object tracking project before. I used a tool to automatically generate the object detection results, but the results are highly biased. However, even if the data is biased, it still assisted us a lot in labeling the data.
The paper provides a number of criteria to qualify the characteristics of various platforms. For the criterion “Demographics & Worker Identities”, it also needs an analysis on the other criterion, whether it is ethical to release the demographics of workers and requesters. What would be the potential hazard to make the personal information available? The two criterions “Incentive Mechanisms” and “Qualifications & Reputation” seem to have some conflicts with each other. Since if the workers work faster on the tasks, that would potentially affects the quality of the work. As for the quantify metrics, the paper does not provide quantitative metrics to quantify the performance of different crowd work platforms. Hence, it is still difficult for the users and requesters to judge which crowdsourcing platforms are good for themselves. The following questions are worthy of further discussion.
- What are the main reasons of crowd workers or requesters to pick one particular platform over others as their major platform?
- Knowing the characteristics of these platforms, is it possible to design a platform which boasts of all the merits?
- A lot of criteria are provided in the paper to compare the crowd work platforms, is it possible to develop quantified standard metrics to evaluate the characteristics and performance?
- Is it necessary or is it ethical for the requesters to know the basic information of the workers, and vice versa?