Beyond the Turk: An empirical comparison of alternative platforms for crowdsourcing online behavioral research

Eyal Peera , Sonam Samatb , Laura Brandimarteb & Alessandro Acquistib

Discussion Leader: Divit Singh

Summary

This paper focused on finding alternatives to MTurk and evaluating its results. MTurk is considered to be the front-runner among other, similar crowdsorucing platforms since it tends to produce high quality data. However, since the worker growth of MTurk is starting to stagnate, the workers that use MTurk have become extremely efficient on completing the tasks that are often published on MTurk. The reason is because these tasks tend to be very similar (surveys, transcribing etc). The familiarity with tasks have shown to reduce effect sizes of research findings (completing same survey multiple times with same answers skews data to be collected). Due to this reason, this paper explores other crowdsourcing platforms and evaluates the performance, results, similarities and differences between each other in an effort to find a viable alternative for researchers to publish their tasks.

In order to evaluation the performance of these various crowdsourcing platforms, they tried to create a survey among all 6 of the platforms being tested. Among these 6, only 3 of them successfully published the survey. Some platforms simply rejected the survey without a reason, other platforms either required considerable amount of money or had errors in their platforms which prevented the study from exploring those alternatives. From the the platforms that were able to be publish the survey, it appeared that the only viable alternative to MTurk among the ones that were tested turned out to be CrowdFlower. The surveys used involved questions that contained attention-check questions, decision-making questions, as well as a question which measured the honesty of the worker. This paper provides an excellent overview of the various properties between each of the platforms and includes many tables which outline when one platform may be more effective over another.

Reflection

This paper does present a considerable amount of information among the various platforms that were described. However, reading through this paper, it really revealed the lack of any actual competition to MTurk that is out there. Sure, it does discuss that CrowdFlower is a good alternative in order to reach a different population of workers, it is still considered less than equal to MTurk for a lot of instances. The main basis of using these other platforms is because MTurk workers have become extremely efficient at completing tasks which may cause the skewing of results. I believe it is only a matter of time before workers on these other platforms lose their “naivety” as the platform becomes more mature.

The results of this paper may be invaluable to a researcher who wants to really target their audience. For example, this paper revealed that CBDR is managed by CMU and that it is composed of students and non-students. Although not guaranteed, it might be the most appealing for a researcher who wants to target college students since it may contain a considerable university student population. Another excellent bit of information that they provided is the failure rate of attention-seeking questions that were posted on their survey. This outlines two things: how inattentive workers are during their surveys, and also how experienced the workers of MTurk really are (they most likely have seen questions like these in the past which prevents them from making the same mistake again). However, keep in mind that these results are a snapshot at a given time. There is nothing that is prevented the workers of CrowdFlower (which are apparently disjoint from workers of MTurk) which contain a massive worker base from learning from these surveys and become smarter workers.

Questions

Is there any other test that you believe that the study missed?
Based on the tables and information provided, how would you rank the different crowdsourcing platforms? Why?
This paper outlined that outlined different approaches for these platforms (e.g. review committee that determines if a survey is valid). What method do you agree with or how would you design your own platform in order to optimize quality and reduce noise?

divit52

Leave a Reply Cancel reply