Conducting behavioral research on Amazon’s Mechanical Turk

Winter Mason and Siddharth Suri. 2012. Conducting behavioral research on Amazon’s Mechanical Turk. Behavior research methods 44, 1: 1–23. http://doi.org/10.3758/s13428-011-0124-6

Discussion Leader: Anamary Leal

Summary:

This guide both motivates researchers to explore conducting research on Amazon Turk, for numerous benefits (a faster theory-experiment turn around time, a large pool of diverse and qualified participants for low pay, beneficial for tasks that humans are easier to do than computers, quality of data is comparable to laboratory testing, and other benefits we covered in class), and a how-to guide on strategies to conduct behavioral research on Amazon Turk.

They’ve collected data from multiple studies looking at worker demographics, requester information, and what a HIT looks like. A HIT can be internally hosted or externally hosted, made up of multiple assignments, or parts that a worker can do. One worker can only work on one assignment in a HIT. The paper also covers how to make a HIT, collect data, and the closing of a HIT. Additionally, the paper had insight on workers and requester relationships, such as how workers search for HITs, and recommendations for requesters to be successful in making their HITs and engaging with workers.

The paper also covers in depth how how to implement various different kinds experiments, such as how to do synchronous experiments, by first collecting a group of reliable and quality workers and conducting preliminary studies to find such a group. Additional informative aspects mentioned are how to perform randomization assignment,

Reflection:

In addition to being a how-to guide, this paper serves as a good literature survey paper, covering the literature alongside covering the how-to guide. There were all kinds of useful data scattered throughout the document, and it was especially helpful.

I just finished using AT for homework before doing the reflection, and according to the review, finding the most recently created HITs was the most popular way workers found HITs, with the second-most being HITs offered so that a worker can learn how perform a task well, and do similar tasks faster. I wonder what do these strategies reveal about how workers use the system. We know that now, a small group of professional workers get the majority of hits. Do workers try to find new HITs to snatch up all the best paying HITs..? Is that a hint of how they use?

In my short experience with AT in doing HITs including IRB-approved research, I still felt like the pay was atrocious compared to the work done, but I also find reasonable the author’s arguments about the convenience of workers not needing to schedule a time to go into a laboratory study, along with quality not being affected by pay. (though, one of the studies compares quality between paying a penny and 10 cents, which isn’t very much.)

The paper goes into detail about pay in AT, from a practical perspective, to a quality issues, to the ethics of it. There is a negative relationship between “highest wage” tasks and probability of a HIT being taken. I found that the highest wage tasks call for a ridiculous amount of time and effort that such tasks were not worth the effort. In my short experience with AT in doing HITs including IRB-approved research, I still felt like the pay was atrocious compared to the work done,.

This paper was cited at least 808 times (according to Google Scholar), and advocates for positive and professional connections between workers and requester. This paper can still inform requester researchers now in 2015.

Questions:

The paper has cited work showing that workers found hits by finding the most recently made HITs first. What does worker’s strategy say about Turk and it’s workers?
1. How were your strategies, in finding appropriate HITs to take?
How did this paper affect in how your completed the assignment? Or, if you did the homework first and then read the paper, how would you have done your homework (as a worker and requester) differently?
The authors address one of the biggest questions potential researchers may have: compensation. They presented a case that quality, for certain tasks (not judgement or decision type tasks) generally remains the same, and that in-lab participants should be paid higher. They recommend starting with less than reservation rate ($1.38/hr) and increase with reception. Given our discussions with workers and pay, what do you think of this rate?
The writers encourage that “new requesters “introduce” them-selves to the Mechanical Turk community by first posting to Turker Nation before putting up HITs.” and to “keep a professional rapport with their workers as if they were company employees.” This paper was published in 2012 and cited a lot. How do you see influences of this attitude (or not) among requesters and workers?
How applicable are some of these techniques on M Turk with respect to some of the issues we discussed before, such as Turkopticon’s rating system, Worker’s bill of rights, and other ethical and quality issues?

leal

Leave a Reply Cancel reply