TurKit: human computation algorithms on mechanical turk

Greg Little, Lydia B. Chilton, Max Goldman, and Robert C. Miller. 2010. TurKit: human computation algorithms on mechanical turk. In Proceedings of the 23nd annual ACM symposium on User interface software and technology (UIST ’10). ACM, New York, NY, USA, 57-66. DOI=10.1145/1866029.1866040 http://doi.acm.org/10.1145/1866029.1866040

Discussion Leader: Adam Binford

Summary

TurKit is a framework developed by the authors to develop algorithms that incorporate human computation through Amazon’s Mechanical Turk. Mechanical Turk has its own API for requesters to use to integrate with their algorithms, but it’s mostly only conducive to highly independent and parallelizable tasks. TurKit provides a way to develop sequential and iterative algorithms that make use of multiple steps of human computation. It does this through some JavaScript extensions and a new crash-and-rerun programming model.

The crash-and-run model allows a TurKit script to be re-executed without having to rerun the costly Mechanical Turk functions, which are costly in both time and money. During an initial run of the script, the calls to the Mechanical Turk API go through and the results, whenever they are returned, are saved, so that subsequent executions of the script simply use the saved result. This allows you to develop, test, and make changes to your TurKit scripts without having to pay someone to do the HIT each time. This saving of state is achieved through some keywords introduced on top of regular JavaScript syntax. Use of the keyword once means the indicated function should only be called once on the first run, and in subsequent executions the same result is used. Additionally, functions fork and join are added to help facilitate parallelism and execution of multiple HITs at the same time, whose results can then be combined into later logic in the algorithm. The authors discuss some iterative applications that can make use of this paradigm, such as iterative writing or iterative blurry text recognition. Finally, they discuss two experiments by other researchers that make use of TurKit and the performance hit of the crash-and-rerun paradigm.

Reflection

The key contribution of this paper I believe is the crash-and-rerun programming model. It is what enables the key benefit of TurKit, developing and modifying a human computation enabled algorithm without having to pay someone to complete a HIT each time you run it. I think the crash-and-rerun style to achieve this is really clever, and I wonder if it could have any use in non-human computation development as well. The idea of retroactive print-line-debugging would be very useful for many different scenarios. Clearly the usability of this model is dependent on the amount of non-deterministic code in your algorithm, and many real world programs would be too large or have too much concurrency to make it feasible.

Crowdsourcing does have some unique properties that make it especially suited for this model. As stated in the paper, compared to the time it takes to complete a HIT, code execution time is almost negligible. So while some overhead with re-executing the program is acceptable, it may not be for other areas. Additionally, most human computation tasks tend to be fairly straightforward. If it’s not completely parallelizable, it is a simple iterative process. It will be interesting to see if toolkits like these enable people to come up with more complex human computation algorithms that stretch the limits of the crash-and-rerun model.

Questions

– Do you think most human computation implementations could make use of TurKit or are the majority single parallelizable task based?
– What human computation algorithms might be too complex for TurKit?
– What other applications would there be for the crash-and-rerun model?
– Do you think the example of writing an article would be possible through a pool of generally unskilled and unknowledgeable workers?

Read More

Turkopticon: Interrupting Worker Invisibility in Amazon Mechanical Turk 

Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’13). ACM, New York, NY, USA, 611-620. DOI= 10.1145/2470654.2470742http://dl.acm.org/citation.cfm?id=2470742

Discussion leader: Mauricio De La Barra

Summary
This paper provides an analysis of Amazon Mechanical Turk, a human computation system, as a site of technically mediated worker-employer relationships. The authors argue that human computation currently relies on worker invisibility, which in turn leads HCI researchers to pay less attention to crowdwork’s ethics and values. In order to bring to light the relations between requesters and Turkers, they conducted a case study in which they asked 67 Turkers questions about what they would desire as a “Workers’ Bill of Rights.” The points of agreement in the survey were the basis of Turkopticon’s design, which is an activist system that allows workers to do two main things: publicize and evaluate their relations with employers, and engage one another in mutual aid. Turkopticon was developed as a browser extension for Chrome and Firefox. When browsing in Mechanical Turk for HITs, this extension display a CSS button next to the requester’s name. On mouse-over, the workers can see the ratings of the requester according to four qualities: Communicativity, Generosity, Fairness, and Promptness, and also a link to a website to see all the written reviews for that requester. Workers can also leave their own reviews for that requester. This paper continues by explaining that the design of Mechanical Turk favors requesters over workers, which causes the unjust treatment of workers. Turkopticon attempts to provide more fairness in the relationship between requesters and Turkers by holding requesters accountable and enabling help among workers. This piece of software has become an essential tool for many Turkers, as it has been installed over 7,000 times and the Turkopticon website receives 100,000 page views a month. The authors conclude the paper by highlighting the lessons they have learned from intervening in large-scale socio-technical systems, such as Mechanical Turk.

Reflection
As someone who doesn’t use Mechanical Turk often, I find this paper to be a great overview of some of the ethical issues of the use of this platform. While some of them are somewhat obvious (such as whether or not to have a minimum wage for HITs), I didn’t really internalize these issues while using the system. One of the most interesting things I found about the paper is how it highlights that when designing Mechanical Turk, Amazon has prioritized the needs of employers over workers.  As an example, by hiding workers behind APIs, Mechanical Turk makes employers see themselves as builders, rather than as employers unconcerned with working conditions of Turkers. Also, because Mechanical Turk’s participation agreement gives requesters intellectual property rights over submission regardless of rejection, workers have no legal recourse against employers who reject work and then use it. One would think that a system such as Mechanical Turk, which is based on relations between requesters and workers, would be designed so that both parties have their concerns addressed. But as Amazon’s goal (like any other company) is to make money, it treats workers  interchangeably and since there are so many workers, Mechanical Turk can sustain the loss of workers who don’t abide by the terms of agreements: since Amazon collects money for task volume, they have little reason to prioritize worker needs. I feel that systems such as Turkopticon are a step in the right direction in order to make the workers’ relationships with employers visible to other workers in Mechanical Turk, but I feel that change needs to happen at the infrastructure level – Amazon should also consider the ethical issues that arise through the use of Mechanical Turk.

Questions
• Do you think that if the Turkopticon extension get widely adopted among workers in Mechanical Turk, that Mechanical Turk requesters would move on to a different human computation/crowdsourcing platform? Why?
• Do you agree or disagree with the criteria used to review requesters (do they represent an accurate framing of the interaction with the requester)? Or what other criterion could have been used instead?
• Turkopticon’s developers hope that Amazon would change its system design to include worker safeguards in Mechanical Turk. This has not happened yet. If Amazon becomes aware of Turkopticon, and how useful it is for workers, do you think that it might consider changing its design? In what ways?
• What are some policies that Mechanical Turk should adopt in order to show that it doesn’t just care about the needs of requesters, but also about the needs of workers (e.g., requiring requesters to justify rejections, having a minimum wage for HITs, etc.)? Or do you think that the system is fine as it is currently?

Read More

Who are the crowdworkers?: shifting demographics in mechanical turk

Ross, Joel, et al. “Who are the crowdworkers?: shifting demographics in mechanical turk.” CHI’10 Extended Abstracts on Human Factors in Computing Systems. ACM, 2010.

Discussion Leader: Ananya Choudhury

Summary

This paper focusses on how demography of turkers is gradually shifting over a period of time. While previous research by Ipeirotis (2008) suggests worker population based primarily in United States, the study conducted in this paper reveals AMT marketplace becoming significantly international with Indians making up more than one-third of turking population. Based on different criteria like age, annual income, gender, and education this paper gives a comparison between Indian and US turkers. The study shows an increase in the number of highly educated, young, male Indian turkers compared to that in US. The results also tell that for most turkers (both in India and US) turking is an extra source of income, while for a significant number of Indian turkers turking is sometimes or always necessary to meet basic needs. Finally, based on these results the paper raises a few open-ended questions on ethics and authenticity of the data collected.

Reflection

This paper paints a pretty clear picture on who these turkers are. As mentioned in the paper, knowing turkers will help analyze survey results better. However I feel when data is collected in exchange of monetary benefits, the goal of a responder may shift from providing honest opinions to providing responses that will guarantee maximum benefits. Cultural background of turkers also play an important role. As mentioned in the paper, Indians are culturally more reluctant to present themselves as unemployed. If there is a tendency among Indians to not reveal their actual employment status, then the statistics provided in Figure 5 and 6 might not be accurate. This questions the credibility of turkers or data collected through mediums like AMT and analysis performed on these datasets. So knowing turkers may help analyzing data better, but if the knowledge per se is not accurate then how will it help the analysis?

Questions

Do you think knowing backgrounds of the workers will help researchers analysis data better?
Do you think collecting data in exchange of monetary benefits is the best way? What other ways can we devise so that we gather more genuine responses?
Is AMT or similar crowdsourcing mediums the right channel to collect survey data that cannot be validated? Can micro-volunteering be a better way to collect such information?
What is the reason that worker demography comprises of mostly Indians and Americans? Why is rest of the world still under-represented?

Read More

Human computation: a survey and taxonomy of a growing field

Alexander J. Quinn and Benjamin B. Bederson. 2011. Human computation: a survey and taxonomy of a growing field. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). ACM, New York, NY, USA, 1403-1412. DOI=10.1145/1978942.1979148 http://doi.acm.org/10.1145/1978942.1979148

Summary

This paper presents a classification of “human computation” and related concepts, including crowdsourcing, social computing, collective intelligence, and data mining. This work is motivated by a growing body of research that uses many of these terms interchangeably, resulting in potential confusion. By surveying a large body of literature, it proposes definitions for each of these areas and describes how they overlap and differ from one another. A Venn diagram helps illustrate these relationships. The authors suggest that human computation was popularized by Luis von Ahn’s 2005 dissertation and its defining trait is a computational process that uses human effort to solve problems that computer can’t. Crowdsourcing was coined by Jeff Howe and is defined as taking a job originally meant for one person and opening it up to a large group of people. The authors identify six key dimensions of human computation, citing examples for each. Three of these — motivation, human skill, and aggregation — are based on analysis of clusters of related projects with a defining attribute. The others — quality control, process order, and task-request cardinality — cut across project clusters. Finally, the authors suggest some opportunities for future work revealed by underexplored areas in their taxonomy. These include exploring new pairings of dimensions, new values for dimensions, and new kinds of work.

Reflection

This strikes me as a great overview of crowdsourcing and human computation research as it stood in 2011 when this paper was published. The definitions of crowdsourcing, human computation, social computing, et al. were not necessarily what I intuitively expected, but they seem reasonable and helpful. With these terms used and abused to mean so many different things, I appreciated a serious attempt to provide some clarity and common ground to researchers working in these areas.  In my mind, social computing is the broadest category and encompasses most of these other concepts, but the authors’ more limited scope (humans interacting naturally, mediated through tech) was interesting. I thought the point that human computation need not be collaborative was interesting and hadn’t considered it before. I also note that the authors don’t cite Daren Brabham who is well known for his writings on crowdsourcing and what it means (and doesn’t mean). This would have been an interesting point of contrast but the authors may have skipped him purposefully or accidentally because he doesn’t often publish in CS/HCI venues. There are also a ton of new human computation and crowdsourcing projects in the 4 years since this was published; I wonder how well they fit into this taxonomy and/or require revisions. Certainly, the values for the “human skill” dimension have expanded to include more complex and creative tasks like visual design, writing, and even scientific research. I also note that “learning” or “feedback” are quality control mechanisms that aren’t listed but have become increasingly important.

Questions

  • Do we agree or disagree with the definitions provided here? Have they become more focused or broader in recent years?
  • What are some other crowdsourcing or human computation examples you can think of that aren’t listed here, and where would they fall in each of the dimensions? Can you think of some that don’t fit the existing values or dimensions?
  • What other fields might be included in the Venn diagram and how are they related?
  • Taxonomies like this are supposed to help researchers, in the authors’ own words, identify opportunities in unexplored or underexplored areas. What opportunities do you see here? Any the authors didn’t mention?
  • Amy Bruckman writes about the usefulness of category theory in describing different kinds of online communities as being more or less like a set of “prototypes”. How might this argument extend to human computation?
  • How does a paper like this help you (or not help you) understand the fields of crowdsourcing and human computation research?

Read More

The Future of Crowd Work

Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work (CSCW ’13). ACM, New York, NY, USA, 1301-1318. DOI=10.1145/2441776.2441923 http://doi.acm.org/10.1145/2441776.2441923

Summary

In this paper, the authors ask the provocative question, “can we foresee a future crowd workplace in which we would want our children to participate?” To address this question, they review a large body of literature on crowdsourcing (over 100 papers) and incorporate commentary from a survey of 104 US and Indian crowd workers. The authors start with a consideration of tradeoffs of crowd worker vs. traditional work, and then synthesize 12 foci or challenges to the future of crowd work that are especially important. For each focus, the authors describe the goals, review some related work, and offer a proposal for what the future of crowd work should entail. The foci are crowd processes (workflow, task assignment, hierarchy, realtime, synchronous, and quality control); crowd computation (crowds guiding AIs, AIs guiding crowds, and platforms); and crowd workers (job design, reputation and credentials, and motivations and rewards). They conclude with 3 design goals that span multiple foci to provide clear step forwards. The first is to create career ladders that allow workers to advance to more complex and rewarding roles. The second is to help requesters design better tasks that workers will understand and enjoy more. The third is to facilitate learning, which offers the dual benefits of providing workers with new skills and requesters with the talent needed to complete their tasks. The authors conclude by emphasizing that both system design and careful study of the effects are needed, but crowd work provides an exciting new opportunity: to explore radically new kinds of organizations in a controlled experimental setting.

Reflection

This is an ambitious paper that covers a lot of ground (over quite a few pages), but it’s all valuable, important stuff. The challenge to imagine a future where we’d want ourselves or our children to be crowd workers is a wonderful provocation. At first it seems hard to imagine, maybe because I’ve seen so many unpleasant crowd tasks. But on further reflection, it’s an exciting vision of the future–one where people can, from the comfort of their homes, find any kind of work they want to do, and engage with it in a way that is financially rewarding and personally satisfying. I also appreciated the authors’ efforts at synthesizing such a large and diverse range of crowdsourcing papers. Just summarizing what’s been done is helpful per se, but the authors go much further by pointing out the drawbacks and opportunities to do better. I’m simultaneously amazed at the amount of crowdsourcing research that’s been conducted in just a few short years, and surprised at how much is still left to do. For example, the authors note that we have almost no idea if “algorithmic management” is better than traditional management techniques–what an interesting question. I find this inspiring as a researcher in this area. Finally, I appreciated that the authors raised some of the ethical concerns of working in this area, such as fair compensation, privacy, and power dynamics. I think they could have gone even further here. While I agree that all the foci are important, I think maybe the ethical concerns supercede all of them, or at least need to be embedded in each of them. Everything from quality control to hierarchies to AI-guided crowds raises serious questions about ethics and morality that need to be seriously considered from day one.

Questions

  • Would you want to be a crowd worker? Your children? Why or why not? What do you think is needed to make that vision a reality?
  • What are some of the potential advantages of crowd work over traditional work? Disadvantages?
  • What was a focus/challenge that you found particularly exciting or interesting? One that seemed especially difficult or hard to realize?
  • How could we make crowd labor more appealing than current traditional jobs?
  • Some of the proposals in the paper make crowd work look more like traditional work. If we follow this line of thinking, will we end up with something that looks just like today’s traditional jobs? Why or why not?
  • Did any of the proposals in the paper strike you as being particularly ethically worrisome? Why?
  • As you begin thinking about your project idea, which of these foci do you think you might contribute to?

Read More