To Play or not to Play: Interactions between Response Quality and Task Complexity in Games and Paid Crowdsourcing

Krause, M., Kizilcec, R.: To Play or not to Play: Interactions between Response Quality and Task Complexity in Games and Paid Crowdsourcing. Conference on Human Computation and Crowdsourcing, San Diego, USA (2015)

Pro Discussion Leader: Adam

Summary

This paper takes a look at how the quality of paid crowdsourcing compares to the quality of crowdsourcing games. There has been a lot of research comparing expert work to non-expert crowdsourcing quality. The choice is a trade off between price and quality, with expert work costing more but having higher quality. You may need to pay more non-expert crowdworkers to get a comparable level of quality to a single paid expert worker. This same trade off may exist with paid versus game crowd work. There has been research in the cost of making a crowdsourcing take gamified, but nothing comparing the quality of game to paid crowd work.

The authors achieve this by creating two tasks, a simple one and a complex one, and create a game version and paid version for the crowd to complete. The simple task is a simple image labeling task. Images were found by searching Google image search on 160 common nouns. Then for each image, the nouns on the webpage the image was found are tallied. More frequently occurring nouns are considered more relevant to the image. For the game version, they modeled it after Family Feud, so that more relevant labels were given more points. The paid task was simple in that it asked workers to provide a keyword for an image.

In addition to the simple task, the authors wanted to see how quality compared in a more complicated task. To do this, they created a second task that had participants look at a website and provide a question that could be answered by the content of the website. The game version is modeled after Jeopardy, with high point values assigned for more complex websites. The paid version had a predetermined order the tasks were completed in, but in both cases participants completed all of the same tasks.

Overall, the quality of the game tasks was significantly higher than the paid tasks. However, when broken down between simple and complex tasks, only the complex task had significantly higher quality for the game version. The quality for the complex task was about 18% higher for the game task, as rated by the selected judges. The authors suggest one reason for this is the selectiveness of game players. Paid workers will do any task as long as it pays them well, but game players will only play the games that actually appeal to them. So only people really interested in the complex task played the game, leading to higher engagement and quality.

Reflections

Gamification has the potential to generate a lot of useful data from crowd work. While one of the benefits of gamification is that you don’t have to pay workers, it still has a cost. Creating an interesting enough game is not an easy or cheap process, and the authors take that into consideration when framing game crowdsourcing. They are essentially comparing game participants to expert crowd work. It has the potential to generate higher quality work, but at a higher cost than paid non-expert crowd work. However, there difference is that with the game, it’s more of a fixed cost. So if you need to collect massive amounts of data, the cost of creating the game may be amortized to the point where it’s cheaper than paid non-expert work, with at least of high if not higher quality.

I really like how the authors try to account for any possible confounding variables between game and paid crowdsourcing tasks. We’ve seen from some of our previous papers and discussions that feedback can play a large role in the quality of crowd work. It was very important that the authors considered this by incorporating feedback into the paid tasks. This provides much more legitimacy to the results found in the paper.

There is also a simplicity to this paper that makes it very easy to understand and buy into. The authors don’t create too many different variables to study. They use a between-subjects design to avoid any cross-over effects. And there analysis is definitive. There were enough participants to give them statistically significant results and meaningful findings. The paper wasn’t weighed down with statistical calculations like some papers are. They keep the most complicated statistical discussion to two short paragraphs to appease any statisticians who might question their results, but their calculations for the comparisons of quality between the two conditions is very straightforward.

Questions

  • Games have a fixed cost for creation, but are there any other costs that should be considered when deciding whether to go the route of game crowdsourcing versus paid crowdsourcing?
  • Other than instantaneous feedback, are there any other variables that could affect the quality between paid and game crowd work?
  • Was there any other analysis the others should have performed or any other variables that should have been considered, or were the results convincing enough?

Leave a Reply

Your email address will not be published. Required fields are marked *