Labeling Images with a Computer Game

Paper: Luis von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’04). ACM, New York, NY, USA, 319-326. DOI=http://dx.doi.org/10.1145/985692.985733

Discussion Leader: Adam Binford

Summary

The authors of this paper try to tackle the issue of image labeling. Image labeling has many purposes, such as enabling better text image searches and creating training data for machine learning algorithms. The typical approaches to this image labeling during this time were through computer vision algorithms and or manual labor. This was before crowdsourcing really became a thing, and before Amazon Mechanical Turk was even launched, so the labor required to produce these labels were likely expensive and hard to obtain quickly.

The paper presents a new way to obtain image labels, through a game called The ESP Game. The idea behind the game is that image labels can be obtained from players who don’t realize they’re actually providing this data. They just find the game fun and want to play. The game works by matching up two players and showing them a common image. Players are told to try to figure out what word the other player is thinking of, they are not told anything about trying to describe the image presented to them. Each player then enters words until a match is found between the two players. Players also have the option to vote to skip the image if it is too difficult to come up with a word for.

The game also includes this idea of taboo words, which to players are words that cannot be guessed for an image. These words come from previous iterations of the game using the same image, so that multiple labels get generated for each image instead of the same obvious one over and over again. When an image starts to get skipped frequently, it is removed from the pool of possible images. They estimate that with 5,000 players playing the game constantly, all of the roughly 425,000,000 images on Google could be labeled in about a month, with each image getting their threshold of six labels within six months.

The authors were able to show that they’re game was indeed fun and that quality images were generated by its players. They supported the level of fun of their game with usage statistics, indicating that over 80% of the 13,670 players who played the game played it on multiple days. Additionally, 33 people played for more than 50 hours total. These statistics indicate that the game is provides sufficient enjoyment for players to keep coming back to play.

Reflection

This paper is one of the first we’ve read this year that looks at alternatives to paid crowd work. What’s even more impressive is that this paper was published before crowdsourcing really became a thing, and a year before Amazon Mechanical Turk was even launched. The ESP Game really started this idea of gamification of meaningful work, which many people have tried to emulate since, including Google which basically made their own version of this game. While not specifically mentioned by the authors, I imagine few of the players, if any, of this game knew that it was intended to be used for image labeling. This means the players truly just played it for fun, and not for any other reason.

Crowdsourcing through games has many advantages over what we would consider traditional crowdsourcing through AMT. First, and most obviously, it provides free labor. You attract workers through the fun of the game, not through any monetary incentive. This provides additional benefits. With paid work, you have to worry about workers trying to perform the least amount of work for the most amount of money, and this can result in poor quality. With a game, there is less incentive to try to game the system, albeit still some. With paid work, there isn’t much satisfaction lost by cheating your way to more money. But with a game, it will be much less satisfying to get a high score by cheating or gaming the system than it would be to legitimately get a high score, at least for most people. And the authors of this paper found some good ways to combat any possible cheating or collusion between players. While they discussed their strategies for this, however, it would be interesting to hear about if they had to use them at all and how rampant, if it all, cheating became in the game.

Obviously the issue with this approach is making your game fun. The authors were able to achieve this, but not every task that can benefit from crowdsourcing can easily be turned into a fun game. Image labeling just happens to have many possible ways of turning into an interesting game. All of the Metadata games linked to on the course schedule involve image (or audio) labeling. And they don’t hide the true purpose of the work nearly as well. The game descriptions specifically mentioning tagging images, unlike The ESP Game which mentioned nothing about describing the images presented. The fact that Mechanical Turk has become so popular and all the kinds of tasks available on it goes to show how difficult it is to turn these problems into an interesting game.

I do wonder how useful this game would be today. One of the things mentioned several times by the authors is that with 5,000 people playing the game constantly, they could label all images indexed by Google within a month. But that was back in 2004, when they said there were about 425,000,000 images indexed by Google. In the past 10 years, the internet has been expanding at an incredible scale. I was unable to find any specific numbers on images, but Google has indexed over 40 billion web pages. I would imagine the number of images indexed by Google could be nearly as high. This leads to some questions…

Questions

  • Do you think the ESP Game would be as useful today as it was 11 years ago, with respect to the number of images on the internet? What about with respect to the improvements in computer vision?
  • What might be some other benefits of game crowd work over paid crowd work that I didn’t mention? Are there any possible downsides?
  • Can you think of any other kinds of work that might be gamifiable other than the Fold It style games? Do you know of any examples?
  • Do you think it’s ok to hide the fact that your game is providing useful data for someone, or should the game have to disclose that fact up front?

Leave a Reply

Your email address will not be published. Required fields are marked *