Combining crowdsourcing and learning to improve engagement and performance.

Dontcheva, Mira, et al. “Combining crowdsourcing and learning to improve engagement and performance.” Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, 2014.

Discussion Leader (con): Ananya

Useful Link:  http://tv.adobe.com/watch/prototype/creative-technologies-lab-photoshop-games/

Summary

This paper discusses how crowdsourcing and learning can be combined to create an environment that benefits both worker and requester. The learning should be such that the skill developed helps workers not only in crowdsourcing context but also marketable in other contexts.

The authors developed a learning interface “LevelUp” on top of Adobe Photoshop. This interface presents an interactive step by step tutorial for photo editing as “missions”, ordered by increasing difficulty level. The tutorial provides sample images for users to work on or users can use their own images. The users are presented with one step at a time and the user has to finish this step to go to the next round. Each mission is associated with points and the number of points increase with the difficulty of the mission. Users can also earn badges on successfully completing a mission which they can share on social networking sites. The system also gives instant feedback to users on their progress. It has 12 tutorials, divided into three levels. At the end of each level, users test their skill in a challenge round. Images in the challenge round are supplied by requester organization.

The interface has two parts. First part is just the interactive tutorial and the challenge round comes in the next part. The challenge round was created to support crowdsourcing. Unlike the interactive part which presents a set of steps for improving an image, the challenge part just suggests improvements and also lets user improvise.

The implementation and results are divided across three deployments. Deployment 1 consisted only of the interactive tutorial. For evaluation, the authors measured number of missions completed by players, collected player’s feedback monthly, interviewed 6 players and finally compared user behavior before and after the game. Overall, this deployment received positive feedback with more than 60% completing atleast level 1.

The deployment 2 tested whether skills learnt in deployment 1 could be used in real world tasks. This included both interactive tutorial and challenge rounds. The authors performed 3 types of evaluations: 1. behavioral logs — number of images edited and types of edit performed, 2. MTurk workers compared original image to edited image and 3. experts examined the quality of edited images and rated between 1-3 on the basis on usefulness and novelty. The results were mixed. However images in challenge 3 received higher “more useful” rating than challenge 1. The authors derives that the users who went till level 3 were more motivated to learn and do a good job.

In deployment 3, the authors added real-images from requesters from 4 different organizations to the challenge round to find out if certain type of institutions would receive better results over others. They deployed this in two different versions — one that included detailed information about the requester organization and the other that just listed the name of the organization. They analyzed behavioral data that included details about the images edited, qualitative assessments from MTurkers, experts and two requesters, and survey data on user-experience. One of the requester assessed 76 images and rated 60% of edited images better than original and the other assessed 470 images and rated 20% of them better than original.

 

Reflection

When I read this paper, the first thing that came to my mind is “Where is the crowdsourcing aspect?”. The only crowdsourcing part was assessment done by MTurkers who needed no specific skill to do the task. Even that part was redundant since the authors were also getting assessment done by experts. I think the title of the paper and the claim of combining crowdsourcing and learning is misleading.

The participants were not crowd workers who were paid to do the job but rather people who wanted  to prettify their images. Now photoshop on its own being a bit overwhelming, LevelUp seemed to be an easy way to learn basics of photo editing. This is an anecdotal view. However this raises the same question that Dr. Luther (sorry Dr. Luther I might not have quoted you accurately) raised yesterday “Would the results differ if we randomly selected people to either play the game or do the task?”. Does the cause of any action influence the results? It would have been interesting ( and more relevant to the title of the paper) to see if MTurkers (who may or may not have interest in photo editing) were chosen as participants and asked to do the learning and take challenges. If they were not paid for the learning part, would they first sit through the learning part(tutorials) because the skills developed might help them somewhere else or they would directly jump to challenges round and take up the challenges because thats where the money is. Even the authors mentioned this point in ‘Limitations’.

The results presented were convoluted — too many correlation made with no clear explanation. I wish they had presented their analysis in some sort of visual format. Keeping track of so many percentages was hard, at least for me.

It is normally  interesting and easy to learn basic editing technics such as adjusting brightness, contrast, saturation, etc.  But to make novices interested in learning advanced technic is a test of the learning platform. The stats provided did not answer the question “what percentage of novices actually learnt advanced technics?” One of the results in deployment 2 says only 74% of users completed challenge 1, 57% challenge 2 and 39% challenge 3 with no explanation on why so less percentage of people continued till challenge 3 and what percentage of novices completed each challenge.

I am also not convinced with their measure of “usefulness”. Any image, even with basic editing, usually looks better than original and as per the definition, it work will get the highest “usefulness” rating. I wish they had a fourth measure in their scale, say 4, which depended on what kind of technics were used. The definition of “novelty” looked flawed too. I mean it works well in this scenario but a crowdsourcing platform like Amazon Mechanical Turk where  workers are used to getting paid for following instructions as nearly as possible may not show much novelty.

With all the issues that are there, still there were a few things I liked. I liked the idea of offering students to practice their skills not through sample use cases but through real scenarios where their effort may benefit someone or something. I also like the LevelUp interface, it is slick. And as I said earlier photoshop may be overwhelming, so an interactive step by step tutorial definitely helps.

Finally, I thought that the skill gained through such tutorials are good only for limited use or, as we have seen in previous discussions, for the task at hand. But without further knowledge or standard recognitions, I doubt how marketable these skills will be outside.

 

Questions

  • Do you also think there was no crowdsourcing aspect in the paper apart from a few guidelines mentioned in ‘Future work’?
  • Do think the skills developed in similar platforms  can be marketed as advanced skills? How would you change the platform so that the learning here can be used as a professional skill?
  • Do you think the results would have been different if the users were not interested participants but rather MTurkers who were paid participants?

Leave a Reply

Your email address will not be published. Required fields are marked *