Combining crowdsourcing and learning to improve engagement and performance.

Dontcheva, Mira, et al. “Combining crowdsourcing and learning to improve engagement and performance.” Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, 2014.

Discussion Leader (con): Ananya

Useful Link:  http://tv.adobe.com/watch/prototype/creative-technologies-lab-photoshop-games/

Summary

This paper discusses how crowdsourcing and learning can be combined to create an environment that benefits both worker and requester. The learning should be such that the skill developed helps workers not only in crowdsourcing context but also marketable in other contexts.

The authors developed a learning interface “LevelUp” on top of Adobe Photoshop. This interface presents an interactive step by step tutorial for photo editing as “missions”, ordered by increasing difficulty level. The tutorial provides sample images for users to work on or users can use their own images. The users are presented with one step at a time and the user has to finish this step to go to the next round. Each mission is associated with points and the number of points increase with the difficulty of the mission. Users can also earn badges on successfully completing a mission which they can share on social networking sites. The system also gives instant feedback to users on their progress. It has 12 tutorials, divided into three levels. At the end of each level, users test their skill in a challenge round. Images in the challenge round are supplied by requester organization.

The interface has two parts. First part is just the interactive tutorial and the challenge round comes in the next part. The challenge round was created to support crowdsourcing. Unlike the interactive part which presents a set of steps for improving an image, the challenge part just suggests improvements and also lets user improvise.

The implementation and results are divided across three deployments. Deployment 1 consisted only of the interactive tutorial. For evaluation, the authors measured number of missions completed by players, collected player’s feedback monthly, interviewed 6 players and finally compared user behavior before and after the game. Overall, this deployment received positive feedback with more than 60% completing atleast level 1.

The deployment 2 tested whether skills learnt in deployment 1 could be used in real world tasks. This included both interactive tutorial and challenge rounds. The authors performed 3 types of evaluations: 1. behavioral logs — number of images edited and types of edit performed, 2. MTurk workers compared original image to edited image and 3. experts examined the quality of edited images and rated between 1-3 on the basis on usefulness and novelty. The results were mixed. However images in challenge 3 received higher “more useful” rating than challenge 1. The authors derives that the users who went till level 3 were more motivated to learn and do a good job.

In deployment 3, the authors added real-images from requesters from 4 different organizations to the challenge round to find out if certain type of institutions would receive better results over others. They deployed this in two different versions — one that included detailed information about the requester organization and the other that just listed the name of the organization. They analyzed behavioral data that included details about the images edited, qualitative assessments from MTurkers, experts and two requesters, and survey data on user-experience. One of the requester assessed 76 images and rated 60% of edited images better than original and the other assessed 470 images and rated 20% of them better than original.

 

Reflection

When I read this paper, the first thing that came to my mind is “Where is the crowdsourcing aspect?”. The only crowdsourcing part was assessment done by MTurkers who needed no specific skill to do the task. Even that part was redundant since the authors were also getting assessment done by experts. I think the title of the paper and the claim of combining crowdsourcing and learning is misleading.

The participants were not crowd workers who were paid to do the job but rather people who wanted  to prettify their images. Now photoshop on its own being a bit overwhelming, LevelUp seemed to be an easy way to learn basics of photo editing. This is an anecdotal view. However this raises the same question that Dr. Luther (sorry Dr. Luther I might not have quoted you accurately) raised yesterday “Would the results differ if we randomly selected people to either play the game or do the task?”. Does the cause of any action influence the results? It would have been interesting ( and more relevant to the title of the paper) to see if MTurkers (who may or may not have interest in photo editing) were chosen as participants and asked to do the learning and take challenges. If they were not paid for the learning part, would they first sit through the learning part(tutorials) because the skills developed might help them somewhere else or they would directly jump to challenges round and take up the challenges because thats where the money is. Even the authors mentioned this point in ‘Limitations’.

The results presented were convoluted — too many correlation made with no clear explanation. I wish they had presented their analysis in some sort of visual format. Keeping track of so many percentages was hard, at least for me.

It is normally  interesting and easy to learn basic editing technics such as adjusting brightness, contrast, saturation, etc.  But to make novices interested in learning advanced technic is a test of the learning platform. The stats provided did not answer the question “what percentage of novices actually learnt advanced technics?” One of the results in deployment 2 says only 74% of users completed challenge 1, 57% challenge 2 and 39% challenge 3 with no explanation on why so less percentage of people continued till challenge 3 and what percentage of novices completed each challenge.

I am also not convinced with their measure of “usefulness”. Any image, even with basic editing, usually looks better than original and as per the definition, it work will get the highest “usefulness” rating. I wish they had a fourth measure in their scale, say 4, which depended on what kind of technics were used. The definition of “novelty” looked flawed too. I mean it works well in this scenario but a crowdsourcing platform like Amazon Mechanical Turk where  workers are used to getting paid for following instructions as nearly as possible may not show much novelty.

With all the issues that are there, still there were a few things I liked. I liked the idea of offering students to practice their skills not through sample use cases but through real scenarios where their effort may benefit someone or something. I also like the LevelUp interface, it is slick. And as I said earlier photoshop may be overwhelming, so an interactive step by step tutorial definitely helps.

Finally, I thought that the skill gained through such tutorials are good only for limited use or, as we have seen in previous discussions, for the task at hand. But without further knowledge or standard recognitions, I doubt how marketable these skills will be outside.

 

Questions

  • Do you also think there was no crowdsourcing aspect in the paper apart from a few guidelines mentioned in ‘Future work’?
  • Do think the skills developed in similar platforms  can be marketed as advanced skills? How would you change the platform so that the learning here can be used as a professional skill?
  • Do you think the results would have been different if the users were not interested participants but rather MTurkers who were paid participants?

Read More

Success & Scale in a Data-Producing Organization: The Socio-Technical Evolution of OpenStreetMap in Response to Humanitarian Events

Palen, Leysia, et al. “Success & Scale in a Data-Producing Organization: The Socio-Technical Evolution of OpenStreetMap in Response to Humanitarian Events.” Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2015.

Discussion Leader: Ananya

Summary:

OpenStreetMap, often called ‘the Wikipedia of maps’, is a collaborative effort to create a common digital map of the world. This paper analyzes the socio-technical evolution of this large, distributed, volunteer-driven organization by examining its mapping activities during two major disaster events, Haiti earthquake in 2010 and 2013 Typhoon Yolanda.

The Haiti earthquake was the first major event that used OSM during its relief efforts. The repercussions of this sudden influx of usage hindered relief efforts, subsequently giving rise to multiple socio-technical changes within the organization.

The Humanitarian OpenStreetMap Team(HOT) which assists humanitarian organizations with mapping needs was formalized and registered as a non-profit organization just 7 months later.

During Haiti earthquake, several issues such as mapping conflicts and map duplications arose. To address this congestion, HOT created the OSM Task Manager that helped mappers coordinate efficiently. The Administrator creates jobs for large geographical area and the Task Manager divides each job into one of the three types of tasks: ‘yellow’ (which means taken), ‘red’ (awaiting validation) and ‘green’ (completed).

Other changes included getting OSM licensed through ODBL – Open Database License. To attract and retain new participants, OSM upgraded its openstreetmap.org website and interface making it easier for new participants to map. ‘Notes’, a drop-pin annotation feature for users to point out improper tagging or suggest additional information, was also added.

Unlike Wikipedia which laid down governance policies and strategies as the community grew, the OSM governance structure still maintains a non-bureaucratic approach to promote and cope with growth. Furthermore to avert low diversity among contributors LearnOSM materials were translated into 9 languages.

Typhoon Yolanda, another disaster of the scale of Haiti earthquake that struck 4 years later tested OSM’s organizing efforts. The response to the event was significant with 3x increase in the number of contributors. The now well-established HOT coordinated with volunteers using emails, chatrooms. The Task Manager was widely used by the mappers which helped prevent mapping collisions.

However since all the jobs are put into only one instance of Task Manager, there is a possibility of sufficient traffic congestion as the mapping population grows. The OSM is considering multiple solutions to mitigate this problem. It has also developed a range of socio-technical interventions aimed at promoting a supportive environment towards new contributors while managing community growth. This is in stark contrast to Wikipedia’s policy driven strategies.

 
Reflection:

This paper presents a good description of how a community matured within the bounds of two major events that shaped the growth. I really appreciate how the authors tracked changes within the OSM community after the Haiti Earthquake and analyzed effects of those changes with regards to another major event (Typhoon Yolanda) 4 years later. One of the authors being a founding member of HOT definitely helped.

However I am skeptical about the comparisons made between Wikipedia and OSM’s way of operation because despite many commonalities, they still work with very distinct types of data. So a non-bureaucratic collaborative environments that OSM maintains may not work for Wikipedia which also has to deal with a completely different set of challenges associated with creative objects such as plagiarism, editorial disputes, etc.

One of the problems that the authors mention Wikipedia faces is with respect to diversity which the OSM community has made notable efforts to alleviate. Still, the gender disparity that plagued Wikipedia was prevalent in OpenStreetMap. Studies done in 2011 showed Wikipedia had 7% women contributor while in OSM it was worse, only 3%. I wish the authors had detailed more on the new email list that OSM launched in 2013 to promote inclusion of more women and how effective this step was to motivate a currently inactive group.

Although not extensively used, the Notes feature did show some potential use by both new and experienced users. However the authors conjectured that guests may use this for noting transient information such as ‘temporary disaster shelter’. I wonder why this is an issue. Incase of a disaster many important landmarks such as a makeshift emergency shelter or a food distribution center will be temporary and still be part of a relief team’s data needs. Well, an additional step must be developed to update the map when these temporary landmarks are gone.

Overall, this paper provides a good understanding of some of the features of OSM’s management technics and is also one of the first papers that studied OpenStreetMap with such intricacy.

 
Questions:
– Do you think the comparison made in the paper between Wikipedia and OSM about their governance strategy is fair? Will OSM’s collaborative governance style work for Wikipedia?
– How can gender imbalance or other diversity issues be resolved in a voluntary crowdsourcing environment?
– As the authors mention, guests can see Notes as an opportunity for personalization. How do you think OSM can reduce noise in notes? Can the Task Manager work here and label each note as a yellow, red or green task?
– I think a volunteer driven platform like OSM is particularly useful in a disaster situation when the landscape is changing rapidly. Do you feel the same? Can you think of any other volunteer driven application that will help in situational awareness at real time?

Read More

Ensemble: Exploring Complementary Strengths of Leaders and Crowds in Creative Collaboration

Kim, Joy, Justin Cheng, and Michael S. Bernstein. “Ensemble: exploring complementary strengths of leaders and crowds in creative collaboration.” Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 2014.

Discussion Leader : Ananya

Summary:

Ensemble is a collaborative story writing platform where the leader maintains a high-level vision of the story and articulates creative constraints while the contributor contributes new ideas, comments or up votes on existing one.

Scenes are basic collaborative unit of each story. It may correspond to a turning pointing the story that reveals character development, new information and a goal for the next scene. The lead author creates a scene with a prompt and a short story description that suggests what problem the lead author wants to solve in this scene. The scene directs contributors towards specific sections that the author has chosen to be completed.

The contributors can participate via drafts, comments or votes. They can communicate with the author or discuss specific scenes using scene comments. Each scene might have multiple drafts from different contributors. The lead author maintains a creative control by choosing a winning draft for each scene. He can optionally appoint a moderator to edit drafts. He can directly add the draft as a part of original story or take inspiration from the contributions and write his own.

The authors evaluated their platform by running a short story writing competition using the platform, monitoring participant activity during the competition and conducting interviews with seven users. The results suggested that the lead authors spent a significant amount of time revising drafts while the moderators spent time mainly editing drafts created by the lead author and the contributors contributed somewhat more on creating  comments.

 

Reflection:

The idea presented in this paper is not new. Several TV series have been incorporating similar technics for many years now where the series creator defines the story outline and each episode is written by a different member in the team. To me, the novel contribution in this paper, was using this concept to create a online platform for creative collaboration among people who may not know each other.  Infact, one of the results analyzed in the paper was whether lead authors knew contributors previously. 4 out of 20 stories were written by teams made up of strangers. Although out of scope of this paper, I would still like to know how these 4 stories performed qualitatively in comparison to the stories by a team of friends.

The author mentioned 55 Ensemble stories were started but later only 20 of these stories were submitted as entries. Again some analysis on why more than 50% of the stories could not be completed would have been good. And team size of submitted stories ranged from 1 to 7 people. Compared to any crowdsourcing platform this number is minuscule which makes me wonder, can this platform successfully cater to a larger user base where hundreds of people collaborate to write a story (the authors also raise this question in the paper), like we see in any crowdsourced videos these days?

It would be interesting to see how this approach compares to traditional story writing methods, how quality varies when multiple people from different parts of the world collaborate to write a story, how their diverse background effect the flow of the story and how lead authors maneuvers through all these varieties to create the perfect story.

At the end, I feel Ensemble in its current stage is not a platform where a crowd collaborates to write a story rather a platform where crowd collaborates to improve someone else’s story.

 

Questions:

  • In this paper, the authors advertised the competition on several writing forums. Will this strategy work in a more generic and paid platform like MTurk? If yes, do you think only mturkers with writing expertise be allowed to participate? And how should mturkers be paid?
  • How will Ensemble handle ownership issues? Can this hamper productivity in the collaboration environment?
  • The lead author has an uphill task of collecting all drafts/comments/suggestions and incorporating in the story. Do you think it is worth spending extra hours compiling someone else’s idea? How will English literature (assuming only English for now)  per se, be benefited from a crowdsourced story?

Read More