‘This Is Not a Game’: Immersive Aesthetics and Collective Play – Pro

McGonigal, ‘This Is Not a Game’: Immersive Aesthetics and Collective Play

Dicussion leader (positive discussant): Anamary

Summary:

This paper discusses one instance of how an immersive game can help many people take collective action, and provides an analysis of immersive games and how such games can help map strategies in-game to challenges in the real world.

 

The first portion of the paper discusses Cloudmaker, an online gaming forum for a fictional puzzle game “The Beast”. The game is called an “immersive game” where media like movie trailers, dropped digital clues, like unusual credit attribution, to a rich complex puzzle game with no clear goal or reward. It has 3 core mysteries, 150 characters, with digital and in-person clues, like randomly broadcasting clues to players’ TVs.

 

Gamers play timed puzzles with a massive amount of clues, and solving the clues may deal with anything from programming to the arts, lending itself to be a crowdsourced endeavor. Puzzles meant to be solved in three months were solved in a day. Players were playing all the time, mixing in game elements into the real world, declaring itself that “this is not a game” (TING).

 

What is curious about the community’s 7132 members are their initial reactions to the 9/11 attacks. At the day’s end, the members felt empowered to help solve the mysteries surrounding the attacks, by posting threads like “The Darkest Puzzle” asking for the crowd to help solve the attacks. Many gamers in the crowd mentioned that the virtual game helped shaped their perception of the attacks, and have gained skills to solve the attacks. But the moderators noted that it’s dangerous to connect real-life to a game, and stopped the initial activity.

 

This example brings about two key questions to the piece:

  • What about “the Beast” helped encourage gamers to be confident that they could solve the attacks?
  • What qualities are in the Cloudmaker forum that helped gamers forget the reality of the situation and to debate between whether the game is virtual or real?

The second part answers these two questions. One key aspect about these TING games is that gamers are unsure what parts of real-life are a game and what are not, and this effect was so prevalent that so much so that gamers’ relationships, careers, and social lives were hampered by The Beast. Another similar TING game, Push, had a final solution, but many gamers were not satisfied and thought it had continued. Acting is believing, and these players kept on acting and believing in a game.

 

These gamers also developed strategies on these detective TING games that may be applied to crime-solving as well, such as researching sources, researching the sources themselves, do analysis and see if secondary information connects to hypotheses.  Additionally, gamers felt like they were a part of a collective intelligence, mentioning a sense of belonging into a giant think tank.

These key features (immersion, unsure whether in or out of the game, trained on related strategies, and sense of belonging) helped motivate and move a crowd towards problem-solving, which has several optimistic and negative consequences to them. This paper shows the promise of crowds in problem-solving using game design, and how to design games to motivate and retain these crowds to continue puzzle solving for free.

 

Reflection:

McGonigal’s core message that games can bring crowds together to help solve real-world problems, seems incredibly influential. This paper was published in 2003, and to my memory, games back then were a child’s toy that maybe trained kids to be violent. In the public’s mind, games are just a useless escapist hobby. But, this paper’s core message can be seen in crowdsourcing endeavors that promote public good and awareness, like FoldIt, various other examples seen in class, and in other fields that are more focused on problem-solving complex problems, like visual analytics.

Features that helped motivate TING games may be applied to other crowdsourcing endeavors as well. I remember one of our oldest papers, “Online Economies”, discussed how a sense of belonging helped nurture these communities, and this feature can be seen in Cloudmaker. It would be very interesting to see the more immersive features of TING games applied to crowdsourcing.

The gamers in Cloudmaker did not solve the crime, and there are many good reasons for this (protection of the gamers in an unprotected site, blurring of fiction and reality, false accusations afoot). This reminds me back to the Boston Bomber Reddit incident, where redditors made their own subreddit and collectively tried to solve the identity of the bombers. I hope the anti-paper presenter talks about this, but even the author can’t help but discuss the negatives associative with such a crowd solving crimes.

 

Initially, the subreddit was praised for publicizing key evidence, but ended up accusing and defaming many innocent people. I wonder if there are ways for law enforcement to collalborate better with crowds (which I’m sure is currently researched now!). The capabilities of these crowds is still fascinating, that a huge collective of puzzles meant to span 3 months, were all solved in a single day.

 

I loved the philosophical and psychological aspects employed in these games to summon crowds. The Sunken Cost fallacy is one where you keep on sinking money into a failed project, because you did it before. Similarly, these players were obsessed with the game for so long, that they still see everything as a game.

 

Could we map-reduce this kind of larger massive puzzle? Maybe some parts of these crimes could be broken down into smaller ones, but I imagine many aspects are interrelated.

 

I also wonder if augmented or pervasive games may better help gamers distinguish between reality and games and use their power for good. What if foldIt were combined with an always-on game? Much of the games discussed in the paper were geared towards detective-style games, but I wonder if this style could be employed to solve historical puzzles, public awareness challenges or even puzzles by the crowd, like “how much of the $X I paid in taxes went to what of my government?”.

 

Questions:

 

  1. In the cases of both 9/11 and the Boston bomber, the police usually are selective in what evidence is publicized since some evidence is uncertain. What are your thoughts on designing systems that help crowds collaborate better with law enforcement to harness the crowd’s problem-solving skills, in ways to help protect the crowd and prevent false accusations?
  2. Are there strategies to breaking down this problem solving task into smaller ones that crowds can do? Or does the entire crowd need to see the whole task at once to tackle it effectively, like the beast?
  3. Are there real-life puzzles that are important, but are not as life-threatening like crimes? I’m casting puzzles broadly in these questions. There are many complex challenges and issues that have multiple solutions, like wicked problems, that may be framed like a huge puzzle.
    1. How can these crowd-based games help or not help solve such puzzles?
    2. Can these puzzles be both given to the crowds and solved by the crowds? That is, can the crowd both supply and solve the puzzle?

 

Read More

Shepherding the crowd yields better work

Paper: Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (CSCW ’12). ACM, New York, NY, USA, 1013-1022. DOI=http://dx.doi.org/10.1145/2145204.2145355

 

Discussion Leader: Anamary Leal

 

Summary

The research goal of this work is: How can crowdsourcing support learning? How do we get better, multi-faceted, creative, complex work from unskilled crowds?

Their solution is shepherding – providing meaningful real-time feedback, with the worker iterating on their work with an assessor who will give them feedback.

For the task of writing a product review, a worker would write the review, and then the shepherd gets a notification of a review submission. Then, in real-time, the shepherd gave structured feedback based on a ranking of the quality, checklist of things to cover in the review, and an open-ended question. Then the worker has chance to improve the work.

The authors conducted a comparative study structured around product reviews with a control condition (no feedback), self-assessment, and this external assessment (shepherding). They measured task performance, learning, and perseverance (amount of revisions and string edit distance). The workers and the shepherd were all from the crowd, though the shepherd came from a single reviewer from ODesk. Self-assessment did slightly better than shepherding, and both feedback conditions did significantly better than the no feedback condition. Shepherding resulted in more worker edits, along with more and better revisions.

 

Reflection

This paper is a great stepping stone for more questions to explore. In addition to the attrition question and different learning strategies, I wonder in the longer term, how well do these crowds learn. Short-bursts of learning is one thing (like cramming) but I wonder if those same workers, through more feedback, get better and writing reviews than others. How well do these lessons stick? The role of feedback can help in bringing the dream of having crowdsourcing work we would want our kids to do.

Another stepping stone is to measure with respect to iterations, even if it’s in the short term. How many gains happen if the worker gets 2+ iterations with the assessor, or even in self-assessment?

Feedback, especially external feedback, helped motivate workers to do more and better work. I’m not well versed in educational research, but engagement in teaching the material and assessment are quite important.

The authors took care to mention the attrition rate, and what composed of that rate. I wonder what can be done about that population. Most of the attribution is dropped out too early, but a decent portion was due to plagiarism. I wonder what those participants saw in the task to discourage them to not do the task.

The external condition probably would have not been as successful if the feedback form was not appropriately structured, with a helpful checklist of items to cover. I can imagine that a ton of design work went into that form to guide shepherds to provide prompt constructive feedback that the worker can deliver upon.
In their studies, it looks like workers cannot substitute expert sheepherders and provide quality feedback. But I wonder if that too can be learned? It’s harder to teach something than to just be good at something.

Discussion

  1. How do we get this non-trivial audience who dropped out, to participate? Feedback encouraged more work, so in a general sense, would more feedback lure them in?
  2. If the task turned to assessing reviews compared to writing reviews, which one do you think would require more iterations to get better? Which one would be easier to learn, to write or critique others?)
  3. How much feedback do you think is needed for an unskilled worker to get better at these creative, multi-faceted, complex tasks? Are there some examples, like writing a story, which may need more cycles of writing and review to be better at it?
  4. How do you see (or not see) real-time external assessment fit into your projects, and what do you think the gains would be, after reading this paper?

Read More

The Economies of Online Cooperation: Gifts and Public Goods in Cyberspace

Paper: Kollock, The Economies of Online Cooperation: Gifts and Public Goods in Cyberspace

Discussion leader: Anamary Leal

Summary

This paper discusses features about online internet communities that support cooperation and gift giving (of sometimes very expensive things like hundred-dollar consultations). The authors compare gift and commodity economies; Getting a commodity does not obligate you to get another. Getting a gift means you get the feeling to reciprocate. Gifts at “the thing that so-n-so gave me”, and commodities are “a” thing. In the internet, If you give the gift of free advice, there is no feeling of reciprocation. The gift is given to some huge group. But, there may be a sense of reciprocity within the group.

Online goods are a public good that is indivisible (online person viewing an answer does not hinder another), non-excludable (can’t exclude others from the good), and can be duplicated. Everyone benefits, but it doesn’t mean that it will happen. And, the temptation for online users to not contribute much and still get a ton of benefits, known as free-ride, arises. Only one person needs to pay the cost by contributing (known as the privileged group) to get the most benefit. How do you motivate people to produce the good and to coordinate with others?

One reason is anticipated reciprocity, which is reciprocity from the group itself for help in the future. A good contributor to a forum may feel entitled to receive help from the forum in the future. One study found that they indeed get help more quickly than others. Another is maintaining a reputation online (which also implies that there is an fixed identify set to a contribution to keep track of their contributions.) Self-efficacy is also a well-studied motivator, and the logic is that one will help the group to help make their own impact seem wider.

The paper discusses two case studies in online cooperation. The first is making Linux, and while it had many markings to fail, it succeeded due to one person doing a large amount of the work up front to get it usable, and making it compelling to contributors. Programmers contributed drivers to get Linux to work on their devices.

The second is connecting elementary schools to internet access by organizing an online rally to organize, coordinate volunteers, and accomplish the task in one day. Additionally, a committee also did much of the work in having face-to-face meetings with school officials and such. The online system allowed for people to sign up based on schools’ needs.

The authors caution that while online communities can rally together to do great things, that interest, not necessarily importance, help rally people. A massive plumbing repair job instead of wiring to the internet may be less successful of a job than the massive wiring internet job. Additionally, many digital goods are produced and managed by a small group or even one person, even if initially.

Reflection:

The paper has a few hints of its age, such as stressing the benefits of instantly communicating online compared to doing a mail, TV or newspaper campaign. But, this paper remains to be compelling to start outlining the features of how these communities interact (ongoing interaction, identity persistence, and knowledge of prior interactions.)

In discussing motivation, it was an interesting choice to first discuss motivation without altruism or group attachment to the equation, assuming that everyone is in it for themselves, and then ease into more altruistic motivations like group needs. To keep the discussion focused, it was a good idea. But, while the paper mentioned that it was rare, I wonder how much altruism, group need, or attachment impact how much they contribute.

The authors stress that many of these efforts are started with a small group or one person. In the Linux example, Linus put a an enormous amount of work to get Linux to a usable state, and then released it for programmers contributed and checked themselves on the contributions. There was no SVN, GIT or code control system back then to help support this (or at least, from what I checked.) I can only imagine how hard it was to keep and manage the code repository back then.

Additionally, how big was the size of the core committee that managed the NetDay? It moved 20,000 volunteers, but how many people did the online site, held the face-to-face meetings? I wouldn’t be surprised if it was one or a handful of people who met regularly and coordinated. I also surmise that this project took a large chunk of their time, compared to the regular volunteer who spent a day wiring.

Fast-forward until now, we now have systems to facilitate such endeavors much easier. Yet, I do not see multiple reasonable OS’s or multiple reasonable alternatives to common software. Most commonly used software, used by the majority that I have seen (not just technologists) is a result of a company of either made by Apple, Microsoft or Google. I wonder how much could quality still remain to be a factor. One would think that the more early crowdsourcing efforts would have the most maturity and be the most successful now, instead of potentially less interesting efforts like ones on Amazon Turk.

Discussion:

  1. The discussion of reciprocity is set in terms of accountability and credit, in 1999. What kinds of mechanisms that you have seen online have tried to design to keep track of a user’s contributions to a community? How well do they work or not work?
  2. One would assume that the earliest crowdsourcing efforts would have the most time to mature, and be the most successful (public events to benefit others, and making software). But Turk, with it’s boring tasks, is the most successful, and may not be widely motivating nor interesting. Why are not these online communities the most successful? Are there still challenges unsolved?
  3. What’s the relationship between efforts doing by one individual or a group, compared to the efforts of the crowd? Torvald built an OS, and surely some core set of people met and worked on NetDay for countless hours. In my experience, the most successful massive efforts are led by a core dedicated group meeting live. In other words, how much effort does an individual or group need to put to get these online communities to successfully do these projects?
  4. Could these individuals, in the present time, be able to delegate out some of the core tasks(develop an OS, organize a NetDay od 20,000 volunteers) to others? If so, how so, and which parts could be crodsourced? Are there any technologies or techniques that come to mind? If not, why not?

Read More

Conducting behavioral research on Amazon’s Mechanical Turk

  1. Winter Mason and Siddharth Suri. 2012. Conducting behavioral research on Amazon’s Mechanical Turk. Behavior research methods 44, 1: 1–23. http://doi.org/10.3758/s13428-011-0124-6

Discussion Leader: Anamary Leal

Summary:

This guide both motivates researchers to explore conducting research on Amazon Turk, for numerous benefits (a faster theory-experiment turn around time, a large pool of diverse and qualified participants for low pay, beneficial for tasks that humans are easier to do than computers, quality of data is comparable to laboratory testing, and other benefits we covered in class), and a how-to guide on strategies to conduct behavioral research on Amazon Turk.

They’ve collected data from multiple studies looking at worker demographics, requester information, and what a HIT looks like. A HIT can be internally hosted or externally hosted, made up of multiple assignments, or parts that a worker can do. One worker can only work on one assignment in a HIT. The paper also covers how to make a HIT, collect data, and the closing of a HIT. Additionally, the paper had insight on workers and requester relationships, such as how workers search for HITs, and recommendations for requesters to be successful in making their HITs and engaging with workers.

The paper also covers in depth how how to implement various different kinds experiments, such as how to do synchronous experiments, by first collecting a group of reliable and quality workers and conducting preliminary studies to find such a group. Additional informative aspects mentioned are how to perform randomization assignment,

 

Reflection:

In addition to being a how-to guide, this paper serves as a good literature survey paper, covering the literature alongside covering the how-to guide. There were all kinds of useful data scattered throughout the document, and it was especially helpful.

I just finished using AT for homework before doing the reflection, and according to the review, finding the most recently created HITs was the most popular way workers found HITs, with the second-most being HITs offered so that a worker can learn how perform a task well, and do similar tasks faster. I wonder what do these strategies reveal about how workers use the system. We know that now, a small group of professional workers get the majority of hits. Do workers try to find new HITs to snatch up all the best paying HITs..? Is that a hint of how they use?

In my short experience with AT in doing HITs including IRB-approved research, I still felt like the pay was atrocious compared to the work done, but I also find reasonable the author’s arguments about the convenience of workers  not needing to schedule a time to go into a laboratory study, along with quality not being affected by pay. (though, one of the studies compares quality between paying a penny and 10 cents, which isn’t very much.)

The paper goes into detail about pay in AT, from a practical perspective, to a quality issues, to the ethics of it. There is a negative relationship between “highest wage” tasks and probability of a HIT being taken. I found that the highest wage tasks call for a ridiculous amount of time and effort that such tasks were not worth the effort. In my short experience with AT in doing HITs including IRB-approved research, I still felt like the pay was atrocious compared to the work done,.

This paper was cited at least 808 times (according to Google Scholar), and advocates for positive and professional connections between workers and requester. This paper can still inform requester researchers now in 2015.

Questions:

  1. The paper has cited work showing that workers found hits by finding the most recently made HITs first. What does worker’s strategy say about Turk and it’s workers?
    1. How were your strategies, in finding appropriate HITs to take?
  2. How did this paper affect in how your completed the assignment? Or, if you did the homework first and then read the paper, how would you have done your homework (as a worker and requester) differently?
  3. The authors address one of the biggest questions potential researchers may have: compensation. They presented a case that quality, for certain tasks (not judgement or decision type tasks) generally remains the same, and that in-lab participants should be paid higher. They recommend starting with less than reservation rate ($1.38/hr) and increase with reception. Given our discussions with workers and pay, what do you think of this rate?
  4. The writers encourage that “new requesters “introduce” them-selves to the Mechanical Turk community by first posting to Turker Nation before putting up HITs.” and to “keep a professional rapport with their workers as if they were company employees.” This paper was published in 2012 and cited a lot. How do you see influences of this attitude (or not) among requesters and workers?
  5. How applicable are some of these techniques on M Turk with respect to some of the issues we discussed before, such as Turkopticon’s rating system, Worker’s bill of rights, and other ethical and quality issues?

 

Read More