‘This Is Not a Game’: Immersive Aesthetics and Collective Play – Pro

McGonigal, ‘This Is Not a Game’: Immersive Aesthetics and Collective Play

Dicussion leader (positive discussant): Anamary

Summary:

This paper discusses one instance of how an immersive game can help many people take collective action, and provides an analysis of immersive games and how such games can help map strategies in-game to challenges in the real world.

 

The first portion of the paper discusses Cloudmaker, an online gaming forum for a fictional puzzle game “The Beast”. The game is called an “immersive game” where media like movie trailers, dropped digital clues, like unusual credit attribution, to a rich complex puzzle game with no clear goal or reward. It has 3 core mysteries, 150 characters, with digital and in-person clues, like randomly broadcasting clues to players’ TVs.

 

Gamers play timed puzzles with a massive amount of clues, and solving the clues may deal with anything from programming to the arts, lending itself to be a crowdsourced endeavor. Puzzles meant to be solved in three months were solved in a day. Players were playing all the time, mixing in game elements into the real world, declaring itself that “this is not a game” (TING).

 

What is curious about the community’s 7132 members are their initial reactions to the 9/11 attacks. At the day’s end, the members felt empowered to help solve the mysteries surrounding the attacks, by posting threads like “The Darkest Puzzle” asking for the crowd to help solve the attacks. Many gamers in the crowd mentioned that the virtual game helped shaped their perception of the attacks, and have gained skills to solve the attacks. But the moderators noted that it’s dangerous to connect real-life to a game, and stopped the initial activity.

 

This example brings about two key questions to the piece:

  • What about “the Beast” helped encourage gamers to be confident that they could solve the attacks?
  • What qualities are in the Cloudmaker forum that helped gamers forget the reality of the situation and to debate between whether the game is virtual or real?

The second part answers these two questions. One key aspect about these TING games is that gamers are unsure what parts of real-life are a game and what are not, and this effect was so prevalent that so much so that gamers’ relationships, careers, and social lives were hampered by The Beast. Another similar TING game, Push, had a final solution, but many gamers were not satisfied and thought it had continued. Acting is believing, and these players kept on acting and believing in a game.

 

These gamers also developed strategies on these detective TING games that may be applied to crime-solving as well, such as researching sources, researching the sources themselves, do analysis and see if secondary information connects to hypotheses.  Additionally, gamers felt like they were a part of a collective intelligence, mentioning a sense of belonging into a giant think tank.

These key features (immersion, unsure whether in or out of the game, trained on related strategies, and sense of belonging) helped motivate and move a crowd towards problem-solving, which has several optimistic and negative consequences to them. This paper shows the promise of crowds in problem-solving using game design, and how to design games to motivate and retain these crowds to continue puzzle solving for free.

 

Reflection:

McGonigal’s core message that games can bring crowds together to help solve real-world problems, seems incredibly influential. This paper was published in 2003, and to my memory, games back then were a child’s toy that maybe trained kids to be violent. In the public’s mind, games are just a useless escapist hobby. But, this paper’s core message can be seen in crowdsourcing endeavors that promote public good and awareness, like FoldIt, various other examples seen in class, and in other fields that are more focused on problem-solving complex problems, like visual analytics.

Features that helped motivate TING games may be applied to other crowdsourcing endeavors as well. I remember one of our oldest papers, “Online Economies”, discussed how a sense of belonging helped nurture these communities, and this feature can be seen in Cloudmaker. It would be very interesting to see the more immersive features of TING games applied to crowdsourcing.

The gamers in Cloudmaker did not solve the crime, and there are many good reasons for this (protection of the gamers in an unprotected site, blurring of fiction and reality, false accusations afoot). This reminds me back to the Boston Bomber Reddit incident, where redditors made their own subreddit and collectively tried to solve the identity of the bombers. I hope the anti-paper presenter talks about this, but even the author can’t help but discuss the negatives associative with such a crowd solving crimes.

 

Initially, the subreddit was praised for publicizing key evidence, but ended up accusing and defaming many innocent people. I wonder if there are ways for law enforcement to collalborate better with crowds (which I’m sure is currently researched now!). The capabilities of these crowds is still fascinating, that a huge collective of puzzles meant to span 3 months, were all solved in a single day.

 

I loved the philosophical and psychological aspects employed in these games to summon crowds. The Sunken Cost fallacy is one where you keep on sinking money into a failed project, because you did it before. Similarly, these players were obsessed with the game for so long, that they still see everything as a game.

 

Could we map-reduce this kind of larger massive puzzle? Maybe some parts of these crimes could be broken down into smaller ones, but I imagine many aspects are interrelated.

 

I also wonder if augmented or pervasive games may better help gamers distinguish between reality and games and use their power for good. What if foldIt were combined with an always-on game? Much of the games discussed in the paper were geared towards detective-style games, but I wonder if this style could be employed to solve historical puzzles, public awareness challenges or even puzzles by the crowd, like “how much of the $X I paid in taxes went to what of my government?”.

 

Questions:

 

  1. In the cases of both 9/11 and the Boston bomber, the police usually are selective in what evidence is publicized since some evidence is uncertain. What are your thoughts on designing systems that help crowds collaborate better with law enforcement to harness the crowd’s problem-solving skills, in ways to help protect the crowd and prevent false accusations?
  2. Are there strategies to breaking down this problem solving task into smaller ones that crowds can do? Or does the entire crowd need to see the whole task at once to tackle it effectively, like the beast?
  3. Are there real-life puzzles that are important, but are not as life-threatening like crimes? I’m casting puzzles broadly in these questions. There are many complex challenges and issues that have multiple solutions, like wicked problems, that may be framed like a huge puzzle.
    1. How can these crowd-based games help or not help solve such puzzles?
    2. Can these puzzles be both given to the crowds and solved by the crowds? That is, can the crowd both supply and solve the puzzle?

 

Read More

ʻThis Is Not a Gameʼ: Immersive Aesthetics and Collective Play

Jane McGonigal

Summary

This paper describes the concept of immersive gaming.  In order to convey this concept, the author gives an example of an online group of gamers known as the Cloudmakers.  This online group of gamers were a group of people who enjoy games that involve solving puzzles.  As described in the paper, this group proudly adapted the theory of being a collective detective who employed all of the resources at their disposal to solve any mystery/puzzle that was presented to them (no matter how obscure).  It was around this time that a massive immersive game was created that catered to groups of people with similar interests: The Beast.  This game created an effective means of virtual immersion.  The entire point of this game was to make it as close to reality as possible.  The creators of this game went as far as denying the existence of the game itself in order to promote its underlying theme of a conspiracy.  This game’s popularity was stemmed from the fact that it went beyond strictly online gaming and offline lives of its players in order to promote the augmented reality of this game.    It facilitated the need for collaboration among all of its players because it created such a complex network of puzzles that one person alone could not possibly solve all the problems.

Next, this paper gave a brief section on the difference between immersive and pervasive gaming.  Although they have many similar characteristics, these two differ in one fundamental manner: immersive games attempt to disguise the existence of the game to create a more realistic sense of a conspiracy, whereas a pervasive game is promoted and openly marketed to gain attraction.  In addition, immersive games encourage collaboration whereas the (Nokia Game) provided incentives to solvers of the game (which implicitly limited collaboration).  The Beast was a very complex network of puzzles, whereas the Nokia Game was simple enough that a single player could solve the game.

This paper then states some of the side effects of creating such immersive games.  It briefly tells of games as becoming too addictive and could potentially harm peoples’ lives.  However, it also emphasizes the players’ burning desire to keep the game play going and them consistently trying to make a conspiracy when one simply does not exist.  It makes the case that if these players have such a burning desire to solve complex puzzles, why not utilize their expertise and intelligence on real world problems?  Instead of fabricating conspiracies, why not apply them to the problems that governments currently face in order to come up with solutions?

Reflections

This paper was very interesting because I didn’t know that communities such as this existed.  I have heard of clans and groups forming in MMORPG’s such as World of Warcraft, however, never a game whose sole purpose was to be disguised so much as to make players question whether it was “not just a game”.  I appreciate the fact that the author pointed out some of the downfalls of this type of gaming.  These types of games can become highly addictive and cause massive amounts of personal damage to the gamers’ lives.  In addition, it can help to create a sense of paranoia to an already flustered society that we currently live in.   The fact that players are so willing to jump into the flames to solve any problem that is being thrown at them means that they may be manipulated at any point to solve real world problems without them ever knowing it.  However, this seems to be a double-edged sword.  If communities such as the Cloudmakers were put to solve a real-world task, they might stumble upon something that was not meant for the public, causing mass hysteria and/or as Rheingold stated: create a mob mentality.  I know that this example is a bit of a stretch, but one could almost consider Anonymous roughly similar to Cloudmaker.  They are a group of hackers/activists who are actively working on solving a problem and/or uncovering some truth that is meant to stay hidden.  I believe that if we were to employ these type of games, it could quickly turn into a form of attack.  For example, if there was a task published to hack into company X’s website (part of the game), and the players succeeded, this could potentially cause much harm to the company.  But who would be the person to get blamed? Would it be the person who got tricked into hacking the website in the first place or the pseudo game designer that left a vague clue that may or may not be interpreted.  This paper stated that the online community is very intelligent and that it greatly surpassed the game-maker’s expectations, if this intelligence was put to malicious use, it could have some potentially disastrous results. A great example of this could be the users of Reddit who falsely accused someone of being behind the Boston bombing.

Questions

  1. How do you draw a line to distinguish game from reality
  2. Should such an addicting type of game be banned
  3. Is it wise to employ online intelligence to solve sensitive problems
  4. Wouldn’t this create a constant sense of paranoia and eventually lose faith in the government?

 

Read More

To Play or not to Play: Interactions between Response Quality and Task Complexity in Games and Paid Crowdsourcing

Krause, M., Kizilcec, R.: To Play or not to Play: Interactions between Response Quality and Task Complexity in Games and Paid Crowdsourcing. Conference on Human Computation and Crowdsourcing, San Diego, USA (2015)

Pro Discussion Leader: Adam

Summary

This paper takes a look at how the quality of paid crowdsourcing compares to the quality of crowdsourcing games. There has been a lot of research comparing expert work to non-expert crowdsourcing quality. The choice is a trade off between price and quality, with expert work costing more but having higher quality. You may need to pay more non-expert crowdworkers to get a comparable level of quality to a single paid expert worker. This same trade off may exist with paid versus game crowd work. There has been research in the cost of making a crowdsourcing take gamified, but nothing comparing the quality of game to paid crowd work.

The authors achieve this by creating two tasks, a simple one and a complex one, and create a game version and paid version for the crowd to complete. The simple task is a simple image labeling task. Images were found by searching Google image search on 160 common nouns. Then for each image, the nouns on the webpage the image was found are tallied. More frequently occurring nouns are considered more relevant to the image. For the game version, they modeled it after Family Feud, so that more relevant labels were given more points. The paid task was simple in that it asked workers to provide a keyword for an image.

In addition to the simple task, the authors wanted to see how quality compared in a more complicated task. To do this, they created a second task that had participants look at a website and provide a question that could be answered by the content of the website. The game version is modeled after Jeopardy, with high point values assigned for more complex websites. The paid version had a predetermined order the tasks were completed in, but in both cases participants completed all of the same tasks.

Overall, the quality of the game tasks was significantly higher than the paid tasks. However, when broken down between simple and complex tasks, only the complex task had significantly higher quality for the game version. The quality for the complex task was about 18% higher for the game task, as rated by the selected judges. The authors suggest one reason for this is the selectiveness of game players. Paid workers will do any task as long as it pays them well, but game players will only play the games that actually appeal to them. So only people really interested in the complex task played the game, leading to higher engagement and quality.

Reflections

Gamification has the potential to generate a lot of useful data from crowd work. While one of the benefits of gamification is that you don’t have to pay workers, it still has a cost. Creating an interesting enough game is not an easy or cheap process, and the authors take that into consideration when framing game crowdsourcing. They are essentially comparing game participants to expert crowd work. It has the potential to generate higher quality work, but at a higher cost than paid non-expert crowd work. However, there difference is that with the game, it’s more of a fixed cost. So if you need to collect massive amounts of data, the cost of creating the game may be amortized to the point where it’s cheaper than paid non-expert work, with at least of high if not higher quality.

I really like how the authors try to account for any possible confounding variables between game and paid crowdsourcing tasks. We’ve seen from some of our previous papers and discussions that feedback can play a large role in the quality of crowd work. It was very important that the authors considered this by incorporating feedback into the paid tasks. This provides much more legitimacy to the results found in the paper.

There is also a simplicity to this paper that makes it very easy to understand and buy into. The authors don’t create too many different variables to study. They use a between-subjects design to avoid any cross-over effects. And there analysis is definitive. There were enough participants to give them statistically significant results and meaningful findings. The paper wasn’t weighed down with statistical calculations like some papers are. They keep the most complicated statistical discussion to two short paragraphs to appease any statisticians who might question their results, but their calculations for the comparisons of quality between the two conditions is very straightforward.

Questions

  • Games have a fixed cost for creation, but are there any other costs that should be considered when deciding whether to go the route of game crowdsourcing versus paid crowdsourcing?
  • Other than instantaneous feedback, are there any other variables that could affect the quality between paid and game crowd work?
  • Was there any other analysis the others should have performed or any other variables that should have been considered, or were the results convincing enough?

Read More

A Critique of: “To Play or not to Play: Interactions between Response Quality and Task Complexity in Games and Paid Crowdsourcing”

R. K. Markus Krause, “To Play or not to Play: Interactions between Response Quality and Task Complexity in Games and Paid Crowdsourcing,” 2015.

Devil’s advocate: Will Ellis

Summary

In this paper, Krause and Kizilcec ask the research questions, “Is the response quality higher in games or in paid crowdsourcing?” and, “How does task complexity influence the difference in response quality between games an paid crowdsourcing?” To answer these questions, the authors devise and carry out an experiment where they test four experimental treatments between 1,262 study participants. Each experimental group has either a simple or complex task set to perform and either performs the task set as a web browser game or as paid crowdwork. As participants self-selected for each treatment and were sourced from online resources—Newgrounds and Kongregate in the case of players and Crowdflower in the case of workers—rather than recruited from a general population and assigned an experimental treatment, the number of participants in each group varies widely. However, for each group, 50 participants were selected at random for analysis.

The authors employed human judges to analyze the quality of responses of the selected participants and used this data to form their conclusions. The simple task consisted of labeling images. Authors employed the ESP game as the gamefied version of this task, having participants earn points by guessing the most-submitted labels for a particular image. Paid crowdworkers were simply given instructions to label each image and were given feedback on their performance. The complex task consisted of participants generating “questions” to given text excerpts, which was meant to mimic the game show Jeopardy. In fact, authors employed a Jeopardy-like interface in the gamefied version of the task. Players selected text excerpts with a particular category and difficulty from a table, and attempted to generate questions, which were automatically graded for quality (though not “ground truth”). On the other hand, paid crowdworkers were given each text in turn and asked to create a question for each. Answers were evaluated in the same automated way as the gamefied task, and workers were given feedback with the opportunity to revise their answers.

In their analysis of their data, authors found that while there was no statistically significant difference in quality between players and workers for the simple task, there was a statistically significant 18% increase in response quality for players over workers for the complex task. Authors posit that the reason for this difference is that, since players choose to play the game, they are interested in the task itself for its entertainment quality. Workers, on the other hand, choose to do the task for monetary reward and are less interested in the quality of their answers. While it is easier to produce quality work for simple tasks with little engagement in the work, higher quality work for complex tasks can be achieved by gamefying such tasks and recruiting interested players.

Critique

The authors’ conclusions rest in large part on data gathered from the two complex task experiments, which ask participants to form Jeopardy-style “questions” as “answers” to small article excerpts. This is supposed to contrast to the simple task experiments using the ESP game, which was developed as a method for doing the work of labeling pictures. However, the authors do not give justification that the Jeopardy game, serving as the complex task experimental condition, is an appropriate contrast to the ESP game.

The ESP game employs as its central mechanic an adaptation of Family Feud-style word guessing. It is a tried and true game mechanic with the benefit that it can be harnessed for the useful work of labeling images with keywords, as was discussed in [Ahn and Dabbish, 2004]. On the surface, the authors’ use of the Jeopardy game mechanic seems similar, but I believe they’ve failed to use it appropriately in two ways that ultimately weaken their conclusions. Firstly, the mechanic itself seems poorly adapted to the work. A text excerpt from an article is not a Jeopardy-style “answer”, and one need only read the examples in the paper to see the “questions” that participants produce based on those answers make no sense in the “Jeopardy” context. Such gameplay did induce engagement in self-selected players, producing quality answers in the process, but it should not be surprising that, in the absence of the game, this tortured game mechanic failed to induce engagement in workers and, thus, failed to produce answers of quality equal to that of the entertainment incentive experimental condition.

This leads into what I believe is the second shortcoming of the experiment, which is that the complex task, as paid work, is unclear and produces nothing of clear value, both of which likely erode worker engagement. Put yourself in the position of someone playing the game version of this task, and assume that, after a few questions, you find it fun enough to keep playing. You figure out the strategies that allow you to achieve higher scores, you perform better, and your engagement is reinforced. Now put yourself in the position of a worker. You’re asked to, in the style of Jeopardy, “Please write down a question for which you think the shown article is a good response for.” From the paper, it’s clear you’re not then presented with a “Jeopardy”-style answer but instead the first sentence of a news article. This is not analogous to answering a Jeopardy question, and what you may write has no clear or even deducible purpose. It is little wonder that, in an effort to complete the task, bewildered workers would try to only do what is necessary to get their work approved. Compare this to coming up with a keyword for an image, as in the simple paid experimental condition. In that task, what is expected is much clearer, and even a modestly computer-literate worker could suppose the benefit of their work is improving the labeling of images. In short, while it may indeed be the simplicity of a task that induces paid workers to produce higher quality work and the difficulty of a task that causes them to produce lower quality work, this experiment may only show that workers produce lower quality work for confusing and seemingly pointless tasks. A better approach may be to, as with the ESP game, turn complex work into a game instead of trying to turn a game into complex work.

Read More

Show Me the Money! An Analysis of Project Updates during Crowdfunding Campaigns

Xu, A., Yang, X., Rao, H., Huang, S.W., Fu, W.-T., Bailey, B.P.: Show me the Money! An Analysis of Project Updates during Crowdfunding Campaigns. In: Accepted: The Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Toronto, Canada (2014)

Discussion Leader: Mauricio

Summary

This paper presents an analysis on project updates for crowdfunding campaigns and the role they play in the outcome of the campaign. Project updates’ original intent is to be a form of communication from project creators to keep funders aware of the progress of the campaign. The authors analyzed the content and usage patterns of updates on Kickstarter campaigns, and elaborated a taxonomy of seven types of updates. Furthermore, they found that specific uses of updates had stronger associations with campaign success than the actual project’s description. They conclude the paper by talking about design implications for designers of crowdfunding systems in order to better support the use of updates.

The authors sampled 8,529 campaigns and found that the chance of success of a project without updates was 32.6% vs. 58.7% when the project had updates. By analyzing how creators use updates, they identified the following themes in updates: Progress Report, Social Promotion, New Content, New Reward, Answer Questions, Reminders, and Appreciation. In their study, they collected 21,234 publicly available updates, and then proceeded to assign themes to these updates.

They also divided campaign duration in three phases: initial, middle, and final; and each update was assigned to one of them. Taking into account the theme of the update and when the update was posted, they arrived at very interesting findings. Reminder updates offered the most significant influence when it comes to campaign success and Answer Questions updates had the least influence. New Reward Updates were more likely to increase the chance of success than New Content updates. These two kinds of updates indicate that the project creators have revised the project in some way; so this shows that offering new rewards is more effective than changing the project itself. Looking into the representation of the project, they found that update representation is more predictive of success than the representation of the project page. In terms of timing, they found that high number of Social Promotion updates in the initial phase, high number of Progress Report updates in the middle phase, and high number of New Reward updates in the final phase all are positively correlated with success.

Finally, the authors discuss design implications for crowdfunding systems to better support campaigns. They suggest that these systems should provide templates for each of the types of updates available. They also mention that these platforms should offer guidance to project creators so that they better elaborate their updates, e.g., provide update guidelines, allow creators to learn from prior successful examples, help creators develop strategies for advertising their campaigns, and guide creators as to when to post what type of update.

Reflections

This paper shows a very interesting take on crowdsourcing campaigns. Though prior work shows that project representation is important, the authors point that more emphasis should be put into the representation, themes, and timing of updates for campaign success. I think this is very interesting because crowdsourcing platforms don’t put too much emphasis on updates. For example, for Kickstarter, their top rule for success is to create a project video on the project page. Though they found that number of updates was higher in successful campaigns than unsuccessful campaigns, I do wonder if there can be “too many” updates and if this can lead to campaign failure. It would be interesting to see if a very high number of updates can become annoying to funders to the point of causing a negative correlation with campaign success; and if so, what type of updates are the ones that annoy people the most. I imagine it can be very difficult to design an experiment around this, since researchers would have to complete the proposed project if they get the funding.

One of the most interesting finding that the authors arrived at is the difference between posting New Reward and New Content updates. New Content updates are for changes in the project itself; this can be viewed as improving the product to attract customers. New Reward updates are for new rewards to attract funders; this can be viewed as offering discounts to attract customers. When the authors first posted the question of which one would be more effective (before arriving to their results), I thought that New Content updates would be more effective, as I saw New Rewards as a form of desperation by the project creators to try to reach their funding goal, which would show that the project is not going well. But I was proven wrong, as New Reward updates were shown to be more likely to increase the chance of success. This seems to indicate that, since people already pledged to the content of the project, they are not really interested in more new content, but in new rewards. However, according to their findings, there were more New Content than New Reward updates. Project creators, therefore, would need to focus more on revising reward levels when it comes to having more chances of success.

In addition, for New Reward updates, a high number of updates in the final phase was positively correlated with campaign success. One reason could be that the initial reward offered served as a reference point, and additional rewards change funders’ perceptions and affects their pledge decisions. I think this is related to the “anchoring effect”, which refers to the human tendency to rely heavily on the first piece of information in making subsequent judgments.

I also like the design implications that they elaborated, but I wonder if they can become too much of a burden for crowdfunding platforms to implement. In addition, it can also become an annoyance for the project creators, as being prompted when to post what kind of update, being given guidelines as to what and when to say things in social media, etc., can become too intrusive.

Questions

  • Do you think that a crowdfunding campaign can provide “too many” updates? If so, what type of updates should creators avoid posting in high numbers and high frequency?
  • If you were to start a crowdfunding campaign to fund the project related to your research or your project for this class, what types of updates and rewards would you give your funders and potential funders?
  • From the perspective of crowdfunding platforms such as Kickstarter, do you think it is worth it to implement all the design implications mentioned in this paper?
  • If you have contributed to a crowdfunding campaign in the past, what were the reasons that you contributed? And, did the updates the creators provided influenced you one way or the other?

Read More

Understanding the Role of Community in Crowdfunding Work

Hui, Greenberg, Gerber. “Understanding the Role of Community in Crowdfunding Work” Proceedings of the 17th ACM Conference on Computer supported cooperative work & social computing. ACM, 2014.

Discussion Leader: Sanchit

Crowdsourcing example: Ushahidi – Website

Summary:

This paper discusses several popular crowdfunding platforms and the common interactions that project designers have with  the crowd in order for them to get properly funded and supported. The authors describe crowdfunding as a practice designed to solicit financial support from a distributed network of several hundred to thousands of supporters on the internet. The practice is a type of entrepreneurial work in that they both require “discovery, evaluation, and exploitation of opportunities to introduce novel products, services, and organizations”. With crowdfunding, a niche has to be discovered and evaluated so the target crowd can be convinced to provide financial support in return for a novel product or service that would benefit both the supporters and the project initiator.

Crowdfunding in recent times is almost entirely dependent on online communities like Facebook, Twitter and Reddit. The authors talk about the importance of having a large online presence because word of mouth through the internet travels much faster than any other medium. By personally reaching out to people in social media, project creators allow a trustworthy relationship to develop between the crowd and them and this can lead to more people funding the project.

The authors conducted a survey of 47 crowdfunding project creators that ranged from a variety of different project ideas and backgrounds. Some creators ended up having successful crowdfunding projects and made a good margin to continue developing and distributing their proposed product. Others weren’t as lucky since some people lacked a strong online presence which turns out to be one of the most important aspects of having a successful crowdfunding project.

According to the authors, a crowdfunding project requires five tasks in the project’s lifespan. (1) Preparing the initial campaign design and ideas, (2) testing the campaign material, (3) publicizing the project the public through social media, (4) following through with project promises and goals, and (5) giving back to the crowdfunding community. It turns out that coming up with a novel idea or product is a very small portion of the entire story of crowdfunding. The process of designing an appealing campaign was very daunting for several creators because they had never worked with video editing or design software before. Ideas for design and promotion mostly came from inspiration blogs and even paid mentors. Testing these campaign ideas was done through an internal network of supporters and some even skipped the step to instead gain feedback when they eventually got supporters. Publicizing depended largely on weather or not the product got picked up by a popular news source or social media account. If creators got lucky, they would have enough funding to support their project and be able to deliver the product to the supporters. However, even this task was difficult for the majority of the creators who were working alone on the project and didn’t have enough resources to add additional people for assistance. Lastly, almost all creators wished to give back to the crowdfunding community by funding projects that their supporters create in the future or by providing advice to future crowdfunding creators.

 

Reflection:

Overall, I thought the paper was a fairly straightforward summary and overview of what happens behind-the-scenes in a crowdfunding project. I have personally seen several Kickstarter campaigns for cool and nifty gadgets primarily through Reddit or Facebook. This shows that unless someone actively looks for crowdsourcing projects, a majority of these projects are stumbled upon through social media websites by other people. Popularity plays a huge part in the success of a crowdfunding project and it makes perfect sense that it does. Having a product that is popular amongst a majority of people will become funded quicker, so creating a product and convincing campaign associated with it is equally important. These social engineering tasks aren’t everyone’s cup of tea though. I can totally relate to the author’s comments on artistic people having a better fundraising background than scientific researchers which allows them to create a much more convincing campaign and have a very forward approach in trying to recruit support using social media platforms. These skills aren’t really drilled into researchers to convince peers that their research is important since their work should speak for itself.

While reading through the paper I also noticed how much additional baggage and onus one has to take responsibility for in order to get a project funded. Creating videos, posters, graphics, t-shirts, gifts and eventually/hopefully delivering the final product to customers is a very demanding process. It’s no wonder that some of these people spend part-time job hours just maintaining their online presence. I personally don’t see this being used as a primary source of income because there is way too much overhead and risk involved to expect any sort of reasonable payback. This is especially true when most of the funded money is used for creating and delivering product and then eventually giving back the money to other community projects. With crowdsourcing platforms such as Amazon MTurk, there is at least a guarantee that some amount of money will be made, no matter how small. If you play the game smart, then at the very least it’s easy beer money. With crowdfunding, a project gaining enormous traction, let alone reaching its goal is a big gamble dependent on a lot of other variable factors than just pure objective work-skill.

The tools and websites designed to aid crowdfunding campaigns are definitely helpful and are honestly expected to exist at this point. Whenever there is a crowd-based technology, I feel like Reddit immediately forms a subreddit dedicated to it and there is constant chatter, suggestions and ideas for success. Similarly, people who want to help themselves and others develop tools to make project development easier and stress free. These tools and forums are great places for general advice, but I agree with the authors in that it is not personal. The idea of having an MTurk based feedback system for crowdfunding campaigns is a brilliant and easy-to-implement one. Just linking the project page and asking for feedback for a higher than average cost will provide a lot of detailed suggestions to help convince future supporters to fund a project.

Overall, the idea of crowdfunding is great, but I wish the paper touched on the fees that Kickstarter and some other crowdfunding platforms take to provide this service to people. It is a cost that people should consider when and if deciding to start a crowdfunding project no matter how big or small.

Discussion:

  1. Have you guys contributed to crowdfunding projects? Or ever created a project? Any interesting project ideas that you found?
  2. Do you agree with the occupational gap the author hinted at? i.e. Artistic project creators have an easier time than scientific project creators for crowdfunding.
  3. Thoughts on having incentives for donating or funding a larger amount than other people? Good idea or will people be skeptical of the success of the project regardless and still donate the minimum amount?
  4. Would you use Kickstarter to donate for poverty/disaster-stricken areas than donate to a reputable charity? There are several donation based projects and I wonder why people would trust those more than charities.

Read More

Success & Scale in a Data-Producing Organization: The Socio-Technical Evolution of OpenStreetMap in Response to Humanitarian Events

Palen, Leysia, et al. “Success & Scale in a Data-Producing Organization: The Socio-Technical Evolution of OpenStreetMap in Response to Humanitarian Events.” Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2015.

Discussion Leader: Ananya

Summary:

OpenStreetMap, often called ‘the Wikipedia of maps’, is a collaborative effort to create a common digital map of the world. This paper analyzes the socio-technical evolution of this large, distributed, volunteer-driven organization by examining its mapping activities during two major disaster events, Haiti earthquake in 2010 and 2013 Typhoon Yolanda.

The Haiti earthquake was the first major event that used OSM during its relief efforts. The repercussions of this sudden influx of usage hindered relief efforts, subsequently giving rise to multiple socio-technical changes within the organization.

The Humanitarian OpenStreetMap Team(HOT) which assists humanitarian organizations with mapping needs was formalized and registered as a non-profit organization just 7 months later.

During Haiti earthquake, several issues such as mapping conflicts and map duplications arose. To address this congestion, HOT created the OSM Task Manager that helped mappers coordinate efficiently. The Administrator creates jobs for large geographical area and the Task Manager divides each job into one of the three types of tasks: ‘yellow’ (which means taken), ‘red’ (awaiting validation) and ‘green’ (completed).

Other changes included getting OSM licensed through ODBL – Open Database License. To attract and retain new participants, OSM upgraded its openstreetmap.org website and interface making it easier for new participants to map. ‘Notes’, a drop-pin annotation feature for users to point out improper tagging or suggest additional information, was also added.

Unlike Wikipedia which laid down governance policies and strategies as the community grew, the OSM governance structure still maintains a non-bureaucratic approach to promote and cope with growth. Furthermore to avert low diversity among contributors LearnOSM materials were translated into 9 languages.

Typhoon Yolanda, another disaster of the scale of Haiti earthquake that struck 4 years later tested OSM’s organizing efforts. The response to the event was significant with 3x increase in the number of contributors. The now well-established HOT coordinated with volunteers using emails, chatrooms. The Task Manager was widely used by the mappers which helped prevent mapping collisions.

However since all the jobs are put into only one instance of Task Manager, there is a possibility of sufficient traffic congestion as the mapping population grows. The OSM is considering multiple solutions to mitigate this problem. It has also developed a range of socio-technical interventions aimed at promoting a supportive environment towards new contributors while managing community growth. This is in stark contrast to Wikipedia’s policy driven strategies.

 
Reflection:

This paper presents a good description of how a community matured within the bounds of two major events that shaped the growth. I really appreciate how the authors tracked changes within the OSM community after the Haiti Earthquake and analyzed effects of those changes with regards to another major event (Typhoon Yolanda) 4 years later. One of the authors being a founding member of HOT definitely helped.

However I am skeptical about the comparisons made between Wikipedia and OSM’s way of operation because despite many commonalities, they still work with very distinct types of data. So a non-bureaucratic collaborative environments that OSM maintains may not work for Wikipedia which also has to deal with a completely different set of challenges associated with creative objects such as plagiarism, editorial disputes, etc.

One of the problems that the authors mention Wikipedia faces is with respect to diversity which the OSM community has made notable efforts to alleviate. Still, the gender disparity that plagued Wikipedia was prevalent in OpenStreetMap. Studies done in 2011 showed Wikipedia had 7% women contributor while in OSM it was worse, only 3%. I wish the authors had detailed more on the new email list that OSM launched in 2013 to promote inclusion of more women and how effective this step was to motivate a currently inactive group.

Although not extensively used, the Notes feature did show some potential use by both new and experienced users. However the authors conjectured that guests may use this for noting transient information such as ‘temporary disaster shelter’. I wonder why this is an issue. Incase of a disaster many important landmarks such as a makeshift emergency shelter or a food distribution center will be temporary and still be part of a relief team’s data needs. Well, an additional step must be developed to update the map when these temporary landmarks are gone.

Overall, this paper provides a good understanding of some of the features of OSM’s management technics and is also one of the first papers that studied OpenStreetMap with such intricacy.

 
Questions:
– Do you think the comparison made in the paper between Wikipedia and OSM about their governance strategy is fair? Will OSM’s collaborative governance style work for Wikipedia?
– How can gender imbalance or other diversity issues be resolved in a voluntary crowdsourcing environment?
– As the authors mention, guests can see Notes as an opportunity for personalization. How do you think OSM can reduce noise in notes? Can the Task Manager work here and label each note as a yellow, red or green task?
– I think a volunteer driven platform like OSM is particularly useful in a disaster situation when the landscape is changing rapidly. Do you feel the same? Can you think of any other volunteer driven application that will help in situational awareness at real time?

Read More

CommunitySourcing: engaging local crowds to perform expert work via physical kiosks

Heimerl, Kurtis, et al. “CommunitySourcing: engaging local crowds to perform expert work via physical kiosks.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2012.

Discussion Leader: Shiwani

Summary:

In this paper, the authors introduce a new mechanism, called community-sourcing, which is intended to facilitate crowd-sourcing when domain experts are required. Community-sourcing is different from other platforms in the sense that it involves the placement of physical kiosks in locations which are likely to attract the right people and aims to involve these people when they have surplus time (e.g. when they are waiting).

The authors defined three cornerstones for the design of a community sourcing system, viz. task selection, location selection and reward selection.

To evaluate the concept, the authors created a system called Umati. Umati consisted of a vending machine interfaced with a touch screen. Users could earn “vending credit” by completing tasks on the touch screen, and once they had earned enough credit, they could exchange the same for items from the vending machine. Although Umati was programmed to accept a number of different tasks, the authors selected the tasks of exam grading and filling out a survey for their evaluation (task selection). Research evidence suggests that redundant peer grades were strongly correlated with expert scores, which made grading an interesting task to choose; while the survey task helped the authors capture demographic information about the users.  Umati was placed in front of the major lecture hall in the Computer Science building, which mainly supported computer science classes (location selection). The authors chose snacks (candies) as reward, as food is a commonly used incentive for students to participate in campus events (reward selection).

For their evaluation, the authors generated 105 sample answers to 13 questions, which were taken from prior mid-term questions for the second semester undergraduate course CS2. These answers were then evaluated on two systems, Umati and Amazon Mechanical Turk, as well as experts. A special spam detection logic was implemented by adding golden standard questions.

The results showed a strong correlation between Umati evaluations and expert evaluations, whereas Amazon Mechanical Turk evaluations differed greatly. Additionally, and more interestingly, Umati was able to grade exams more accurately, or at a lower cost than traditional single-expert grading. The authors mention several limitations of their study such as the duration of the study, privacy concerns, restriction to a particular domain, etc. Overall, Umati looks promising but requires more evaluation.

 

Reflection:

The title of the paper gives the perfect summary of what community-sourcing is about, viz. local crowds, expert work, and physical kiosks. It is a novel and pretty interesting approach to crowd-sourcing.

I really like the case the authors make for this new approach. They talk about the limitations and challenges in accessing experts. For example, some of the successful domain-driven platforms do so by creating competitions which is not the best approach for high-volume work. Others seek to identify the “best answer” (StackOverflow) which is not great for use-cases such as grading. Secondly, there are many natural locations where there is cognitive surplus (e.g. academic buildings, airport lounges, etc.) and the individuals in these locations can serve as “local experts” under certain conditions. Thirdly, having a physical kiosk as a reward system, thereby giving out tangible rewards seems like a great idea.

I also like that the authors situate community sourcing very well and state where it would be applicable, that is, specifically for “short-duration, high volume tasks that require specialized knowledge or skills which are specific to a community but widely available within the community”. This is perhaps a very niche aspect, but an interesting niche and the authors gave some examples of where they could see this being applied (grading, tech support triage, market research, etc).

The design for Umati, the evaluation system, was quite thorough and clearly based on the three chief design considerations put forward by the authors (location, reward, task). Every aspect seems to have been thought through and reviewed. An example is the fact that the survey task had more credit (5 credits) than the grading tasks (1 credit each), which I assume was to encourage users to provide their demographic information.

The spam detection concept used was the use of gold standard questions, and exclusion of participants who failed more than one such question. Interestingly, while for Umati this meant that the user was blacklisted (based on ID), the data upto the point of blacklisting was still used in the analysis. On the other hand, for AMT, two sets of data were presented, one including all responses and one which was filtered based on the spam detection criteria.

Another interesting point is that about 19% of users were blacklisted. The authors explain that this happened in some cases because some users forgot to log out, and in some cases, because users were merely exploring and did not realize that there would be repercussions. I wonder if the authors performed any pilot tests to catch this?

The paper presents a few more interesting ideas such as the double-edged effects of group participation, as well as the possibility that the results may not be generalizable due to the nature of the study (specific domain and tasks). I did not find any further work performed by the authors to extend this, which was a little unfortunate. There was some work related to community sourcing, but along very different lines.

Last, but not least, Umati had a hidden benefit for the users: grading tasks could potentially improve their understanding of the material, especially when the tasks were performed through discussions as a group. This opens up potential for instructors encouraging their students to participate in exchange for some class credit perhaps.

Discussion:

  1. The authors decided to include the data up to the failing of the second gold standard question for Umati users. Why do you think they chose to do that?
  2. Do you think community sourcing would be as effective if it was more volunteering-based, or if the reward was less tangible (virtual points, raffle ticket, etc)?
  3. 80% of the users had never participated in a crowdsourcing platform. Could this have been a reason for its popularity? Do you think interest may go down over a period of time?
  4. The paper mentions issue such as the vending machine running out of snacks, and people getting blacklisted because they did not realize there would be repercussions. Do you think having some controlled pilot tests would have re-mediated these issues?
  5. None of the AMT workers passed the CS qualification exam (5 MCQs on computations complexity and Java) , but only 35% failed the spam detection. The disparity in pay was $0.02 between the normal HIT, and the HIT with the qualification exam. Do you think the financial incentive was not enough, or was the gold standard not as effective?
  6. The authors mentioned an alternative possibility of separating the work and reward interfaces, in order to scale the interface both in terms of users and tasks. Do you think this would still be effective?

Read More

Bringing semantics into focus using visual abstraction

Zitnick, C. Lawrence, and Devi Parikh. “Bringing semantics into focus using visual abstraction.” Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.

Discussion Leader: Nai-Ching Wang

Summary

To solve the problem of relating visual information and linguistic semantics of an image, the paper proposes to start studying with abstract images instead of real images to avoid complexity and low-level noise in real images. By using abstract images, it makes it possible to generate and reproduce same or similar images depending on the need of study while it is nearly impossible to do so with real images. This paper demonstrates this strength of using abstract images by recruiting different crowd users on Amazon Mechanical Turk to 1) create 1002 abstract images, 2) describe the created abstract images and 3) generate 10 images (from different crowd users) for each description. With this process, images with similar linguistic semantic meaning are then produced because they are created from the same description. Because the parameters of creation of the abstract images are known (or can be detected easily), the paper is able to find semantic importance of visual features derived from occurrence, person attributes, co-occurrence, spatial location and depth ordering of the objects in the images. The results also show that suggested important features have better recall than using low-level image features such as GIST and SPM. This paper also shows that visual features are highly related to text used to describe the images.

Reflections

Even though crowdsourcing is not the main focus of the paper, it is very interesting to see how crowdsourcing can be used and be helpful in other research fields. I really like the idea of generating different images with similar linguistic semantic meaning to find important features that determine the similarity of linguistic semantic meaning. It might be interesting to see the opposite way of study, that is, generating different descriptions with similar/same images.

For the crowdsourcing part, the quality control is not discussed in the paper probably due to its focus but it would be surprising if there was no quality control of the results from crowd workers during the study because as we discussed during class, we know maximizing compensation within a certain amount of time is an important goal for crowdsourcing markets such as Amazon Mechanical Turk. As we can imagine how to achieve that goal by submitting very short description and random placement of clip art. In addition, if multiple descriptions are required for one image, then how is the final description selected?

I can also see other crowdsourcing topics related to the study in the paper. It would be interesting to see how different workflows might affect the results. For example, ask the same crowd worker to do all the three stages vs. different crowd workers for different stages vs. different crowd workers to work collaboratively. With the setting, we might be able to find individual difference and/or social consensus in linguistic semantic meaning. In section 6, it seems to me that this part is somewhat similar to the ESP game and the words might be constrained to some types based on the need of research.

Overall, I think this paper is a very good example to show how we can leverage human computation along with algorithmic computation to understand the human cognition.

Questions

  • Do you think in real images, the reader of the images will be distracted by other complex features such that the importance of some features will decrease?
  • As for the workflow, what are the strengths and drawbacks of using same crowd users to do all the 3 stages vs. using different crowd users for different stages?
  • How do you do the quality control of the produced images with descriptions? For example, how do you make sure the description is legitimate for the given image?
  • If we want to turn the crowdsourcing part into a game, how will you do it?

Read More

VQA: Visual Question Answering

Paper: S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, “Vqa: Visual question answering,” ArXiv preprint arXiv:1505.00468, 2015.

Summary

This paper presents a dataset to use for evaluating AI algorithms involving computer vision, natural language processing, and knowledge representation and reasoning. Many tasks that we once thought would be difficult for an AI algorithm to perform, such as image labeling, have now become fairly commonplace. Thus, the authors hope to help push the boundaries of AI algorithms by providing a data set for a more complex problem that combines multiple disciplines, which they name Visual Question Answering. Others can then use this dataset to test their algorithms that try to solve this highly complex puzzle.

The authors use around 200,000 images from the Microsoft Common Objects in Context data set, and 50,000 abstract scene images created by the authors. For each of these images, they collected three open ended questions from crowdworkers. And then for each of these questions, they collected ten answers from unique crowdworkers. An accuracy measure was then applied to each answer, where if at least three responses to a question were identical, that answer was deemed to be 100% accurate. This concluded the data collection for generating the data set, but they then used crowdworkers to evaluate the complexity of the questions received.

There were three criteria examined to justify the complexity of these questions: whether the image is necessary to answer the question, whether the questions require any common sense knowledge not available in the image, and how well the question can be answered using the captions alone and not the actual image. The studies conducted to support this successfully showed that the questions generated generally required the image to answer, a fair amount require some form of common sense, and the questions are answered significantly better with access to the image than with just access to the captions. Finally, the authors used various algorithms to test their effectiveness against this data set, and found that current algorithms still significantly underperform compared to humans. This means that the data set can successfully test the abilities of a new complex set of AI algorithms.

Reflection

While the purpose of this paper is focused on artificial intelligence algorithms, a large portion of it involves crowd work. It is not specifically mentioned in the body of the paper, but from the description and from the acknowledgements and figure descriptions you can tell that the question and answer data was collected on Amazon Mechanical Turk. And this isn’t surprising given the vast amount of data they collected (nearly 10 million question answers). It would be interesting to learn more about how the tasks were setup and the compensation, but the crowdsourcing aspects are not the focus of the paper.

One part of the paper that I thought was most relevant to are studies of crowd work was the discussion of how to get the best complex, open-ended questions relating to the pictures. The authors used three different prompts to try to get the best answers out of the crowdworkers: ask a question that either a toddler, alien, or smart robot would not understand. I thought it was very interesting that the smart robot prompt produced the best questions. This prompt is actually fairly close to reality, as the smart robot could just be considered modern AI algorithms. Good questions are ones that can stump these algorithms, or the smart robot.

I was surprised that the authors chose to go with exact text matches for all of their metrics, especially given the discussion regarding my project last week with the image comparison labeling. The paper mentions a couple reasons for this, such as not wanting things like “left” and “right” to be grouped together, and because current algorithms don’t do a good enough job of synonym matching for this type of task. It would be interesting to see if the results might differ at all if synonym matching were used. The exact matching was used in all scenarios, however, so adding in synonym matching would theoretically not change the relative results.

That being said, this was a very interesting article that aimed to find human tasks that computers still have difficulty dealing with. Every year that passes, this set of tasks gets smaller and smaller. And this paper is actually trying to help this set get smaller more quickly, by helping test new AI algorithms for effectiveness. The workers may not know it, but for the tasks in this paper they were actually working toward making their own job obsolete.

Questions

  • How would you have set up these question and answer gathering tasks, regarding the number that each worker performs per HIT? How do you find the right number of tasks per HIT before the worker should just finish the HIT and accept another one?
  • Is it just a coincidence that the “smart robot” prompt performed the best, or do you think there’s a reason that the closest to the truth version produced the best results (are crowdworkers smart enough to understand what questions are difficult for AI)?
  • What do you think about the decision to use exact text matching (after some text cleaning) instead of any kind of synonym matching?
  • How much longer are humans going to be able to come up with questions that are more difficult for computers to answer?

Read More