Heimerl, Kurtis, et al. “CommunitySourcing: engaging local crowds to perform expert work via physical kiosks.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2012.
Discussion Leader: Shiwani
Summary:
In this paper, the authors introduce a new mechanism, called community-sourcing, which is intended to facilitate crowd-sourcing when domain experts are required. Community-sourcing is different from other platforms in the sense that it involves the placement of physical kiosks in locations which are likely to attract the right people and aims to involve these people when they have surplus time (e.g. when they are waiting).
The authors defined three cornerstones for the design of a community sourcing system, viz. task selection, location selection and reward selection.
To evaluate the concept, the authors created a system called Umati. Umati consisted of a vending machine interfaced with a touch screen. Users could earn “vending credit” by completing tasks on the touch screen, and once they had earned enough credit, they could exchange the same for items from the vending machine. Although Umati was programmed to accept a number of different tasks, the authors selected the tasks of exam grading and filling out a survey for their evaluation (task selection). Research evidence suggests that redundant peer grades were strongly correlated with expert scores, which made grading an interesting task to choose; while the survey task helped the authors capture demographic information about the users. Umati was placed in front of the major lecture hall in the Computer Science building, which mainly supported computer science classes (location selection). The authors chose snacks (candies) as reward, as food is a commonly used incentive for students to participate in campus events (reward selection).
For their evaluation, the authors generated 105 sample answers to 13 questions, which were taken from prior mid-term questions for the second semester undergraduate course CS2. These answers were then evaluated on two systems, Umati and Amazon Mechanical Turk, as well as experts. A special spam detection logic was implemented by adding golden standard questions.
The results showed a strong correlation between Umati evaluations and expert evaluations, whereas Amazon Mechanical Turk evaluations differed greatly. Additionally, and more interestingly, Umati was able to grade exams more accurately, or at a lower cost than traditional single-expert grading. The authors mention several limitations of their study such as the duration of the study, privacy concerns, restriction to a particular domain, etc. Overall, Umati looks promising but requires more evaluation.
Reflection:
The title of the paper gives the perfect summary of what community-sourcing is about, viz. local crowds, expert work, and physical kiosks. It is a novel and pretty interesting approach to crowd-sourcing.
I really like the case the authors make for this new approach. They talk about the limitations and challenges in accessing experts. For example, some of the successful domain-driven platforms do so by creating competitions which is not the best approach for high-volume work. Others seek to identify the “best answer” (StackOverflow) which is not great for use-cases such as grading. Secondly, there are many natural locations where there is cognitive surplus (e.g. academic buildings, airport lounges, etc.) and the individuals in these locations can serve as “local experts” under certain conditions. Thirdly, having a physical kiosk as a reward system, thereby giving out tangible rewards seems like a great idea.
I also like that the authors situate community sourcing very well and state where it would be applicable, that is, specifically for “short-duration, high volume tasks that require specialized knowledge or skills which are specific to a community but widely available within the community”. This is perhaps a very niche aspect, but an interesting niche and the authors gave some examples of where they could see this being applied (grading, tech support triage, market research, etc).
The design for Umati, the evaluation system, was quite thorough and clearly based on the three chief design considerations put forward by the authors (location, reward, task). Every aspect seems to have been thought through and reviewed. An example is the fact that the survey task had more credit (5 credits) than the grading tasks (1 credit each), which I assume was to encourage users to provide their demographic information.
The spam detection concept used was the use of gold standard questions, and exclusion of participants who failed more than one such question. Interestingly, while for Umati this meant that the user was blacklisted (based on ID), the data upto the point of blacklisting was still used in the analysis. On the other hand, for AMT, two sets of data were presented, one including all responses and one which was filtered based on the spam detection criteria.
Another interesting point is that about 19% of users were blacklisted. The authors explain that this happened in some cases because some users forgot to log out, and in some cases, because users were merely exploring and did not realize that there would be repercussions. I wonder if the authors performed any pilot tests to catch this?
The paper presents a few more interesting ideas such as the double-edged effects of group participation, as well as the possibility that the results may not be generalizable due to the nature of the study (specific domain and tasks). I did not find any further work performed by the authors to extend this, which was a little unfortunate. There was some work related to community sourcing, but along very different lines.
Last, but not least, Umati had a hidden benefit for the users: grading tasks could potentially improve their understanding of the material, especially when the tasks were performed through discussions as a group. This opens up potential for instructors encouraging their students to participate in exchange for some class credit perhaps.
Discussion:
- The authors decided to include the data up to the failing of the second gold standard question for Umati users. Why do you think they chose to do that?
- Do you think community sourcing would be as effective if it was more volunteering-based, or if the reward was less tangible (virtual points, raffle ticket, etc)?
- 80% of the users had never participated in a crowdsourcing platform. Could this have been a reason for its popularity? Do you think interest may go down over a period of time?
- The paper mentions issue such as the vending machine running out of snacks, and people getting blacklisted because they did not realize there would be repercussions. Do you think having some controlled pilot tests would have re-mediated these issues?
- None of the AMT workers passed the CS qualification exam (5 MCQs on computations complexity and Java) , but only 35% failed the spam detection. The disparity in pay was $0.02 between the normal HIT, and the HIT with the qualification exam. Do you think the financial incentive was not enough, or was the gold standard not as effective?
- The authors mentioned an alternative possibility of separating the work and reward interfaces, in order to scale the interface both in terms of users and tasks. Do you think this would still be effective?