Mohannad Al Ameedi – Human Computation: A Survey and Taxonomy of a Growing Field

Summary

In this paper, the authors aim to classify the Human Computation systems and compare/contrast the term with other terms like crowdsourcing, social computing, and data mining. The paper starts by presenting some definitions of the human computation where all refer to it as a utilizing the human power to solve problems that can’t be solved by the computers yet.

The paper present different computational systems that share some properties with the human computation and yet they are different. The authors highlight some computational systems like social computing, crowdsourcing, and data mining and show the similarities and distinctions with the human computation systems. All systems grouped together under the collective intelligence where humans’ intel solve a big problem.

The paper presents a classification system for human computation systems that is based on six dimensions. The dimensions include motivation, quality control, aggregation, human skills, process order, and task-request cardinality.

The authors presented different ways to that can motivate people to participate in the systems like pay, altruism, enjoyment and others. And presents the pros and cons for each approach.

The authors also presented different approach to improve the quality of the systems like output or input agreements, expert review, and multilevel reviews and ground truth seeding. All these approaches try to get better quality and ways to measure the performance of the system.

There are different aggregation approaches that collect the results of the tasks completed and to formulate the solution of the global problem.

Other dimensions like human skills, process order, and task-request cardinality discuss that skills required, the way order is processed and the pipeline the request can go through.

Reflection

I found one interesting definition of human computation interesting. It defines it as “systems of computers and large numbers of humans that work together in order to solve problems that can’t be solved by either computers or humans”. It is try that if humans can solve problems then there will be no need to use computers and also if systems can solve the problems and automate the solution then there will be no need for humans so both need to work together to solve bigger problems.

I also found the comparison between different systems including the human computation interesting. I personally was thinking that some systems like crowdsourcing is a human computation system, but it appears it is not.  

I agree with the dimensions that define or classify human computation systems as they are accurate measures that help researchers to build new system and to evaluate it.

To connect to other ideas, I found the work is like dynamic programming where we have to solve small problems to eventually solve the global problem. Small tasks are distributed to workers to solve a small problem and then aggregation methods will take these solutions to solve the global problem.

I also found the ground truth seeding quality control approach in similar to the training and testing data in any machine learning algorithm.

Questions

  • What other dimensions can we define to classify a human computation system?
  • There are different approaches that can measure the quality of a human computation systems. Which one is the best?
  •  Can we combine to motivation methods together to get better results? Like combining both pay and the enjoyment to solve a global problem?

Read More

01/29/20 – Runge Yan – Beyond Mechanical Turk

Analysis on Amazon Mechanic Turk and other paid crowdsourcing platform

Ever since the rise of AMT, research and applications have been focusing on this prominent crowdsourcing platform. With the development of similar platforms, some concerns on the use AMT have been solved in various ways. This paper reviews AMT limitations and compares the solution among 7 other popular platforms: ClickWorker, CloudFactory, CrowdComputing Systems, CrowdFlower, CrowdSource, LeadGenius and oDesk.

AMT Limitations are presented in four categories: They are short in quality control, management tools, support for fraud prevention and automated tools. These limitations are further mapped to assessment criteria to focus on detailed solution from all the platforms.

These criteria include the identity and skill of contributors, extra workload management, complex task support and quality control by requesters, and generalized qualification, task collaboration, task recommendation by platforms. By comparing with AMT, future focus research is addressed. Also, the method used in this paper can be improved in a way we set an alternative platform as a baseline.

I tried to work on a platform…

I’ve been thinking since I tried to work on Amazon Mechanic Turk and ClowdFlower (I believe they changed the name to Figure Eight). How does a requestor post a task? The expected format of the input and output from the platform may not match the interface provided by the requestor. I can see most requestors have to transform/code for themselves, but the platforms also start to help here.

Both platforms require identification through credit card and AMT requires SSN. I’m able to use Figure Eight now but AMT refused my signup. I have a relatively new SSN and credit record, which is probably the reason I’m refused. Although CrowdFlower comes into people’s sight and was mentioned more than before, the difference in scale and functionality can be easily spotted in their website layout and structure.

Figure Eight provides a basic task for me to start – A people’s name, current company and position name are given, my job is to go to his LinkedIn profile and make sure of two things: Is he/she working in the same company, and if so, did he/she change to another position? How many positions of this people are active in?  

This should be a simple task, even for a people who’s not familiar with LinkedIn. The reward is relatively low, though. For 10 correct answers I got 1 cent (I think I’m still in an evaluation period). More pay is on the road if I work in a recommended manner, i.e. to try out several simple tasks in different categories, and then put my hands on more complex task, and so on.

Still, I found myself quitting after I made 10 cents. I’m not sure if it’s because I was too casual in sample quiz that I got 8 correct out of 10 and that they decided to give me a longer trail. Compared to several fun-inspired tasks I tried, experience on Figure Eight is not so welcome as I see it.

Back to the analysis. There’s an example that represent many dilemmas in this tripartite workspace: AMT doesn’t care about workers’ profile while oDesk offers public worker profiles showing their identities, skills, rating, etc. It’s hard to maintain a platform that workers can switch between these two options when identification is preferred for some tasks and not for other tasks. And this may refrain requestors from posting their needs.

Questions

Do platforms cooperate? How to combine existing good solution to improve or come up with a better platform?

How does AMT dominate in crowdsourcing for so long and no other platform catches up with it? What are the most significant improvements on AMT in the recent years?

Read More

01/29/20 – Fanglan Chen – Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms

Beyond evaluating the extent of AMT’s influence for research, Vakharia’s paper Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms draws the impact of existing crowd work platforms based on prior work, comparing the operation and workflows among the platforms. In addition, it opens discussions about to what extent that the characteristics of other platforms may have positive and negative impact on the research. Built on analysis of prior white papers, this paper introduces inadequate quality control, inadequate management tools, missing support for fraud prevention, and lack of automated tools. Further, this paper defines twelve ways to comprehensively evaluate the performance of the platforms. At last, the researchers present a comparative results based on their proposed metrics.

While reading this paper, I am thinking that the private crowds problem is also an ethics problem because protection of sensitive or confidential data is significant to different parties. Besides, the ethics problems of the crowd work platforms can be seen from three perspectives, saying, data, human, and interfaces. From the data perspective, we need to ask the requester to provide the rational data by mitigating the bias, such as gender and geo-location. From the human  perspective, we need the platforms to assign the workers tasks randomly. From the interface perspective, the requesters need to provide the interfaces that is symetrics for all categories of data, and the data should follow the IID distribution.

Though I have not used a platform like AMT before, I performed data labeling tasks before. The automated tools are really efficient and important to the kind of worker like this. I was part of a team labeling cars for an object tracking project before. I used a tool to automatically generate the object detection results, but the results are highly biased. However, even if the data is biased, it still assisted us a lot in labeling the data. 

The paper provides a number of criteria to qualify the characteristics of various platforms. For the criterion “Demographics & Worker Identities”, it also needs an analysis on the other criterion, whether it is ethical to release the demographics of workers and requesters. What would be the potential hazard to make the personal information available? The two criterions “Incentive Mechanisms” and “Qualifications & Reputation” seem to have some conflicts with each other. Since if the workers work faster on the tasks, that would potentially affects the quality of the work. As for the quantify metrics, the paper does not provide quantitative metrics to quantify the performance of different crowd work platforms. Hence, it is still difficult for the users and requesters to judge which crowdsourcing platforms are good for themselves. The following questions are worthy of further discussion.

  • What are the main reasons of crowd workers or requesters to pick one particular platform over others as their major platform? 
  • Knowing the characteristics of these platforms, is it possible to design a platform which boasts of all the merits?
  • A lot of criteria are provided in the paper to compare the crowd work platforms, is it possible to develop quantified standard metrics to evaluate the characteristics and performance?
  • Is it necessary or is it ethical for the requesters to know the basic information of the workers, and vice versa?

Read More

01/29/20 – Fanglan Chen –The Future of Crowd Work

This week’s readings on crowdsourcing continue our discussion on ghost work last week. With the vision of what will the future crowd work be like, Kittur et al.’s paper “The Future of Crowd Work” discusses the benefits and drawbacks of crowd work and addresses the major challenges current crowdsourcing is facing up with. The researchers calls for a new framework that can potentially bring more complex, collaborative, and sustainable crowd work. The framework lays out major research challenges in 1) crowd work processes, including designing workflows, assigning tasks, supporting hierarchical structure, enabling real-time response, supporting synchronous collaboration, controlling quality; 2) crowd computation, including crowds guiding AIs, AIs guiding crowds, platforms; and 3) crowd workers, including job design, reputation, and motivation.

I feel this paper opens more questions than it answers. The vision for the future of crowd work is promising, however, with the high-level ideas provided by the researchers, how to achieve the goal is still unclear. I think there are two key questions worthy of discussion. Firstly, is complex crowd work really needed at the current stage of AI development or what type of complex and collaborative crowd work is in need and to what extent? This question links me to a recent talk provided by Yoshua Bengio, one of the “Godfathers of AI,” on NeurIPS 2019. Entitled “From System 1 Deep Learning to System 2 Deep Learning,”  his talk addressed some problems of current AI development — System 1 deep learning — including but not limited to 1) require a large volume of training data to complete naive tasks; 2) poor in generalization among different datasets. It seems the current development of AI is in System 1 and there is still a long way to reach System 2 which requires higher level of cognition, out-of-generation and transferring ability. I think this can partially explain why a large portion of crowd work tasks are labeling or pattern recognition. For simple tasks like this, there seems no need to decompose the work. Currently, it is difficult for us to foresee how fast the AI development and how complex the required crowdsourcing tasks will be. In my opinion, a quantitative study on what portion of current tasks are considered as complex and an analysis of the trend would be useful for a better understanding of the crowd work at the current stage.

Secondly, complex, collaborative, and sustainable crowd work highly depends on the platforms. How to modify the existing crowd work platforms to support the future of crow works remains unclear. The organization and coordination of crowd workers across varying task types and complexity is still lack of consideration in the design and operation of existing platforms, even in large ones, such as AMT, ClickWorker, CloudFactory, and so forth. Based on the observations above, the following questions are worthy of further discussion. 

  • When do we need more complex, collaborative, and sustainable crowd work?
  • How can existing crowd work platforms support the future of crowd work?
  • What organizational and coordination structures can facilitate the crowd work across varying task types and complexity?
  • How can existing platforms boost effective communication and collaboration on crowd work?
  • How can existing platform support for effective decomposition and recombination of tasks, or design interfaces/tools for efficient workflow for complex work?

Read More

01/29/20 – Myles Frantz – Human Computation: A Survey and Taxonomy of a Growing Field

Within the recent emergence of Human Computation, there has been many advancements that have pushed further out of the into the industry. This growth has been so sporadic as many of the terminologies and nomenclature has not been well defined within the scientific area. Though all of these “ideas”are all considered within the umbrella term Human Computation, the common explanation for Human Computation is not strictly defined, as it has been used frequently loosely tied papers and ideas. This work states Human Computation as coupling both the problems that can eventually be migrated to computers and “human participation is directed by the computational system”. Furthermore, the study starts to define related terms that are equally as loosely defined. These terms, under the collective idea of Human Computing, included the common technological terms such as Crowdsourcing, Social Computing, Data Mining, and Collective Intelligence. Following among these more collectively defined definitions,various Crowd Sourcing platforms are compared in a more inclusive classification system. Within the system, various aspects of the Crowd Sourcing platforms are categorized various labels retrieved by common usage in industry and literature.Through those labels include the following terms that are used throughout (to some extent)of each of the crowd sourcing platforms; motivation human skill, aggregation, quality control, process order, and task-request cardinality. Underneath each of these top categories is more sub categories better defining each of the platforms, for example a label like Motivation (for use with each platform) has the following sub-labels underneath it, Pay, Altruism (peoples inherit will to do good), enjoyment, Reputation (to work with a big company) ,and implicit work (underlying work from the system). From helping to tie this vocabulary down to a clearer definition, it is the hope of the authors to better understand each platform and to better realize how to make sure each system is humanly good.

I disagree with how the labeling is created through this system. It is always a fundamental idea that with the classification there may seem to be more “gray area” within some of the platforms put under the label. In addition, this may also stifle some of new creative ideas since these could be the “broader” buckets people use to standardize their ideas. This could be related to ideas such as a standardized test that may miss the general learning while enforcing strictly learning a singular path.

While I do potentially agree with the upper level labeling system itself, I believe the secondary labeling should be left more open-ended.This would be again due to the limiting or even “under shadowing” the of a new discovery by entertaining the idea of making a more distinctive approach to attempt to relate the ideas to commonly collected ideas.

  • I would like to see how many of the crowd sourcing examples are cross listed within any of the dimensions. It seems from their current system the examples listed may be easily (relatively) defined, the others unlisted may be able to fit into the categories that would be dropped from the table.
  • Since this is a common classification system, I would like to see if there has been a user survey (amongst people actively using the technology) done to see if these labels accurately represent the research area.
  • My final question pertaining to this system is how much this has been used actively in the industry. Potentially between advertisements or cores of the new platforms

Read More

01/29/20 – Myles Frantz – An Affordance-Based Framework for Human Computation and Human-Computer Collaboration

Within the endeavor of research visual analytics, there has been much work to solve problems that require the close and interactive relationship between human and machines. With the standard practice for research at the initial basis, each paper usually creates a new standard to improve on the fundamental level of the area or improve upon a previously created standard. To this extent there have been many various projects that excel in their certain areas of expertise, however this paper is endeavors to create a framework to enable relativity (or comparability)between the various features of the projects to further the research. Within some of previous frameworks provided, they each created models based on the best of their abilities, including features such as maturity of the crowd sourcing platform, the model presentation, or the integration types. Whilst these are acknowledged as furthering the field, they are limited to their subsections and “cornering”their framework in relative to the framework presented in this paper. While the idea of the relationship between humans and computers were initially described and conversed from the early 1950’s, it was stabilized in the late 1970’s from J.J. Gibson in which “an organism and its environment complement each other”. These affordances are described are used as some of the core concepts between humans and machines,since through visual analytics the relationship (between human and machine) are at its core. Going through the multitude of papers underwent through this survey, include some of the following human affordances (human “features” required by machines); Visual perception, Visuo spatial thinking, creativity, and domain knowledge. Listed within the machine affordances (machine attributes used or further “exploited” for research purposes) includes some of the following; Large-Scale Data manipulation, efficient data movement, and bias-free analysis. Through these features, there can also be hybrid relationships where both the human and machine features.

In comparison to the other reading for the week, I do agree and like the framework created to relate crowd-sourcing tools and humans. Not only is it of more human aspects (suggesting a better future relationship), it also describes the co-dependency (currently) with a relatively bigger emphasis on human centric interactions.

I do also agree that this framework seems to be a good representation of the standard applications of visual analytics. While acknowledging the merging of both human and machine affordances, the human affordances seem to be enough for the framework. The machine affordances seem enough, though this may be due to the direction of the research in the area.

  • Like the other reading of the week, I would like to see a user study (of researchers or people in the industry in the area)and see how this comparison lines up to practical usage.
  • In the future, would there be a more granular weighting of which affordances are used by which platform? This is more practical of an application though it may help serve as a better direction in which companies (or researchers)could choose their platform to best fit their target audience.
  • Comparing the affordances (or qualities) of a project may not be as fair to each respective (at a high level)to potential consumers. Though potentially being game-able (increasing the numbers through such malicious means) and exaggerated, impact score and depth could help compare each project.

Read More

01/29/20 – Runge Yan – Human computation: a survey and taxonomy of a growing field

What is human computation?

Human computation is a process that makes use of human’s power to solve the problem that could not currently be solved (well) by computers. As several related field emerging in recent years, understanding similarities and difference between them and human computation will contribute in the research and applications.

Collective intelligence consists of crowdsourcing, social computing, a large portion of human computation and a small portion of data mining. Crowdsourcing comes in when a designated job is outsourced to a (large) group of people; social computing refers to when communication in online communities is mediated by technology; data mining is the process of finding patterns among huge amount of data;  collective intelligence is a broad field where products of many people’s action are proved to be wiser.

Human computation can be classified according to 6 dimensions: motivation, quality control, aggregation, human skill, process order and task-request cardinality. Current research and projects on human computation share a combination of different values.

Reflection

This is a comprehensive paper and it covers a lot of dimension in human computation. I found myself stuck for a long time and then came up with many tiny passing thoughts. It’s hard to tell something about the whole similarities/difference or analyze if the number of dimensions should be six…

I believe a human computation system requires a lot of effort, just like the making progress on machine intelligence. Design, implementation, improvement, extra budget, etc. How do people determine the trade-off between the choice of the research towards a better model and the choice of a human computation assistance? I think for now the preference is to find help from human computation, but it’s changing gradually. As the paradox of automation evolves into a higher level, the tasks are also going through an improvement – they require higher-level human skills. While the smartness of machine approach 100% authentic, the role of human in human computation will approach “verifying the decision” rather than “filling the gap”.

If a guy gets all his paycheck from different tasks in human computation, his career will be totally different from someone sitting in the room focusing on certain responsibilities for a long time. The goals of the tasks vary so he/she has to switch really quickly between fields, actions and routines. Precision on first few objects would be lower and goes up gradually. All these require careful observation and adjustment compared to collecting the results from traditional employees.

All the implicit work is probably written in user privacies, but I don’t’ read them:P For example, situations where identity confirmation is needed always annoy me. As a CS student I should understand how important it is to secure my important information, however, “Select all squares with traffic lights” really bothers me. As I know more about machine learning and I realized that I’m helping to train Google’s AI, I became even more angry (I shouldn’t, I actually participated in several human computation with a motivation of altruism). I don’t know if it’s just me or it bothers other people, too.

Questions

How to find a good combination in human computation? An urgent task may pay (higher) to collect the results in need. How to prevent contributors from rushing for money? How to make sure different motivation reflect less on the outcome of the tasks? Will a paycheck have absolutely no influence on contributors? If not, how do we create a balanced inspiration to avoid the bias/influence?

What is the best practice to reveal what’s behind smart machine-move step by step? Contributors have been around for more than 10 years. If it’s not this course asking me to understand and utilize the tool of crowdsourcing, I may not realize that the underlying non-macine intelligence is all around. Why I haven’t heard much about this group of people and their job? Does it come naturally with the feature of their job or is there anything holding their opinion back from public?

Read More

01/29/20 – Rohit Kumar Chandaluri – An Affordance-Based Framework for Human Computation andHuman-Computer Collaboration

Summary:

The author explains the research going on in the visual analytics area to collaborate the work between humans and computers.  While there have been multiple promising examples between human-computer collaboration but there are no proper solutions to the following questions

  1. How can  we tell if a problem will benefit from collaboration?
  2. How do we decide which tasks to delegate to which party and when?
  3. How can one system compare to others trying to solve the same problem?

The author tries to answer to the above questions in the paper. While the author explains the uses of visual analytics that exist in the present world and their usages. The author then explains the different kinds of affordances that exist in the visual analytical human computer collaboration and explains each affordance in detail further.

Reflections:

It was interesting to learn that visual analytics can be seen as a human computer collaboration. For analytics on large data we need larger computation power to visualize the analytical data. It was interesting to learn about the white box human computer collaboration and black box human computer collaboration.  The visual analytics help people with no expertise in the are to provide inputs to the problem.

Questions:

  1. How can we be sure that one visualization is the correct solution for a particular problem?
  2. The people who are developing the visualization tools are the humans, will the area of expertise of the people developing the tool affect the results?
  3. Is visual analytics solution is better than normal machine learning solution to solve the problem?

Read More

Mohannad Al Ameedi – Beyond Mechanical Turk

Summary

In this paper, the authors aim to highlight and explore the features of online crowd works other than Amazon Mechanical Turk (AMT) that is not been investigated by many researchers. They recognize that AMT as a system that made a revolution on data processing and collection, but also lack very crucial features like quality control, automation, and integration.

The paper poses many questions about human computation and presents some solutions. The questions related to the current problems with AMT, features of other platforms, and a way to measure each system.

The authors discuss the limitation the AMT system like lacking a good quality control, lacking a good management tools, and missing a process to prevent fraud, and lacking of a way to automate repeated tasks.

The paper defines criteria to evaluate or asses a crowd work platform. The assessment uses different categories like incentive program, quality measures used in the system, the worker demographic and identity information, worker skills or qualifications, and other categories that they have used to compare/contrast different systems including AMT.

The authors also reviewed like seven AMT-alternatives like ClickWorker, CloudFactory, CrowdComputing Systems, CrowdFlower, CrowdSource, MobileWorks, and oDesk. They show the benefits of each system over AMT using the criteria mentioned above, and show that there is a significant improvement on these systems that make the entire process much better and enable the worker and the requester to interact in a better way.

The crows-platforms analysis done by the authors was the only work at the time the paper was written to compare different systems on a defined criterion, and they hope that this work offer a good foundation for other researchers to get better results.

All platforms are still missing features like analytics data about each worker to provide visibilities of the work that is done by each worker. There is also a lack of security measure to make sure that the system is robust and can respond to any adversarial attack.

Reflection

I found that the features that are missing from Amazon Mechanical Turk interesting giving the volume of the system usage and also given the work that Amazon do today on the cloud computing, marketplace and other area where the quality of the work is well known.

I also found that the technical details mentioned in the paper interesting. It seems to me that the authors go lots of feedback from everybody get involved in the system.

I agree with authors on the criteria mentioned in the paper to asses crowdsourcing systems like pay motivation, quality control, and automation and system integration.

The authors didn’t specify which system is the best on their opinion and which system meet all of their criteria.

Questions

  • Are there any other criteria that we can use to assess the crowdsourcing systems?
  • The author didn’t mention which system is the best. Is there a system that can outperform others?
  • Is there a reason why Amazon didn’t address these findings?

Read More

1/28/2020 – Nurendra Choudhary – Human Computation

Summary:

In this paper, the authors detail the emergence of Human Computation as a new area for research. They provide a solid definition of area’s constitution as:

  • The problems fit the general paradigm of computation, and as such might someday be solvable by computers.  
  • The human participation is directed by the computational system or process.

They further solidify the definition by comparing it with other similar fields and finding the niche differences between them. Additionally, the paper provides a classification system for human computation using six dimensions: 

  1. Motivation: Speaks about different reward mechanisms for humans in the framework. 
  2. Quality Control: Discusses various measures used to maintain quality of work and control humans that cheat the system’s reward.
  3. Aggregation: Strategies for combining work done parallely by independent human labour into a block of usable data.
  4. Human Skill: The extent of human’s ability to facilitate machines. Defines the places where general human computational abilities are better.
  5. Process Order: Different frameworks of defining workflows with humans (requesters/workers) and computers.
  6. Task-Request Cardinality: This defines the task assignment combinations such as if one task can be given to multiple humans or multiple tasks to the same human.

Reflection:

The authors define human computation by relying on previous definitions. But, the previous definitions show the dynamic nature of the field and hence, the definition should evolve over time. This, also, affects the derived system classification and its nomenclature. But, the paper is interesting in its effort to provide clear differences between human computation and relevant fields.

I was very interested in learning the different motivations (except monetary benefit) for human workforce. Similar motivations can be seen in early software development, where developers worked on open-source software without a monetary reward but just out of their nature for contributing to society. I believe given any field, monetary reward can never be the only motivation. Human beings are social animals with stronger instinct to contribute to society. Monetary benefit, on the other hand, is just perceptual. The classification nomenclature given by authors is very static to current worker platforms and I believe is subject to change overtime and hence, I believe it has limited advantage.

Questions:

  1. Can this definition for human computation be static or is it subject to change overtime?
  2. How can the task-request process, itself, change over-time?
  3. To what extent should humans support machines before the challenges overcome the rewards?
  4. What exactly is defined as a growing field? Does it mean the field will grow in research or will it spring out more sub-fields? 

Read More