04/08/20 – Fanglan Chen – CrowdScape: Interactively Visualizing User Behavior and Output

Summary

Rzeszotarski and Kittur’s paper “CrowdScape: Interactively Visualizing User Behavior and Output” explores the research question of how to unify different quality control approaches to enhance the quality control of the work conducted by crowd workers. With the emerging crowdsourcing platforms, many tasks can be accomplished quickly by recruiting crowd workers to collaborate in a parallel manner. However, quality control is among the challenges faced by crowdsourcing paradigm. Previous works focus on designing algorithmic quality control approaches based on either worker outputs or worker behavior, but neither of the approaches is effective for complex or creative tasks. To fill in that research gap, the authors develop CrowdScape, a system that leverages interactive visualization and mixed initiative machine learning to support the human evaluation of complex crowd work. With experimentation on a variety of tasks, the authors present the incorporation of information about worker outputs and worker behavior with worker outputs has the potential to assist users to better understand the crowd and further identify reliable workers and outputs. 

Reflection

This paper conducts an interesting study by exploring the relationship between the outputs and behavior patterns of crowd workers to achieve better quality control in complex and creative tasks. The proposed method provides a unique angle of quality control and it has wide potential usage in crowdsourced tasks. Although there is a strong relationship between the final outputs and the behavior patterns, I feel the design of CrowdScape relies too heavily on the behavior analysis of crowd workers. In many situations, good behavior can lead to good results, but that cannot be guaranteed. From my understanding, behavior analysis is more suitable to be utilized as a screening mechanism in certain tasks. For example, in the video tagging task, workers that have gone through the whole video are more possible to provide accurate tagging. In this case, behavior such as watching is more like a necessary condition, not a sufficient condition. The group of workers who finish watching the videos may still reach disagreement on the tagging output. A different quality control mechanism is still needed in this round. In creative and open tasks, the behavior patterns are even more difficult to capture. By analyzing the behavior by metrics such as time spent on the task, we cannot directly connect the analysis with the measurement of creativity. 

In addition, I think the quality of a crowdsourced task discussed in paper is comparatively narrow. We need to be aware the quality control on crowdsourcing platforms is multifaceted, which depends on the knowledge of the workers on the specific task, the quality of the processes that govern the creation of tasks, the recruiting process of workers, the coordination of subtasks such as reviewing intermediate outputs, aggregating individual contributions, and so forth. A more comprehensive quality control circle needs to take the following aspects into consideration: (1) quality model that clearly defines the dimensions and attributes to control quality in crowdsourcing tasks. (2) assessment metrics which can be utilized to measure the values of the attributes identified by the quality model (3) assurance of quality, which requires a set of actions that aim to achieve expected levels of quality. To prevent low quality, it is important to understand how to design for quality and how to intervene if quality drops below expectations on crowdsourcing platforms.

Discussion

I think the following questions are worthy of further discussion.

  • Can you think about some other tasks that may benefit from the proposed quality control method?
  • Do you think the proposed method can perform good quality control on the complex and creative tasks as the paper suggested? Why or why not?
  • Do you think that an analysis based on worker behavior can assist to determine the quality of the work conducted by crowd workers? Why or why not?
  • What scenarios do you think would be more useful to trace worker behavior, informing them beforehand or tracing without advance notice? Can you think about some potential ethical issues?

4 thoughts on “04/08/20 – Fanglan Chen – CrowdScape: Interactively Visualizing User Behavior and Output

  1. I think this method can benefit the advertisement. Websites can trace what their users are focused on to improve their advertisement positions and content. I think this system can perform good quality control on some complex systems, as it combined the behavioral traces and the outputs. This can help in evaluating the crowd by building a model of their minds. However, for some more complex tasks, the system can still hardly give accurate results about whether the outputs should be accepted or not. In most of the cases, behavior traces are good for evaluating crowd workers. However, for some specific cases, for example, a user used his own text editor and paste the text to the page, the method would become useless. I think it is better to inform the workers beforehand. I think this would let the workers more focus on the tasks and get better results. And if we traced their behavior without informing them, there may be ethical issues.

  2. First, I really like your reflection. This is totally another direction to think about this system which I didn’t even consider. I agree with your opinion about the imbalance design of the proposed method and a more comprehensive quality control process is a need. Nevertheless, I do think the proposed method can perform a better quality control than before, even it’s just a small step of improvement, it did help. Because this method tries to apply to the most common situation and the regular behavior, instead of solving all the cases or digging on a more specific special case. I think it is true output with good quality does not necessarily have to have the same process, but similar process usually has a similar outcome.

  3. To answer your second question, I think the proposed method can perform a good quality because the method can easily filter out poor output and let the requester focus on the high quality output and then machine learning part can compare the workers behaviors with the optimal behavior surface the best combination of both output and behavior.

  4. Hi Fanglan. To answer your first question, I think there are many tasks that can benefit from the approach proposed in the paper. Apart from labeling tasks and annotation tasks, it could perhaps be used for measuring /assessing creativity in AI, rating the quality of writing, speech recognition. etc.

Leave a Reply