04/08/2020 – Nan LI – CrowdScape: Interactively Visualizing User Behavior and Output

Summary:

This paper demonstrates a system called CrowdScape that support human to evaluate the quality of crowd work outputs by presenting an interactive visualization about the information of worker behavior and worker outputs through mixed-initiative machine learning(ML). This paper made the point that quality control for complex and creative crowd work based on either the post-hoc output or behavioral traces alone is insufficient. Therefore, the author proposed that we could gain new insight into crowd worker performance by combining both behavioral observations and knowledge of worker output. CrowdScape system presents the visualization of each individual traces that include mouse movement, keypresses, visual scrolling, focus shifts, and clicking through an abstract visual timeline. The system also aggregates these features use a combination of 1-D and 2-D matrix scatter plots to show the distribution of the features to enable dynamic exploration. The system also explores the worker output by recognizing the worker submissions pattern and providing means. Finally, CrwodScape enables users to analyze the mental models of tasks and worker behaviors, and use these models for the verification of worker output and majority or gold standards. The author also demonstrates four experiment results to illustrate the practical operation and also prove the effectiveness of the system.

Reflection:

I think the author made a great point regarding address the quality control issue of crowdsourcing. Quality control approaches are limited and even not guaranteed for most of the system, which using crowdsourcing as part of the component. The most frequently used approach I have seen so far based on the policy that worker’s salary determined by the quality of their work. It is the most reasonable approach to encourage workers to provide high-quality results. Besides, another straightforward approach is to choose the similar or the same work(such as tag, count numbers) provided by most workers.

Nevertheless, the author proposed that we should consider the quality control for more complex and creative work because these type of tasks has appeared more often. However, there is no appropriate quality control mechanism exists. I think this mechanism is essential in order to utilize crowdsourcing better.

I believe the most significant advantage of this CrowdScape is that the system can be used very flexibly based on the type of task. From the scenario and case studies presented in the paper, the user could evaluate the worker’s output using different attributes as well as interactive visualization method based on the type of the task. Further, the types of visualization are varied, and each of them can interpret and detect differences and patterns in workers’ behavior and their work. The system design is impressed because the interface of the system is userfriendly based on the figures in the paper combine with the explanation.

The only concern is that as the increase in the number of workers, the points and lines on the visualization interface will become so dense that no pattern can be detected. Therefore, the system might need data filter tools or interactive systems to deal with this problem.

Questions:

  1. What are the most commonly used quality control approaches? What is the quality control approach that you will apply to your project?
  2. There are many kinds of HITS on MTurk, do you think what type of work requires quality control and what kinds of work do not.
  3. For information visualization, one of the challenges is to deal with a significant amount of data. How we should deal with this problem regarding the CrowdScape system?

Word Count: 588

One thought on “04/08/2020 – Nan LI – CrowdScape: Interactively Visualizing User Behavior and Output

  1. I like the point you raise about how the CrowdScape system is task-agnostic and can be used for a varied number of diverse tasks. I definitely agree that this a significant advantage of the system.

    I also agree with your concern that with an increase in the number of workers, the lines and points would be very dense and recognizing patters would become increasingly difficult. I wonder if performing a clustering step before displaying the visualization would be useful in this case. It would help group similar users (based on a set of pre-defined factors) and then would reduce the overall number of points/lines being represented in the visualization. An interactive system, like you mention, might also show good performance.

Leave a Reply