04/08/20 – Jooyoung Whang – CrowdScape: Interactively Visualizing User Behavior and Output

In this paper, the authors try to help Mturk requesters by providing them with an analysis tool called “Crowdscape.” Crowdscape is a ML + visualization tool for viewing and filtering Mturk worker submissions based on the workers’ behaviors. The user of the application can threshold based on certain behavioral attributes such as time spent or typing delay. The application takes in two inputs: Worker behavior and results. The behavior input is a timeseries data of user activity. The result is what the worker submitted for the Mturk work. The authors focused on finding similarities of the answers to graph on parallel coordinates. The authors conducted a user study by launching four different tasks and recording user behavior and result. The authors conclude that their approach is useful.

This paper’s approach of integrating user behavior and result to filter good output was interesting. Although, I think this system should overcome a problem for it to be effective. The problem lies in the ethics area. The authors explicitly stated that they obtained consent from their pool of workers to collect user behavior. However, some Mturk requesters may decide not to do so with some ill intentions. This may result in intrusion of private information and even end up to theft. On the other hand, upon obtaining consent from the Mturk worker, the worker becomes aware of him or her being monitored. This could also result in unnatural behavior which is undesired for system testing.

I thought the individual visualized graphs and figures were effective for better understanding and filtering by user behavior. However, the entire Crowdscape interface looked a bit overpacked with information. I think a small feature to show or hide some of the graphs would be desirable. The same problem existed with another information exploration system from a project that I’ve worked in. In my experience, an effective solution was to provide a set of menus that hierarchically sorted attributes.

These are the questions that I had while reading the paper:

1. A big purpose of Crowdscape is that it can be used to filter and retrieve a subset of the results (that are thought to be high quality results). What other ways could this system be used for? For example, I think this could be used for rejecting undesired results. Suppose you needed 1000 results and you launched 1000 HITs. You know you will get some ill-quality results. However, since there are too many submissions, it’ll take forever to filter by eye. Crowdscape would help accelerate the process.

2. Do you think you can use Crowdscape for your project? If so, how would you use it? Crowdscape is useful if you, the researcher, is the endpoint of the Mturk task (as in the result is ultimately used by you). My project uses the results from Mturk in a systematic way without ever reaching me, so I don’t think I’ll use Crowdscape.

3. Do you think the graphs available in Crowdscape is enough? What other features would you want? For one, I’d love to have a boxplot for the user behavior attributes.

Read More

04/08/2020 – Ziyao Wang – CrowdScape: Interactively Visualizing User Behavior and Output

The authors presented CrowdScape, which is a system used for supporting the human evaluation of increasing numbers of complex crowd work. The system used interactive visualization and mixed-initiative machine learning to combine information about worker behavior with the worker outputs. This system can help users to better understand the crowd workers and leverage their strength. They developed the system from three points to meet the requirement of quality control in crowdsourcing: output evaluation, behavioral traces, and integrated quality control. They visualized the workers’ behavior, quality of outputs and combined the findings of user behavior with user outputs to evaluate the work of the crowd workers. This system has some limitations, for example, it cannot work if the user completes the work in a separate text editor and the behavior traces are not detailed enough. However, this system is still good support for quality control.

Reflections:

How to evaluate the quality of the outputs made by the crowdsourcing workers? For those complex tasks, there is no single correct answer, and we can hardly evaluate the work of the workers. Previously, researchers proposed methods in which they traced the behavior of the workers and evaluated their work. However, this kind of method is still not accurate enough as workers may provide the same output while completing tasks in different ways. The authors provide us a novel approach that evaluates the workers from outputs, behavior traces and the combination of these two kinds of information. This combination increases the accuracy of their system and is able to do analysis on some of the complex tasks.

This system is valuable for crowdsourcing users. They can better understand the workers by building a mental model of them. As a result, they can distinguish good results from the poor ones. In projects related to crowdsourcing, developers will sometimes receive a poor response by inactive workers. With this system, they can only keep the valuable results for their research, which may increase the accuracy of their models, get a better view of their systems’ performance and get detailed feedback.

Also, for system designers, the visualization tool for behavioral traces is quite useful if they want to get detailed user feedback and user interactions. If they can analysis on these data, they can know what kinds of interactions are needed by their users and provide a better user experience.

However, I think there may be ethical issues in this system. Using this system, the hits publishers can obtain workers’ behavior while doing the hits. They can collect mouse movement, scrolling, keypresses, focus events and clicks information of the user. I think this may raise some privacy issues and these kinds of information may be used for crimes. The workers’ computers would be risky if their habits are collected by crackers.

Questions:

Can this system be applied to some more complex tasks other than purely generative tasks?

How can the designers use this system to design interfaces which can provide a better user experience?

How can we prevent crackers from using this system to collect user habits and do attacks on their computers?

Read More

04/08/2020 – Myles Frantz – CrowdScape: Interactively visualizing user behavior and output

Summary

Crowd Sourcing provides a quick and easily scalable way to request help from people, but how do you ensure they are properly paying attention instead of cheating in some way? Since tasks are handed off through some platform that handles the abstraction of assigning work to the workers, the requesters cannot guarantee the participants full attention. This is where this team has created CrowdScape, to keep better track of the attention and focus of the participants. Utilizing various Javascript libraries, CrowdScape is able to keep track of the participants through their interaction or lack of interactions. This program is able to track participants’ mouse clicks, key strokes, and browser focus changes. Since Amazon Turk is a web-based platform, Javascript libraries are perfectly able to track this information. Through the various visualization libraries retrieved, the team is able to demonstrate the visualization that provides extra insight information to the requestors. Through these advanced visualizations it’s demonstrated how the team is able to determine the workers behavior, including if they only have rapid clicks and shift windows fast or stay on the same window and stay focused on the window.

Reflection

I do appreciate the kind of insight this provides through delegating work. I have worked with mentoring various workers in some of my past internships, and it has provided various stress. With some of the more professional workers they are easier to manage, however with others it usually takes more time to manage them and teach them then doing the work themselves. Being able to automatically do this and discard the work of participants provides a lot of freedom to discard lacking participants since creators can not necessarily oversee the participants working.

I do however strongly disagree with how much information is being tracked and requested. A strong proponent of privacy, browser technology is not the best of domains to track and inject programs to watch the session and information of the participant. Though this is limited to the browser, any other session information, such as cookies, ids, or uids, could potentially be accessed. Though not necessarily able to be tracked from the app, other live Javascript could track the information via the CrowdScape program.

Questions

  • One of my first initial concerns with this type of project is the amount of privacy invasion. Though it makes sense to ensure the worker is working, there could always be the potential of leaked issues of confidential information. Though they could limit the key tracking to the time when the participant is focused on the browser window, do you think this would be a major concern for participants?
  • Throughout the cases studied through the team’s experiments, it seemed most of the participants were able to be discarded since they were using some other tool or external help. Do you think as many people would be discarded within real experiments for similar reasons?
  • Alongside the previous question, is it over reaching in a sense as to potential discredit workers if they have different working habits then expected?

Read More

04/08/2020 – Nan LI – CrowdScape: Interactively Visualizing User Behavior and Output

Summary:

This paper demonstrates a system called CrowdScape that support human to evaluate the quality of crowd work outputs by presenting an interactive visualization about the information of worker behavior and worker outputs through mixed-initiative machine learning(ML). This paper made the point that quality control for complex and creative crowd work based on either the post-hoc output or behavioral traces alone is insufficient. Therefore, the author proposed that we could gain new insight into crowd worker performance by combining both behavioral observations and knowledge of worker output. CrowdScape system presents the visualization of each individual traces that include mouse movement, keypresses, visual scrolling, focus shifts, and clicking through an abstract visual timeline. The system also aggregates these features use a combination of 1-D and 2-D matrix scatter plots to show the distribution of the features to enable dynamic exploration. The system also explores the worker output by recognizing the worker submissions pattern and providing means. Finally, CrwodScape enables users to analyze the mental models of tasks and worker behaviors, and use these models for the verification of worker output and majority or gold standards. The author also demonstrates four experiment results to illustrate the practical operation and also prove the effectiveness of the system.

Reflection:

I think the author made a great point regarding address the quality control issue of crowdsourcing. Quality control approaches are limited and even not guaranteed for most of the system, which using crowdsourcing as part of the component. The most frequently used approach I have seen so far based on the policy that worker’s salary determined by the quality of their work. It is the most reasonable approach to encourage workers to provide high-quality results. Besides, another straightforward approach is to choose the similar or the same work(such as tag, count numbers) provided by most workers.

Nevertheless, the author proposed that we should consider the quality control for more complex and creative work because these type of tasks has appeared more often. However, there is no appropriate quality control mechanism exists. I think this mechanism is essential in order to utilize crowdsourcing better.

I believe the most significant advantage of this CrowdScape is that the system can be used very flexibly based on the type of task. From the scenario and case studies presented in the paper, the user could evaluate the worker’s output using different attributes as well as interactive visualization method based on the type of the task. Further, the types of visualization are varied, and each of them can interpret and detect differences and patterns in workers’ behavior and their work. The system design is impressed because the interface of the system is userfriendly based on the figures in the paper combine with the explanation.

The only concern is that as the increase in the number of workers, the points and lines on the visualization interface will become so dense that no pattern can be detected. Therefore, the system might need data filter tools or interactive systems to deal with this problem.

Questions:

  1. What are the most commonly used quality control approaches? What is the quality control approach that you will apply to your project?
  2. There are many kinds of HITS on MTurk, do you think what type of work requires quality control and what kinds of work do not.
  3. For information visualization, one of the challenges is to deal with a significant amount of data. How we should deal with this problem regarding the CrowdScape system?

Word Count: 588

Read More

04/08/2020 – Dylan Finch – CrowdScape: interactively visualizing user behavior and output

Word count: 561

Summary of the Reading

This paper describes a system for dealing with crowdsourced work that needs to be evaluated by humans. For complex or especially creative tasks, it can be hard to evaluate the work of crowd workers, because there is some much of it and most of it needs to be evaluated by another human. If the evaluation takes too long, you lose the benefits of using crowd workers in the first place. 

To help with these issues, the researchers have developed a system that helps an evaluator deal with all of the data from the tasks. The system leans heavily on the data visualization. The interface for the system shows the user a slathering of different metrics about the crowd work and the workers to help the user determine quality. Specifically, the system helps the user to see information about worker output and behavior at the same time, giving a better indication of performance.

Reflections and Connections

I think that this paper tries to tackle a very important issue of crowd work: evaluation. Evaluation of tasks is not an easy process and for complicated tasks, it can be extremely difficult and, worst of all, hard to automate. If you need humans to review and evaluate work done by crowd workers, and it takes the reviewer a non-insignificant amount of time, then you are not really saving any effort by using the crowd in the first place. 

This paper is so important because it provides a way to make it easier for people to evaluate work done by crowd workers, making the use of crowd workers much more efficient, on the whole. If evaluation can be done more quickly, the data from the tasks can be used more quickly, and the whole process of using crowd workers has been made much faster than it was before. 

I also think this paper is important because it gives reviewers a new way to look at the work done by crowds: it shows the reviewer both worker output and worker behavior. This would make it much easier for reviewers to decide if a task was completed satisfactorily or not. If we can see that a worker did not spend a lot of time on a task and that their work was significantly different from other workers assigned to the same task, we may be able to tell that that worker did a bad job, and their data should be thrown out.

From the pictures of the system, it does look a little complicated and I would be concerned that the system is hard to use or overly complicated. Having a system that saves time, but that takes a long time to fully understand can be just as bad as not having the time saving system. So, I do think that some effort should be used to make the system look less intimidating and easier to use. 

Questions

  1. What are some other possible applications for this type of software, besides the extra  one mentioned in the paper?
  2. Do you think there is any way we could fully automate the evaluation of the creative and complex tasks focused on in this research?
  3. Do you think that the large amount of information given to users of the system might overwhelm them?

Read More

04/08/2020 – Nurendra Choudhary – CrowdScape: Interactively Visualizing User Behavior and Output

Summary

In this paper, the authors solve the problem of large-scale human evaluation through CrowdScape, a system for large-scale human evaluation based on interactive visualizations and mixed-initiative machine learning. They track two major previous approaches of quality control – worker behaviour and worker output.  

The contributions of the paper include an interactive interface for crowd worker results, visualization for crowd behavior, techniques for exploration of crowd worker products and mixed initiative machine learning for bootstrapping user intuitions. The previous work includes analyzing the crowd worker behavior and output independently, whereas, CrowdScape provides an interface for analyzing them together.  CrowdSpace utilizes mouse movement, scrolling, keypresses, focus events, and clicks to build worker profiles. Additionally, the paper also points out its limitations such as neglected user behaviors like focus of fovea. Furthermore, it shows the potential of CrowdSpace in other experimental setups which are primarily offline or cognitive and don’t require user movement on the system to analyze their behavior.

Reflection

CrowdSpace is a very necessary initiative as the number of users increases for evaluation. Another interesting aspect is that this also increases developers’ creativity and possibilities as he can now evaluate more complex and focus-based algorithms. However, I feel the need for additional compensation here. The crowd workers are being tracked and this is an intrusion to their privacy. I understand that this is necessary for the process to function but given that it makes focus an essential aspect of worker compensation, they should be awarded fairly for it. 

Also, the user behaviors tracked here fairly cover most significant problems in the AI community. However, more inputs should cover a better range of problems. Adding more features would not only increase problem coverage but also lead to increase in development effort. There could be several instances when a developer does not build something due to lack of evaluation techniques or popular measures. Increasing features would help get rid of this concern. For example, if we are able to track the fovea of user’s, developers could not study the effect of different advertising techniques or build algorithms to predict and track interest in different varieties of videos (business of YouTube).

Also, I am not sure of the effectiveness of tracking the movements given in the paper. The paper considers effectiveness as a combination of worker’s behavior and output. But in several tasks you need mental models that do not require the movements tracked in the paper. In such cases, the output needs to have more weightage. I think the evaluator should be given the option to change the weights of different parameters, so that he could vary the platform for different problems making it more ubiquitous. 

Questions

  1. What kind of privacy concerns could be a problem here? Should the analyzer have access to such behavior? Is it fair to ask the user for his information? Should the user be additionally compensated for such intrusion to privacy?
  2. What other kinds of user behaviors are traceable? The paper mentions fovea’s focus. Can we also track listening focus or mental focus in other ways? Where will this information be useful in our problems?
  3. CrowdSpace uses the platform’s interactive nature and visualization to improve user experience. Should there be an overall focus on improving UX at the development level? Or should we let them be separate processes?
  4. CrowdSpace considers worker behavior and output to analyze human evaluation. What other aspects could be used to analyze the results?

Word Count: 582

Read More

04/08/2020 – Yuhang Liu – CrowdScape: Interactively Visualizing User Behavior and Output

Summary:

This article proposes that the crowdsourcing platform is a very good tool and can help people solve quite a few problems. It can help people quickly allocate work and complete tasks on a large scale. Therefore, the work quality of the workers on the crowdsourcing platform is very important. In the previous research, other researchers have developed algorithms on the inspection of the workers ’work quality and the workers’ behavior to detect the workers ’work quality, but these algorithms are all It has limitations, especially those complex tasks. The manual assessment of the quality of tasks completed by workers can solve this problem, but when many workers are needed, it cannot be scaled well. Therefore, based on this background, the author created CrowdScape, which can support the inspection of the work quality of crowdsourced workers through interactive visualization and hybrid active machine learning. The system’s approach is to combine information about worker behavior and worker output to better explain the work of the workers. This system mainly completes the exploration of workers’ work quality through the development and application of the following functions. : 1 Build an interface that can interactively browse the results of crowd workers, and explain the performance of workers by combining the information of workers’ behavior and output 2 Can visualize the behavior of mass workers 3 Can explore the products of mass workers 4 Tools for grouping and classifying workers 5 Combine machine learning to guide users’ intuition to the masses.

Reflection:

The system introduced in this article has many innovations, the most important of which is the ability to combine the behavior of workers with the output of workers, which is beneficial to study the behavior of workers with different performances. This is what we often say that the result is determined by the process. The author incorporates this into the research. I think it is a very innovative point, through this vigorous research, we can get the working behavior of workers with different results, and in the subsequent research, we can not only focus on workers with good output. The more important meaning is that the behavior guidance can be summarized through the behavior of workers with good output, and the behavior guidance can be used to guide the workers to complete the task better. And another innovation point I think is to visualize the interactive process. As we all know, people can better receive visual information. I think this is also feasible when evaluating the work effect of crowdsourced workers. Visualizing the interaction process of crowdsourced workers can help people better study the behavior of workers, and at the same time can improve people’s understanding of the performance of crowdsourced workers at work, and it also help us design ideas for crowdsourcing tasks. At the same time, the system’s dynamic query system can quickly analyze large data sets by providing users with immediate feedback. I think Crowddscape is based on these points in order to better discover people’s work patterns and understand the essence of crowdsourcing. And constantly adapt to more complex and innovative crowdsourcing tasks.

Question:

  1. Whether the quality of workers ’work must be related to behavior, and how the system prevents some workers achieving good results with inappropriate behavior?
  2. When workers know their behavior will be recorded, if it will affect their work?
  3. Is there any other method to lead workers to have a better output?

Read More