2020 – Rzeszotarski and Kittur, “CrowdScape”

April 8, 2020 Subil Abraham Leave a comment

Quality control in crowdwork is straightforward for straightforward tasks. Tasks like transcribing text on an image is fairly easy to evaluate the quality of because there is only one right answer. Requesters can use things like gold standard tests to evaluate the output of the crowdworkers directly in order to determine if they have done a good job, or use task fingerprinting to determine if the worker behavior indicates that they are making an effort. The authors propose CrowdScape as a way to combine both types of quality analysis, worker output and behavior, through a mix of machine learning and innovative visualization methods. CrowdScape includes a dashboard that provides a birds-eye view of the different aspects of the worker behavior in the form of graphs. These graphs showcase both aggregate behaviors of all the crowdworkers as well as the timeline of the individual actions a crowd worker takes on a particular task (scrolling, clicking, typing, and so on). They conduct multiple case studies on different kinds of tasks to show that their visualizations are beneficial in separating out the workers who make an effort in producing quality output versus those who are just phoning it in. Behavioral traces identify where the crowdworker spends their time by looking at their actions and how long they spend doing that action.

CrowdScape provides an interesting visual solution to the problem of “how to evaluate if the workers are being sincere in the completion of complex tasks”. Creative work especially, where you ask the crowd worker to write something on their own, is notoriously hard to determine because there is no gold standard test that you can do. So I find the inclusion of the behavior tracking visualizer where different colored lines along a timeline represent different actions done can be useful. Someone who makes an effort in typing out will show long blocks of typing with pauses for thinking. I can see how different behavioral heuristic can be applied for different tasks in order to determine if the workers are actually doing the work. I have to admit though that I find the scatter plots kind of obtuse and hard to parse. I’m not entirely sure how we’re supposed to read them and what information they are conveying. So I feel like the interface itself could do better in communicating exactly what the graphs are doing. There is promise for releasing this as a commercial or open source product (if it isn’t already one) once the polishing of the interface is done with. One last thing is the ability to group “good” submissions by the requester and then machine learning is used by CrowdScape to find other similar “good” submissions. However, the paper only makes mention of it and do not describe how it fits in with the interface as a whole. I felt this was another shortcoming of this design.

What would a good interface for the grouping of the “good” output and subsequent listing of other related “good” output look like?
In what kind of crowd work would CrowdScape not be useful (assuming you were able to get all the data that CrowdScape needs)?
Did you find all the elements of the interface intuitive and understandable? Were there parts of it that were hard to parse?

04/08/2020 – Mohannad Al Ameedi – CrowdScape Interactively Visualizing User Behavior and Output

April 8, 2020 mohada4 Leave a comment

Summary

In this paper, the authors propose a system that can evaluate complex tasks based on both workers output and behaviors. Other available systems are focus on once aspect of evaluation on either the worker output or behavior which can give poor results especially with complex or creative work. The proposed system combine works through interactive visualization and mixed initiative machine learning. The proposed system, CrowdScape, offers visualization to the users that allow them to filter out poor output to focus on a limited number of responses and use machine learning to measure the similarity of response with the best submission and that way the requester can get the best output and best behavior at the same time. The system provides time series data about for user actions like mouse move or scroll up and down to generate visual timeline for tracing user behavior. The system can work only with web pages and has some limitation, but the value that can give to the customer is high and can enable users to navigate through workers results easily and efficiently.

Reflection

I found the method used by the authors be very interesting. Requesters receive too much information about the workers and visualizing that data can help the requesters to know more about the data, and the use of machine learning can help a lot on classifying or clustering the optimal workers output and behaviors. Other approaches mentioned in the paper are also interesting especially for simple tasks that don’t need complex evaluation.

I also didn’t know that we can get detailed information about the workers output and behavior and found YouTube example mentioned in the paper to be very interesting. The example mentioned shows that MTurk returns everything related to the user actions using the help of JavaScript while working on the YouTube video which can be used in many scenarios. I agree with the authors about the approach which can combine the best of the two approaches. I think it will be interesting to know how many worker response are filtered out in the first phase of the process because that can tell us if sending the request even worthwhile. If too many responses are not considered, then it is possible that task need to be evaluated again.

Questions

The authors mentioned that their proposed system can help to filter out poor outputs on the first phase. Do you think if to many responses are filtered out means that the guidelines are the selection criteria needs to be reevaluated?
The authors depend on JavaScript to track information about the workers behaviors do you think MTurk needs to approve that or it is not necessary? And do you think that the workers also need to be notified before accepting the task?
The authors mention that CrowdScape can be used to evaluate complex and creative tasks, do you think that they need to add some process to make sure that the task really need to be evaluated by their system, or you think the system can also work with simple tasks?

04/08/2020 – Myles Frantz – CrowdScape: Interactively visualizing user behavior and output

April 8, 2020 Miles Frantz Leave a comment

Summary

Crowd Sourcing provides a quick and easily scalable way to request help from people, but how do you ensure they are properly paying attention instead of cheating in some way? Since tasks are handed off through some platform that handles the abstraction of assigning work to the workers, the requesters cannot guarantee the participants full attention. This is where this team has created CrowdScape, to keep better track of the attention and focus of the participants. Utilizing various Javascript libraries, CrowdScape is able to keep track of the participants through their interaction or lack of interactions. This program is able to track participants’ mouse clicks, key strokes, and browser focus changes. Since Amazon Turk is a web-based platform, Javascript libraries are perfectly able to track this information. Through the various visualization libraries retrieved, the team is able to demonstrate the visualization that provides extra insight information to the requestors. Through these advanced visualizations it’s demonstrated how the team is able to determine the workers behavior, including if they only have rapid clicks and shift windows fast or stay on the same window and stay focused on the window.

Reflection

I do appreciate the kind of insight this provides through delegating work. I have worked with mentoring various workers in some of my past internships, and it has provided various stress. With some of the more professional workers they are easier to manage, however with others it usually takes more time to manage them and teach them then doing the work themselves. Being able to automatically do this and discard the work of participants provides a lot of freedom to discard lacking participants since creators can not necessarily oversee the participants working.

I do however strongly disagree with how much information is being tracked and requested. A strong proponent of privacy, browser technology is not the best of domains to track and inject programs to watch the session and information of the participant. Though this is limited to the browser, any other session information, such as cookies, ids, or uids, could potentially be accessed. Though not necessarily able to be tracked from the app, other live Javascript could track the information via the CrowdScape program.

Questions

One of my first initial concerns with this type of project is the amount of privacy invasion. Though it makes sense to ensure the worker is working, there could always be the potential of leaked issues of confidential information. Though they could limit the key tracking to the time when the participant is focused on the browser window, do you think this would be a major concern for participants?
Throughout the cases studied through the team’s experiments, it seemed most of the participants were able to be discarded since they were using some other tool or external help. Do you think as many people would be discarded within real experiments for similar reasons?
Alongside the previous question, is it over reaching in a sense as to potential discredit workers if they have different working habits then expected?

04/08/2020 – Nan LI – CrowdScape: Interactively Visualizing User Behavior and Output

April 8, 2020 NAN LI 1 Comment

Summary:

This paper demonstrates a system called CrowdScape that support human to evaluate the quality of crowd work outputs by presenting an interactive visualization about the information of worker behavior and worker outputs through mixed-initiative machine learning(ML). This paper made the point that quality control for complex and creative crowd work based on either the post-hoc output or behavioral traces alone is insufficient. Therefore, the author proposed that we could gain new insight into crowd worker performance by combining both behavioral observations and knowledge of worker output. CrowdScape system presents the visualization of each individual traces that include mouse movement, keypresses, visual scrolling, focus shifts, and clicking through an abstract visual timeline. The system also aggregates these features use a combination of 1-D and 2-D matrix scatter plots to show the distribution of the features to enable dynamic exploration. The system also explores the worker output by recognizing the worker submissions pattern and providing means. Finally, CrwodScape enables users to analyze the mental models of tasks and worker behaviors, and use these models for the verification of worker output and majority or gold standards. The author also demonstrates four experiment results to illustrate the practical operation and also prove the effectiveness of the system.

Reflection:

I think the author made a great point regarding address the quality control issue of crowdsourcing. Quality control approaches are limited and even not guaranteed for most of the system, which using crowdsourcing as part of the component. The most frequently used approach I have seen so far based on the policy that worker’s salary determined by the quality of their work. It is the most reasonable approach to encourage workers to provide high-quality results. Besides, another straightforward approach is to choose the similar or the same work(such as tag, count numbers) provided by most workers.

Nevertheless, the author proposed that we should consider the quality control for more complex and creative work because these type of tasks has appeared more often. However, there is no appropriate quality control mechanism exists. I think this mechanism is essential in order to utilize crowdsourcing better.

I believe the most significant advantage of this CrowdScape is that the system can be used very flexibly based on the type of task. From the scenario and case studies presented in the paper, the user could evaluate the worker’s output using different attributes as well as interactive visualization method based on the type of the task. Further, the types of visualization are varied, and each of them can interpret and detect differences and patterns in workers’ behavior and their work. The system design is impressed because the interface of the system is userfriendly based on the figures in the paper combine with the explanation.

The only concern is that as the increase in the number of workers, the points and lines on the visualization interface will become so dense that no pattern can be detected. Therefore, the system might need data filter tools or interactive systems to deal with this problem.

Questions:

What are the most commonly used quality control approaches? What is the quality control approach that you will apply to your project?
There are many kinds of HITS on MTurk, do you think what type of work requires quality control and what kinds of work do not.
For information visualization, one of the challenges is to deal with a significant amount of data. How we should deal with this problem regarding the CrowdScape system?

Word Count: 588

04/08/2020 – Dylan Finch – CrowdScape: interactively visualizing user behavior and output

April 8, 2020 Dylan Finch 1 Comment

Word count: 561

Summary of the Reading

This paper describes a system for dealing with crowdsourced work that needs to be evaluated by humans. For complex or especially creative tasks, it can be hard to evaluate the work of crowd workers, because there is some much of it and most of it needs to be evaluated by another human. If the evaluation takes too long, you lose the benefits of using crowd workers in the first place.

To help with these issues, the researchers have developed a system that helps an evaluator deal with all of the data from the tasks. The system leans heavily on the data visualization. The interface for the system shows the user a slathering of different metrics about the crowd work and the workers to help the user determine quality. Specifically, the system helps the user to see information about worker output and behavior at the same time, giving a better indication of performance.

Reflections and Connections

I think that this paper tries to tackle a very important issue of crowd work: evaluation. Evaluation of tasks is not an easy process and for complicated tasks, it can be extremely difficult and, worst of all, hard to automate. If you need humans to review and evaluate work done by crowd workers, and it takes the reviewer a non-insignificant amount of time, then you are not really saving any effort by using the crowd in the first place.

This paper is so important because it provides a way to make it easier for people to evaluate work done by crowd workers, making the use of crowd workers much more efficient, on the whole. If evaluation can be done more quickly, the data from the tasks can be used more quickly, and the whole process of using crowd workers has been made much faster than it was before.

I also think this paper is important because it gives reviewers a new way to look at the work done by crowds: it shows the reviewer both worker output and worker behavior. This would make it much easier for reviewers to decide if a task was completed satisfactorily or not. If we can see that a worker did not spend a lot of time on a task and that their work was significantly different from other workers assigned to the same task, we may be able to tell that that worker did a bad job, and their data should be thrown out.

From the pictures of the system, it does look a little complicated and I would be concerned that the system is hard to use or overly complicated. Having a system that saves time, but that takes a long time to fully understand can be just as bad as not having the time saving system. So, I do think that some effort should be used to make the system look less intimidating and easier to use.

Questions

What are some other possible applications for this type of software, besides the extra one mentioned in the paper?
Do you think there is any way we could fully automate the evaluation of the creative and complex tasks focused on in this research?
Do you think that the large amount of information given to users of the system might overwhelm them?

04/08/2020 – Nurendra Choudhary – CrowdScape: Interactively Visualizing User Behavior and Output

April 7, 2020April 8, 2020 Nurendra Choudhary Leave a comment

Summary

In this paper, the authors solve the problem of large-scale human evaluation through CrowdScape, a system for large-scale human evaluation based on interactive visualizations and mixed-initiative machine learning. They track two major previous approaches of quality control – worker behaviour and worker output.

The contributions of the paper include an interactive interface for crowd worker results, visualization for crowd behavior, techniques for exploration of crowd worker products and mixed initiative machine learning for bootstrapping user intuitions. The previous work includes analyzing the crowd worker behavior and output independently, whereas, CrowdScape provides an interface for analyzing them together. CrowdSpace utilizes mouse movement, scrolling, keypresses, focus events, and clicks to build worker profiles. Additionally, the paper also points out its limitations such as neglected user behaviors like focus of fovea. Furthermore, it shows the potential of CrowdSpace in other experimental setups which are primarily offline or cognitive and don’t require user movement on the system to analyze their behavior.

Reflection

CrowdSpace is a very necessary initiative as the number of users increases for evaluation. Another interesting aspect is that this also increases developers’ creativity and possibilities as he can now evaluate more complex and focus-based algorithms. However, I feel the need for additional compensation here. The crowd workers are being tracked and this is an intrusion to their privacy. I understand that this is necessary for the process to function but given that it makes focus an essential aspect of worker compensation, they should be awarded fairly for it.

Also, the user behaviors tracked here fairly cover most significant problems in the AI community. However, more inputs should cover a better range of problems. Adding more features would not only increase problem coverage but also lead to increase in development effort. There could be several instances when a developer does not build something due to lack of evaluation techniques or popular measures. Increasing features would help get rid of this concern. For example, if we are able to track the fovea of user’s, developers could not study the effect of different advertising techniques or build algorithms to predict and track interest in different varieties of videos (business of YouTube).

Also, I am not sure of the effectiveness of tracking the movements given in the paper. The paper considers effectiveness as a combination of worker’s behavior and output. But in several tasks you need mental models that do not require the movements tracked in the paper. In such cases, the output needs to have more weightage. I think the evaluator should be given the option to change the weights of different parameters, so that he could vary the platform for different problems making it more ubiquitous.

Questions

What kind of privacy concerns could be a problem here? Should the analyzer have access to such behavior? Is it fair to ask the user for his information? Should the user be additionally compensated for such intrusion to privacy?
What other kinds of user behaviors are traceable? The paper mentions fovea’s focus. Can we also track listening focus or mental focus in other ways? Where will this information be useful in our problems?
CrowdSpace uses the platform’s interactive nature and visualization to improve user experience. Should there be an overall focus on improving UX at the development level? Or should we let them be separate processes?
CrowdSpace considers worker behavior and output to analyze human evaluation. What other aspects could be used to analyze the results?

Word Count: 582

04/08/2020 – Yuhang Liu – CrowdScape: Interactively Visualizing User Behavior and Output

April 7, 2020April 8, 2020 yuhang Liu Leave a comment

Summary:

This article proposes that the crowdsourcing platform is a very good tool and can help people solve quite a few problems. It can help people quickly allocate work and complete tasks on a large scale. Therefore, the work quality of the workers on the crowdsourcing platform is very important. In the previous research, other researchers have developed algorithms on the inspection of the workers ’work quality and the workers’ behavior to detect the workers ’work quality, but these algorithms are all It has limitations, especially those complex tasks. The manual assessment of the quality of tasks completed by workers can solve this problem, but when many workers are needed, it cannot be scaled well. Therefore, based on this background, the author created CrowdScape, which can support the inspection of the work quality of crowdsourced workers through interactive visualization and hybrid active machine learning. The system’s approach is to combine information about worker behavior and worker output to better explain the work of the workers. This system mainly completes the exploration of workers’ work quality through the development and application of the following functions. : 1 Build an interface that can interactively browse the results of crowd workers, and explain the performance of workers by combining the information of workers’ behavior and output 2 Can visualize the behavior of mass workers 3 Can explore the products of mass workers 4 Tools for grouping and classifying workers 5 Combine machine learning to guide users’ intuition to the masses.

Reflection:

The system introduced in this article has many innovations, the most important of which is the ability to combine the behavior of workers with the output of workers, which is beneficial to study the behavior of workers with different performances. This is what we often say that the result is determined by the process. The author incorporates this into the research. I think it is a very innovative point, through this vigorous research, we can get the working behavior of workers with different results, and in the subsequent research, we can not only focus on workers with good output. The more important meaning is that the behavior guidance can be summarized through the behavior of workers with good output, and the behavior guidance can be used to guide the workers to complete the task better. And another innovation point I think is to visualize the interactive process. As we all know, people can better receive visual information. I think this is also feasible when evaluating the work effect of crowdsourced workers. Visualizing the interaction process of crowdsourced workers can help people better study the behavior of workers, and at the same time can improve people’s understanding of the performance of crowdsourced workers at work, and it also help us design ideas for crowdsourcing tasks. At the same time, the system’s dynamic query system can quickly analyze large data sets by providing users with immediate feedback. I think Crowddscape is based on these points in order to better discover people’s work patterns and understand the essence of crowdsourcing. And constantly adapt to more complex and innovative crowdsourcing tasks.

Question:

Whether the quality of workers ’work must be related to behavior, and how the system prevents some workers achieving good results with inappropriate behavior?
When workers know their behavior will be recorded, if it will affect their work?
Is there any other method to lead workers to have a better output?