04/08/2020 – Ziyao Wang – CrowdScape: Interactively Visualizing User Behavior and Output

The authors presented CrowdScape, which is a system used for supporting the human evaluation of increasing numbers of complex crowd work. The system used interactive visualization and mixed-initiative machine learning to combine information about worker behavior with the worker outputs. This system can help users to better understand the crowd workers and leverage their strength. They developed the system from three points to meet the requirement of quality control in crowdsourcing: output evaluation, behavioral traces, and integrated quality control. They visualized the workers’ behavior, quality of outputs and combined the findings of user behavior with user outputs to evaluate the work of the crowd workers. This system has some limitations, for example, it cannot work if the user completes the work in a separate text editor and the behavior traces are not detailed enough. However, this system is still good support for quality control.

Reflections:

How to evaluate the quality of the outputs made by the crowdsourcing workers? For those complex tasks, there is no single correct answer, and we can hardly evaluate the work of the workers. Previously, researchers proposed methods in which they traced the behavior of the workers and evaluated their work. However, this kind of method is still not accurate enough as workers may provide the same output while completing tasks in different ways. The authors provide us a novel approach that evaluates the workers from outputs, behavior traces and the combination of these two kinds of information. This combination increases the accuracy of their system and is able to do analysis on some of the complex tasks.

This system is valuable for crowdsourcing users. They can better understand the workers by building a mental model of them. As a result, they can distinguish good results from the poor ones. In projects related to crowdsourcing, developers will sometimes receive a poor response by inactive workers. With this system, they can only keep the valuable results for their research, which may increase the accuracy of their models, get a better view of their systems’ performance and get detailed feedback.

Also, for system designers, the visualization tool for behavioral traces is quite useful if they want to get detailed user feedback and user interactions. If they can analysis on these data, they can know what kinds of interactions are needed by their users and provide a better user experience.

However, I think there may be ethical issues in this system. Using this system, the hits publishers can obtain workers’ behavior while doing the hits. They can collect mouse movement, scrolling, keypresses, focus events and clicks information of the user. I think this may raise some privacy issues and these kinds of information may be used for crimes. The workers’ computers would be risky if their habits are collected by crackers.

Questions:

Can this system be applied to some more complex tasks other than purely generative tasks?

How can the designers use this system to design interfaces which can provide a better user experience?

How can we prevent crackers from using this system to collect user habits and do attacks on their computers?

Read More

Subil Abraham – 04/08/2020 – Rzeszotarski and Kittur, “CrowdScape”

Quality control in crowdwork is straightforward for straightforward tasks. Tasks like transcribing text on an image is fairly easy to evaluate the quality of because there is only one right answer. Requesters can use things like gold standard tests to evaluate the output of the crowdworkers directly in order to determine if they have done a good job, or use task fingerprinting to determine if the worker behavior indicates that they are making an effort. The authors propose CrowdScape as a way to combine both types of quality analysis, worker output and behavior, through a mix of machine learning and innovative visualization methods. CrowdScape includes a dashboard that provides a birds-eye view of the different aspects of the worker behavior in the form of graphs. These graphs showcase both aggregate behaviors of all the crowdworkers as well as the timeline of the individual actions a crowd worker takes on a particular task (scrolling, clicking, typing, and so on). They conduct multiple case studies on different kinds of tasks to show that their visualizations are beneficial in separating out the workers who make an effort in producing quality output versus those who are just phoning it in. Behavioral traces identify where the crowdworker spends their time by looking at their actions and how long they spend doing that action.

CrowdScape provides an interesting visual solution to the problem of “how to evaluate if the workers are being sincere in the completion of complex tasks”. Creative work especially, where you ask the crowd worker to write something on their own, is notoriously hard to determine because there is no gold standard test that you can do. So I find the inclusion of the behavior tracking visualizer where different colored lines along a timeline represent different actions done can be useful. Someone who makes an effort in typing out will show long blocks of typing with pauses for thinking. I can see how different behavioral heuristic can be applied for different tasks in order to determine if the workers are actually doing the work. I have to admit though that I find the scatter plots kind of obtuse and hard to parse. I’m not entirely sure how we’re supposed to read them and what information they are conveying. So I feel like the interface itself could do better in communicating exactly what the graphs are doing. There is promise for releasing this as a commercial or open source product (if it isn’t already one) once the polishing of the interface is done with. One last thing is the ability to group “good” submissions by the requester and then machine learning is used by CrowdScape to find other similar “good” submissions. However, the paper only makes mention of it and do not describe how it fits in with the interface as a whole. I felt this was another shortcoming of this design.

  1. What would a good interface for the grouping of the “good” output and subsequent listing of other related “good” output look like?
  2. In what kind of crowd work would CrowdScape not be useful (assuming you were able to get all the data that CrowdScape needs)?
  3. Did you find all the elements of the interface intuitive and understandable? Were there parts of it that were hard to parse?

Read More

Subil Abraham – 04/08/2020 – Heer, “Agency plus automation”

A lot of work has been independently done along the tangents of improving computers to allow humans to use them better, and separately in helping machines do work by themselves. The paper makes the case that in the quest for automation, research in augmenting humans to do work by improving the intelligence of tools has fallen to the wayside. This provides a rich area of exploration. The paper explores three tools in this space that work with the users in a specific domain and predict what they might need or want next, based on a combination of context clues from the user. Two of the three tools, Data Wrangler and Voyager use domain specific languages to represent to the user the operations that are possible, thus providing a shared representation of data transformations for the user and the machine. The last tool, for language translation, does not provide a shared representation but presents the suggestions directly because there is no real way of using a DSL here outside of exposing the parse tree which doesn’t really make sense for an ordinary end user. The paper also makes several suggestions of future work. This includes methods for better monitoring and introspection tools in these human AI systems, allowing shared representations to be designed by AI based on the domain instead of being pre-designed by a human, and finding techniques that would help to identify the right balance between human control and automation for a given domain.

The paper uses these three projects as a framing device to discuss the idea of developing better shared representations and their importance in human AI collaboration. I think its an interesting take, especially the idea of using DSLs as a means of communicating ideas between the human user and the AI underneath. They backed away from discussing what a DSL would look like for the translation software since anything outside of autocomplete suggestions don’t really make sense in that domain, but I would be interested in further exploration in that field. I also find it interesting and it makes sense that people might not like the machine predictions being thrust upon them, either because it influences the thinking or it is just annoying. I think the tools discussed manage to make a good balance in staying out of the users way. Yes, the user will be influenced but that is inevitable because the other option is to not give the predictions at all and now you get no benefit.

Although I see the point that the article is trying to make about shared representations (at least, I think I do), I really don’t see the reason for the article existing besides just the author saying “Hey look at my research, this research is very important and I’ve done things with it including making a startup”. The article doesn’t contribute any new knowledge. I don’t mean for that to sound harsh, and I can understand how reading this article is useful from a meta perspective (saves us the trouble of reading the individual pieces of research that are summarized in this article and trying to connect the dots between them).

  1. In the translation task, Why wouldn’t a parse tree work? Are there other kinds of structured representations that would aid a user in the translation task?
  2. Kind of a meta question, but do you think this paper was useful on its own? Did it provide anything outside of summarizing the three pieces of research the author was involved in?
  3. Is there any way for the kind of software discussed here, where it makes suggestions to the user, to avoid influencing the user and interfering with their thought process?

Read More

04/08/2020 – Mohannad Al Ameedi – CrowdScape Interactively Visualizing User Behavior and Output

Summary

In this paper, the authors propose a system that can evaluate complex tasks based on both workers output and behaviors. Other available systems are focus on once aspect of evaluation on either the worker output or behavior which can give poor results especially with complex or creative work. The proposed system combine works through interactive visualization and mixed initiative machine learning. The proposed system, CrowdScape, offers visualization to the users that allow them to filter out poor output to focus on a limited number of responses and use machine learning to measure the similarity of response with the best submission and that way the requester can get the best output and best behavior at the same time. The system provides time series data about for user actions like mouse move or scroll up and down to generate visual timeline for tracing user behavior. The system can work only with web pages and has some limitation, but the value that can give to the customer is high and can enable users to navigate through workers results easily and efficiently.

Reflection

I found the method used by the authors be very interesting. Requesters receive too much information about the workers and visualizing that data can help the requesters to know more about the data, and the use of machine learning can help a lot on classifying or clustering the optimal workers output and behaviors. Other approaches mentioned in the paper are also interesting especially for simple tasks that don’t need complex evaluation.

I also didn’t know that we can get detailed information about the workers output and behavior and found YouTube example mentioned in the paper to be very interesting. The example mentioned shows that MTurk returns everything related to the user actions using the help of JavaScript while working on the YouTube video which can be used in many scenarios. I agree with the authors about the approach which can combine the best of the two approaches. I think it will be interesting to know how many worker response are filtered out in the first phase of the process because that can tell us if sending the request even worthwhile. If too many responses are not considered, then it is possible that task need to be evaluated again.

Questions

  • The authors mentioned that their proposed system can help to filter out poor outputs on the first phase. Do you think if to many responses are filtered out means that the guidelines are the selection criteria needs to be reevaluated?
  • The authors depend on JavaScript to track information about the workers behaviors do you think MTurk needs to approve that or it is not necessary? And do you think that the workers also need to be notified before accepting the task?
  • The authors mention that CrowdScape can be used to evaluate complex and creative tasks, do you think that they need to add some process to make sure that the task really need to be evaluated by their system, or you think the system can also work with simple tasks?

Read More

04/08/2020 – Dylan Finch – Agency plus automation: Designing artificial intelligence into interactive systems

Word count: 667

Summary of the Reading

This paper focuses on the problem of how humans interact with AI systems. The paper starts with a discussion of automation and how fully automated systems are a long way off and are currently not the best way to do things. We do not have the capabilities to make fully automated systems now, so we should not be trying to. Instead, we should make it easier for humans and machines to interact and work together.

The paper then describes three systems that try to use these principles to make systems that have humans and machines working together. All of these systems give the human the most power. They always have the final say on what to do. The machine will give suggestions, but the humans decide whether or not to accept these ideas. The systems include a tool to help with data analytics, a tool to help with data visualization, and a tool to help with natural language processing.

Reflections and Connections

This paper starts with a heavy dose of skepticism about automation. I think that many people are too optimistic about automation. This class has shown me how people are needed to make these “automated” systems work.Crowd workers often fill in the gaps for systems which pretend to be fully automated, but are not actually fully automated. Rather than pretend that people aren’t needed, we should embrace it and build tools to help the people who make these systems possible. We should be working to help people, rather than replace them. It will take a long time to fully replace many human jobs. We should build tools for the present, not the future.

The paper also argues that a good AI should be easily accessed and easy to dismiss. I completely agree. If AI tools are going to be commonplace, we need easy ways to get information from them and dismiss them when we don’t need them. In this way, they are much like any other tool. For example, suggesting software should give suggestions, but should get out of the way when you don’t like any of them. 

This paper brings up the idea that users prefer to be in control when using many systems. I think this is something many researchers miss. People like to have control. I often do a little more work so that I don’t have to use suggested actions, so that I know exactly what is being done. Or, I will go back through and try to check the automated work to make sure I did it correct. For example, I would much rather make my own graph in Excel than use the suggested ones. 

Questions

  1. Is the public too optimistic about the state of automation? What about different fields of research? Should we focus less on fully automating systems and instead on improving the systems we have with small doses of automation?
  2. Do companies like Tesla need to be held responsible for misleading consumers about the abilities of their AI technologies? How can we, as computer scientists, help people to better understand the limitations of AI technologies?
  3. When you use software that has options for automation, are you ever skeptical? Do you ever do things yourself because you think the system might not do it right? When we are eventually trying to transition to fully automated systems, how can we get people to trust the systems?
  4. The natural language translation experiment showed that the automated system made the translators produce more homogenous translations. Is this a good thing? When would having more similar results be good? When would it be bad?
  5. What are some other possible applications for this type of system, where an AI suggests actions and a user decides whether to accept those actions or not? What are some applications where this kind of system might not work? What limitations cause it not to work there?

Read More

04/08/2020 – Dylan Finch – CrowdScape: interactively visualizing user behavior and output

Word count: 561

Summary of the Reading

This paper describes a system for dealing with crowdsourced work that needs to be evaluated by humans. For complex or especially creative tasks, it can be hard to evaluate the work of crowd workers, because there is some much of it and most of it needs to be evaluated by another human. If the evaluation takes too long, you lose the benefits of using crowd workers in the first place. 

To help with these issues, the researchers have developed a system that helps an evaluator deal with all of the data from the tasks. The system leans heavily on the data visualization. The interface for the system shows the user a slathering of different metrics about the crowd work and the workers to help the user determine quality. Specifically, the system helps the user to see information about worker output and behavior at the same time, giving a better indication of performance.

Reflections and Connections

I think that this paper tries to tackle a very important issue of crowd work: evaluation. Evaluation of tasks is not an easy process and for complicated tasks, it can be extremely difficult and, worst of all, hard to automate. If you need humans to review and evaluate work done by crowd workers, and it takes the reviewer a non-insignificant amount of time, then you are not really saving any effort by using the crowd in the first place. 

This paper is so important because it provides a way to make it easier for people to evaluate work done by crowd workers, making the use of crowd workers much more efficient, on the whole. If evaluation can be done more quickly, the data from the tasks can be used more quickly, and the whole process of using crowd workers has been made much faster than it was before. 

I also think this paper is important because it gives reviewers a new way to look at the work done by crowds: it shows the reviewer both worker output and worker behavior. This would make it much easier for reviewers to decide if a task was completed satisfactorily or not. If we can see that a worker did not spend a lot of time on a task and that their work was significantly different from other workers assigned to the same task, we may be able to tell that that worker did a bad job, and their data should be thrown out.

From the pictures of the system, it does look a little complicated and I would be concerned that the system is hard to use or overly complicated. Having a system that saves time, but that takes a long time to fully understand can be just as bad as not having the time saving system. So, I do think that some effort should be used to make the system look less intimidating and easier to use. 

Questions

  1. What are some other possible applications for this type of software, besides the extra  one mentioned in the paper?
  2. Do you think there is any way we could fully automate the evaluation of the creative and complex tasks focused on in this research?
  3. Do you think that the large amount of information given to users of the system might overwhelm them?

Read More

04/08/2020 – Myles Frantz – CrowdScape: Interactively visualizing user behavior and output

Summary

Crowd Sourcing provides a quick and easily scalable way to request help from people, but how do you ensure they are properly paying attention instead of cheating in some way? Since tasks are handed off through some platform that handles the abstraction of assigning work to the workers, the requesters cannot guarantee the participants full attention. This is where this team has created CrowdScape, to keep better track of the attention and focus of the participants. Utilizing various Javascript libraries, CrowdScape is able to keep track of the participants through their interaction or lack of interactions. This program is able to track participants’ mouse clicks, key strokes, and browser focus changes. Since Amazon Turk is a web-based platform, Javascript libraries are perfectly able to track this information. Through the various visualization libraries retrieved, the team is able to demonstrate the visualization that provides extra insight information to the requestors. Through these advanced visualizations it’s demonstrated how the team is able to determine the workers behavior, including if they only have rapid clicks and shift windows fast or stay on the same window and stay focused on the window.

Reflection

I do appreciate the kind of insight this provides through delegating work. I have worked with mentoring various workers in some of my past internships, and it has provided various stress. With some of the more professional workers they are easier to manage, however with others it usually takes more time to manage them and teach them then doing the work themselves. Being able to automatically do this and discard the work of participants provides a lot of freedom to discard lacking participants since creators can not necessarily oversee the participants working.

I do however strongly disagree with how much information is being tracked and requested. A strong proponent of privacy, browser technology is not the best of domains to track and inject programs to watch the session and information of the participant. Though this is limited to the browser, any other session information, such as cookies, ids, or uids, could potentially be accessed. Though not necessarily able to be tracked from the app, other live Javascript could track the information via the CrowdScape program.

Questions

  • One of my first initial concerns with this type of project is the amount of privacy invasion. Though it makes sense to ensure the worker is working, there could always be the potential of leaked issues of confidential information. Though they could limit the key tracking to the time when the participant is focused on the browser window, do you think this would be a major concern for participants?
  • Throughout the cases studied through the team’s experiments, it seemed most of the participants were able to be discarded since they were using some other tool or external help. Do you think as many people would be discarded within real experiments for similar reasons?
  • Alongside the previous question, is it over reaching in a sense as to potential discredit workers if they have different working habits then expected?

Read More

04/08/2020 – Myles Frantz – Agency plus automation: Designing artificial intelligence into interactive systems

Summary

Throughout the field of artificial intelligence, many recent research efforts have been in effort to fully automate issues, ignoring the jobs that would be automated and closed. To ensure there is still progression between the two fields, this team has bootstrapped their previous work to create three popular and different technologies that work together to both visualize and aid the collaboration between workers and machine learning. Within data analysts, since there have been efforts to create automation in cleaning the raw data, one of the teams projects was adapted to visualizing the data in a loose Excel like table and suggesting transformations throughout the various cells. Further delving into the data analysts opportunities, they adapted more of their tools in order to copy data and automatically suggest automatic visualization and tables to better graph the information. Providing further information into the predictive suggestion, the team was able to produce multiple suggestions in which the users can choose which one they believe is the correct suggestion and further enhances the algorithm.

Reflection

Being a proponent of simplicity of design, I do appreciate how simplistic and connected their applications can be. Throughout the regularized and modularized programs, unless through Application Programming Interfaces it seems connecting various applications has to be done through standardized outputs that can be edited or adapted by someone or another external application. Being able to directly enable suggestions in data validation and connect it to an advanced graphing utility that also suggests new graphing rules and tools.

I do appreciate how applicable their research is. Though not completely unique, creating usable applications greatly expands how far a project will stretch and be used. If it can be directly and easily used the likelihood it will be extended and used throughout public projects.

Questions

  • Within the data analyst role, this may have aided but may not have completely alleviated all of the tasks that they have to do throughout an agile cycle, let alone a full feature. What other positions could be alleviated through these sort of tools?
  • Within all of the tool sets available, this may be one of many available on GitHub. Having a published paper may improve the program’s odds of being used throughout the future, however it does not necessarily translate to a well used and publicly used project. Ignoring any of the technical information (such as technical expertise, documentation, and programming styles), will this program be an upcoming project on GitHub?
  • Within the various use of standardized languages, the teams were able to make a higher abstraction that allows direct communication between the applications. Though making it easier on the development team, this may make it more restrictive on any tools looking to extend or communicate with the team’s set of tools. Do you think their created domain specific languages were required for their set of tools or if the languages was only created to help aid the developers for the connectivity between their applications?

Read More

04/08/2020 – Nan LI – CrowdScape: Interactively Visualizing User Behavior and Output

Summary:

This paper demonstrates a system called CrowdScape that support human to evaluate the quality of crowd work outputs by presenting an interactive visualization about the information of worker behavior and worker outputs through mixed-initiative machine learning(ML). This paper made the point that quality control for complex and creative crowd work based on either the post-hoc output or behavioral traces alone is insufficient. Therefore, the author proposed that we could gain new insight into crowd worker performance by combining both behavioral observations and knowledge of worker output. CrowdScape system presents the visualization of each individual traces that include mouse movement, keypresses, visual scrolling, focus shifts, and clicking through an abstract visual timeline. The system also aggregates these features use a combination of 1-D and 2-D matrix scatter plots to show the distribution of the features to enable dynamic exploration. The system also explores the worker output by recognizing the worker submissions pattern and providing means. Finally, CrwodScape enables users to analyze the mental models of tasks and worker behaviors, and use these models for the verification of worker output and majority or gold standards. The author also demonstrates four experiment results to illustrate the practical operation and also prove the effectiveness of the system.

Reflection:

I think the author made a great point regarding address the quality control issue of crowdsourcing. Quality control approaches are limited and even not guaranteed for most of the system, which using crowdsourcing as part of the component. The most frequently used approach I have seen so far based on the policy that worker’s salary determined by the quality of their work. It is the most reasonable approach to encourage workers to provide high-quality results. Besides, another straightforward approach is to choose the similar or the same work(such as tag, count numbers) provided by most workers.

Nevertheless, the author proposed that we should consider the quality control for more complex and creative work because these type of tasks has appeared more often. However, there is no appropriate quality control mechanism exists. I think this mechanism is essential in order to utilize crowdsourcing better.

I believe the most significant advantage of this CrowdScape is that the system can be used very flexibly based on the type of task. From the scenario and case studies presented in the paper, the user could evaluate the worker’s output using different attributes as well as interactive visualization method based on the type of the task. Further, the types of visualization are varied, and each of them can interpret and detect differences and patterns in workers’ behavior and their work. The system design is impressed because the interface of the system is userfriendly based on the figures in the paper combine with the explanation.

The only concern is that as the increase in the number of workers, the points and lines on the visualization interface will become so dense that no pattern can be detected. Therefore, the system might need data filter tools or interactive systems to deal with this problem.

Questions:

  1. What are the most commonly used quality control approaches? What is the quality control approach that you will apply to your project?
  2. There are many kinds of HITS on MTurk, do you think what type of work requires quality control and what kinds of work do not.
  3. For information visualization, one of the challenges is to deal with a significant amount of data. How we should deal with this problem regarding the CrowdScape system?

Word Count: 588

Read More

04/08/2020 – Nurendra Choudhary – Agency plus automation: Designing artificial intelligence into interactive systems

Summary

In this paper, the authors study system designs that included different kinds of interaction between human agency and automation. They utilize human control and their complementary strengths over algorithms to build a more robust architecture that absorbs from both their strengths. They share case studies of interactive systems in three different problems – data wrangling, exploratory analysis and natural language translation. 

To achieve synchronization between automation and human agency, they propose a design of shared representations for augmented tasks with predictive models for human capabilities and actions. The authors criticize the need for the AI community’s push towards complete automation and argue that the focus should rather be on systems that are augmented with human intelligence. In their results, they show that such models are more usable in current situations. They depict the utilization for interactive user interfaces to integrate human feedback into AI and improve the systems and also provided correct results for that problem instance. They utilize shared representations for AI that can be edited by humans for removing inconsistencies, thus, integrating human capability for those tasks.

Reflection

This is a problem we have discussed in class several times. However, the outlook of this paper is really interesting. It shows shared representation as a method for integrating human agency. Several papers we have studied utilize human feedback as part of augmenting the learning process. However, this paper discusses auditing the output of the AI system. Representations for AI is a very critical attribute. Its richness decides the efficiency of the system and its lack of interpretability is generally the reason several AI applications are considered black-box models. I think shared representations in a broader sense, also, suggests broader AI understanding akin to unifying human and AI capabilities in the most optimal way. 

However, such representations might limit the capability of the AI mechanisms behind them. The optimization in AI is with respect to a task and that basic metric decides the representations in AI models. The models are effective because they are able to detect patterns in multi-dimensional spaces that humans cannot comprehend. The paper aims to make that space comprehensible, thus, eliminating the very basic complication that makes an AI effective. Hence, I am not sure if it is the best idea for long-term development. I believe we should stick to current feedback loops and only accept interpretable representations with statistically insignificant results differences.  

Questions

  1. How do we optimize for quality of shared representations versus quality of system’s results?
  2. Humans that are needed to optimize shared representations may be fewer when compared to the number of people who can complete the task. What would be the cost-benefit ratio for shared representations? Do you think the approach will be worth it in the long-term?
  3. Do we want our AI systems to be fully automatic at some point? If so, how does this approach benefit or limit the move towards the long-term goal?
  4. Should there be separate workflows or research communities that work on independent AI and AI systems with human agency? What can these communities learn from each other? How can they integrate and utilize each other’s capabilities? Will they remain independent and lead to other sub-areas of research?

Word Count: 545

Read More