04/08/20 – Fanglan Chen – CrowdScape: Interactively Visualizing User Behavior and Output

Summary

Rzeszotarski and Kittur’s paper “CrowdScape: Interactively Visualizing User Behavior and Output” explores the research question of how to unify different quality control approaches to enhance the quality control of the work conducted by crowd workers. With the emerging crowdsourcing platforms, many tasks can be accomplished quickly by recruiting crowd workers to collaborate in a parallel manner. However, quality control is among the challenges faced by crowdsourcing paradigm. Previous works focus on designing algorithmic quality control approaches based on either worker outputs or worker behavior, but neither of the approaches is effective for complex or creative tasks. To fill in that research gap, the authors develop CrowdScape, a system that leverages interactive visualization and mixed initiative machine learning to support the human evaluation of complex crowd work. With experimentation on a variety of tasks, the authors present the incorporation of information about worker outputs and worker behavior with worker outputs has the potential to assist users to better understand the crowd and further identify reliable workers and outputs. 

Reflection

This paper conducts an interesting study by exploring the relationship between the outputs and behavior patterns of crowd workers to achieve better quality control in complex and creative tasks. The proposed method provides a unique angle of quality control and it has wide potential usage in crowdsourced tasks. Although there is a strong relationship between the final outputs and the behavior patterns, I feel the design of CrowdScape relies too heavily on the behavior analysis of crowd workers. In many situations, good behavior can lead to good results, but that cannot be guaranteed. From my understanding, behavior analysis is more suitable to be utilized as a screening mechanism in certain tasks. For example, in the video tagging task, workers that have gone through the whole video are more possible to provide accurate tagging. In this case, behavior such as watching is more like a necessary condition, not a sufficient condition. The group of workers who finish watching the videos may still reach disagreement on the tagging output. A different quality control mechanism is still needed in this round. In creative and open tasks, the behavior patterns are even more difficult to capture. By analyzing the behavior by metrics such as time spent on the task, we cannot directly connect the analysis with the measurement of creativity. 

In addition, I think the quality of a crowdsourced task discussed in paper is comparatively narrow. We need to be aware the quality control on crowdsourcing platforms is multifaceted, which depends on the knowledge of the workers on the specific task, the quality of the processes that govern the creation of tasks, the recruiting process of workers, the coordination of subtasks such as reviewing intermediate outputs, aggregating individual contributions, and so forth. A more comprehensive quality control circle needs to take the following aspects into consideration: (1) quality model that clearly defines the dimensions and attributes to control quality in crowdsourcing tasks. (2) assessment metrics which can be utilized to measure the values of the attributes identified by the quality model (3) assurance of quality, which requires a set of actions that aim to achieve expected levels of quality. To prevent low quality, it is important to understand how to design for quality and how to intervene if quality drops below expectations on crowdsourcing platforms.

Discussion

I think the following questions are worthy of further discussion.

  • Can you think about some other tasks that may benefit from the proposed quality control method?
  • Do you think the proposed method can perform good quality control on the complex and creative tasks as the paper suggested? Why or why not?
  • Do you think that an analysis based on worker behavior can assist to determine the quality of the work conducted by crowd workers? Why or why not?
  • What scenarios do you think would be more useful to trace worker behavior, informing them beforehand or tracing without advance notice? Can you think about some potential ethical issues?

Read More

04/08/20 – Fanglan Chen – The State of the Art in Integrating Machine Learning into Visual Analytics

Summary

Endert et al.’s paper “The State of the Art in Integrating Machine Learning into Visual Analytics” surveys the recent state-of-the-art models that integrate machine learning to visual analytics and highlights the advances achieved at the intersection of machine learning and visual analytics. In the data-driven era, how to make sense of data and how to facilitate a wider understanding of data attract the interests of researchers in various domains. It is challenging to discover knowledge from data while delivering reliable and interpretable results. Previous studies suggest that machine learning and visual analytics have complementary strengths and weaknesses, there are many works that explore the possibility to combine those two to develop interactive data visualization to promote sensemaking and analytical reasoning. This paper presents a survey of the achievements that have been made by recent state-of-the-art models. Also, it provides a summary of opportunities and challenges to boost the synergy between machine learning and visual analytics as future research directions.

Reflection

Overall, this paper presents a thorough survey of the progress that has been made by highlighting and synthesizing select research advances. The recent advances of deep learning models bring more new challenges and opportunities in the intersection of machine learning and visual analytics. We need to be aware the design of a highly accurate and efficient deep learning model is an iterative and progressive process of training, evaluation, and refinement, which typically relies on a time-consuming trial-and-error procedure where the parameters and the model structures are adjusted based on user expertise. Visualization researchers are making initial attempts to visually illustrate intuitive model behaviors and debug the training processes of widely-used deep learning models such as CNNs and RNNs. However, little effort has been conducted in tightly integrating state-of-the-art deep learning models with interactive visualizations to maximize the value of both. There is full potential in integrating deep learning into visual analytics for a better understanding of current practices.

As we know, the training of deep learning models requires a lot of data. However, sometimes well-labeled data is very expensive to obtain. The injection of a small number of user inputs into the models can potentially solve these problems through a visual analytics system. In real-world applications, a method is impractical if each specific task requires its own separate large-scale collection of training examples. To close the gap between academic research outputs and real-world requirements, it is necessary to reduce the sizes of required training sets by leveraging prior knowledge obtained from previously trained models in similar categories, as well as domain experts. Few-shot learning and zero-shot learning are two of the unsolved problems in the current practice of training deep learning models, which provide a possibility to incorporate prior knowledge on objects into a “prior” probability density function. That is, those models trained using given data and their labels can usually solve only pre-defined problems for which they were originally trained.

Discussion

I think the following questions are worthy of further discussion.

  • What other challenges or opportunities can you think about a framework to incorporate machine learning and visual analytics?
  • How to best leverage the advantages of machine learning and visual analytics in a complementary way? 
  • Do you plan to utilize a framework to incorporate machine learning and visual analytics in your course project? If yes, how do you plan to approach it?
  • Are there any applications that we access in daily life you can think of as good examples that integrate machine learning into visual analytics?

Read More

04/08/20 – Lulwah AlKulaib-CrowdScape

Summary

The paper presents a system supporting the evaluation of complex crowd work through mixed-initiative machine learning and interactive visualization. This system proposes a solution for challenges in quality control that occur in crowdsourcing platforms. Previous work shows quality control concepts based on worker output or behavior which was not effective in evaluating complex tasks. Their suggested system combines the behavior and output of a worker to show the evaluation of complex crowd work. Their system features allow users to develop hypotheses about their crowd, test them, and refine selections based on machine learning and visual feedback. They use MTurk and Rzeszotarski and Kittur’s Task Fingerprinting system to create an interactive data visualization of the crowd workers. They posted four varieties of complex tasks: translating text from Japanese to English, picking a favorite color using an HSV color picker and writing its name, writing about their favorite place, and tagging science tutorial videos from Youtube. They conclude that the information gathered from crowd workers behavior is beneficial in reinforcing or contradicting the conception of the cognitive process that crowd workers use to complete tasks and in developing and testing mental models of the behavior of crowd workers who have good or bad outputs. This model helps  its users identify further good workers and output in a sort of positive feedback loop.

Reflection

This paper presents an interesting approach in addressing how to discover low quality responses from crowd workers. It is an interesting way to combine these two methods and makes me think of our project and what limitations might arise from following their method in logging behaviors of crowd workers. I was not thinking of disclosing to the crowdworkers about their behavior while responding is being recorded and now it’s making me look at previous work if that has affected the crowd workers response or not. I found it interesting that crowdworkers used machine translation in the Japanese to English translation task even when they knew their behavior was being recorded. I assume that since there wasn’t a requirement of speaking Japanese or the requirements were relaxed that crowd workers were able to perform the task and use tools like Google Translate. If the requirements were there, then the workers won’t be paid for the task. This has also alerted me to the importance of task requirements and explanation for crowd workers. Since some Turkers could abuse the system and give us low quality results simply because the rules weren’t so clear.

Having the authors list their limitations was useful for me. It gave me another perspective to think about how to evaluate the responses that we get in our project and what we can do to make our feedback approach better.

Discussion

  • Would you use behavioral traces as an element in your project? If yes, would you tell the crowd workers that you are collecting that data? Why? Why not?
  • Do you think that implicit feedback and behavioral traces can help determine the quality of a crowd worker’s answer? Why? Or why not?
  • Do you think that collecting such feedback is a privacy issue? Why? Or Why not?

Read More

4/8/2020 – Lee Lisle – The State of the Art in Integrating Machine Learning into Visual Analytics

Summary

               Endert et al.’s focus in this paper is on how machine learning and visual analytics have blended together to create tools for sensemaking with large complex datasets. They first explain the various models of sensemaking and how they can impact learning and understanding, as well as many models of interactivity in visual analytics that complement sensemaking. Then the authors lightly describe some machine learning models and frameworks to establish a baseline knowledge for the paper. They then create 4 categories for machine learning techniques currently used in visual analytics: dimension reduction, clustering, classification, and regression/correlation models. Then then discuss papers that fit into each of these categories in another set of categories where the user either modifies parameters and computational domain or defines analytical expectations, while the machine learning model assists the user in these. The authors then point out several new ways of blending machine learning and visual analytics, such as steerable machine learning, creating training models from user interaction data, and automated report generation.

Personal Reflection

               This paper was an excellent summary of the field of visual analytics and various ways machine learning has been blended into it. Furthermore, there were several papers included that have informed my own research into visual analytics and sensemaking. I was somewhat surprised that, though the authors mention virtual reality, they don’t cover some of the tools that have been developed for immersive analytics. As a side note to this, the authors used a lot of various acronyms and did not explain all of them, for example virtual reality was referenced once and only by its acronym. When they used it for dimensional reduction, I was initially confused because they hadn’t defined that acronym, while they defined the acronyms for visual analytics and machine learning twice in the same paragraph in the introduction.

               Their related works section was impressive and really covered a lot of angles for sensemaking and visual analytics. While I do not have the base for machine learning, I assume it also covered that section well.

               I also thought the directions they suggested for future development was a good selection of ideas. I could identify ways that many of them could be applied to my work on the Immersive Space to Think, like automated report generation would be a great way to start out in IST, and a way to synthesize and perform topic analysis on any notations while in IST could lead to further analytical goals.

Questions

  1. What observations do you have on their suggestions for future development in visual analytics? What would you want to tackle first and why?
  2. In what ways does the human and machine work together in each category of machine learning (dimension reduction, clustering, classification, and regression/correlation)? What affordances does each use?
  3. Which training method do you think leads to higher quality outputs? Unmodified training sets or user interaction steerable machine learning?

Read More

4/8/20 – Lee Lisle – Agency Plus Automation: Designing Artificial Intelligence into Interactive Systems

Summary

Heer’s focus in this paper is on refocusing AI and machine learning into everyday interactions that assist users in their work rather than trying to replace users. He reiterates many times in the introduction that humans should remain in control while the AI assists them in completing the task, and even brings up the recent Boeing automation mishaps as an example of why human-in-the-loop is so essential to future developments. The author then describes several tools in data formatting, data visualization, and natural language translation that use AI to assist the user by suggesting actions based on their interactions with data, as well as domain-specific languages (DSLs) that can quickly perform actions through code. The review of his work shows that users want more control, not less, and that these tools increase productivity while allowing the user to ultimately make all of the decisions.

Personal Reflection

               I enjoyed this paper as an exploration of various ways people can employ semantic interaction in interfaces to boost productivity. Furthermore, the explorations in how users can do this without giving up control was remarkable. I hadn’t realized that the basic idea behind autocorrect/autocomplete could apply in so many different ways in these domains. However, I did notice that the author mentioned that in certain cases there were too many options for what to do next. I wonder how much ethnographic research needs to go into determining each action that’s likely (or even possible) in each case and what overhead the AI puts on the system.

               I also wonder how these interfaces will shape work in the future. Will humans adapt to these interfaces and essentially create new routines and processes in their work? As autocomplete/correct often creates errors, will we have to adapt to new kinds of errors in these interfaces? At what point does this kind of interaction become a hindrance? I know that, despite the number of times I have to correct it, I wouldn’t give up autocomplete in today’s world.

Questions

  1. What are some everyday interactions that you interact with in specialized programs and applications? I.E., beyond just autocorrect. Do you always utilize these features?
  2. The author took three fairly everyday activities and created new user interfaces with accompanying AI with which to create better tools for human-AI collaboration. What other everyday activities can you think of that you could create a similar tool for?
  3. How would you gather data to create these programs with AI suggestions? What would you do to infer possible routes?
  4. The author mentions expanding these interfaces to (human/human) collaboration. What would have to change in order to support this? Would anything?
  5. DSLs seem to be a somewhat complicated addition to these tools. Why would you want to use these and is it worth learning about the DSL?
  6. Is ceding control to AI always a bad idea? What areas do you think users should cede more control or should gain back more control?

Read More

04/08/20 – Lulwah AlKulaib-Agency

Summary

The paper considers the design of systems that enable rich and adaptive interaction between people and algorithms. The authors attempt to balance the complementary strengths and weaknesses of humans and algorithms while promoting human control and skillful action.They aim to employ AI methods while ensuring that people remain in control. Supporting that people should be unconstrained in pursuing complex goals and exercising domain expertise.They share case studies of interactive systems that they developed in three fields: data wrangling, exploratory analysis, and natural language translation that integrates proactive computational support into interactive systems. They examine the strategy of designing shared representations that augment interactive systems with predictive models of users’ capabilities and potential actions, surfaced via interaction mechanisms that enable user review and revision for each case study. These models enable automated reasoning about tasks in a human centered fashion and can adapt over time by observing and learning from user behavior. To improve outcomes and support learning by both people and machines, they describe the use of shared representations of tasks augmented with predictive models of human capabilities and actions. They conclude with how they could better construct and deploy systems that integrate agency and automation via shared representations. They also mention that they found that neither automated suggestions nor direct manipulation play a strictly dominant role.But that a fluent interleaving of both modalities can enable more productive, yet flexible, work.

Reflection

The paper was very interesting to read. The case studies presented were thought provoking. They’re all papers based on research that I have read and gone through while learning about natural language processing and the thought of them being suggestive makes me wonder about such work. How user-interface toolkits might affect design and development of models.

I also wonder as presented in the future work, how to evaluate systems across varied levels of agency and automation. What would the goal be in that evaluation process? Would it differ across machine learning disciplines?  The case studies presented in the paper had specific evaluation metrics used and I wonder how that generalizes to other models. What other methods could be used for evaluation in the future and how does one compare two systems  when comparing their results is no longer enough?

I believe that this paper sheds some light to how evaluation criteria can be topic specific, and those will be shared across applications that are relevant to human experience in learning. It is important to pay attention to how they promote interpretability, learning, and skill acquisition instead of deskilling workers. Also, it’s essential that we think of appropriate designs that would optimize trade offs between automated support and human engagement.  

Discussion

  • What is your takeaway from this paper?
  • Do you agree that we need better design tools that aid the creation of effective AI-infused interactive systems? Why? Or Why not?
  • What determines a balanced AI – Human interaction?
  • When is AI agency/control harmful? When is it useful?
  • Is insuring humans being in control of AI models important? If models were trained by domain experts and domain expertise, then why do we mistrust them?

Read More

04/07/20 – Sukrit Venkatagiri – CrowdScape: Interactively Visualizing User Behavior and Output

Paper: Jeffrey Rzeszotarski and Aniket Kittur. 2012. CrowdScape: interactively visualizing user behavior and output. In Proceedings of the 25th annual ACM symposium on User interface software and technology (UIST ’12), 55–62. https://doi.org/10.1145/2380116.2380125

Summary:

Crowdsourcing has been used to do intelligent tasks/knowledge work at scale and for a lower price, all online. However, there are many challenges with controlling quality in crowdsourcing. This paper talks about how in prior approaches, quality control was done through algorithms evaluated against gold standard or looking at worker agreement and behavior. Yet, these approaches have many limitations, especially for creative tasks or other tasks that are highly complex in nature. This paper presents a system, called CrowdScape, to support manual or human evaluation of complex crowdsourcing task results through a visualization that is interactive and has a mixed initiative machine learning back-end. The paper describes features of the system as well as its uses through 4 very different case studies. First, a translation task from Japanese to English. The next one was a little unique, asking workers to pick their favorite color. The third was about writing about their favorite place, and finally the last one was tagging a video. Finally, the paper concludes with a discussion of the findings.

Reflection:

Overall, I really liked the paper and the CrowdScape system, and I found the multiple case studies really interesting. I especially liked the fact that the case studies varied in terms of complexity, creativity, and open-endedness. However, I found the color-picker task a little off-beat and wonder why the authors chose that task. 

I also appreciate that the system is built on top of existing work, e.g. Amazon Mechanical Turk (a necessity), as well as Rzeszotarski and Kittur’s Task Fingerprinting system to capture worker behavioral  traces. The scenario describing the more general use case was also very clear and concise. The fact that the system, CrowdScape, also utilizes two diverse data sources—as opposed to just one—is interesting. This makes triangulating the findings more easy, as well as observing and discrepancies in the data. More specifically, the CrowdScape system looks at worker’s behavioral traces as well as their output. This allows one to differentiate between workers in terms of their “laziness/eagerness” as well as the actual quality of the output. The system also provides an aggregation of the two features, and all of these are displayed as visualizations which makes it easy for a requester to view tasks and easily discard/include work.

However, I wonder how useful these visualizations might be for tasks such as surveys, or tasks that are less open-ended. Further, although the visualizations are useful, I wonder if they should be used in conjunction with gold standard datasets or not, and how useful that combination would be. Although the paper demonstrates the potential uses of the system via case studies, it does not demonstrate whether real users say it is useful. Thus, an evaluation by real-world users might help.

Questions:

  1. What do you think about the case study evaluation? Are there ways to improve it? How?
  2. What features of the system would you use as a requester?
  3. What are some drawbacks to the system?

Read More

04/08/2020 – Vikram Mohanty – CrowdScape: Interactively Visualizing User Behavior and Output

Authors: Jeffrey M Rzeszotarski, Aniket Kittur

Summary

This paper proposes a system CrowdScape, that supports human evaluation of crowd work through interactive visualization of behavioral traces and worker output, combined with mixed-initiative machine learning. Different case studies are discussed to showcase the utility of CrowdScape.

Reflection

The paper addresses the issue of quality control, a long-standing problem in crowdsourcing, by combining two existing standalone approaches that researchers currently adopt: a) inference from worker behavior and b) analyzing worker output. Combining these factors is advantageous as it provides the complete picture, either by providing corroborating evidence towards ideal workers, or in some cases, may provide complementary evidence that can help infer ideal “good” workers. Just analyzing the worker output might not be enough as there’s an underlying chance that it might be as good as a random coin toss. 

Even though it was a short text in parentheses, I really liked the fact that the authors explicitly sought permission to record the worker interaction logs. 

Extrapolating other similar or dissimilar behavior using Machine Learning seems intuitive here as the data and the features used (i.e. the building blocks) of the model are very meaningful, perfectly relevant to the task and not a black-box model. As a result, it’s not surprising to see it work almost everywhere. The one case where it didn’t work, it made up for it by showing that the complementary case works. This sets a great example for designing predictive models on top of behavioral traces that actually works. 

Moreover, the whole system was built agnostic of the task, and the evaluations justified it. However, I am not sure if the best use case of the system is optimized towards recruiting multiple workers for a single task, or whether it is to identify a set of good workers to subsequently retain for other tasks in the pipeline. I am guessing it is the latter, as the former might seem like an expensive approach for getting high-quality responses. 

On the other hand, I feel the implications of this paper go beyond just crowdsourcing quality control. CrowdScape, or a similar system, can provide assistance for studying user behavior/experience in any interface (web for now), which is important for evaluating interfaces.

Questions

  1. Does your evaluation include collecting behavioral trace logs? If so, what are some of your hypotheses regarding user behavior?
  2. How do you plan on assessing quality control?
  3. What kind of tasks do you see CrowdScape being best applicable for? (e.g. single task, multiple workers)

Read More

04/08/2020 – Vikram Mohanty – Agency plus automation: Designing artificial intelligence into interactive systems

Authors: Jeffrey Heer

Summary

The paper discusses interactive systems in three different areas — data wrangling, exploratory analysis, and natural language translation — to showcase the use of “shared representation” of tasks, where machines can augment human capabilities instead of replacing them. All the systems highlight balancing of the complementary strengths and weaknesses of each, while promoting human control.

Reflection

This paper makes the case for intelligence augmentation i.e. augmenting human capabilities with the strengths of AI rather than striving to replace them. Developers of intelligent user interfaces can come up with effective collaborative systems by carefully designing the interface for ensuring that that AI component “reshapes” the shared representations that users can contribute to, and not “replace” them. This is always a complex task, and therefore, requires scoping down from the notion that AI can be used to automate everything by focusing on these editable shared representations. This has other benefits i.e. helps exploit the benefits of AI in a sum-of-parts manner rather than an end-to-end mechanism where an AI is more likely to be erroneous. The paper discusses three different case studies where a mixed-initiative deployment was successful in catering to user expectations in terms of experience and output. 

It was particularly interesting to see the participants complaining that the Voyager system, despite being good, spoilt them as it made them think less. This can hamper adoption of such systems. A reasonable design implication here should be allowing users to choose the features they want or giving them the agency to adjust the degree of automation/suggestions. This also suggests the importance of conducting longitudinal studies to understand how users use the different features of an interface i.e. whether they use one but not the other. 

According to some prior work, machine-suggested recommendations have been known to perpetrate filter bubbles. In other words, users are exposed to a similar set of items and miss out on other stuff. Here, the Voyager recommendations work in contrast to prior work by allowing users to explore the space, analyze different charts and data points they wouldn’t otherwise notice and combat confirmation bias. In other words, the system does what it claims to do i.e. augment the capabilities of humans in a positive sense using the strengths of the machine. 

Questions

  1. In the projects you are proposing for the class, does the AI component augment the human capabilities or strive to replace it (eventually)? If so, how?
  2. How do you think developers should cater to cases where users are less likely to adopt a system because it impedes their creativity?
  3. Do you think AI components (should) allow users to explore the space more than they would normally? Any possible pitfalls (information overdose, unnatural tasks/interactions, etc.)

Read More

4/8/20 – Akshita Jha – Agency plus automation: Designing artificial intelligence into interactive systems

Summary:
“Agency plus automation: Designing artificial intelligence into interactive systems” by Heer talks about the drawback of using artificial intelligence techniques for automating tasks, especially the ones that are considered repetitive and monotonous. However, this presents a monumentally optimistic point of view by completely ignoring the ghost work or the invisible labor that goes into making ‘automating’ these tasks. This gap between crowd work and machine automation highlights the need for design and engineering interventions. The authors of this paper try to make use of the complementary nature strengths and weaknesses of the two – creativity, intelligence, world-knowledge of the crowd workers and the cheap and no cognitive overload provided by automated systems. The authors describe in detail the case studies of interactive systems in three different areas – data wrangling, exploratory analysis, and natural language translation. These systems combine computational support with interactive systems. The authors also talk about sharing representations of tasks to include both human intelligence and automated support in the design itself. The authors conclude that “neither automated suggestions nor direct manipulation plays a strictly dominant role” and ” a fluent interleaving of both modalities can enable more productive, yet flexible, work.”

Reflections:
There is a lot of invisible work that goes into automating a task. Most automated tasks require hundreds, if not thousands, of annotations. Machine learning researchers turn a blind eye to all the effort that goes into annotations by calling their systems ‘fully automated’. This view is exclusionary and does not do justice to the vital but seemingly trivial work done by the crowd workers. One of the areas that one can focus on is the open question of shared representation – Is it possible to integrate data representation with human intelligence? If yes, is that useful? Data representation often involves the construction of latent space to reduce the dimensionality of input data and get concise and meaningful information. There may or may not be such representations exist for human intelligence. Maybe borrowing from social psychology might help in such a scenario. There can be other ways to go around this. For example, the authors focus on building interactive systems with ‘collaborative’ interfaces. The three interaction models: Wrangler, Voyager, and PTM do not distribute the tasks equally between humans and automated systems. The automated methods prompt the users with different suggestions which the end user reviews. The final decision making power lies with the end user. It would be interesting to see what would the results looks like if the roles were reversed and the system was turned on its head. An interesting case study could be if the suggestion was given by the end user and the ultimate decision making capability rested with the system. Would the system still be as collaborative? What would the drawbacks of such systems be?

Questions:

1. What are your general thoughts on the paper?
2. What did you think about the case studies? Which other case studies would you include?
3. What are your thoughts on evaluating systems with shared representations? Which evaluation criteria can we use?

Read More