04/08/20 – Lulwah AlKulaib-CrowdScape

Summary

The paper presents a system supporting the evaluation of complex crowd work through mixed-initiative machine learning and interactive visualization. This system proposes a solution for challenges in quality control that occur in crowdsourcing platforms. Previous work shows quality control concepts based on worker output or behavior which was not effective in evaluating complex tasks. Their suggested system combines the behavior and output of a worker to show the evaluation of complex crowd work. Their system features allow users to develop hypotheses about their crowd, test them, and refine selections based on machine learning and visual feedback. They use MTurk and Rzeszotarski and Kittur’s Task Fingerprinting system to create an interactive data visualization of the crowd workers. They posted four varieties of complex tasks: translating text from Japanese to English, picking a favorite color using an HSV color picker and writing its name, writing about their favorite place, and tagging science tutorial videos from Youtube. They conclude that the information gathered from crowd workers behavior is beneficial in reinforcing or contradicting the conception of the cognitive process that crowd workers use to complete tasks and in developing and testing mental models of the behavior of crowd workers who have good or bad outputs. This model helps  its users identify further good workers and output in a sort of positive feedback loop.

Reflection

This paper presents an interesting approach in addressing how to discover low quality responses from crowd workers. It is an interesting way to combine these two methods and makes me think of our project and what limitations might arise from following their method in logging behaviors of crowd workers. I was not thinking of disclosing to the crowdworkers about their behavior while responding is being recorded and now it’s making me look at previous work if that has affected the crowd workers response or not. I found it interesting that crowdworkers used machine translation in the Japanese to English translation task even when they knew their behavior was being recorded. I assume that since there wasn’t a requirement of speaking Japanese or the requirements were relaxed that crowd workers were able to perform the task and use tools like Google Translate. If the requirements were there, then the workers won’t be paid for the task. This has also alerted me to the importance of task requirements and explanation for crowd workers. Since some Turkers could abuse the system and give us low quality results simply because the rules weren’t so clear.

Having the authors list their limitations was useful for me. It gave me another perspective to think about how to evaluate the responses that we get in our project and what we can do to make our feedback approach better.

Discussion

  • Would you use behavioral traces as an element in your project? If yes, would you tell the crowd workers that you are collecting that data? Why? Why not?
  • Do you think that implicit feedback and behavioral traces can help determine the quality of a crowd worker’s answer? Why? Or why not?
  • Do you think that collecting such feedback is a privacy issue? Why? Or Why not?

Read More

4/8/2020 – Lee Lisle – The State of the Art in Integrating Machine Learning into Visual Analytics

Summary

               Endert et al.’s focus in this paper is on how machine learning and visual analytics have blended together to create tools for sensemaking with large complex datasets. They first explain the various models of sensemaking and how they can impact learning and understanding, as well as many models of interactivity in visual analytics that complement sensemaking. Then the authors lightly describe some machine learning models and frameworks to establish a baseline knowledge for the paper. They then create 4 categories for machine learning techniques currently used in visual analytics: dimension reduction, clustering, classification, and regression/correlation models. Then then discuss papers that fit into each of these categories in another set of categories where the user either modifies parameters and computational domain or defines analytical expectations, while the machine learning model assists the user in these. The authors then point out several new ways of blending machine learning and visual analytics, such as steerable machine learning, creating training models from user interaction data, and automated report generation.

Personal Reflection

               This paper was an excellent summary of the field of visual analytics and various ways machine learning has been blended into it. Furthermore, there were several papers included that have informed my own research into visual analytics and sensemaking. I was somewhat surprised that, though the authors mention virtual reality, they don’t cover some of the tools that have been developed for immersive analytics. As a side note to this, the authors used a lot of various acronyms and did not explain all of them, for example virtual reality was referenced once and only by its acronym. When they used it for dimensional reduction, I was initially confused because they hadn’t defined that acronym, while they defined the acronyms for visual analytics and machine learning twice in the same paragraph in the introduction.

               Their related works section was impressive and really covered a lot of angles for sensemaking and visual analytics. While I do not have the base for machine learning, I assume it also covered that section well.

               I also thought the directions they suggested for future development was a good selection of ideas. I could identify ways that many of them could be applied to my work on the Immersive Space to Think, like automated report generation would be a great way to start out in IST, and a way to synthesize and perform topic analysis on any notations while in IST could lead to further analytical goals.

Questions

  1. What observations do you have on their suggestions for future development in visual analytics? What would you want to tackle first and why?
  2. In what ways does the human and machine work together in each category of machine learning (dimension reduction, clustering, classification, and regression/correlation)? What affordances does each use?
  3. Which training method do you think leads to higher quality outputs? Unmodified training sets or user interaction steerable machine learning?

Read More

4/8/20 – Lee Lisle – Agency Plus Automation: Designing Artificial Intelligence into Interactive Systems

Summary

Heer’s focus in this paper is on refocusing AI and machine learning into everyday interactions that assist users in their work rather than trying to replace users. He reiterates many times in the introduction that humans should remain in control while the AI assists them in completing the task, and even brings up the recent Boeing automation mishaps as an example of why human-in-the-loop is so essential to future developments. The author then describes several tools in data formatting, data visualization, and natural language translation that use AI to assist the user by suggesting actions based on their interactions with data, as well as domain-specific languages (DSLs) that can quickly perform actions through code. The review of his work shows that users want more control, not less, and that these tools increase productivity while allowing the user to ultimately make all of the decisions.

Personal Reflection

               I enjoyed this paper as an exploration of various ways people can employ semantic interaction in interfaces to boost productivity. Furthermore, the explorations in how users can do this without giving up control was remarkable. I hadn’t realized that the basic idea behind autocorrect/autocomplete could apply in so many different ways in these domains. However, I did notice that the author mentioned that in certain cases there were too many options for what to do next. I wonder how much ethnographic research needs to go into determining each action that’s likely (or even possible) in each case and what overhead the AI puts on the system.

               I also wonder how these interfaces will shape work in the future. Will humans adapt to these interfaces and essentially create new routines and processes in their work? As autocomplete/correct often creates errors, will we have to adapt to new kinds of errors in these interfaces? At what point does this kind of interaction become a hindrance? I know that, despite the number of times I have to correct it, I wouldn’t give up autocomplete in today’s world.

Questions

  1. What are some everyday interactions that you interact with in specialized programs and applications? I.E., beyond just autocorrect. Do you always utilize these features?
  2. The author took three fairly everyday activities and created new user interfaces with accompanying AI with which to create better tools for human-AI collaboration. What other everyday activities can you think of that you could create a similar tool for?
  3. How would you gather data to create these programs with AI suggestions? What would you do to infer possible routes?
  4. The author mentions expanding these interfaces to (human/human) collaboration. What would have to change in order to support this? Would anything?
  5. DSLs seem to be a somewhat complicated addition to these tools. Why would you want to use these and is it worth learning about the DSL?
  6. Is ceding control to AI always a bad idea? What areas do you think users should cede more control or should gain back more control?

Read More

04/08/20 – Lulwah AlKulaib-Agency

Summary

The paper considers the design of systems that enable rich and adaptive interaction between people and algorithms. The authors attempt to balance the complementary strengths and weaknesses of humans and algorithms while promoting human control and skillful action.They aim to employ AI methods while ensuring that people remain in control. Supporting that people should be unconstrained in pursuing complex goals and exercising domain expertise.They share case studies of interactive systems that they developed in three fields: data wrangling, exploratory analysis, and natural language translation that integrates proactive computational support into interactive systems. They examine the strategy of designing shared representations that augment interactive systems with predictive models of users’ capabilities and potential actions, surfaced via interaction mechanisms that enable user review and revision for each case study. These models enable automated reasoning about tasks in a human centered fashion and can adapt over time by observing and learning from user behavior. To improve outcomes and support learning by both people and machines, they describe the use of shared representations of tasks augmented with predictive models of human capabilities and actions. They conclude with how they could better construct and deploy systems that integrate agency and automation via shared representations. They also mention that they found that neither automated suggestions nor direct manipulation play a strictly dominant role.But that a fluent interleaving of both modalities can enable more productive, yet flexible, work.

Reflection

The paper was very interesting to read. The case studies presented were thought provoking. They’re all papers based on research that I have read and gone through while learning about natural language processing and the thought of them being suggestive makes me wonder about such work. How user-interface toolkits might affect design and development of models.

I also wonder as presented in the future work, how to evaluate systems across varied levels of agency and automation. What would the goal be in that evaluation process? Would it differ across machine learning disciplines?  The case studies presented in the paper had specific evaluation metrics used and I wonder how that generalizes to other models. What other methods could be used for evaluation in the future and how does one compare two systems  when comparing their results is no longer enough?

I believe that this paper sheds some light to how evaluation criteria can be topic specific, and those will be shared across applications that are relevant to human experience in learning. It is important to pay attention to how they promote interpretability, learning, and skill acquisition instead of deskilling workers. Also, it’s essential that we think of appropriate designs that would optimize trade offs between automated support and human engagement.  

Discussion

  • What is your takeaway from this paper?
  • Do you agree that we need better design tools that aid the creation of effective AI-infused interactive systems? Why? Or Why not?
  • What determines a balanced AI – Human interaction?
  • When is AI agency/control harmful? When is it useful?
  • Is insuring humans being in control of AI models important? If models were trained by domain experts and domain expertise, then why do we mistrust them?

Read More

04/07/20 – Sukrit Venkatagiri – CrowdScape: Interactively Visualizing User Behavior and Output

Paper: Jeffrey Rzeszotarski and Aniket Kittur. 2012. CrowdScape: interactively visualizing user behavior and output. In Proceedings of the 25th annual ACM symposium on User interface software and technology (UIST ’12), 55–62. https://doi.org/10.1145/2380116.2380125

Summary:

Crowdsourcing has been used to do intelligent tasks/knowledge work at scale and for a lower price, all online. However, there are many challenges with controlling quality in crowdsourcing. This paper talks about how in prior approaches, quality control was done through algorithms evaluated against gold standard or looking at worker agreement and behavior. Yet, these approaches have many limitations, especially for creative tasks or other tasks that are highly complex in nature. This paper presents a system, called CrowdScape, to support manual or human evaluation of complex crowdsourcing task results through a visualization that is interactive and has a mixed initiative machine learning back-end. The paper describes features of the system as well as its uses through 4 very different case studies. First, a translation task from Japanese to English. The next one was a little unique, asking workers to pick their favorite color. The third was about writing about their favorite place, and finally the last one was tagging a video. Finally, the paper concludes with a discussion of the findings.

Reflection:

Overall, I really liked the paper and the CrowdScape system, and I found the multiple case studies really interesting. I especially liked the fact that the case studies varied in terms of complexity, creativity, and open-endedness. However, I found the color-picker task a little off-beat and wonder why the authors chose that task. 

I also appreciate that the system is built on top of existing work, e.g. Amazon Mechanical Turk (a necessity), as well as Rzeszotarski and Kittur’s Task Fingerprinting system to capture worker behavioral  traces. The scenario describing the more general use case was also very clear and concise. The fact that the system, CrowdScape, also utilizes two diverse data sources—as opposed to just one—is interesting. This makes triangulating the findings more easy, as well as observing and discrepancies in the data. More specifically, the CrowdScape system looks at worker’s behavioral traces as well as their output. This allows one to differentiate between workers in terms of their “laziness/eagerness” as well as the actual quality of the output. The system also provides an aggregation of the two features, and all of these are displayed as visualizations which makes it easy for a requester to view tasks and easily discard/include work.

However, I wonder how useful these visualizations might be for tasks such as surveys, or tasks that are less open-ended. Further, although the visualizations are useful, I wonder if they should be used in conjunction with gold standard datasets or not, and how useful that combination would be. Although the paper demonstrates the potential uses of the system via case studies, it does not demonstrate whether real users say it is useful. Thus, an evaluation by real-world users might help.

Questions:

  1. What do you think about the case study evaluation? Are there ways to improve it? How?
  2. What features of the system would you use as a requester?
  3. What are some drawbacks to the system?

Read More

04/09/2020 – Mohannad Al Ameedi – Agency plus automation Designing artificial intelligence into interactive systems

Summary

In this paper, the author proposes multiple systems that can combine the power of both artificial inttelgence and human computation and overcome each one weakness. The author thinks that automating all tasks can lead to a poor results as human component is needed to review and revise results get the best results. The author the autocomplete and spell checkers examples to show that artificial intelligence can offer suggestion and then human can review or revise these suggestions or dismiss the suggestions. The author propose different systems that uses predictive interaction that help users on their tasks that can be partially automated to help the users to focus more on the things that they care more about. One of these systems called Data Wrangling that can used by data analyst on the data preprocessing to help them with cleaning up the data to save more than %80 of their work. The users will need to setup some data mapping and can accept or reject the suggestions. The author proposed project called Voyager that can help with data visualization for exploratory analysis which can be used to help with suggesting visualization elements. The author suggests using AI to automate repeated task and offer the best suggestions and recommendations and let the human decide whether to accept or reject the recommendations. This kind of interaction can improve both machine learning results and human interaction.

Reflection

I found the material presented in the paper to be very interesting. Many discussions were about whether machine can replace human or not was addressed in this paper. The author mentioned that machine can do well with the help of human and the human in the loop will always be necessary.

I also like the idea of the Data Wrangling system as many data analysts and developer spend considerable time on cleaning up the data and most of the steps are repeated regardless of the type of data, and automating these steps will help a lot of people to do more effective work and to focus more on the problem that they are trying to solve rather than spending time on things that are not directly related to the problem.

I agree with author that human will always be in the loop especially on systems that will be used by humans. Advances in AI need human on annotating or labeling the data to work effectively and also to measure and evaluate the results.

Questions

  • The author mentioned that the Data Wrangler system can be used by data analysts to help with data preprocessing, do you think that this system can also be used by data scientist since most machine learning and deep learning projects require data cleanup ?
  • Can you give other examples of AI-Infused interactive systems that can help different domains and can be deployed into production environment to be used by large number of users and can scale well with increased load and demands?

Read More

04/08/2020 – Sushmethaa Muhundan – Agency plus automation: Designing artificial intelligence into interactive systems

This work explores strategies to balance the role of agency and automation by designing user interfaces that enable the shared representations of AI and humans. The goal is to productively employ AI methods while also ensuring that humans remain in control. Three case studies are discussed and these are data wrangling, data visualization for exploratory analysis, and natural language translation. Across each, strategies for integrating agency and automation by incorporating predictive models and feedback into interactive applications are explored. In the first case study, an interactive system is proposed that aims at reducing human efforts by recommending potential transformation, gaining feedback from the user, and performing the transformations as necessary. This would enable the user to focus on tasks that would require the application of their domain knowledge and expertise rather than spending time and effort manually performing transformations. A similar interactive system was developed to aid visualization efforts. The aim was to encourage more systematic considerations of the data and also reveal potential quality issues. In the case of natural language translation, a mixed-initiative translation approach was explored.

The paper has a pragmatic view of the current AI systems and makes a realistic observation that the current AI systems are not capable of completely replacing humans. There is an emphasis on leveraging the complementary strengths of both the human and the AI throughout the paper which is practical. 

Interesting observations were made in the Data Wrangler project with respect to proactive suggestions. If these were presented initially, before the user has had a chance to interact with the system, this feature received negative feedback and was ignored. But, if the same suggestions were presented to users whilst the user was engaging with the system, although the suggestions were not related to the user’s current task, it was received positively. Users viewed themselves as initiators in the latter scenario and hence felt that they were controlling the system. This observation was fascinating since it shows that while designing such user interfaces, the designers should ensure that their users feel in control and are not feeling insecure while using AI systems.

With respect to the second case study, it was reassuring to learn that the inclusion of automated support from the interactive system was able to shift user behavior for the better and helped broaden their understanding of the data. Another positive effect was that the system helped humans combat confirmation bias. This shows that if the interface is designed well, the benefits of AI amplifies the results gained when humans apply their domain expertise.

  • The paper deals with designing interactive systems where the complementary strengths of agents and automation systems are leveraged. What could be the potential drawbacks of such systems, if any?
  • How would the findings of this paper be translated in the context of your class project? Is there potential to develop similar interactive systems to improve the user experience of the end-users?
  • Apart from the three case studies presented, what are some other domains where such systems can be developed and deployed?

Read More

04/08/2020 – Sushmethaa Muhundan – CrowdScape: Interactively Visualizing User Behavior and Output

This work aims to address quality issues in the context of crowdsourcing and explores strategies to involve humans in the evaluation process via interactive visualizations and mixed-initiative machine learning. CrowdScape is a tool proposed that aims to ensure quality even in complex or creative settings. This aims to leverage both the end output as well as workers’ behavior patterns to develop insights about performance. CrowdScape is built on top of Mechanical Turk and obtains data from two sources: the MTurk API in order to obtain the products of work done and Rzeszotarski and Kittur’s Task Fingerprinting system in order to capture worker behavioral traces. The tool utilizes these two data sources and generates an interactive data visualization platform. With respect to worker behavior, raw event logs, and aggregate worker features are incorporated to provide diverse interactive visualizations. Four specific case studies were discussed and these included tasks relating to translation, color preference survey, writing, and video tagging. 

In the context of creative works and complex tasks where it is extremely difficult to evaluate the task results objectively, I feel that mixed-initiative approaches like the one described in the paper can be effective to gauge the worker’s performance.

I specifically liked the feature mentioned with respect to aggregating features of worker behavioral traces wherein the user is presented with capabilities to dynamically query the visualization system to support data analysis. This gives the user control over what features are important to them and allows users to focus on those specific behavioral traces as opposed to presenting the users with static visualizations which would have limited impact.

Another interesting feature provided by the system was that it enabled users to cluster submission based on aggregate event features and I feel that this would definitely help save time and effort from the user’s side and would thereby quicken the process.

In the translation case study presented, it was interesting to note that one of the factors that affected the study of lack of focus was tracking copy-paste keyboard usage. This would intuitively translate to the fact that the worker has used third-party software for translation. However, this alone might not be proof enough since it is possible that the worker translated the task locally and was copy-pasting his/her own work. This shows that while user behavior tracking can provide insights, it might not be sufficient to draw conclusions. Hence, coupling it with the output data and comparing and visualizing it would definitely help draw concrete conclusions.

  • Apart from the techniques mentioned in the paper, what are some alternate techniques to gauge the quality of crowd workers in the context of complex or creative tasks?
  • Apart from the case studies presented, what are some other domains where such systems can be developed and deployed?
  • Given that the tool relies on worker’s behavior patterns and given that these may vary largely from worker to worker, are there situations in which the proposed tool would fail to produce reliable results with respect to performance and quality?

Read More

04/08/20 – Jooyoung Whang – CrowdScape: Interactively Visualizing User Behavior and Output

In this paper, the authors try to help Mturk requesters by providing them with an analysis tool called “Crowdscape.” Crowdscape is a ML + visualization tool for viewing and filtering Mturk worker submissions based on the workers’ behaviors. The user of the application can threshold based on certain behavioral attributes such as time spent or typing delay. The application takes in two inputs: Worker behavior and results. The behavior input is a timeseries data of user activity. The result is what the worker submitted for the Mturk work. The authors focused on finding similarities of the answers to graph on parallel coordinates. The authors conducted a user study by launching four different tasks and recording user behavior and result. The authors conclude that their approach is useful.

This paper’s approach of integrating user behavior and result to filter good output was interesting. Although, I think this system should overcome a problem for it to be effective. The problem lies in the ethics area. The authors explicitly stated that they obtained consent from their pool of workers to collect user behavior. However, some Mturk requesters may decide not to do so with some ill intentions. This may result in intrusion of private information and even end up to theft. On the other hand, upon obtaining consent from the Mturk worker, the worker becomes aware of him or her being monitored. This could also result in unnatural behavior which is undesired for system testing.

I thought the individual visualized graphs and figures were effective for better understanding and filtering by user behavior. However, the entire Crowdscape interface looked a bit overpacked with information. I think a small feature to show or hide some of the graphs would be desirable. The same problem existed with another information exploration system from a project that I’ve worked in. In my experience, an effective solution was to provide a set of menus that hierarchically sorted attributes.

These are the questions that I had while reading the paper:

1. A big purpose of Crowdscape is that it can be used to filter and retrieve a subset of the results (that are thought to be high quality results). What other ways could this system be used for? For example, I think this could be used for rejecting undesired results. Suppose you needed 1000 results and you launched 1000 HITs. You know you will get some ill-quality results. However, since there are too many submissions, it’ll take forever to filter by eye. Crowdscape would help accelerate the process.

2. Do you think you can use Crowdscape for your project? If so, how would you use it? Crowdscape is useful if you, the researcher, is the endpoint of the Mturk task (as in the result is ultimately used by you). My project uses the results from Mturk in a systematic way without ever reaching me, so I don’t think I’ll use Crowdscape.

3. Do you think the graphs available in Crowdscape is enough? What other features would you want? For one, I’d love to have a boxplot for the user behavior attributes.

Read More

04/08/20 – Jooyoung Whang – Agency plus automation: Designing artificial intelligence into interactive systems

This paper seeks to investigate a method to achieve AI + IA. That is, enhancing human performance using automated methods but not completely replacing it. The author takes into notice that effective automation should first off bring significant value, second be unobtrusive, third do not require precise user input, and finally, adapt. The author takes these points to account and introduces three interactive systems that he built. All these systems utilize machine computing to handle the initial or small repetitive tasks and rely on human computing to make corrections and improve quality. They are all collaborative systems where AI and humans work together to boost each other’s performance. The AI part of the system tries to predict user intentions while the human part of the system drives the work.

This paper reminded me of Smart-Built Environments (SBE), a term I learned in a Virtual Environments class. SBE is an environment where computing is seamlessly integrated into the environment and interaction with it is very natural. It is capable of “smartly” providing appropriate services to humans in a non-intrusive way. For example, a system where the light automatically lights up upon a person entering a room is a smart feature. I felt that this paper was trying to build something similar in a desktop environment. One core difference with SBEs is that SBE also tries to tackle immersion and presence (which are terms frequently used for evaluating virtual environments). I wonder if the author knows about SBEs or got his project ideas from SBEs.

While reading the paper, I wasn’t sure if the author handled the “unobtrusive” part effectively. In one of the introduced systems, Wrangler was an assist tool for preprocessing data. It tries to predict user intention upon observing certain user behavior and recommends available data transformations on a side panel. I believe this was a similar approach to mimic the Google query auto-completion feature. However, I don’t think it’ll work as well as Google’s auto-completion. Google’s auto-complete suggestions appear right below where the user is typing whereas Wrangler suggests it in the side corner. This requires the user to avert his or her eye from where the point of the previous interaction was, and this is obtrusive.

These are the questions that I had while reading the paper:

1. Do you know any other systems that try to seamlessly integrate AI and human tasks? Is that system effective? How so?

2. The author of this paper mostly uses AI to predict user intentions and process repetitive tasks. What other capabilities of AI would be available for naturally integrating with human tasks? What other tasks are hard to do by humans that machines accel at that could be integrated?

3. Do you agree that “the best kind of systems is one where the user does not even know he or she is using it?” Would there ever be a case where it is crucial that the user feels the presence of the system as a separate entity? This thought came to me because systems could (and ultimately does) fail at some point. If none of the users understand how the system works, wouldn’t that be a problem?

Read More