04/15/20 – Akshita Jha – Believe It or Not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary:
This paper discusses the issue of fact-checking, i.e., the estimation of the credibility of a given statement, which is extremely pertinent in today’s climate. With the generation and exploration of information becoming increasingly simpler, the task of judging the trustworthiness of the found information is becoming increasingly challenging. Researchers have attempted to address this issue with a plethora of AI based fact-checking tools. However, a majority of these are based on neural network models that provide no description, whatsoever, of how they have arrived at a particular verdict. It has not yet been addressed how this lack of transparency, in addition to a poor user interface, affects the usability or even, the perceived authenticity of such tools. This tends to breed mistrust in the minds of the user towards the tool, which ends up being counter-productive to the original goal. This paper tackles this important issue head-on by designing a transparent system that mixes the scope and scalability of an AI system with inputs from the user. While this transparency increases the user’s trust in the system, the user’s input also improves the predictive accuracy of the system. On a very important note, there is an additional side-benefit that this user interaction addresses the deeper issue of information illiteracy in our society, by cultivating the skill of questioning the veracity of a claim and the reliability of a source. The researchers build a model that uses NLP tools to aggregate and assess various articles related to a certain statement and assign a stance to each of the articles. The user can modify the weights associated with each of these stances and influence the verdict generated by the model on the veracity of the claim. They, further, conduct three studies, wherein they judge the usability, the effectiveness, and flaws of this model. They compare the assessments of multiple statements by the participants, before and after being exposed to the model’s prediction. They also verify if the interaction with the model provides additional support as compared to simply displaying the result. A third task estimates if the gamification of this task has any effects. While the third task seems inconclusive, the first two tasks lead the researchers to conclude that interaction with the system causes an increase in the user’s trust in the model’s results, albeit even in the case of a false prediction of the model. However, this interaction also helps improve the prediction of the model, for the claims tested herein.

Reflections:
The paper brings up an interesting point about the transparency of the model. When people talk about an AI system normally a binary system comes to the mind, which takes in a statement, assesses it in a black box, and returns a binary answer. What is interesting is that the ability to interact with the predictions of the models, enables the user to improve their own judgment and even compensate for the lacking of the model. The aspect of user interaction aspect in AI systems has been grossly overlooked. While there are some clear undeniable benefits of this model, there is a very dangerous issue that this human modifiable fact-checking could lead to. Using the slider to modify the reputation of a given source, can potentially lead to user’s inducing their own biases into the system, and effectively creating echo chambers of their own views. This could nefariously impact the verdict of the ML system and thus reinforcing the possible prejudices of the user. I would suggest that the model should assign x% weightage to its own assessment of the source and (100-x)% to the user’s assessment. This would be a step in ensuring that the user’s prejudices do not suppress the model’s judgment completely. However, without doubt, the advantage of this interaction, inadvertently, helping a user learn how to tackle misinformation and check the reputation of sources, is highly laudable. This is definitely worth considering in future models along these lines. From the human perspective, the bayesian or linear approaches adopted by these models make them very intuitive to understand. However, one must not underestimate the effectiveness of neural networks in being more powerful in aggregating relevant information and assessing its quality. A simple linear approach is bound to have its fallacies, and hence, it would be interesting to see a model with uses the power of neural networks in addition to these techniques with help with the transparency aspect. On a side note, it would have been useful to have more information on the NLP and ML methods used. The related work regarding these models is also insufficient to provide a clear background about the existing techniques. One glaring issue with the paper is their circumventing of the insignificance of their result in task #2. They mention that the p-value is just below the threshold. However, statistics teaches us that the exact value is not of importance, it’s the threshold that is set before conducting the experiment that matters. Thus, the statement “..slightly larger than the 0.05..” is simply careless.

Questions:
1. Why is there no control group in task 2 and task 3 as well?
2. What are your general thoughts on the paper? Do you approve of the methodology?

Read More

4/15/2020 – Nurendra Choudhary – What’s at Stake: Characterizing Risk Perceptions ofEmerging Technologies

Summary

In this paper, the authors study the associated risk perception of human mental models to AI systems. For analyzing risk perception, they study 175 individuals, both individually and comparatively, while also factoring in psychological factors. Additionally, they also analyze the factors that lead to people’s conceptions or misconceptions in risk assessment. Their analysis shows that technologists or AI experts consider the studied risks as posing more threat to society than non-experts. Such differences, according to the author, can be utilized to define system design and decision-making.

However, most of the subjects agree that such system risks (identity theft, personal filter bubbles) were not voluntarily introduced in the system but were a consequence or side-effects of integrating some valuable tools or services. The paper also discusses risk-sensitive designs that need to be applied when the difference between public and expert opinion on risk is high. They emphasize on the integration of risk-sensitivity earlier in the design process rather than the current process where it is an after-thought of a deployed system.

Reflection

Given the recent usability of AI technologies in everyday lives (Tesla cars, Google Search, Amazon Marketplace, etc.), this study is very necessary. The risks do not just involve test subjects but a much larger populace that is unable to comprehend these technologies that intrude in their daily lives. These leave them vulnerable to exploitation. Several cases of identity theft or spam treachery have already taken victims due to lack of awareness. Hence, it is very crucial to analyze the amount of information that can reduce such cases. Additionally, a system should provide a comprehensive analysis of its limitations and possible misuse.

Google Assistant records all conversations to detect its initiation phrases “OK Google”. It depends on the fact that the recording is a stream and no data is stored except a segment. However, a possible listener to extract the streams and utilize another program to integrate them into comprehensible knowledge that can be exploited. Users are confident in the system due to the speech segmentation. However, an expert can see-through the given ruse and imagine the listener scenario just based on the knowledge that such systems exist. This knowledge is not entirely expert-oriented and can be transferred to users, thus preventing exploitation.   

Questions

  1. Think about systems that do not rely or have access to user information (e.g. Google Translate, Duckduckgo). What information can they still get from users? Can this be utilized in an unfair manner? Would these be risk-sensitive features? If so, how should the system design change?
  2. Unethical hackers generally work in networks and are able to adapt to security reinforcements. Can security reinforcements utilize risk-sensitive designs to overcome hacker adaptability? What such changes could be thought of in the current system?
  3. Experts tend to show more caution towards technologies. What amount of knowledge introduces such caution? Can this amount be conveyed to all the users of a particular product? Would this knowledge help risk-sensitivity?
  4. Do you think the individuals selected for the task are a representative set? They utilized MTurk for their study. Isn’t there an inherent presumption of being comfortable with computers? How could this bias the study? Is the bias significant?

Word Count: 542

Read More

4/15/2020 – Nurendra Choudhary – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

In this paper, the authors study the human-side or human trust in automated fact checking systems. In their experiments, they show that human beings were able to improve their accuracy when subjected to correct model predictions. However, the human judgement is also shown to degrade in case of incorrect model predictions. This establishes the trustful relationship between humans and their fact-checking models. Additionally, the authors find that humans interacting with the AI system improve their predictions significantly, suggesting transparency of models as a key aspect to human-AI interaction.

The authors provide a novel mixed-initiative framework to integrate human intelligence with fact checking models. Also, they analyze the benefits and drawbacks of such integrated systems.  

The authors also point out several limitations in their approach such as non-American representation in MTurk workers and bias towards AI predictions. Furthermore, they point out the system’s effectiveness in mediating debates and also convey real-time fact checks in an argument setting. Also, interaction with the tool could serve as a platform of identifying the reason for opinion differences.

Reflection

The paper is very timely in the sense that fake news has become a widely used tool of political and social gain. People, unfamiliar with the power of the internet, tend to believe unreliable sources and form very powerful opinions based on them. Such a tool can be extremely powerful in eliminating such controversies. The idea of analyzing human role in AI fact checkers is also extremely important. AI fact checkers lack perfect accuracy and given the problem, perfect accuracy is a requirement. Hence, the role of human beings in the system cannot be undermined. However, human mental models tend to trust the system after correct predictions and do not efficiently correct itself for incorrect predictions. This becomes an inherent limitation for these AI systems. Thus, the paper’s idea of introducing transparency is extremely appropriate and necessary. Given more incite into the mechanism of fact checkers, human beings would be able to better optimize their mental models, thus improving the performance of collaborative human-AI team.

AI systems can analyze huge repositories of information and humans can perform more detailed analysis. In that sense, fact-checking human-AI team utilizes the most important capabilities of both human and AI. However, as pointed out in the paper, humans tend to overlook their own capabilities and rely on model prediction. This could be due to human trust after some correct predictions. Given the plethora of existing information, it would be really inconvenient for humans to assess it all. Hence, I believe these initial trials are extremely important to build the correct amount of trust and expectations.

Questions

  1. Can a fact-checker learn from its mistakes pointed out by humans? How would that work? Would that make the fact-checker dynamic? If so, what is the extent of this change and how would the human mental models adapt effectively to such changes?
  2. Can you suggest a better way of model interaction between humans and models? Also, what other tasks can such interaction be effective?
  3. As pointed out in the paper, humans tend to overlook their own capabilities and rely on model prediction? What is a reason for this? Is their a way to make the collaboration more effective?
  4. Here, the assumption is that human beings are the veto authority. Can there be a case when this is not true? Is it always right to trust the judgement of humans (in this case underpaid crowd workers)?

Word Count: 588

Read More

04/15/2020 – Nan LI – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

This paper introduced a mix-initiative model which allow human and machine to check the authenticity of a claim cooperatively. This fact-checking model also considers the existing interface design principle, which supports understandability and actionability of interface. The main objective of their design is to provide transparency, supports for integrating user knowledge, and explanation of system uncertainty. To prioritize transparency over raw predictive performance, the author used a more transparent prediction model, linear models instead of deep neural networks. Further, users allow to change the reputations and stances of the system prediction. To evaluate how the system could help users to assess the factuality of claims, the author conducted three user studies with MTurk workers. The study results indicate that users might over-trust the system. The system prediction can help the user when the claim is predicted as correct. At the same time, it also degrades human performance when the system prediction errors due to the biases implicit in training data.

Reflection

I think the design of this approach is valuable. Because it does not blindly pursue the accuracy of prediction results, but also consider the transparency, understandability, and actionability of the interface. These attempts would improve the user experience since the user have more knowledge of how the system works and thus provide more trust. On the other hand, this might be the cause that the user may over-trust the system, as indicated in the paper’s experiment results. But still, I think the design of the approach is a nice try.

However, I don’t see the possibility that this system can help users. Although the design is very user friendly, it does not leverage human ability; instead, it just allows humans to participant in such a fact-checking process. Even though the design of the fact-checking process is reasonable and understandable for users, but the expectations from users side require too much mental work such as read a lot of information, thinking, and reasoning. This is a reasonable process, but it is too burdensome.

Moreover, based on the observation of the figures in the paper, I don’t think the system could facilitate the user in determining the authenticity of the claim, and I believe the experiment results also found this. Further, I found that the accuracy of the user’s judgment depends more on the type of claim. Different claims have a significant difference accuracy; this impact is even higher than the effect of the system.

It is also interesting to see the user’s feedback after they complete the task. It seems one of the users has the same opinion as me regarding the amount of information needed to read. The most impressive feedback is that the user would be confused if they have more options. I think these conditions only happen when they not sure about the correctness, and they have the right to change the system output. Finally, we can also see from the comment that when the system has the same judgment as users, users will be more sure of their answer. Still, if the system predicts results indicate the different judgments with users, this will seriously affect the accuracy of the user’s judgment. This is understandable because when someone questions your decision, no matter how confident you are, you will waver a little, let alone a machine that has 70% accuracy.

Questions:

  1. Do you think the system could really help humans to detect the factuality of claims? Why or why not?
  2. When the author design the model, to achieve the goal of transparency, they give up the higher accuracy prediction model instead of using linear models. What do you think of this? Which one is more important for your design? Transparency? Accuracy?
  3. What do you think of the design interface? Does it provide too much information to users? Do you like the design or not.

Word Count: 638

Read More

04/15/2020 – Nan LI – ALGORITHMIC ACCOUNTABILITY Journalistic investigation of computational power structures

Summary:

In this paper, the author first presents the algorithmic power, prioritization, classification, association, as well as filtering. Then the author concludes based on the description of algorithmic power that a significant number of humans would be influenced by algorithms outcome. Thus, the author made the point that it is significant to interpreting the output of algorithms in the course of making higher-level decisions. Next, the author examined the possibility and weaknesses of requiring algorithm transparency. Therefore, the paper introduces a replaced method called reverse engineering. In this work, journalistics combined the interviews, document reviews as well as reverse engineering analysis to shed light on the algorithms’ functioning. They introduced five cases of studies of journalistic investigations and also presented the challenges and opportunities for doing algorithmic accountability work. The primary process of the inquiry includes identifying newsworthy algorithm, sampling the input-output relationships to study the correlations, and finally seeking a story. Finally, the author provides a series of suggestions regarding the transparency policy for algorithms.

Reflection:

First, algorithm accountability is not a new topic nowadays. With the penetration of algorithms into our lives, the application of algorithms to all walks of life, not only for entertainment, learning, daily tools, but even for the significant impact on our experiences of security issues, privacy issues, and even the distribution of social resources. People are starting to ask the question, can algorithms be trusted? To what extent are they trustworthy? I have also seen many examples of guessing and analyzing the internal structure of such a black box. I want to share one of them.

The approach of reversing engineering, especially the process that sampling the input-output relationships of the algorithms to study the correlations, remind me of a news report which identifies the bias from the algorithm. That algorithm was designed for individual risk assessment, which is predicting the likelihood of each committing a future crime. It has been increasingly common in courtrooms across the nation, but in 2014, it was accused that the risk scores might be injecting bias into the courts. The way people found the bias in the algorithm is the same as reverse engineering. Here’s there finding in that paper ( I also put the link below):

  • The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
  • White defendants were mislabeled as low risk more often than black defendants.

Based on this outcome, it seems that reverse engineering is essential and efficient. I think this is a better way to examine the algorithm accountability than transparency. As mentioned in this article, leaving aside the trade secrets problem, disclose the source code of algorithms might helpful for specialists but does not able to improve user experience since they may not make meaningful choices considering their lack of expertise. Thus, identify the issue of algorithms instead of focus on the implementation process is more efficient in encouraging the designer to perfect the algorithm.

Questions:

  1. Do you think the algorithm is trustworthy? How much confidence do you have in an algorithm?
  2. What do you think about transparency? How transparent do you think the algorithm should be?
  3. What do you think of reverse engineering? Does it work? Do you have any other examples regarding this approach?

Link:https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Word Count: 544

Read More

04/08/20 – Fanglan Chen – The State of the Art in Integrating Machine Learning into Visual Analytics

Summary

Endert et al.’s paper “The State of the Art in Integrating Machine Learning into Visual Analytics” surveys the recent state-of-the-art models that integrate machine learning to visual analytics and highlights the advances achieved at the intersection of machine learning and visual analytics. In the data-driven era, how to make sense of data and how to facilitate a wider understanding of data attract the interests of researchers in various domains. It is challenging to discover knowledge from data while delivering reliable and interpretable results. Previous studies suggest that machine learning and visual analytics have complementary strengths and weaknesses, there are many works that explore the possibility to combine those two to develop interactive data visualization to promote sensemaking and analytical reasoning. This paper presents a survey of the achievements that have been made by recent state-of-the-art models. Also, it provides a summary of opportunities and challenges to boost the synergy between machine learning and visual analytics as future research directions.

Reflection

Overall, this paper presents a thorough survey of the progress that has been made by highlighting and synthesizing select research advances. The recent advances of deep learning models bring more new challenges and opportunities in the intersection of machine learning and visual analytics. We need to be aware the design of a highly accurate and efficient deep learning model is an iterative and progressive process of training, evaluation, and refinement, which typically relies on a time-consuming trial-and-error procedure where the parameters and the model structures are adjusted based on user expertise. Visualization researchers are making initial attempts to visually illustrate intuitive model behaviors and debug the training processes of widely-used deep learning models such as CNNs and RNNs. However, little effort has been conducted in tightly integrating state-of-the-art deep learning models with interactive visualizations to maximize the value of both. There is full potential in integrating deep learning into visual analytics for a better understanding of current practices.

As we know, the training of deep learning models requires a lot of data. However, sometimes well-labeled data is very expensive to obtain. The injection of a small number of user inputs into the models can potentially solve these problems through a visual analytics system. In real-world applications, a method is impractical if each specific task requires its own separate large-scale collection of training examples. To close the gap between academic research outputs and real-world requirements, it is necessary to reduce the sizes of required training sets by leveraging prior knowledge obtained from previously trained models in similar categories, as well as domain experts. Few-shot learning and zero-shot learning are two of the unsolved problems in the current practice of training deep learning models, which provide a possibility to incorporate prior knowledge on objects into a “prior” probability density function. That is, those models trained using given data and their labels can usually solve only pre-defined problems for which they were originally trained.

Discussion

I think the following questions are worthy of further discussion.

  • What other challenges or opportunities can you think about a framework to incorporate machine learning and visual analytics?
  • How to best leverage the advantages of machine learning and visual analytics in a complementary way? 
  • Do you plan to utilize a framework to incorporate machine learning and visual analytics in your course project? If yes, how do you plan to approach it?
  • Are there any applications that we access in daily life you can think of as good examples that integrate machine learning into visual analytics?

Read More

04/08/20 – Fanglan Chen – CrowdScape: Interactively Visualizing User Behavior and Output

Summary

Rzeszotarski and Kittur’s paper “CrowdScape: Interactively Visualizing User Behavior and Output” explores the research question of how to unify different quality control approaches to enhance the quality control of the work conducted by crowd workers. With the emerging crowdsourcing platforms, many tasks can be accomplished quickly by recruiting crowd workers to collaborate in a parallel manner. However, quality control is among the challenges faced by crowdsourcing paradigm. Previous works focus on designing algorithmic quality control approaches based on either worker outputs or worker behavior, but neither of the approaches is effective for complex or creative tasks. To fill in that research gap, the authors develop CrowdScape, a system that leverages interactive visualization and mixed initiative machine learning to support the human evaluation of complex crowd work. With experimentation on a variety of tasks, the authors present the incorporation of information about worker outputs and worker behavior with worker outputs has the potential to assist users to better understand the crowd and further identify reliable workers and outputs. 

Reflection

This paper conducts an interesting study by exploring the relationship between the outputs and behavior patterns of crowd workers to achieve better quality control in complex and creative tasks. The proposed method provides a unique angle of quality control and it has wide potential usage in crowdsourced tasks. Although there is a strong relationship between the final outputs and the behavior patterns, I feel the design of CrowdScape relies too heavily on the behavior analysis of crowd workers. In many situations, good behavior can lead to good results, but that cannot be guaranteed. From my understanding, behavior analysis is more suitable to be utilized as a screening mechanism in certain tasks. For example, in the video tagging task, workers that have gone through the whole video are more possible to provide accurate tagging. In this case, behavior such as watching is more like a necessary condition, not a sufficient condition. The group of workers who finish watching the videos may still reach disagreement on the tagging output. A different quality control mechanism is still needed in this round. In creative and open tasks, the behavior patterns are even more difficult to capture. By analyzing the behavior by metrics such as time spent on the task, we cannot directly connect the analysis with the measurement of creativity. 

In addition, I think the quality of a crowdsourced task discussed in paper is comparatively narrow. We need to be aware the quality control on crowdsourcing platforms is multifaceted, which depends on the knowledge of the workers on the specific task, the quality of the processes that govern the creation of tasks, the recruiting process of workers, the coordination of subtasks such as reviewing intermediate outputs, aggregating individual contributions, and so forth. A more comprehensive quality control circle needs to take the following aspects into consideration: (1) quality model that clearly defines the dimensions and attributes to control quality in crowdsourcing tasks. (2) assessment metrics which can be utilized to measure the values of the attributes identified by the quality model (3) assurance of quality, which requires a set of actions that aim to achieve expected levels of quality. To prevent low quality, it is important to understand how to design for quality and how to intervene if quality drops below expectations on crowdsourcing platforms.

Discussion

I think the following questions are worthy of further discussion.

  • Can you think about some other tasks that may benefit from the proposed quality control method?
  • Do you think the proposed method can perform good quality control on the complex and creative tasks as the paper suggested? Why or why not?
  • Do you think that an analysis based on worker behavior can assist to determine the quality of the work conducted by crowd workers? Why or why not?
  • What scenarios do you think would be more useful to trace worker behavior, informing them beforehand or tracing without advance notice? Can you think about some potential ethical issues?

Read More

04/07/20 – Sukrit Venkatagiri – CrowdScape: Interactively Visualizing User Behavior and Output

Paper: Jeffrey Rzeszotarski and Aniket Kittur. 2012. CrowdScape: interactively visualizing user behavior and output. In Proceedings of the 25th annual ACM symposium on User interface software and technology (UIST ’12), 55–62. https://doi.org/10.1145/2380116.2380125

Summary:

Crowdsourcing has been used to do intelligent tasks/knowledge work at scale and for a lower price, all online. However, there are many challenges with controlling quality in crowdsourcing. This paper talks about how in prior approaches, quality control was done through algorithms evaluated against gold standard or looking at worker agreement and behavior. Yet, these approaches have many limitations, especially for creative tasks or other tasks that are highly complex in nature. This paper presents a system, called CrowdScape, to support manual or human evaluation of complex crowdsourcing task results through a visualization that is interactive and has a mixed initiative machine learning back-end. The paper describes features of the system as well as its uses through 4 very different case studies. First, a translation task from Japanese to English. The next one was a little unique, asking workers to pick their favorite color. The third was about writing about their favorite place, and finally the last one was tagging a video. Finally, the paper concludes with a discussion of the findings.

Reflection:

Overall, I really liked the paper and the CrowdScape system, and I found the multiple case studies really interesting. I especially liked the fact that the case studies varied in terms of complexity, creativity, and open-endedness. However, I found the color-picker task a little off-beat and wonder why the authors chose that task. 

I also appreciate that the system is built on top of existing work, e.g. Amazon Mechanical Turk (a necessity), as well as Rzeszotarski and Kittur’s Task Fingerprinting system to capture worker behavioral  traces. The scenario describing the more general use case was also very clear and concise. The fact that the system, CrowdScape, also utilizes two diverse data sources—as opposed to just one—is interesting. This makes triangulating the findings more easy, as well as observing and discrepancies in the data. More specifically, the CrowdScape system looks at worker’s behavioral traces as well as their output. This allows one to differentiate between workers in terms of their “laziness/eagerness” as well as the actual quality of the output. The system also provides an aggregation of the two features, and all of these are displayed as visualizations which makes it easy for a requester to view tasks and easily discard/include work.

However, I wonder how useful these visualizations might be for tasks such as surveys, or tasks that are less open-ended. Further, although the visualizations are useful, I wonder if they should be used in conjunction with gold standard datasets or not, and how useful that combination would be. Although the paper demonstrates the potential uses of the system via case studies, it does not demonstrate whether real users say it is useful. Thus, an evaluation by real-world users might help.

Questions:

  1. What do you think about the case study evaluation? Are there ways to improve it? How?
  2. What features of the system would you use as a requester?
  3. What are some drawbacks to the system?

Read More

04/08/20 – Lulwah AlKulaib-Agency

Summary

The paper considers the design of systems that enable rich and adaptive interaction between people and algorithms. The authors attempt to balance the complementary strengths and weaknesses of humans and algorithms while promoting human control and skillful action.They aim to employ AI methods while ensuring that people remain in control. Supporting that people should be unconstrained in pursuing complex goals and exercising domain expertise.They share case studies of interactive systems that they developed in three fields: data wrangling, exploratory analysis, and natural language translation that integrates proactive computational support into interactive systems. They examine the strategy of designing shared representations that augment interactive systems with predictive models of users’ capabilities and potential actions, surfaced via interaction mechanisms that enable user review and revision for each case study. These models enable automated reasoning about tasks in a human centered fashion and can adapt over time by observing and learning from user behavior. To improve outcomes and support learning by both people and machines, they describe the use of shared representations of tasks augmented with predictive models of human capabilities and actions. They conclude with how they could better construct and deploy systems that integrate agency and automation via shared representations. They also mention that they found that neither automated suggestions nor direct manipulation play a strictly dominant role.But that a fluent interleaving of both modalities can enable more productive, yet flexible, work.

Reflection

The paper was very interesting to read. The case studies presented were thought provoking. They’re all papers based on research that I have read and gone through while learning about natural language processing and the thought of them being suggestive makes me wonder about such work. How user-interface toolkits might affect design and development of models.

I also wonder as presented in the future work, how to evaluate systems across varied levels of agency and automation. What would the goal be in that evaluation process? Would it differ across machine learning disciplines?  The case studies presented in the paper had specific evaluation metrics used and I wonder how that generalizes to other models. What other methods could be used for evaluation in the future and how does one compare two systems  when comparing their results is no longer enough?

I believe that this paper sheds some light to how evaluation criteria can be topic specific, and those will be shared across applications that are relevant to human experience in learning. It is important to pay attention to how they promote interpretability, learning, and skill acquisition instead of deskilling workers. Also, it’s essential that we think of appropriate designs that would optimize trade offs between automated support and human engagement.  

Discussion

  • What is your takeaway from this paper?
  • Do you agree that we need better design tools that aid the creation of effective AI-infused interactive systems? Why? Or Why not?
  • What determines a balanced AI – Human interaction?
  • When is AI agency/control harmful? When is it useful?
  • Is insuring humans being in control of AI models important? If models were trained by domain experts and domain expertise, then why do we mistrust them?

Read More

4/8/20 – Lee Lisle – Agency Plus Automation: Designing Artificial Intelligence into Interactive Systems

Summary

Heer’s focus in this paper is on refocusing AI and machine learning into everyday interactions that assist users in their work rather than trying to replace users. He reiterates many times in the introduction that humans should remain in control while the AI assists them in completing the task, and even brings up the recent Boeing automation mishaps as an example of why human-in-the-loop is so essential to future developments. The author then describes several tools in data formatting, data visualization, and natural language translation that use AI to assist the user by suggesting actions based on their interactions with data, as well as domain-specific languages (DSLs) that can quickly perform actions through code. The review of his work shows that users want more control, not less, and that these tools increase productivity while allowing the user to ultimately make all of the decisions.

Personal Reflection

               I enjoyed this paper as an exploration of various ways people can employ semantic interaction in interfaces to boost productivity. Furthermore, the explorations in how users can do this without giving up control was remarkable. I hadn’t realized that the basic idea behind autocorrect/autocomplete could apply in so many different ways in these domains. However, I did notice that the author mentioned that in certain cases there were too many options for what to do next. I wonder how much ethnographic research needs to go into determining each action that’s likely (or even possible) in each case and what overhead the AI puts on the system.

               I also wonder how these interfaces will shape work in the future. Will humans adapt to these interfaces and essentially create new routines and processes in their work? As autocomplete/correct often creates errors, will we have to adapt to new kinds of errors in these interfaces? At what point does this kind of interaction become a hindrance? I know that, despite the number of times I have to correct it, I wouldn’t give up autocomplete in today’s world.

Questions

  1. What are some everyday interactions that you interact with in specialized programs and applications? I.E., beyond just autocorrect. Do you always utilize these features?
  2. The author took three fairly everyday activities and created new user interfaces with accompanying AI with which to create better tools for human-AI collaboration. What other everyday activities can you think of that you could create a similar tool for?
  3. How would you gather data to create these programs with AI suggestions? What would you do to infer possible routes?
  4. The author mentions expanding these interfaces to (human/human) collaboration. What would have to change in order to support this? Would anything?
  5. DSLs seem to be a somewhat complicated addition to these tools. Why would you want to use these and is it worth learning about the DSL?
  6. Is ceding control to AI always a bad idea? What areas do you think users should cede more control or should gain back more control?

Read More