02/26/20 – Vikram Mohanty – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

February 25, 2020 Vikram Mohanty 1 Comment

Authors: Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, Casey Dugan

Summary

This paper discusses how different types of programmatically generated explanations can impact people’s fairness judgments of ML systems. The authors conduct studies with Mechanical Turk workers by showing them profiles from a recidivism dataset and the explanations for a classifier’s decision. Findings from the paper show that certain explanations can enhance people’s confidence in the fairness of the algorithm, and individual differences, including prior positions and judgment criteria of algorithmic fairness, impact how people react to different styles of explanation.

Reflection

For the sake of the study, the participants were shown only one type of explanation. While that worked for the purpose of this study, there is value in seeing the global and local explanations together. For e.g. the input-influence explanations can highlight the features that is more/less likely to re-offend, and allowing the user to dig deeper into the features by showing a local explanation can help in forming more clarity. There is some scope of building interactive platforms with the “overview first, details on demand” philosophy. It is, therefore, interesting to see the paper discuss about the potentials of a human-in-the-loop workflow.

I agree with the paper that a focus on data oriented explanation has the unintended consequence of shifting blame from the algorithms, which can slow down the “healing process” from the biases we interact with when we use these systems. Re-assessing the “how” explanations i.e. how the decisions were made is the right approach. The Effect of Population and “Structural” Biases on Social Media-based Algorithms – A Case Study in Geolocation Inference Across the Urban-Rural Spectrum by Johnson et al. illustrates how bias can be attributed to the design of algorithms themselves rather than population biases in the underlying data sources.

The paper makes an interesting contribution regarding the participants’ prior beliefs and positions and how that impacts the way they perceive these judgments. In my opinion, as a system developer, it seems like a good option to take a position (obviously, being informed and depends on the task) and advocate for normative explanations, rather than appeasing everyone and reinforcing meaningless biases which could have been avoided otherwise.

Questions

Based on Figure 1, what other explanations would you suggest? If you were to pick 2 explanations, which 2 would you pick and why?
If you were to design a human-in-the-loop workflow, what sort of input would you seek from the user? Can you outline some high-level feedback data points for a dummy case?
Would normative explanations frustrate you if your beliefs didn’t align with the explanations (even though the explanations make perfect sense)? Would you adapt to the explanations? (PS Read about the backfire offer here: https://youarenotsosmart.com/2011/06/10/the-backfire-effect/)

02/26/2020 – Sukrit Venkatagiri – Interpreting Interpretability

February 25, 2020 Sukrit Venkatagiri 1 Comment

Paper: Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In CHI 2020, 13.

Summary: There have been a number of tools developed to aid in increasing the interpretability of ML models, which are used in a wide variety of applications today. However, very few of these tools have been studied with a consideration of the context of use and evaluated by actual users. This paper presents a user-centered evaluation of two ML interpretability tools using a combination of interviews, contextual inquiry, and a large-scale survey with data scientists.

From the interviews, they found six key themes: missing values, temporal changes in the data, duplicate data masked as unique, correlated features, adhoc categorization, and difficulty of trying to debug or identify potential improvements. From the contextual inquiry with a glass-box model (GAM) and a post-hoc explanation technique (SHAP), they found a misalignment between data scientists’ understanding of the tools and their intended use. And finally, from the surveys, they found that participants’ mental models differed greatly, and that their interpretations of these interpretability tools also varied on multiple axes. The paper concludes with a discussion on bridging HCI and ML communities and designing more interactive interpretability tools.

Reflection:

Overall, I really liked the paper and it provided a nuanced as well as broad overview of data scientists’ expectations and interpretations of interpretability tools. I especially appreciate the multi-stage, mixed-methods approach that is used in the paper. In addition, I commend the authors for providing access to their semi-structured interview guide, as well as other study materials, and that they had pre-registered their study. I believe other researchers should strive to be this transparent in their research as well.

More specifically, it is interesting that the paper first leveraged a small pilot study to inform the design of a more in-depth “contextual inquiry” and a large-scale study. However, I do not believe the methods that are used for the “contextual inquiry” to be a true contextual inquiry, rather, it is more like a user study involving semi-structured interview. This is especially true since many of the participants were not familiar with the interpretability tools used in the study, which means that it was not their actual context of use/work.

I am also unsure how realistic the survey is, in terms of mimicking what someone would actually do, and appreciate that the authors acknowledge the same in the limitations section. A minor concern is also the 7-point scale that is used in the survey that ranges from “not at all” to “extremely,” which does not follow standard survey science practices.

I wonder what would happen if the participants were a) nudged to not take the visualizations at face value or to employ “system 2”-type thinking, and/or b) asked to use the tool for a longer. Indeed, they do notice some emergent behavior in the findings, such as a participant questioning whether the tool was actually an interpretability tool. I also wonder what would have happened if two people had used the tool side-by-side, as a “pair programming” exercise.

It’s also interesting how varied participants’ backgrounds, skills, baseline expectations, and interpretations were. Certainly, this problem has been studied elsewhere, and I wonder whether the findings in this paper are a result of not only the failure of these tools to be designed in a user-centered manner, but also the broad range in technical skills of the users themselves. What would it mean to develop a tool for users with such a range in skillsets, especially statistical and mathematical skills? This certainly calls for increased certification—at the behest of increased demand for data scientists—within the ML software industry.

I appreciate the point surrounding Kahneman’s system 1 and system 2 work in the discussion, but I believe this section is possibly too short. I acknowledge that there are page restrictions, which meant that the results could not have been discussed in as much depth as is warranted for such a formative study.

Overall, this was a very valuable study that was conducted in a methodical manner and I believe the findings to be interesting to present and future developers of ML interpretability tools, as well as the HCI community that is increasingly interested in improving the process of designing such tools.

Questions:

Is interpretability only something to be checked off a list, and not inspected at depth?
How do you inspect the interpretability of your models, if at all? When do you know you’ve succeeded?
Why is there a disconnect between the way these tools are intended to be used and how they are actually used? How can this be fixed?
Do you think there needs to be greater requirements in terms of certification/baseline understanding and skills for ML engineers?

02/26/2020 – Dylan Finch – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 Dylan Finch 1 Comment

Word count: 556

Summary of the Reading

This paper examines the role of expectations and the role of focusing on certain types of errors to see how this impacts perceptions of AI. The aim of the paper is to figure out how the setting of expectations can help users better see the benefits of an AI system. Users will feel worse about a system that they think can do a lot and then fails to live up to those expectations rather than a system that they think can do less and then succeeds at accomplishing those smaller goals.

Specifically, this paper lays some ways to better set user expectations: an Accuracy Indicator which allows users to better expect what the accuracy of a system should be, an explanation method based on examples to help increase user understanding, and the ability for users to adjust the performance of the system. They also show the usefulness of these 3 techniques and that systems tuned to avoid false positives are generally worse than those tuned to avoid false negatives.

Reflections and Connections

This paper highlights a key problem with AI systems: people expect them to be almost perfect and companies market them as such. Many companies that have deployed AI systems have not done a good job managing expectations for their own AI systems. For example, Apple markets Siri as an assistant that can do almost anything on your iPhone. Then, once you buy one, you find out that it can really only do a few very specialized tasks that you will rarely use. You are unhappy because the company sold you a much more capable product. With so many companies doing this, it is understandable that many people have very high expectations for AI. Many companies seem to market AI as the magic bullet that can solve any problem. But, the reality is often much more underwhelming. I think that companies that develop AI systems need to play a bigger role in managing expectations. They should not sell their products as a system that can do anything. They should be honest and say that their product can do some things but not others and that it will make a lot of mistakes, that is just how these things work.

I think that the most useful tool this team developed was the slider that allows users to choose between more false positives and more false negatives. I think that this system does a great job of incorporating many of the things they were trying to accomplish into one slick feature. The slider shows people that the AI will make mistakes, so it better sets user expectations. But, it also gives users more control over the system which makes them feel better about it and allows them to tailor the system to their needs. I would love to see more AI systems give users this option. It would make them more functional and understandable.

Questions

Will AI ever become so accurate that these systems are no longer needed? How long will that take?
Which of the 3 developed features do you think is most influential/most helpful?
What are some other ways that AI developers could temper the expectations of users?

02/26/2020 – Bipasha Banerjee – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

February 25, 2020 bipashab 1 Comment

Summary

The paper highlights one of the major problems that the current digital world faces, algorithmic bias, and fairness in AI. They point out that often ML models are trained on data that in itself is bias and, therefore, may result in amplification of the existing bias. This often results in people not trusting AI models. This is a good step towards explainable AI and making models more transparent to the user. The authors used a dataset that is used for predicting the risk of re-offending, and the dataset is known to have a racial bias. Global and local explanations were taken into account across four types of explanations styles, namely, demographic-based, sensitivity based, influence based, and case-based. For measuring fairness, they considered racial discrimination and tried to measure case-specific impact. Cognition and an individuals’ prior perception of fairness of algorithms were considered as measures of individual difference factors. Both qualitative and quantitative methods were taken into account during the evaluation. They concluded that a general solution is not possible but depends on the user profile and fairness issues.

Reflection

The paper by Dodge et al. is a commendable effort towards making algorithms and their processing more clear to humans. They take into account not only the algorithmic fairness but also the humans’ perception of the algorithm, various fairness problems, and individual differences in their experiments. The paper was an interesting read, but a better display of results would make it easier for the readers to comprehend.

In the model fairness section, they are considering fairness in terms of racial discrimination. Later in the paper, they do mention that the re-offending prediction classifier has features such as age included. Additionally, features like gender might play an important role too. It would be interesting to see how age and other features as a fairness issue perform on maybe other datasets where such biases are dominant.

The authors mentioned that a general solution is not possible to be developed. However, is it possible for the solution to be domain-specific? For example, if we change the dataset to include other features for fairness, we should be able to plug in the new data without having to change the model.

The study was done using crowd workers and not domain experts who are well knowledgeable with the jargon and are used to being unbiased. Humans are prone to be biased with/without intentions. However, people who are in the legal paradigm like judges, attorneys, paralegals, court reporters, law enforcement officers are more likely to be impartial because either they are under oath or years’ of practice and training in an unbiased setup. So, including them in the evaluation and utilizing them as expert crowd workers might yield better results.

Questions

A general solution for a domain rather than one size fits all?
Only racial discrimination is considered as a fairness issue. Are other factors only used as a feature to the classifier? How would the model perform on a varied dataset with other features like gender as a fairness issue?
The authors have used the dataset for the judicial system, and they mentioned their goal was not to study the users. I am curious to know how they anonymize the data, and how was the privacy and security of individuals handled here?

02/26/20 – Fanglan Chen – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

February 25, 2020 Fanglan Chen 1 Comment

Summary

Dodge et al.’s paper “Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment” presents an empirical study on how people make fairness judgments of machine learning systems and how different styles of explanation impact their judgments. Fairness issues of ML systems attract research interests during recent years. Mitigating the unfairness in ML systems is challenging, which requires the good cooperation of developers, users, and the general public. The researchers state that how explanations are constructed have an impact on users’ confidence in the systems. To further examine the potential impacts on people’s fairness judgments of ML systems, they conduct empirical experiments involving crowdsourcing workers on four types of programmatically generated explanations (influence, demographic-based, sensitivity, and case-based). Their key findings include: 1) some explanations are considered more fair, while others have negative impact on users’ trust of the algorithm in regards of fairness; 2) varied fairness issues (model-wide fairness and case-specific fairness) can be detected more effectively through an examination of different explanation styles; 3) individual differences (prior positions and judgment criteria of algorithmic fairness) lead to how users react to different styles of explanation.

Reflection

This paper shines light on a very important fact that bias in ML systems can be detected and mitigated. There is a growing attention to the fairness issues in AI-powered technologies in the machine learning research community. Since ML algorithms are widely used to speed up the decision making process in a variety of domains, beyond achieving good performance, they are expected to produce neutral results. There is no denying the fact that algorithms rely on data, “garbage in, garbage out.” Hence, it is incumbent to feed the unbiased data to these systems upon developers in the first place. In many real-world cases, race is actually not used as an input, however, it correlates to other factors that make predictions biased. That case is not as easy as the cases presented in the paper to detect but still requires effort to be corrected. A question here would be in order to counteract this implicit bias, should race be considered and used to calibrate the relative importance of other factors?

Besides the bias introduced by data input, there are other factors that need to be taken into consideration to deal with the fairness issues in ML systems. Firstly, machine bias can never be neglected. The term bias in the context of the high-stakes tasks (e.g. future criminal prediction) is very important because a false positive decision could have a destructive impact on a person’s life. This is why when an AI system deals with the human subject (in this case human life), the system must be highly precise and accurate and ideally provide reasonable explanation. Making a person’s life harder to live in a society or impacting badly a person’s life due to a flawed computer model is never acceptable. Secondly, the proprietary model is another concern. One thing should be kept in mind that many high-stacks tasks such as future criminal prediction is a matter of public matter and should be transparent and fair. That does not mean that the ML systems used for those tasks need to be completely public and open. However, I believe there should be a regulatory board of experts who can verify and validate the ML systems. More specifically, the experts can verify and validate the risk factors used in a system so that the factors could be widely accepted. They can also verify and validate the algorithmic techniques used in a system so that the system incorporates less bias.

Discussion

I think the following questions are worthy of further discussion.

Besides model unfairness and case-specific disparate impact, are there any other fairness issues?
What are the benefits and drawbacks of global and local explanations in supporting fairness judgment of AI systems?
Are there any other style or element of explanations that may impact fairness judgement you can think about?
If an AI system is not any better than untrained users at predicting recidivism in a fair and accurate way, why do we need the system?

02/26/2020 – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment – Yuhang Liu

February 25, 2020 yuhang Liu 1 Comment

This paper mainly explores the injustice of the results of machine learning. These injustices are usually reflected in gender and race, so in order to make the results of machine learning better serve people, the author of the paper conducted an empirical study with four types of programmatically generated explanations to understand how they impact people’s fairness judgments of ML systems. In the experiment, these four interpretations have different characteristics, and after the experiment, the author has the following findings:

Some interpretations are inherently considered unfair, while others can increase people’s confidence in the fairness of the algorithm;
Different interpretations can more effectively expose different fairness issues, such as the model-wide fairness issue and the fairness difference of specific cases.
There are differences between people, different people have different positions, and the perspective of understanding things will affect people’s response to different interpretation styles.

In the end, the authors obtained that in order to make the results of machine learning generally fair, in different situations, different corrections are needed and differences between people must be taken into account.

Reflection：

In another class this semester, the teacher gave three reading materials on the results of machine learning and increased discrimination. In the discussion of those three articles, I remember that most students thought that the reason for discrimination should not be Is the inaccuracy of the algorithm or model, and I even think that machine learning is to objectively analyze things and display the results, and the main reason that people feel uncomfortable and even feel immoral in the face of the results is that people are not willing to face these results. It is often difficult for people to have a clear understanding of the whole picture of things, and when these unnoticed places are moved to the table, people will be shocked or even condemn others, but it is difficult to really think about the cause of things. But after reading this paper, I think my previous understanding was narrow: First, the results of the algorithm and the interpretation of the results must be wrong and discriminatory in some cases. So only if we resolve this discrimination can the results of machine learning be able to better serve people. At the same time, I also agree with the ideas and conclusions in the article. Different interpretation methods and different emphasis will indeed affect the fairness of interpretation. All the prerequisites to eliminate injustices are to understand the causes of these injustices. At the same time, I think the main solution to eliminate injustice is still on the researcher. Reason why I think computer is fascinating is it can always keep things rational and objective to deal with problems. People’s response to different results and the influence of different people on different model predictions are the key to eliminating this injustice. Of course, I think people will think that part of the cause of injustice is also the injustice of our own society. When people think that the results of machine learning carry discrimination based on race, sex, religion, etc., should we think about this discrimination itself, should we pay more attention to gender equality, ethnic equality and how to make the results look better.

Question:

Do you think that this unfairness is more because the results of machine learning mislead people or it is existed in people’s society for a long time.
The article proposes that in order to get more fair results, more people need to be considered, what changes should users make.
How to combine the points of different machine learning explanations to create a fairer explanation.

2/26/20 – Jooyoung Whang – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 Jooyoung Whang 1 Comment

This paper seeks to study what an AI system could do to get more approved by users even if it is not perfect. The paper focuses on the concept of “Expectation” and the discrepancy between an AI’s ability and a user’s expectation for the system. To explore this problem, the authors implemented an AI-powered scheduling assistant that mimics the look of MS Outlook. The agent detects in an E-mail if there exists an appointment request and asks the user if he or she wants to add a schedule to the calendar. The system was intentionally made to perform worse than the originally trained model to explore mitigation techniques to boost user satisfaction given an imperfect system. After trying out various methods, the authors conclude: Users prefer AI systems focusing on high precision and users like systems that gives direct information about the system, shows explanations, and supports certain measure of control.

This paper was a fresh approach that appropriately addresses the limitations that AI systems would likely have. While many researchers have looked into methods of maximizing the system accuracy, the authors of this paper studied ways to improve user satisfaction even without a high performing AI model.

I did get the feeling that the designs for adjusting end-user expectations were a bit too static. Aside from the controllable slider, the other two designs were basically texts and images with either an indication of the accuracy or a step-by-step guide on how the system works. I wonder if having a more dynamic version where the system reports for a specific instance would be more useful. For example, for every new E-mail, the system could additionally report to the user how confident it is or why it thought that the E-mail included a meeting request.

This research reminded me of one of the UX design techniques: think-aloud testing. In all of their designs, the authors’ common approach was to close the gap between user expectation and system performance. Think-aloud testing is also used to close that gap by analyzing how a user would interact with a system and adjusting from the results. I think this research approached it in the opposite way. Instead of adjusting the system, the authors’ designs try to adjust the user’s mental model.

The followings are the questions that I had while reading the paper:

1. As I’ve written in my reflection portion, do you think the system will be approved more if it reported some information about the system for each instance (E-mail)? Do you think the system may appear to be making excuses for when it is wrong? In what way would this dynamic version be more helpful than the static design from the paper?

2. In the generalizability section, the authors state that they think some parts of their study are scalable to other kinds of AI systems. What other types of AI could benefit from this study? Which one would benefit the most?

3. Many AI applications today are deployed after satisfying a certain accuracy threshold which is pretty high. This can lead to more funds and time needed for development. Do you think this research will allow the stakeholders to lower the threshold? In the end, the stakeholders just want to achieve high user satisfaction.

02/26/2020 – Bipasha Banerjee – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 bipashab 1 Comment

Summary

The paper talks about user-expectation when it comes to end-user applications. It is essential to make sure that the user-expectations are set to an optimal level so that the user does not find the end product underwhelming. Most of the related work done in this area highlights the fact that user disappointment occurs if the initial expectation is set to “too high”. Initial expectations can originate from advertisements, product reviews, brands, word of mouth, etc. They tested their hypothesis on an AI-powered scheduling assistant. They created an interface similar to the Microsoft Outlook email system. The main purpose of the interface was to detect if an email was sent with the intention of scheduling a meeting. If so, the AI would automatically highlight the meeting request sentence and then allow the user to schedule the meeting. The authors designed three techniques, namely, accuracy indicator, example-based explanations, and control slider, to design for adjusting end-user expectations. Most of their hypotheses were proved to be true. Yet, it was found that an AI system based on high recall had better user acceptance than high precision.

Reflection

The paper was an interesting read on adjusting the end-user expectation. The AI scheduling assistant was used as a UI-tool to evaluate the users’ reactions and expectations of the system. The authors conducted various experiments based on three design techniques. I was intrigued to find out that the high precision version did not result in a high perception of accuracy. An ML background practitioner always looks at precision (false positive). From this, we can infer that the task at hand should be the judge of what metric we should focus on. It is certainly true that here, displaying a wrongly highlighted sentence would annoy the user less than completely missing out on the meeting details in an email. Hence, I would say this kind of high recall priority should be kept in mind and adjusted according to the end goal of the system.

It would also be interesting to see how such expectation oriented experiments performed in the case of other complex tasks. This AI scheduling task was straight-forward, where there can be only one correct answer. It is necessary to see how the expectation based approach fairs when the task is subjective. By subjective, I mean, the success of the task would vary from user to user. For example, in the case of text summarization, the judgment of the quality of the end product would be highly dependent on the user reading it.

Another critical thing to note is the expectation can also stem from a user’s personal skill level and subsequent expectation from a system. As a crowd-worker, having a wrongly highlighted line might not affect as much when the number of emails and tasks are less. How likely is this to annoy busy professionals who might have to deal with a lot of emails and messages with meeting requests. Having multiple incorrect highlights a day is undoubtedly bound to disappoint the user.

Questions

How does this extend to other complex user-interactive systems?
Certain tasks are relative, like text summarization. How would the system evaluate success and gauge expectations in such cases where the task at hand is extremely subjective?
How would the expectation vary with the skill level of the user?

02/26/2020 – Subil Abraham – Explaining models

February 25, 2020 Subil Abraham 1 Comment

A big concern with the usage of current ML systems is the issue of fairness and bias when making their decisions. Bias can creep into ML decisions through either the design of the algorithm or through training datasets that are labeled in a way to bias against certain kinds of things. The example used in this paper is the bias against African Americans in an ML system used by judges to predict the probability of a person re-offending after committing a crime. Fairness is hard to judge when ML systems are black boxes so this paper proposes that if ML systems expose reasons behind the decisions (i.e. the idea of explainable AI), a better judgement of the fairness of the decision can be made by the user. To this end, this paper examines the effect of four of different kinds of explanations of the ML decisions on people’s judgements of the fairness of that decision.

I believe this is a very timely and necessary paper in these times, with ML systems being used more and more for sensitive and life changing decisions. It is probably impossible to stop people from adopting these systems so the next best thing is making explainability of the ML decisions mandatory, so people can see and judge if there was potentially bias in the ML system’s decisions. It is interesting that people were mostly able to perceive that there were fairness issues in the raw data. You would think that that would be hard but the generated explanations may have worked well enough to help with that (though I do wish they could’ve shown an example comparing a raw data point and a processed data point that showed how their pre-processing cleaned things). I did wonder why they didn’t show confidence levels to the users in the evaluation, but their explanation that it was something they could not control for makes sense. People could have different reactions to confidence levels, some thinking that anything less than a 100% is insufficient, while others thinking that 51% is good enough. So keeping it out is a limiting but is logical.

What other kinds of generated explanations could be beneficial, outside of the ones used in the paper?
Checking for racial bias is an important case for fair AI. In what other areas is fairness and bias correction in AI critical?
What would be ways that you could mitigate any inherent racial bias of the users who are using explainable AI, when they are making their decisions?

02/26/2020 – Sushmethaa Muhundan – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 Sushmethaa Muhundan 1 Comment

The perception and acceptance of AI systems are impacted by the expectations that the users have on that system as well as their prior experiences working with AI systems. Hence, expectation setting before interacting with the system is extremely pivotal to avoid any inflated expectations which in turn could lead to disappointment if they are not met. A Scheduling Assistant system has been used as an example in this paper and expectation adjustment techniques are discussed. The paper focusses on exploring methods to shape the user’s expectation before they use the system and study the impacts on the acceptance of the system by the user. Apart from this, the impact of different AI imperfections is also studied, specifically the cases of false positives vs false negatives. Accuracy indicator, example-based explanation and performance control are the three techniques proposed and evaluated. Via the studies conducted, it is concluded that a better expectation setting done before using a system decreases the chances of disappointment by highlighting the flaws of the system beforehand.

The study conducted assumes that the users are new to the environment and dedicates time explaining the interface at the initial stage of the experiment. I felt that this was helpful since the people involved in the survey can now follow along. I found this missing in some of the earlier papers read where it was assumed that all the readers had sufficient prior knowledge to follow along. Also, despite the fact that the initial performance of the system was ninety-three percent on the test dataset, in order to gauge the sentiments of the users and evaluate their expectation setting hypothesis, the authors decided to set the accuracy to fifty percent. I felt that this greatly improved the scope for disappointment, thereby helping them efficiently validate their expectation setting system and its effects. I felt that the decision to use visualizations as well as a short summary of the intent in their explanation was helpful since this eradicated the need for the users to read lengthy summaries and would offer better support for user decision. It was also good to note the authors take on deception and marketing as a means to set false expectations. This study went beyond such techniques and focused on shaping the expectations of the people via explaining the accuracy of the system. I felt that this perspective was more ethical compared to the other means adopted in this area.

Apart from the expectations that users have, what other factors influence the perception and acceptance of AI systems by the users?
What are some other techniques, visual or otherwise that can be adopted to set expectations of AI systems?
How can the AI system developers tackle trust issues and acceptance issues? Given that perceptions and individual experiences are extremely diverse, is it possible for an AI system to be capable of completely satisfying all its users?