02/26/20 – Fanglan Chen – Will You Accept an Imperfect AI? Exploring Designs For Adjusting End-user Expectations of AI Systems

February 25, 2020 Fanglan Chen 4 Comments

Summary

Kocielnik et al.’s paper “Will You Accept an Imperfect AI?” explores approaches for shaping expectations of end-users before their initial working with an AI system and studies how appropriate expectations impact users’ acceptance of the system. Prior study has presented that end-user expectations of AI-powered technologies are influenced by various factors, such as external information, knowledge and understanding, and first hand experience. The researchers indicate that expectations vary among users and users perception/acceptance of AI systems may be negatively impacted when their expectations are set too high. To fill in the gap of understanding how end-user expectations can be directly and explicitly impacted, the researchers use a Scheduling Assistant – an AI system for automatic meeting schedule detection in email – to study the impact of several methods of expectation shaping. Specifically, they explore two system versions with the same accuracy level of the classifier but each is intended to focus on mitigating different types of errors(False Positives and False Negatives). Based on their study, error types highly relate to users’ subjective perceptions of accuracy and acceptance. Expectation adjustment techniques are proposed to make users fully aware of AI imperfections and enhance their acceptance of AI systems.

Reflection

We need to be aware that AI-based technologies cannot be perfect, just like nobody is perfect. Hence, there is no point setting a goal that involves AI systems making no mistake. Realistically defining what success and failure look like associated with working with AI-powered technologies is of great importance in adopting AI to improve the imperfection of nowadays solutions. That calls for an accurate positioning of where AI sits in the bigger picture. I feel the paper mainly focuses on how to set appropriate expectations but lacks a discussion on different scenarios associated with the users expectations to AI. For example, users expectation greatly vary to the same AI system in different decision making frameworks: in human-centric decision making process, the expectation of AI component is comparatively low as AI’s role is more like a counselor who is allowed to make some mistakes; in machine-centric system, all the decisions are made by algorithms which render users’ low tolerance of errors, simply put, some AIs will require more attention than others, because the impact of errors or cost of failures will be higher. Expectations of AI systems vary not only among different users but also under various usage scenarios.

To generate positive user experiences, AI needs to exceed expectations. One simple way to achieve this is to not over-promise the performance of AI in the beginning. That relates with the intention of the researchers on designing the Accuracy Indicator component in the Scheduling Assistant. In the study case, they set the accuracy to 50%. This accuracy is actually very low in AI-based applications. I’m interested in whether the evaluation results would change with AI systems of higher performance (e.g. 70% or 90% in accuracy). I think it is worthwhile to conduct a survey about users’ general expectations of AI-based systems.

Interpretability of AI is another key component that shapes user experiences. If people cannot understand how AI works or how it comes up with its solutions, and in turn do not trust it, they would probably not choose to use it. As people accumulate more positive experiences, they build trust with AI. In this way, easy-to-interpret models seem to be more promising to deliver success compared with complex black-box models.

To sum up, by being fully aware of AI’s potential but also its limitations, and developing strategies to set appropriate expectations, users can create positive AI experiences and build trust in an algorithmic approach in decision making processes.

Discussion

I think the following questions are worthy of further discussion.

What is your expectation of AI systems in general?
How would users expectations of the same AI system vary in different usage scenarios?
What are the negative impacts brought by the inflated expectations? Please give some examples.
How can we determine which type of errors is more severe in an AI system?

02/26/20 – Sukrit Venkatagiri – Will You Accept an Imperfect AI?

February 25, 2020 Sukrit Venkatagiri 2 Comments

Paper: Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 1–14.

Summary:

This paper explores people’s perceptions and expectations of an intelligent scheduling assistant. The paper specifically considers three broad research questions: the impact of AI’s focus on error avoidance versus user perception, ways to set appropriate expectations, and impact of expectation setting on user satisfaction and acceptance. The paper explores this through an experimental setup, whose design process is explored in detail.

The authors find that expectation adjustment designs significantly affected the desired aspects of expectations, similar to what was hypothesized. They also find that high recall resulted in significantly higher perceptions of accuracy and acceptance compared to high precision, and that expectation adjustment worked by intelligible explanations and tweaking model evaluation metrics to emphasize one over the other. The paper concludes with a discussion of the findings.

Reflection:

This paper presents some interesting findings using a relatively simple, yet powerful “technology probe.” I appreciate the thorough exploration of the design space, taking into consideration design principles and how they were modified to meet the required goals. I also appreciate the varied and nuanced research questions. However, I feel like the setup may have been too simple to explore in more depth. Certainly, this is valuable as a formative study, but more work needs to be done.

It was interesting that people valued high recall over high precision. I wonder if the results would differ among people with varied expertise, from different countries, and from different socioeconomic backgrounds. I also wonder how this might differ based on the application scenario, e.g. AI scheduling assistant versus a movie recommendation system. In the latter, a user would not be aware of what movies they were not recommended but that they would actually like, while with an email scheduling assistant, it is easy to see false negatives.

I wonder how these techniques, such as expectation setting, might apply not only to users’ expectations of AI systems, but also to exploring the interpretability or explainability of more complex ML models.

At what point do explanations tend to result in the opposite effect? I.e. reduced user acceptance and preference? It may be interesting to experimentally study how different levels of explanations and expectation settings affect user perceptions versus a binary value. I also wonder how it might change with people of different backgrounds.

In addition, this experiment was relatively short in duration. I wonder how the findings would change over time. Perhaps users would form inaccurate expectations, or their mental models might be better steered through expectation-setting. More work is needed in this regard.

Questions:

Will you accept an imperfect AI?
How do you determine how much explanation is enough? How would this work for more complex models?
What other evaluation metrics can be used?
When is high precision valued over high recall, and vice versa?

02/26/2020 – Dylan Finch – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 Dylan Finch 1 Comment

Word count: 556

Summary of the Reading

This paper examines the role of expectations and the role of focusing on certain types of errors to see how this impacts perceptions of AI. The aim of the paper is to figure out how the setting of expectations can help users better see the benefits of an AI system. Users will feel worse about a system that they think can do a lot and then fails to live up to those expectations rather than a system that they think can do less and then succeeds at accomplishing those smaller goals.

Specifically, this paper lays some ways to better set user expectations: an Accuracy Indicator which allows users to better expect what the accuracy of a system should be, an explanation method based on examples to help increase user understanding, and the ability for users to adjust the performance of the system. They also show the usefulness of these 3 techniques and that systems tuned to avoid false positives are generally worse than those tuned to avoid false negatives.

Reflections and Connections

This paper highlights a key problem with AI systems: people expect them to be almost perfect and companies market them as such. Many companies that have deployed AI systems have not done a good job managing expectations for their own AI systems. For example, Apple markets Siri as an assistant that can do almost anything on your iPhone. Then, once you buy one, you find out that it can really only do a few very specialized tasks that you will rarely use. You are unhappy because the company sold you a much more capable product. With so many companies doing this, it is understandable that many people have very high expectations for AI. Many companies seem to market AI as the magic bullet that can solve any problem. But, the reality is often much more underwhelming. I think that companies that develop AI systems need to play a bigger role in managing expectations. They should not sell their products as a system that can do anything. They should be honest and say that their product can do some things but not others and that it will make a lot of mistakes, that is just how these things work.

I think that the most useful tool this team developed was the slider that allows users to choose between more false positives and more false negatives. I think that this system does a great job of incorporating many of the things they were trying to accomplish into one slick feature. The slider shows people that the AI will make mistakes, so it better sets user expectations. But, it also gives users more control over the system which makes them feel better about it and allows them to tailor the system to their needs. I would love to see more AI systems give users this option. It would make them more functional and understandable.

Questions

Will AI ever become so accurate that these systems are no longer needed? How long will that take?
Which of the 3 developed features do you think is most influential/most helpful?
What are some other ways that AI developers could temper the expectations of users?

02/26/2020 – Vikram Mohanty – Will you accept an imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 Vikram Mohanty 2 Comments

Authors: Rafal Kocielnik, Saleema Amershi, Paul Bennett

Summary

This paper discusses the impact of end-user expectations on the subjective perception of AI-based systems. The authors conduct studies to better understand how different types of errors (i.e. False Positives and False Negatives) are perceived differently by users, even though accuracy remains the same. The paper uses the context of an AI-based scheduling assistant (in an email client) to demonstrate 3 different design interventions for helping end-users adjust their expectations of the system. The studies in this paper showed that these 3 techniques were effective in preserving user satisfaction and acceptance of an imperfect AI-based system.

Reflection

This paper is basically an evaluation of the first 2 guidelines from the “Guidelines of Human-AI Interaction” paper i.e. making clear what the system can do, and how well it can do what it does.

Even though the task in the study was artificial (i.e. using workers from an internal crowdsourcing platform instead of real users of a system and subjecting to an artificial task instead of a real one), the study design, the research questions and the inference from the data initiates the conversation on giving special attention to the user experience in AI-infused systems. Because the tasks were artificial, we could not assess scenarios where users actually have a dog in the fight e.g. they miss an important event by over-relying on the AI assistant and start to depend less on the AI suggestions.

The task here was scheduling events from emails, which is somewhat simple in the sense that users can almost immediately assess how good or bad the system is at. Furthermore, the authors manipulated the dataset for preparing the High Precision and High Recall versions of the system. For conducting this study in a real-world scenario, this would require a better understanding of user mental models with respect to AI imperfections. It becomes slightly trickier when these AI imperfections can not be accurately assessed in a real-world context e.g. search engines may retrieve pages containing the keywords, but may not account context into the results, and thus may not always give users what they want.

The paper makes an excellent case of digging deeper into error recovery costs and correlating that with why participants in this study preferred a system with high false positive rates. This is critical for system designers to keep in mind while dealing with uncertain agents like an AI core. This gets further escalated when it’s a high-stakes scenario.

Questions

The paper starts off with the hypothesis that avoiding false positives is considered better for user experience, and therefore systems are optimized for high precision. The findings however contradicted it. Can you think about scenarios where you’d prefer a system with a higher likelihood of false positives? Can you think about scenarios where you’d prefer a system with a higher likelihood of false negatives?
Did you think the design interventions were exhaustive? How would you have added on to the ones suggested in the paper? If you were to adopt something for your own research, what would it be?
The paper discusses factoring in other aspects, such as workload, both mental and physical, and the criticality of consequences. How would you leverage these aspects in design interventions?
If you used an AI-infused system every day (to the extent it’s subconsciously a part of your life)
1. Would you be able to assess the AI imperfections purely on the basis of usage? How long would it take for you to assess the nature of the AI?
2. Would you be aware if the AI model suddenly changed underneath? How long would it take for you to notice the changes? Would your behavior (within the context of the system) be affected in the long term?

2/26/20 – Jooyoung Whang – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 Jooyoung Whang 1 Comment

This paper seeks to study what an AI system could do to get more approved by users even if it is not perfect. The paper focuses on the concept of “Expectation” and the discrepancy between an AI’s ability and a user’s expectation for the system. To explore this problem, the authors implemented an AI-powered scheduling assistant that mimics the look of MS Outlook. The agent detects in an E-mail if there exists an appointment request and asks the user if he or she wants to add a schedule to the calendar. The system was intentionally made to perform worse than the originally trained model to explore mitigation techniques to boost user satisfaction given an imperfect system. After trying out various methods, the authors conclude: Users prefer AI systems focusing on high precision and users like systems that gives direct information about the system, shows explanations, and supports certain measure of control.

This paper was a fresh approach that appropriately addresses the limitations that AI systems would likely have. While many researchers have looked into methods of maximizing the system accuracy, the authors of this paper studied ways to improve user satisfaction even without a high performing AI model.

I did get the feeling that the designs for adjusting end-user expectations were a bit too static. Aside from the controllable slider, the other two designs were basically texts and images with either an indication of the accuracy or a step-by-step guide on how the system works. I wonder if having a more dynamic version where the system reports for a specific instance would be more useful. For example, for every new E-mail, the system could additionally report to the user how confident it is or why it thought that the E-mail included a meeting request.

This research reminded me of one of the UX design techniques: think-aloud testing. In all of their designs, the authors’ common approach was to close the gap between user expectation and system performance. Think-aloud testing is also used to close that gap by analyzing how a user would interact with a system and adjusting from the results. I think this research approached it in the opposite way. Instead of adjusting the system, the authors’ designs try to adjust the user’s mental model.

The followings are the questions that I had while reading the paper:

1. As I’ve written in my reflection portion, do you think the system will be approved more if it reported some information about the system for each instance (E-mail)? Do you think the system may appear to be making excuses for when it is wrong? In what way would this dynamic version be more helpful than the static design from the paper?

2. In the generalizability section, the authors state that they think some parts of their study are scalable to other kinds of AI systems. What other types of AI could benefit from this study? Which one would benefit the most?

3. Many AI applications today are deployed after satisfying a certain accuracy threshold which is pretty high. This can lead to more funds and time needed for development. Do you think this research will allow the stakeholders to lower the threshold? In the end, the stakeholders just want to achieve high user satisfaction.

02/26/2020 – Bipasha Banerjee – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 bipashab 1 Comment

Summary

The paper talks about user-expectation when it comes to end-user applications. It is essential to make sure that the user-expectations are set to an optimal level so that the user does not find the end product underwhelming. Most of the related work done in this area highlights the fact that user disappointment occurs if the initial expectation is set to “too high”. Initial expectations can originate from advertisements, product reviews, brands, word of mouth, etc. They tested their hypothesis on an AI-powered scheduling assistant. They created an interface similar to the Microsoft Outlook email system. The main purpose of the interface was to detect if an email was sent with the intention of scheduling a meeting. If so, the AI would automatically highlight the meeting request sentence and then allow the user to schedule the meeting. The authors designed three techniques, namely, accuracy indicator, example-based explanations, and control slider, to design for adjusting end-user expectations. Most of their hypotheses were proved to be true. Yet, it was found that an AI system based on high recall had better user acceptance than high precision.

Reflection

The paper was an interesting read on adjusting the end-user expectation. The AI scheduling assistant was used as a UI-tool to evaluate the users’ reactions and expectations of the system. The authors conducted various experiments based on three design techniques. I was intrigued to find out that the high precision version did not result in a high perception of accuracy. An ML background practitioner always looks at precision (false positive). From this, we can infer that the task at hand should be the judge of what metric we should focus on. It is certainly true that here, displaying a wrongly highlighted sentence would annoy the user less than completely missing out on the meeting details in an email. Hence, I would say this kind of high recall priority should be kept in mind and adjusted according to the end goal of the system.

It would also be interesting to see how such expectation oriented experiments performed in the case of other complex tasks. This AI scheduling task was straight-forward, where there can be only one correct answer. It is necessary to see how the expectation based approach fairs when the task is subjective. By subjective, I mean, the success of the task would vary from user to user. For example, in the case of text summarization, the judgment of the quality of the end product would be highly dependent on the user reading it.

Another critical thing to note is the expectation can also stem from a user’s personal skill level and subsequent expectation from a system. As a crowd-worker, having a wrongly highlighted line might not affect as much when the number of emails and tasks are less. How likely is this to annoy busy professionals who might have to deal with a lot of emails and messages with meeting requests. Having multiple incorrect highlights a day is undoubtedly bound to disappoint the user.

Questions

How does this extend to other complex user-interactive systems?
Certain tasks are relative, like text summarization. How would the system evaluate success and gauge expectations in such cases where the task at hand is extremely subjective?
How would the expectation vary with the skill level of the user?

02/26/2020 – Sushmethaa Muhundan – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020 Sushmethaa Muhundan 1 Comment

The perception and acceptance of AI systems are impacted by the expectations that the users have on that system as well as their prior experiences working with AI systems. Hence, expectation setting before interacting with the system is extremely pivotal to avoid any inflated expectations which in turn could lead to disappointment if they are not met. A Scheduling Assistant system has been used as an example in this paper and expectation adjustment techniques are discussed. The paper focusses on exploring methods to shape the user’s expectation before they use the system and study the impacts on the acceptance of the system by the user. Apart from this, the impact of different AI imperfections is also studied, specifically the cases of false positives vs false negatives. Accuracy indicator, example-based explanation and performance control are the three techniques proposed and evaluated. Via the studies conducted, it is concluded that a better expectation setting done before using a system decreases the chances of disappointment by highlighting the flaws of the system beforehand.

The study conducted assumes that the users are new to the environment and dedicates time explaining the interface at the initial stage of the experiment. I felt that this was helpful since the people involved in the survey can now follow along. I found this missing in some of the earlier papers read where it was assumed that all the readers had sufficient prior knowledge to follow along. Also, despite the fact that the initial performance of the system was ninety-three percent on the test dataset, in order to gauge the sentiments of the users and evaluate their expectation setting hypothesis, the authors decided to set the accuracy to fifty percent. I felt that this greatly improved the scope for disappointment, thereby helping them efficiently validate their expectation setting system and its effects. I felt that the decision to use visualizations as well as a short summary of the intent in their explanation was helpful since this eradicated the need for the users to read lengthy summaries and would offer better support for user decision. It was also good to note the authors take on deception and marketing as a means to set false expectations. This study went beyond such techniques and focused on shaping the expectations of the people via explaining the accuracy of the system. I felt that this perspective was more ethical compared to the other means adopted in this area.

Apart from the expectations that users have, what other factors influence the perception and acceptance of AI systems by the users?
What are some other techniques, visual or otherwise that can be adopted to set expectations of AI systems?
How can the AI system developers tackle trust issues and acceptance issues? Given that perceptions and individual experiences are extremely diverse, is it possible for an AI system to be capable of completely satisfying all its users?

02/26/2020 – Subil Abraham – Will you accept an imperfect AI?

February 25, 2020 Subil Abraham 1 Comment

Reading: Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19), 1–14. https://doi.org/10.1145/3290605.3300641

Different parts of our lives are being infused with AI magic. With this infusion, however, comes problems, because the AI systems deployed aren’t always accurate. Users are used to software systems being precise and doing exactly the right thing. But unfortunately they can’t extend that expectation for AI systems because they are often inaccurate and make mistakes. Thus it is necessary for developers to set expectations of the users ahead of time so that the users are not disappointed. This paper proposes three different visual methods of setting the user’s expectations on how well the AI system will work: an indicator depicting accuracy, a set of examples demonstrating how the sytem works, and a slider that controls how aggressively the system should work. The system under evaluation is a detector that will identify and suggest potential meetings based on the language in an email. The goal of the paper isn’t to improve the AI system itself, but rather to evaluate how well the different expectation setting methods work given an imprecise AI system.

I want to note that I really wanted to see an evaluation on the effects of mixed techniques. I hope that it will be covered in possible future work they do but am also afraid that such work might never get published because it would be classified as incremental (unless they come up with more expectation setting methods beyond the three mentioned in this paper, and do a larger evaluation). It is useful to see that we now have numbers to back up that high-recall applications under certain scenarios are perceived as more accurate. It makes intuitive sense that it would be more convenient to deal with false positives (just close the dialog box) than false negatives (having to manually create a calendar event). Also, seeing the control slider brings to mind the trick that some offices play where they have the climate control box within easy reach of the employees but it actually doesn’t do anything. It’s a placebo to make people think it got warmer/colder when nothing has changed. I realize that the slider in the paper is actually supposed to do what it advertised, but it makes me think of other places where a placebo slider can be given to a user to make them think they have control when in fact the AI system remains completely unchanged.

What other kinds of designs can be useful for expectation setting in AI systems?
How would these designs look different for a more active AI system like medical prediction, rather than a passive AI system like the meeting detector?
The paper claims that the results are generalizable for other passive AI systems, but are there examples of such systems where it is not generalizable?

02/26/2020 – Nurendra Choudhary -Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020February 25, 2020 Nurendra Choudhary Leave a comment

Summary

In this paper, the authors discuss the acceptance of imperfect AI systems by human users. They, specifically, consider the case of email-scheduling assistants for their experiments. Three features are adopted to interface between users and the assistant: Accuracy Indicator (indicates expected accuracy of the system), Example-based Explanation design (explanation of sample test-cases for pre-usage development of human mental models) and Control Slider Design (to let users control the system aggressiveness, False Positive vs False Negative Rate).

The participants of the study completed a 6 step procedure. The procedures analyzed their initial expectations of the system. The evaluation showed that the features helped in setting the right expectation for the participants. Additionally, it concluded that systems with High Recall gave a pseudo-sense of higher accuracy than High Precision. The study shows that user expectations and acceptance can be satisfied through not only intelligible explanations but also tweaking model evaluation metrics to emphasize on one over another (Recall over Precision in this case).

Reflection

The paper explores user expectations of AI and their corresponding perceptions and acceptance. An interesting idea is tweaking evaluation metrics. In previous classes and a lot of current research, we discuss utilizing interpretable or explainable AI as the fix for the problem of user acceptance. However, the research shows that even simple measures such as tweaking evaluation metrics to prefer recall over precision can create a pseudo-sense of higher accuracy for users. This makes me question the validity of current evaluation techniques. Current metrics are statistical measures for deterministic models. The statistical measures directly correlated with user acceptance because of human comprehensibility of their behaviour. However, given the indeterministic nature of AI and its incomprehensible nature, old statistical measures may not be the right way to validate AI models. For our specific research/problems, we need to study end-user more closely and design metrics that correlate to the user demands.

Another important aspect is the human comprehensibility of AI systems. We notice from the paper that addition of critical features significantly increased user acceptance and perception. I believe there is a necessity for more such systems across the field. The features help the user expectation of the system and also help adoption of AI systems in real-world scenarios. The slider is a great example of manual control that could be provided to users to enable them to set their expectations from the system. Explanation of the system also helps develop human mental models so they can understand and adapt to AI changes faster. Example, search engines or recommender systems record information on users. If the users understand they store and utilize for their recommendation, they would modify their usage accordingly to fit the system requirements. This will improve system performance and also user experience. Also, it will lead to a sense of pseudo-deterministic nature in AI systems.

Questions

Can such studies help us in finding relevance of evaluation metric in the problem’s context?
Evaluation metrics have been designed as statistical measures. Has this requirement changed? Should we design metrics based on user experience?
Should AI be developed according to human expectations or independently?
Can this also be applied to processes that generally do not directly involve humans such as recommender systems or search engines?

Word Count: 557

02/26/20 – Myles Frantz – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 22, 2020February 25, 2020 Miles Frantz Leave a comment

Summation

Though Machine Learning techniques have advanced greatly within the past few years, human perception on the technology may severely limit the adoption and usage of the technology. To further study how to better encourage and describe the process of using this technology, this team has created a Scheduling Assistant to better monitor and elicit how users would tune and react to from different expectations. Tuning the expectations of the AI (Scheduling Assistant) via a slider (from “Fewer detections” on the left to “More detections” on the right) directly altering the false positive and false negative settings in the AI. This direct feedback loop gave users (mechanical turk workers) more confidence and a better understanding of how the AI works. Though given the various variety of users, having an AI focused on High Precision was not the best general setting for the scheduling assistant.

Response

I like this kind of raw collaboration between the user and the Machine Learning system. This tailors the algorithm explicitly to the tendencies and mannerisms of each user, allowing easier customization and thus a higher likelihood of usage. This is supported due to the team’s Research hypothesis: “H1.1 An AI system focused on High Precision … will result in higher perceptions of accuracy …”. In this example each user (mechanical turk worker) was only using the subset of Enron emails to confirm or deny the meeting suggestions. Speculating further, this type of system being used in an expansive system across many users, being able to tune the AI would greatly encourage use.

I also strongly agree with the slider bar for ease of use tuning by the individual. In this format the user does not neat to have great technological skill to be able to use it, and it is relatively fast to use. Having it within the same system easily reachable also ensures a better connection between the user and the AI.

Questions

I would like to see a greater (and or a beta) study done with a giant email provider. Each email provider likely has their own homegrown Machine Learning model, however providing the capabilities to further tune their own AI for their preferences and tendencies would be a great improvement. The only issue would be with the scalability and providing enough services to make this work for all the users.
In tuning the ease of access and usability, I would like to see a study done comparing the different types of interaction tools (sliders, buttons, likert scale settings, etc.…). There likely is a study done about the effectiveness of each type of interaction tool upon a system, however in the context of AI settings it is imperative to have the best tool. This would hopefully be an adopted standard that would be an easy to use tool accessible by everyone.
Following along with the first question, I would like to see this kind of study provided to an internal mailing system, potentially at an academic level. Though this was studied with 150 mechanical turk workers and 400 internally provided workers, this was based on a sub-sample on the Enron email dataset. Providing this as a live-beta test in a widely and actively used email system with live emails would be a true test that I would like to see.