03/25/20 – Sukrit Venkatagiri – “Like Having a Really Bad PA”: Gulf between User Expectation and Experience

March 24, 2020 Sukrit Venkatagiri 2 Comments

Paper: Ewa Luger and Abigail Sellen. 2016. “Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5286–5297.

Summary: This paper presents findings from 14 semi-structured interviews conducted with users of existing conversational agent systems, and highlights four key areas where current systems fail to support user interaction. It details how conversational agents are increasingly used within online systems, such as banking systems, as well as by larger companies like Google, Facebook, and Apple. The paper attempts to understand end-users and their experiences in dealing with conversational agents, as well as the challenges that they face in doing so—both from the user and the conversational agents’ side. The findings highlight how conversational agents are used by end-users for play, hands-free speech when they are unable to type, for making specific and formal queries, and for simple tasks such as finding out the weather. The paper also talks about how, in most instances, conversational agents are unable to fill the gap between users’ expectations and the actual way the conversational agent behaves, and that incorporating playfulness may be useful. Finally, the paper uses Norman’s gulf of execution and evaluation to provide implications for designing future systems.

Reflection:
This paper is very interesting, and I have had similar thoughts when using conversational agents in day to day life. I also appreciate the use of semi-structured interviews to get at users’ actual experiences of using conversational agents and how it differed from their expectations prior to using these CAs.

This work also adds on to prior work, confirming the existence of this gap or gulf between expectations and reality, and that users constantly expect more from CAs than CAs are capable of providing. The paper also speaks to the importance of designing conversational agents where user expectations should be set rather than having users set their own expectations, as we saw in some papers from previous weeks. The authors also discuss emphasizing interaction and constant updates with the CA to improve end-user expectations.

The paper also suggests ways to hold researchers and developers accountable for the promises that they make when designing such systems, and overhauling the system based on user feedback.

However, rather than just focusing on where conversational agents failed to support user interaction, I wish the paper had also focused on where the system successfully supports user interaction. Further, I wish they had sampled users who were not only novices but also experts, who might have had different expectations. It might be interesting to scale up this work as a survey to see how users’ expectations differ based on the conversational agent that is being used.

Questions:

How would you work to reduce the gulf between expectation and reality?
What are the challenges to building useful and usable conversational AIs?
Why are conversational AIs sometimes so limited? What affects their performance?
Where do you think humans can play a role? I.e. as humans-in-the-loop?

03/24/2020 – Akshita Jha – “Like Having a Really bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

March 24, 2020 Akshita Jha 2 Comments

Summary:
“Like Having a Really bad PA: The Gulf between User Expectation and Experience of Conversational Agents” by Luger and Sellen talks about conversational agents and the gap between user expectation and the response given by the conversational agent. Conversational agents have been on the rise for quite some time now. All the big and well-known companies like Apple, Microsoft, Google, IBM, etc. have their own proprietary conversational agents. The authors report the findings of interviews with 14 end-users in order to understand the interactional factors affecting everyday use. The findings show that the end-users use conversational agents: (i) as a form of play, (ii) for a hands-free approach, (iii) for formal queries, and (iv) for simple tasks. The authors use “Norman’s ‘gulfs of execution and evaluation’ and infer the possible implications of their findings for the design of future systems.” The authors found that in the majority of instances the conversational agent was unable to fill the gap between user expectation and how the agent actually operates. It was also found that incorporating playful triggers and trigger responses in the systems increased human engagement.

Reflection:
This is an interesting work as it talks about the gap between user expectation and the system behavior, especially in the context of conversational agents. The researchers confirm that there is a “gulf” between the expectation and the reality and the end-users continually overestimate the amount of demonstratable intelligence that the system possesses. The authors also emphasized on the importance of the design and interactability of these conversational agents to make them better suited for engaging users. The users expect the chatbot to converse like humans but in reality, AI is far from it. The authors suggest considering ways to (a) to reveal system intelligence (b) to change the interactability to reflect the system’s capability (c) reign in the promises made by the scientists (d) revamp the system feedback given. The limitations of the study are that the sample is the male population from the UK. The findings presented in the paper, therefore, might be skewed. The primary use case for a conversational agent, not surprisingly, was ‘hands-free’ usage for saving time. However, if the conversational agent results in an error, the process becomes more cumbersome and time-consuming than originally typing in the query. The user tolerance in such cases might be low and lead to distrust which can negatively affect the feedback the conversational agents receive. The authors also talk about the different approaches that end-users take to interact with Google Now vs Siri. It would be interesting to see how user behavior changes with different conversational agents.

Questions:
1. What are your views about conversational agents?
2. Which conversational agent do you think performs the best? Why?
3. As a computer scientist, what can you do to make the end-users more aware of the limitations of conversational agents?
4. How can be best incorporate feedback into the system?
5. Do you think using multimodal representations of intelligence is the way forward? What challenges do you see in using such a form of representation?

03/25/2020 – Palakh Mignonne Jude – “Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

March 24, 2020 Palakh Mignonne Jude Leave a comment

SUMMARY

The authors of this paper aim to understand the interactional factors that affect conversational agents (CAs) such as Apple’s Siri, Google Now, Amazon’s Alexa, and Microsoft’s Cortana. They conducted interviews with 14 participants (they continued to find participants until theoretical saturation had been reached). They identified the motivations of these users, their type of use, the effort involved in learning to use a CA, user evaluation of the CAs, as well as issues that affect engagement of users. Through their study, the authors found that user expectations were not met by the CAs. They also found that the primary use-case for these CAs was to perform tasks ‘hands free’ and that users were more likely to trust the CA with tasks that needed less precision (such as setting an alarm, or asking about the weather) as compared to tasks that required more precision (such as drafting an email). For the tasks that needed more precision, the users were likely to utilize visual confirmation to ensure that the CA had not made any mistakes. The authors identified that it would help if the CA enabled the users to learn about the systems capabilities in a better manner, as well as if the goals of the system were more clearly defined.

REFLECTION

I found this paper to be very interesting, however, given that it was written in 2016, I wonder if a follow-up study has been performed to evaluate CAs and their improvement over the past few years. I liked the description given in the paper about ‘structural patterns’ and how as humans, we often use non-verbal tools to ascertain the mood of another person – which would be challenging to achieve in the context of the current conversational agents. I also found it interesting to learn that humans found excess politeness repulsive when they knew that their interaction was with a machine while they expected politeness in case of interactions with humans. I agree that these CAs must be designed in such a way that naïve, uninformed humans would be able to use them with ease in everyday situations.

I thought it was interesting that the authors mention the lack of the ability of the CAs to bear contextual understanding between interactions, especially in the case of subsequent questions that might be asked. If the CA were to intend to conduct conversations in a more human-like manner, I believe that this is an important factor that must be considered. As someone who isn’t an avid user of CAs, I am unaware about the current progress that has been made towards improving this aspect of CAs.

As indicated by the paper, I remember having used ‘play’ as a point of entry when I first starting using my Google Home – wherein I used the ‘Pikachu Talk’ feature. I also found it interesting to learn how, in this case as well, humans form mental models regarding the capabilities of the CA systems.

QUESTIONS

How have conversational agents evolved over the past few years since this paper was published?
Which CA among Cortana, Siri, Google Now, and Alexa has made the most progress and has the best performance? Are any of these systems capable of maintaining context when communicating with users? Which of these conversations seem most human-like?
Considering that this study was conducted with users that mainly used Siri, has a follow-up comparative study been performed that evaluates the performance of each of the available CAs and illustrates the strengths and weaknesses of each of these systems?

03-25-2020-Yuhang Liu-“Like Having a Really bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

March 24, 2020March 24, 2020 yuhang Liu 1 Comment

Summary:

The research background of this paper is that many conversational agents are currently emerging. For example, every major technical company has its own conversational agent. As a key mode of human-computer interaction, it has a lot of research significance, so this paper reports the results of interviews with 14 users, and finds that user expectations are very different from the ways of system operations, so the author has finished researching the feedback from these 14 users. The following conclusions were reached:

(a) Change the ways to reveal system intelligence

(b) Reconsidering the interactional promise made by humorous engagement

(d) Rethinking system feedback and design goals in light of the dominant use case

In general, the functions that the conversation agent can achieve and its impact on human life are still far from people’s expectations. So it is need to be improved better on how people work and their design goals based on their needs.

Reflection:

Then I will talk about my thoughts on these suggestions:

First of all, I very much agree with the author’s suggestion, interactional promise made by humorous engagement. Based on my only interaction experience, I think that humorous interaction methods are very effective in improving user experience and making interaction commitments. I rarely use Siri, but I remember that Siri has a lot of humorous reality, and when asked a specific question, there will be relevant answers. Although this does not help much to solve the problem, it can improve the user experience, and I think that making interactive commitments in this way can also help users better understand the conversation agent, add fun to use, and improvements will make users have confidence in the conversational agent.

Secondly, I think that the other suggestions are mainly regarded as a better demonstration of the ability of conversational agents to users, which is also in line with the central idea of this paper, that is, people’s expectations are far from the goals that the system can achieve. I don’t know the true ability of the system, which leads to the continuous accumulation of disappointment of unfinished tasks, and thus gradually abandon the use of conversational agents. I think I gradually reduced the use of similar products because I thought that the operating systems I wanted to perform were difficult to meet, and in retrospect, I didn’t know what functions the system could really accomplish. So I think it is imperative that users need to really understand the system’s capabilities, use it in a correct and efficient way, and constantly improve their satisfaction in order to realize successful interactions, and then increase the use of conversational agents. At the same time, the emergence of this problem cannot be completely attributed to the user’s wrong use. Technology companies also need to better understand the needs of users, innovate interaction methods, try to change the previous way of conversation communication, broaden interaction methods, and add new features to continue to meet people’s needs.

Question:

What is the role of those conversational agent in your life?
What functions can be added in your mind?
Do you think you don’t know the abilities of these conversational agent clearly?
Do you think it might be useful to design a conversational agent follow the suggestions mentioned in this paper?

03/25/2020 – Mohannad Al Ameedi – “Like Having a Really bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

March 24, 2020March 24, 2020 mohada4 Leave a comment

Summary

In this paper, the authors, try to understand the user experience of conversational agents by examining the factors that motivate users to work with these agents, and also try to propose design considerations that overcome the current limitation and improve human interaction. During their study, they found that there is a huge gap between user expectations and conversational agents’ operations.

They also found that there are limited studies about how agents are used on a daily bases and most of these studies were not about user experiences and more focus on technical architecture, language learning, and other areas.

The authors conducted interviews with 14 individuals who use conversational agents regularly, and their ages varies from 25 to 60 years. Some of these individuals have in depth technical knowledge and the others are regular users of technologies.

They found that the key motivation of using the conversational agents was time saving where users ask the CA to execute simple tasks that normally require multiple steps like checking the weather, setting reminders, setting alarms, getting directions. They also found that the users started the engagement through playful interaction like asking the CA to tell them a joke or playing a music. Only few users, who have technical knowledge, reported using these systems on basic work-related tasks.

The user’s interactions were mainly on non-critical tasks and have reported that the agents were not that successful when they are asked to execute complex tasks. The studies shows that users don’t trust conventional agents when it comes to executing critical tasks like sending emails or making a phone calls and they need a visual confirmation to complete these kind tasks. They also mentioned that these systems don’t accept feedback and there are no transparencies of how things are working internally.

The authors suggest considering ways reveal system intelligence, reconsidering the interactional promise made by humorous engagement, considering how best to indicate capability though interaction, and rethinking system feedback and design goals in light of the dominant use case, as areas for future investigation and development.

Reflection

I found the results reported by the study to be very interesting. Most users learned to use these CA systems as they go by trying different words and keywords unit it worked out, and the conversational agents failed to have a natural interaction with humans.

I also thought that companies like Google, Amazon, Microsoft, and Facebook have developed conversational systems that can perform much better than answering simple questions and struggling with complex questions, but it appears that is not the case. These companies have developed very sophisticated AI systems and services and it seems to me that there are some limitation like computational power or latency considerations are preventing these systems from performing well.

I agree with the authors that providing feedback can improve human interaction with CA systems and communicating the capability can lower the expectation which leads to reducing the gap between the expectation and the operation.

Questions

The authors mentions that most users felt unsure as to whether their conversational agents had a capacity to learn, can we use reinforcement learning to help the CA to adapt and learn while engaging with users in a single session?
The authors mentioned that CA systems are generally good with simple tasks, but not with complex tasks and they are struggling with understanding human requests. Do you think that there are technical limitation or other factors preventing the system from performing well with humans? what are these factors?
The authors mentioned that most instances, the operation of the CA systems failed to bridge the gap between user expectation and system operation. If that the case for conversational agents, do you think that we are far away from deploying autonomous cars, which are far more complicated than CAs, in real time setting since it has direct interaction with environments?

Subil Abraham – 03/25/2020 – Luger and Sellen, “Like Having a Really bad PA”

March 24, 2020March 24, 2020 Subil Abraham 1 Comment

This paper tries to take a hard look at how useful conversational agents like Google Now and Siri are in the real world, when in the hands of real users who try to use them in daily life. The authors conduct interviews with 14 users to get their general thoughts about how they use these tools and in some case, get step by step details on how they do specific tasks. The paper is able to get some interesting insights and provide some useful recommendations on how to improve the existing CAs. Recommendations include making design changes to inform users the limitations of what the CAs can do, tone down some of the more personable aspects which gives a false impression that they are equivalent to humans in understanding, and rethinking design for easier use in hands free scenarios.

First thing that I noticed, after having read and focused primarily on papers that had some quantitative aspect to them, was that this paper is entirely focused on evaluating and interpreting the content of their interviews. I suppose this is another important way in which HCI research is done and shared with the world, because it focuses entirely on the human side of it. I think they have some good interpretations and recommendations from it. The general problem I have with these kinds of studies is the small sample size, which rears up here too. But I can look past that because I think they still are able to get some good insights and make some good recommendations, and provide focus on a mode of interaction that is entirely dialogue based. I do think that if they could have a bigger sample size and do some quantitative work, they could maybe show some trends in the failings of CAs. The most interesting insight for me is the fact that CAs seemed to have been designed with the thought that they would be the focus of attention when used, when in reality people were trying to use it while doing something else and were not looking at their phone. So the feedback mechanism was useless for the users because they were trying to be hands free. From my perspective, that seems to be the most actionable change and can probably lead to (or maybe it already has lead to) interesting design research on how to best provide task feedback for different kinds of tasks for hands free usage.

What kind of design elements can be included to help people understand the limits of what the CA can do, and thereby avoid having unfulfillable expectations?
Similarly, what kind of design elements would be useful to better suit the hands free usage of the CAs?
Should CAs aim to be more task oriented like Google Now, or more personable like Siri? What’s your preferred fit?

03/25/2020 – Dylan Finch – “Like Having a Really bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

March 23, 2020March 24, 2020 Dylan Finch 1 Comment

Word count: 575

Summary of the Reading

This paper focuses on cataloguing issues that users have with conversational agents and how user expectations of what conversational agents can do differ dramatically from what these systems can actually do. This is where the main issue arises: the difference in perceived and actual usefulness. Conversational agents are the virtual assistants that most of us have on our smartphones. They can do many things, but they will often have trouble with more complicated tasks and they may not be extremely accurate. Many participants in the study said that they would not use their conversation agents to do complicated tasks that required precision, like writing long emails. Other uses assumed that the conversation agents could do something, like book movie tickets, but the system could not accomplish the task for the first few times it was tried. This made the user less likely to try to use those features in the future. This paper lists more of these types of issues and tries to present some solutions to them.

Reflections and Connections

I think that this paper highlights a big problem with conversation agents. It can sometimes be very hard to know what conversational agents can and cannot do. Oftentimes, there is no explicit manual that lists all of the kinds of questions that you can ask or that tells you the limits of what the agent can do. This is unfortunate because being upfront with users is the best way to set expectations to a reasonable level. Conversational agents should do a better job of working expectations into their setups or their instructions.

Companies should also do a better job of telling consumers what these agents can and cannot do. Companies like Apple and Google, some of the companies highlighted in the paper, often build their agents up to be capable of anything. Apple tries to sell you on Siri by promising that it can basically do anything. Apple encourages you to use Siri for as many tasks as you can and advertaties this. But, oftentimes, Siri can’t do everything they imply it can. Or, if Siri can do it, she does it poorly. This compounds the problem even more because it sets user expectations extremely high. Then, users will try to actually use the agents, find out that they can’t do as many things as was advertised, and give up on the system altogether. Companies could do a lot to help solve this problem by just being honest with consumers and saying that there are certain things their agents can do and certain things their agents cannot do.

This is a real problem for people who use these kinds of technologies. When people do not know what kinds of questions the agents can actually answer, they may be more scared to ask any questions, severely limiting the usefulness of the agent. It would vastly improve the user experience if we could solve this issue and make people have more accurate expectations for what conversational agents can do.

Questions

How can companies better set expectations for their conversational agents?
Does anyone else have a role to play in educating people on the capabilities of conversation agents besides companies?
Do we, as computer scientists, have a role to play in educating people about the capabilities of conversational agents?