03/24/2020 – Akshita Jha – All Work and No Play? Conversations with a Question-and-Answer Chatbot in the Wild

March 24, 2020 Akshita Jha 4 Comments

Summary:
“All Work and No Play? Conversations with a Question-and-Answer Chatbot in the Wild” by Liao et. al. talks about conversational agents and their interactions with the end-users. The end-user of a conversation agent might want something more than just information from these chatbots. Some of these can be playful conversations. The authors study a field deployment of human resource chatbot and discuss the interest areas of users with respect to the chatbot. The authors also present a methodology involving statistical modeling to infer user satisfaction from the conversations. This feedback from the user can be used to enrich conversational agents and make them better interact with the end-user in order to guarantee user satisfaction. The authors primarily discuss 2 research questions: (i) What kind of conversational interactions did the user have with the conversational agent in the wild, (ii) What kind of signals given by the user to the conversational agents can be used to study human satisfaction and engagement. The findings show that the main areas of conversations include “feedback-giving, playful chit-chat, system inquiry, and habitual communicative utterances.” The authors also discuss various functions of conversational agents, design implications, and the need for adaptive conversational agents.

Reflection:
This is a very interesting paper because it talks about the surprising dearth of research in the gap between user interactions in the lab and those in the wild. It highlights the differences between the two scenarios and the varying degree of expectations that the end-user might have while interacting with a conversational agent. The authors also mention how the conversation is almost always initiated by the conversational agent which might not be the best scenario depending upon the situation. The authors also raise an interesting point where the conversational agent mostly functions as a question answering system. This is far from ideal and prevents the user from having an organic conversation. To drive home this point further, the authors compare and contrast the signals of an informal playful conversation with that of a functional conversation in order to provide a meaningful and nuanced understanding of user behavior that can be incorporated by the chatbot. The authors also mention that the results were based on survey data which was done in a workplace environment and do not claim generalization. The authors also study only work professionals and the results might not hold for a population from a different age group. An interesting point here is that the users strive for human-like conversations. This got me thinking if this a realistic goal to strive for? What would the research direction look like if we modified our expectations and treated the conversational agent as an independent entity? It might help to not evaluate the conversational agents with human-level conversation skills.

Questions:
1. Have you interacted with a chatbot? What has your experience been like?
2. Which feature do you think should be a must and should be incorporated in the chatbot?
3. Is it a realistic goal to strive for human-like conversations? Why is that so important?

03/25/2020 – Palakh Mignonne Jude – Evaluating Visual Conversational Agents via Cooperative Human-AI Games

March 24, 2020 Palakh Mignonne Jude 1 Comment

SUMMARY

In this paper, the authors design a cooperative game called GuessWhich (inspired by the 20-Questions game) to measure the perform of human-AI teams in the context of visual conversational agents. The AI system, ALICE, is based on the ABOT developed by Das et. al. in a prior study conducted to measure the performance of AI-AI systems. Two variants of ALICE have been considered for this study – ALICE_SL(trained in a supervised manner on the Visual Dialog dataset) and ALICE_RL (pre-trained with supervised learning and fine-tuned using reinforcement learning). The GuessWhich game was designed such that the human is the ‘questioner’ and the AI (ALICE) is the ‘answerer’. Both are given a caption that describes an image. While, ALICE is shown this image, the human can ask the AI multiple questions (9 rounds of dialog) to better understand the image. Post these rounds, the human is made to select the correct image from a set of distractor images that are semantically similar to the image to be identified. The authors found that, contrary to expectation, improvements in AI-AI performance does not translate to an improvement to AI-human performance.

REFLECTION

I like the gamification approach that the authors adopted for this study and I believe that the design of the game works well in the context of visual conversational agents. The authors mention how they aimed to ensure that the game was ‘challenging and engaging’. This reminded me of the discussion we had in class about the paper ‘Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research’, of how researchers often put in extra effort to make tasks for crowd workers more engaging and meaningful. I also liked the approach used to identify ‘distractor’ images and felt that this was useful in making the game challenging for the crowd workers.

I thought that it was interesting to learn that the AI-ALICE teams outperformed the human-ALICE teams. I wonder if this is impacted by the fact that ALICE could get some answers wrong and how that might affect the mental model generated by the human. I thought that it was good that the authors took into account knowledge leak and ensured that the human workers could only play a fixed number (10) of games.

I also liked that the authors gave performance-based incentives to the workers that were tied to the success of the human-AI team. I thought that it was good that the authors published the code of their design as well as provided a link to an example game.

QUESTIONS

As part of the study conducted in this paper, the authors design an interactive image-guessing game. Can similar games be designed to evaluate human-AI team performance in other applications? What other applications could be included?
Have any follow up studies been performed to evaluate the QBOT in a ‘QBOT-human team’? In this scenario, would the QBOT_RL outperform the QBOT_SL?
The authors found that some of the human workers adopted a single word querying strategy with ALICE. Is there any specific reason that could have caused the humans to do so? Would they have questioned a human in a similar fashion? Would their style of querying have changed if they were unaware if the other party was a human or an AI system?

03/25/2020 – Palakh Mignonne Jude – “Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

March 24, 2020 Palakh Mignonne Jude Leave a comment

SUMMARY

The authors of this paper aim to understand the interactional factors that affect conversational agents (CAs) such as Apple’s Siri, Google Now, Amazon’s Alexa, and Microsoft’s Cortana. They conducted interviews with 14 participants (they continued to find participants until theoretical saturation had been reached). They identified the motivations of these users, their type of use, the effort involved in learning to use a CA, user evaluation of the CAs, as well as issues that affect engagement of users. Through their study, the authors found that user expectations were not met by the CAs. They also found that the primary use-case for these CAs was to perform tasks ‘hands free’ and that users were more likely to trust the CA with tasks that needed less precision (such as setting an alarm, or asking about the weather) as compared to tasks that required more precision (such as drafting an email). For the tasks that needed more precision, the users were likely to utilize visual confirmation to ensure that the CA had not made any mistakes. The authors identified that it would help if the CA enabled the users to learn about the systems capabilities in a better manner, as well as if the goals of the system were more clearly defined.

REFLECTION

I found this paper to be very interesting, however, given that it was written in 2016, I wonder if a follow-up study has been performed to evaluate CAs and their improvement over the past few years. I liked the description given in the paper about ‘structural patterns’ and how as humans, we often use non-verbal tools to ascertain the mood of another person – which would be challenging to achieve in the context of the current conversational agents. I also found it interesting to learn that humans found excess politeness repulsive when they knew that their interaction was with a machine while they expected politeness in case of interactions with humans. I agree that these CAs must be designed in such a way that naïve, uninformed humans would be able to use them with ease in everyday situations.

I thought it was interesting that the authors mention the lack of the ability of the CAs to bear contextual understanding between interactions, especially in the case of subsequent questions that might be asked. If the CA were to intend to conduct conversations in a more human-like manner, I believe that this is an important factor that must be considered. As someone who isn’t an avid user of CAs, I am unaware about the current progress that has been made towards improving this aspect of CAs.

As indicated by the paper, I remember having used ‘play’ as a point of entry when I first starting using my Google Home – wherein I used the ‘Pikachu Talk’ feature. I also found it interesting to learn how, in this case as well, humans form mental models regarding the capabilities of the CA systems.

QUESTIONS

How have conversational agents evolved over the past few years since this paper was published?
Which CA among Cortana, Siri, Google Now, and Alexa has made the most progress and has the best performance? Are any of these systems capable of maintaining context when communicating with users? Which of these conversations seem most human-like?
Considering that this study was conducted with users that mainly used Siri, has a follow-up comparative study been performed that evaluates the performance of each of the available CAs and illustrates the strengths and weaknesses of each of these systems?

03/25/20 – Lulwah AlKulaib-VQAGames

March 24, 2020 Lulwah AlKulaib Leave a comment

Summary

The paper presents a cooperative game between humans and AI called GuessWhich. The game is a live interaction in a conversational manner where the human is given multiple photos as choices and the AI has only one photo, the human would ask the AI, ALICE, questions to identify which one is the correct choice. ALICE was trained using both supervised learning and reinforcement learning on a publicly available visual dialog dataset then was used in the evaluation of the human-AI team performance. The authors find no significant difference in performance between ALICE’s supervised learning and reinforcement learning versions when paired with human partners. Their findings suggest that while self-talk and reinforcement learning are interesting directions to pursue for building better visual conversational agents, there appears to be a disconnection between AI-AI and human-AI evaluations. Progress in the former does not seem to be predictive of progress in the latter. It is important to note that measuring AI progress in isolation is not as useful for systems that require human-AI interaction.

Reflection

The concept presented in this paper is interesting. As someone who doesn’t work in the HCI field, it has opened my eyes to thinking of the different ways models that I have worked on shouldn’t be measured in isolation. As the authors showed that evaluating visual conversational agents through a human computation game gives results that differ from our conventional AI-AI evaluation. When thinking of this, I wonder how such methods would apply to tasks in which automated metrics correlate poorly with human judgement. Tasks like natural language generation in image captioning. Comparing how a method inspired by the one given in this paper would differ than the suggested method in last week’s papers. Given the difficulties presented in these tasks and the interactive nature of them, it is clear that the most appropriate way to evaluate these kinds of tasks is with a human in the loop but how would a large-scale human in the loop evaluation happen? Especially when there’s limited financial and infrastructure resources.

This paper made me think of the challenges that come with human in the loop evaluations:

1- In order to have it done properly, we must have a set of clear and simple instructions for crowdworkers.

2- There should be a way to ensure the quality of the crowdworkers.

3- For the evaluation’s sake, we need uninterrupted communication.

My takeaway from the paper is that while traditional platforms were adequate for evaluation tasks using automatic metrics, there is a critical need to support human in the loop evaluation for free form multimodal tasks.

Discussion

What are the ways that we could use this paper to evaluate tasks like image captioning?
What are other challenges that come with human in the loop evaluations?
Is there a benchmark of human- AI in the field of your project? How would you ensure that your results are comparable?
How would you utilize the knowledge about human- AI evaluation in your project?
Have you worked with measuring evaluations with human in the loop? What was your experience there?

03/25/2019 – Nurendra Choudhary – Evaluating Visual Conversational Agents via Cooperative Human-AI Games

March 24, 2020 Nurendra Choudhary 1 Comment

Summary

In this paper, the authors measure the performance of human-AI teams instead of isolated AI. They employ a GuessWhich game in the context of a visual conversation agent system. The game works on interaction between humans and the AI system called ALICE.

The game includes two agents- the questioner and answerer. The answerer has access to images, based on which they ask questions to the answerer. The answerer replies to the questions and tries to guess the correct image from an exhaustive set of images. For the human-AI team, the questioner is ALICE and the answerer is human. The performance in terms of the number of questions needed for the correct guess. Also, the authors utilize a QBot (Questioner Bot) instead of humans for comparative analysis between ALICE-human and ALICE-QBot.

The authors discuss various challenges with the approaches such as robustness to incorrect question-answer pairs, human learning on AI and others. They conclude that ALICE-RL, the state-of-the-art in AI literature, does not perform better than ALICE-QBot in ALICE-human pairs. This highlights the disconnect between isolated AI development and development in teams with humans.

Reflection

The paper discusses an important problem of disconnect between isolated AI development and its real-world usage with humans in the loop. However, I feel there are some drawbacks in the experimental setup. In the QBot part, I do not agree with the temporally dynamic nature of the questionnaire. I think the QBot should get access to the perfect set of questions (from humans) to generate the new question. This would make the comparison fair and less dependent on its own performance.

An interesting point is the dependence of AI on humans. The perfect AI system should not rely on humans. However, current AI systems rely on humans to be useful in real-world. This leads to a paradox where we need to make AI systems human-compliant but move towards the larger goal of bringing in independent AI.

To achieve the larger goal, I believe isolated development of AI is crucial. However, the systems also need to contribute to human society. For this, I believe we utilize variants of the underlying system to support human behavior. This approach supports isolated development and additionally collects auxiliary performance of human behavior which can further improve the AI’s performance. This approach is already being applied effectively. For example, in case of Google Translate, the underlying neural network model was developed in isolation. Human corrections to its translations provide auxiliary information and also improve the human-AI team’s overall performance. This leads to a significant overall improvement in the translator’s performance overtime.

Questions

Is it fair to use the GuessWhich game as an indicator of AI’s success? Shouldn’t we rely on the final goal of an AI to better appreciate the disconnect?
Should the human-AI teams just be part of evaluation or also the development part? How would we include them in the development phase for this problem?
The performance of ALICE relies on the QBot mechanism. Could we use human input to improve QBot’s question generation mechanism to improve its robustness?
The experiment provides a lot of auxiliary data such as correct questions, relevance of questions to the images and robustness of bots with respect to their own answers. Can we integrate this data into the main architecture in a dynamic manner?

Word Count:564

03/25/2019 – Nurendra Choudhary – All Work and No Play? Conversations with a Question-and-Answer Chatbot in the Wild

March 24, 2020 Nurendra Choudhary 1 Comment

Summary

In this paper, the authors study a Human Resources chatbot to analyze the interactions between the bot and its users. Their primary aim is to utilize the study to enhance the interactions of conversational agents in terms of behavioral aspects such as playfulness and information content. Additionally, the authors emphasize on the adaptivity of the systems based on particular user’s conversational data.

For the experiments, they adopted an agent called Chip (Cognitive Human Interface Personality). Chip has access to all the company related assistance information. The human subjects for this experiment are new employees that need constant assistance to orient themselves in the company. Chip is integrated into the IM services of the company to provide real-time support.

Although Chip is primarily a question-answer agent, the authors are more interested in the behavioral ticks in the interaction such as playful chit-chats, system inquiry, feedbacks and habitual communicative utterances. They utilize the information from such ticks to further enhance the conversational agent and improve its human-like behavior (and not focus solely on answer-retrieval efficiency).

Reflection

All Work and No play is a very appropriate title for the paper. Chip is primarily applied in a formal context where social interactions are considered unnecessary (if not inappropriate). However, human interactions always include a playful feature to improve quality of communication. No matter the context, human conversation is hardly ever void of behavioral features. The features exhibit emotions and significant subtext. Given the setting, it is a good study to analyze the effectiveness of conversational agents with human behavior features. However, some limitations of the study include the selection bias (as indicated in the paper too). The authors pick conversation subparts that are subjectively considered to include the human behavior features. However,I do not see a better contemporary method in the literature to efficiently avoid the selection bias.

Additionally, I see this study as part of a wider move of the community towards appending human-like behavior to their AI systems. If we look at the current popular AI conversation agents like Alexa, Siri, Google Assistant and others, we find a common aim to enhance human-specific features with limited utilitarian value such as jokes, playful ticks among others. I believe this type of learning also mitigates the amount of adaptation humans need before being comfortable with the system. In the previous classes, we have seen the adaptation of mental models with a given AI tool. If the AI systems behave more like humans and learn accordingly, humans would not need significant learning to adopt these AI tools in their daily life. For example, when voice assistants did not include these features, they were significantly less prevalent than in the current society and they are only projected to widen their market.

Questions

How appropriate is it to have playful systems in an office environment? Is it sufficient to have efficient conversational agents or do we require human-like behavior in a formal setting?
The features seem even more relevant for regular conversational agents. How will the application and modeling differ in those cases?
The authors select the phrases or conversational interactions as playful or informal based on their own algorithm. How does this affect the overall analysis setup? Is it fair? Can it be improved?
We are trying to make the AI more human-like and not using it simply as a tool. Is this part of a wider move as the future area of growth in AI?

Word Count: 590

03/25/2020 – Bipasha Banerjee – All Work and No Play? Conversations with a Question-and-Answer Chatbot in the Wild

March 24, 2020 bipashab Leave a comment

Summary

The paper by Liao et al. talks about conversational agents (CAs) that are used to answer two research questions. The first was to see how CAs interact with users, and the second was to see what kind of conversational interactions could be used for the CAs to gauge user satisfaction. For this task, the authors developed a conversational agent called Cognitive Human Interface Personality (CHIP). The primary function of the agent is to provide HR assistance to new hires to a company. For this research, 377 new employees were the users, and the agent provided support to them for six weeks. CHIP would answer questions related to the company, which is quite natural for newly employed individuals to have. IBM Watson Dialog package has been utilized to incorporate the conversations collected from past CA usages. They made the process iterative, where 20-30 user interaction was taken into account in the development process. The CA was aimed to be conversational and social. In order to do so, users were assisted with regular reminders. Participants in the study were asked to use a specific hashtag, namely, #fail, to provide feedback and consent to the study. The analysis was done using classifiers to provide a characterization of user input. It was concluded that signals in conversational interactions could be used to infer user satisfaction and further develop chat platforms that would utilize such information.

Reflection

The paper does a decent job of investigating conversational agents and finding out the different forms of interactions users have with the system. This work gave an insight into how these conversations could be used to identify user satisfaction. I particularly was interested to see the kind of interaction the users had with the system. It was also noted in the paper that the frequency of usage of the CA declined within two weeks. This was natural when it comes to using the HR system. However, industries like banking, where 24-hour assistance is needed and desired, would have consistent traffic of users. Additionally, it is essential to note how they maintain the security of users while such systems use human data. For example, HR data is sensitive. The paper did not mention anything about how do we actually make sure that personal data is not transferred or used by any unauthorized application or humans.

One more important factor, in my opinion, is the domain. I do understand why the HR domain was selected. New hires are bound to have questions, and a CA is a perfect solution to answer all such frequently asked questions. However, how would the feasibility of using such agent change with other potential uses of the CA? I believe that the performance of the model would decrease if the system was to be more complex. Here the CA mostly had to anticipate or answer questions from a finite range of available questions. However, a more open-ended application could have endless questioning opportunities. To be able to handle such applications would be challenging.

The paper also only uses basic machine learning classifiers to answer their first research question. However, I think some deep learning techniques like those mentioned in [1] would help classify the questions better.

Questions

How would the model perform in domains where continuous usage is necessary? Examples are the banking sectors.
How was the security taken care of in their CA setup?
Would the performance and feasibility change according to the domain?
Could deep learning techniques improve the performance of the model?

References

[1]https://medium.com/@BhashkarKunal/conversational-ai-chatbot-using-deep-learning-how-bi-directional-lstm-machine-reading-38dc5cf5a5a3

03/25/20 – Lee Lisle – Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time

March 24, 2020March 24, 2020 Lorance R Lisle 1 Comment

Summary

Often, humans have to be trained on how to talk to AIs such that the AI understands the human and what it is supposed to do. However, this certainly puts the onus on the human to adjust rather than having the AI adjust to the human. One way of solving that issue was to create crowdsourced responses so that the human and AI could understand each other with what was essentially a middle-man approach. Huang et al.’s work was on creating a hybrid crowd and AI powered conversational assistant that aims to create an AI that can be fast enough for a human to interact with naturally but also creates higher-quality natural responses that the crowd could create. It accomplishes this through reusing responses that were previously generated by the crowd over time as high quality responses are identified. They also deployed Evorus over a period of 5 months with 281 conversations that gradually moved from crowd responses to chatbot responses.

Personal Reflection

I liked that the authors took an odd stand in this line of research early on in the paper – that, while it has been suggested that AI will eventually take over for the crowd implementations of a lot of systems, this hasn’t happened despite a lot of research. This stand highlights that the crowd has been performing tasks that it was supposed to stop at some point for a long time.

Also, I found that I could possibly adapt what I’m working on with automatic speech recognition (ASR) to improve itself with a similar approach. If I took, for example, several outputs from different ASR algorithms along with crowd responses and had a ranked voting for which was the best transcription, perhaps it could eventually wean itself off the crowd as well.

It was also interesting that they took a reddit or other social website approach with the upvote/downvote system of determination. This approach seems to have long legs in fielding appropriate responses via the crowd.

The last observation I would like to make is that they had an interesting and diverse set of bots, though I question the usefulness of some of them. I don’t really understand how filler bot can be useful except in situations where the AI doesn’t really understand what is happening, for example. I had also though the interview bot would be low performing as the types of interviews it describes that it pulled its training data from would be particular to certain types of people.

Questions

Considering they said that the authors felt that Evorus wasn’t a complete system but a stepping point to a complete system, what do you think they could do to improve it? I.E., what more can be done?
What other domains within human-AI collaboration could use this approach of the crowd being a scaffold that the AI develops upon until the scaffold is no longer needed? Is the lack of these deployments evidence that the developers don’t want to leave this crutch or is it due to the crowd still being needed?
Does the weighting behind the upvotes and downvotes make sense? Should the votes have equal (yet opposite) weighting or should they be manipulated as the authors did? Why or why not?
Should the workers be incentivized for upvotes or downvotes? What does this do the middle-of-the-road responses that could be right or wrong?

03/25/20 – Lee Lisle – Evaluating Visual Conversational Agents via Cooperative Human-AI Games

March 24, 2020March 24, 2020 Lorance R Lisle Leave a comment

Summary

Chattopadhyay et al.’s work details the problems with the current (pre-2018) methods of evaluating visual conversational agents. These agents, which are AIs designed to discuss what is in pictures, were typically evaluated through one AI (the primary visual conversational agent) describing a picture while another asked questions about it. However, the authors show how this kind of interaction does not adequately reflect how humans would converse with the agent. They use 2 visual conversation agents, dubbed ALICE_SL and ALICE_RL (for supervised and reinforcement learning, respectively) to play 20 questions with AMT workers. They found that there was no significant difference in the performance of the two versions of ALICE. This stood in contrast to the work done previously which found that ALICE_RL was significantly better than ALICE_SL when tested by AI-AI teams. Both ALICEs perform better than random chance, however. Furthermore, AI-AI teams require fewer guesses than the humans in Human-AI teams.

Personal Reflection

I first found their name for 20-questions was Guess What or Guess Which. This has relatively little to do with the paper, but it was jarring to me at first.

The first thing that struck me was their discussion of the previous methods. If the first few rounds of AI-AI evaluation were monitored, why didn’t they pick up that the interactions weren’t reflective of human usage? If the abnormality didn’t present until later on, could they have monitored late-stage rounds, too? Or was it generally undetectable? I feel like there’s a line of questioning here that wasn’t looked at that might benefit AI as well.

I was amused that, with all the paper being on AI and interactions with humans, that they chose the image set to be medium difficulty based on “manual inspection.” Does this indicate that the AIs don’t really understand difficulty in these datasets?

Another minor quibble is that they say each HIT was 10 games, but then state that they published HITs until they got 28 games completed on each version of ALICE and specify this meant 560 games. They overload the word ‘game’ without describing the actual meaning behind it.

An interesting question that they didn’t discuss investigating further is whether question strategy evolved over time for the humans. Did they change up their style of questions as time went on with ALICE? This might provide some insight as to why there was no significant difference.

Lastly, their discussion on the knowledge leak of evaluating AIs on AMT was quite interesting. I would not have thought that limiting the interaction each turker could have with an AI would improve the AI.

Questions

Of all of the participants who started a HIT on AMT, only 76.7% of participants actually completed the HIT. What does this mean for HITs like this? Did the turkers just get bored or did the task annoy them in some way?
The authors pose an interesting question in 6.1 about QBot’s performance. What do you think would happen if the turkers played the role of the answerer instead of the guesser?
While they didn’t find any statistical differences, figure 4(b) shows that ALICE_SL outperformed ALICE_RL in every round of dialogue. While this wasn’t significant, what can be made of this difference?
How would you investigate the strategies that humans used in formulating questions? What would you hope to find?

03-25-2020-Yuhang Liu-“Like Having a Really bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

March 24, 2020March 24, 2020 yuhang Liu 1 Comment

Summary:

The research background of this paper is that many conversational agents are currently emerging. For example, every major technical company has its own conversational agent. As a key mode of human-computer interaction, it has a lot of research significance, so this paper reports the results of interviews with 14 users, and finds that user expectations are very different from the ways of system operations, so the author has finished researching the feedback from these 14 users. The following conclusions were reached:

(a) Change the ways to reveal system intelligence

(b) Reconsidering the interactional promise made by humorous engagement

(d) Rethinking system feedback and design goals in light of the dominant use case

In general, the functions that the conversation agent can achieve and its impact on human life are still far from people’s expectations. So it is need to be improved better on how people work and their design goals based on their needs.

Reflection:

Then I will talk about my thoughts on these suggestions:

First of all, I very much agree with the author’s suggestion, interactional promise made by humorous engagement. Based on my only interaction experience, I think that humorous interaction methods are very effective in improving user experience and making interaction commitments. I rarely use Siri, but I remember that Siri has a lot of humorous reality, and when asked a specific question, there will be relevant answers. Although this does not help much to solve the problem, it can improve the user experience, and I think that making interactive commitments in this way can also help users better understand the conversation agent, add fun to use, and improvements will make users have confidence in the conversational agent.

Secondly, I think that the other suggestions are mainly regarded as a better demonstration of the ability of conversational agents to users, which is also in line with the central idea of this paper, that is, people’s expectations are far from the goals that the system can achieve. I don’t know the true ability of the system, which leads to the continuous accumulation of disappointment of unfinished tasks, and thus gradually abandon the use of conversational agents. I think I gradually reduced the use of similar products because I thought that the operating systems I wanted to perform were difficult to meet, and in retrospect, I didn’t know what functions the system could really accomplish. So I think it is imperative that users need to really understand the system’s capabilities, use it in a correct and efficient way, and constantly improve their satisfaction in order to realize successful interactions, and then increase the use of conversational agents. At the same time, the emergence of this problem cannot be completely attributed to the user’s wrong use. Technology companies also need to better understand the needs of users, innovate interaction methods, try to change the previous way of conversation communication, broaden interaction methods, and add new features to continue to meet people’s needs.

Question:

What is the role of those conversational agent in your life?
What functions can be added in your mind?
Do you think you don’t know the abilities of these conversational agent clearly?
Do you think it might be useful to design a conversational agent follow the suggestions mentioned in this paper?