03/25/20 – Fanglan Chen – Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time

Summary

Huang et al.’s paper “Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time” explores a novel approach of a crowd-powered system architecture that can support gradual automation. The motivation of the research is that crowd-powered conversational assistants have been shown to achieve better performance than automated systems, but they cannot be widely adopted due to their monetary cost and response latency. Inspired by the idea to combine the crowd-powered and automatic approaches, the researchers develop Evorus, a crowd-powered conversational assistant. Powered by three major components(learning to choose chatbots, reusing prior responses, and automatic voting), new chatbots can be automated to more scenarios over time. The experimental results show that Evorus can evolve without compromising conversational quality. Their proposed framework contributes to the research direction of how automation can be introduced efficiently in a deployed system.

Reflection

Overall, this paper proposes an interesting gradual automatic approach for empowering conversational assistants. One selling point of the paper is that users can converse with the proposed Evorus in open domains instead of limited ones. To achieve that goal, the researchers design the learning framework as it can assign a slightly higher likelihood of newly-added chatbots to collect more data. I would imagine it requires a large amount of time for domain-specific data collection. Sufficient data collection in each domain seems important to ensure the quality of open-domain conversation. Similar to the cold-start problems in recommender systems, the data collected for different domains is likely imbalanced, for example, certain domains may gain no or little data during the general data collection process. It is unclear how the proposed framework can deal with this problem. One direction I can think about is to utilize some machine learning techniques such as zero-shot learning (domain does not appear in prior conversations) and few-shot learning (domain rarely discussed in prior conversations) to deal with the imbalanced data collected by the chatbot selector.

For the second component, the reuse of prior answers seems a good way to reduce the system computational cost. However, text retrieval can be very challenging. Take lexical ambiguity as an example, polysemous words would hinder the accuracy of the derived results because different contexts are mixed in the instances, collected from the corpus, in which a polysemous word occurred. If the retrieval component cannot handle the lexical ambiguity issue well, the reuse of prior answers may find irrelevant responses to user conversations which could potentially introduce errors to the results.

In the design of the third component, both workers and the vote bot can upvote suggested responses. It requires the collection of sufficient vote weight until Evorus accepts the response candidate and sends it to the users. Depending on the threshold of the sufficient vote weight, the latency could be very long. In the design of user-centric applications, it is important to keep the latency/runtime in mind. I feel the paper would be more persuasive if it can provide support experiments on the latency of the proposed system.

Discussion

I think the following questions are worthy of further discussion.

  • Do you think it is important to ensure users can converse with the conversational agents in open domains in all scenarios? Why or why not?
  • What improvements can you think about that the researchers could improve in the Evorus learning framework?
  • At what point, you think the conversational agent can be purely automatic or it is better to always have a human-in-the-loop component?
  • Do you consider utilizing the gradual automated learning framework in your project? If yes, how are you going to implement it?

Read More

03/25/20 – Lee Lisle – Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time

Summary

            Often, humans have to be trained on how to talk to AIs such that the AI understands the human and what it is supposed to do. However, this certainly puts the onus on the human to adjust rather than having the AI adjust to the human. One way of solving that issue was to create crowdsourced responses so that the human and AI could understand each other with what was essentially a middle-man approach. Huang et al.’s work was on creating a hybrid crowd and AI powered conversational assistant that aims to create an AI that can be fast enough for a human to interact with naturally but also creates higher-quality natural responses that the crowd could create. It accomplishes this through reusing responses that were previously generated by the crowd over time as high quality responses are identified. They also deployed Evorus over a period of 5 months with 281 conversations that gradually moved from crowd responses to chatbot responses.

Personal Reflection

I liked that the authors took an odd stand in this line of research early on in the paper – that, while it has been suggested that AI will eventually take over for the crowd implementations of a lot of systems, this hasn’t happened despite a lot of research. This stand highlights that the crowd has been performing tasks that it was supposed to stop at some point for a long time.

Also, I found that I could possibly adapt what I’m working on with automatic speech recognition (ASR) to improve itself with a similar approach. If I took, for example, several outputs from different ASR algorithms along with crowd responses and had a ranked voting for which was the best transcription, perhaps it could eventually wean itself off the crowd as well.

It was also interesting that they took a reddit or other social website approach with the upvote/downvote system of determination. This approach seems to have long legs in fielding appropriate responses via the crowd.

The last observation I would like to make is that they had an interesting and diverse set of bots, though I question the usefulness of some of them. I don’t really understand how filler bot can be useful except in situations where the AI doesn’t really understand what is happening, for example. I had also though the interview bot would be low performing as the types of interviews it describes that it pulled its training data from would be particular to certain types of people.

Questions

  1. Considering they said that the authors felt that Evorus wasn’t a complete system but a stepping point to a complete system, what do you think they could do to improve it? I.E., what more can be done?
  2. What other domains within human-AI collaboration could use this approach of the crowd being a scaffold that the AI develops upon until the scaffold is no longer needed? Is the lack of these deployments evidence that the developers don’t want to leave this crutch or is it due to the crowd still being needed?
  3. Does the weighting behind the upvotes and downvotes make sense? Should the votes have equal (yet opposite) weighting or should they be manipulated as the authors did? Why or why not?
  4. Should the workers be incentivized for upvotes or downvotes? What does this do the middle-of-the-road responses that could be right or wrong?

Read More

03-25-2020- Yuhang Liu -Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time

Summary:

This paper proposes a new system. The system can combine crowdsourcing workers with machine learning to get better chat robots. The motivation for the authors to propose this idea is: fully automatic chat robots usually do not respond as well as crowd-driven conversation assistants, which is also obvious in our lives. But crowd-driven conversation assistants have higher cost and longer response time. So the author built the Evorus system, which is a crowd support conversation assistant. There are three main ways to achieve more efficient:

  1. This new system can integrate other chat robots;
  2. It can reuse previous answers;
  3. It can learn to automatically approve responders;

In short, when users chat with the system, users can evaluate the response, when the response is not ideal, the system can quote the answers from the crowdsourcing workers, and at the same time, it can learn the Q & A, and take the next time to answer the similar questions. In the direction that has been practiced, you can quickly answer questions. This speeds up Q & A, but also improves accuracy.

Reflection:

I believe that in daily life, people will definitely have access to a large number of automated question answering systems. For example, when going to the official website of UPS, there will be a chat robot popping up to ask your purpose, but usually there are only a few directions, However, when people’s requirements tend to be complicated, the chat tends to be complicated, and the automatic answering robot cannot handle it, which will make users go to phone consultation or other homepage. I think this response mainly comes from pre-booked questions and answers, so I think the system proposed by the author has a very important use value.

At the same time, I think that the biggest advantage of this system’s answer through the self-learning crowdsourcing system is that it can be updated continuously in time. Frequent updates usually consume a lot of manpower and resources, and timely updates are more important in communication tools. In network, the terminology and emerging vocabulary are updated very quickly. If it can be updated frequently by studying the accepted answers in each response which need crowdsourcing workers engage, it will have a very positive impact on system maintenance and users.

Finally, the system introduces the answering system into a wider field, not only in the way of updating, revising, and answering questions, but also more importantly, combining humans and machines, and opening a sealing systems to make it can be continuously updated. And more and more innovative projects will be added to it, which is what I think is more meaningful than the system itself.

Question:

  1. Do you think there are any other fields this system can add?
  2. How to evaluate crowdsourcing workers response, in other words how to make sure crowdsourcing workers’ response is better than machine.
  3. What is the difference between the system mentioned in this paper with other Q&A machine in modern society.

Read More

03/25/2020 – Vikram Mohanty – Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time

Authors: Ting-Hao (Kenneth) Huang, Joseph Chee Chang, and Jeffrey P. Bigham

Summary

This paper discusses Evorus, a crowd-powered intelligent conversation agent that is targeted towards automation over time. It allows new chatbots to be integrated, reuses prior crowd responses, and learns to automatically approve responses. It demonstrates how automation can efficiently be deployed by augmenting an existing system. Users used Evorus through Google Hangouts.  

Reflection

There’s a lot happening in this paper, but then it’s perfectly justified because of the eventual target — fully automated system. This paper is a great example of how to carefully plan the path to an automated system from manual origins. It is realistic in terms of feasibility, and the transition from a crowd-based system to a crowd-AI collaborative system aimed towards a fully automated one seems organic and efficient as seen from the results. 
In terms of their workflow, they break down different elements i.e. chatbots and vote bots, and essentially, scope down the problem to just selecting a chatbot and voting on responses. A far-fetched approach would have been to build (or aim for) an end-to-end (God-mode) chatbot that can give the perfect response. Because the problem is scoped down, and depends on interpretable crowd worker actions, designing a learning framework around these actions and scoped down goals seems like a feasible approach. This is a great takeaway from the paper — how to break down a complex goal into smaller goals. Instead of attempting to automate an end-to-end complex task, crafting ways to automate smaller, realizable elements along the path seems like a smarter alternative. 
The voting classifier was carefully designed, considering a lot of interpretable and relevant features again such as message, turn and conversation levels. Again, this was evaluated with a real purpose i.e. reducing the human effort in voting. 
This paper also shows how we can still build intelligent systems that improve over time on top of AI engines that we cannot (or may, actually do not have to) modify i.e. third-party developer chatbots, off-the-shelf AI APIs. Crowd-AI collaboration can be useful for this aspect, and therefore designing the user interaction(s) remains critical for a learning framework to be augmented to the fixed AI engine e.g. vote bot or the select bot in this paper’s case.

Questions

  1. If you are working with an off-the-shelf AI engine that cannot be modified, how do you plan on building a system that improves over time? 
  2. What other (interaction) areas in the Evorus system do you see for a potential learning framework that would improve the performance of the system (according to the existing metrics)?
  3. If you were working a complex task, would you prefer an end-to-end God-mode solution or adopt a slow approach by carefully breaking it down and automating each element?

Read More

Subil Abraham – 03/25/2020 – Huang et al., “Evorus”

This paper introduces Evorus, a conversational assistant framework/interface that can serve as a middleman to curate and choose the appropriate responses for a client’s query. The goal of Evorus is to serve as a middleman between a user and many integrated chatbots, while also using crowd workers to vote on which responses are the best given the query and the context. This allows Evorus to be a general purpose chatbot, because it being powered by many domain specific chatbot and (initially) crowd workers. Evorus learns over time from the crowd workers votes on which responses to send to a query, based on its historical knowledge of previous conversations, and also learn which chatbot to direct a query to based on what it knows of which chatbots responded to similar queries in the past. It also prevents bias against newer chatbots by giving them higher initial probabilities when they first start to allow them to be selected even though Evorus does not have any historical data or prior familiarity with that chat bot. The ultimate ideal of Evorus is to eventually minimize the number of crowd worker interventions that are necessary by learning which responses to vote on and pass through to the user, and thus save crowd work costs over time.

This paper seems to follow on the theme of last week’s reading “Pull the Plug? Predicting If Computers or Humans Should Segment Images”. In that paper, the application is trying to decide on the quality of image segmentation of an algorithm, and pass it on to a human in case it was not up to par. The goals of this paper seem similar to that, but for chat bots instead of image segmentation algorithms. I’m starting to think the idea of curation and quality checking is a common refrain that will pop up in other crowd work based applications, if I keep reading in this area. I also find it an interesting choice that Evorus seems to allow multiple responses (either from bots or from crowd workers) to be voted in and displayed to the client. I suppose the idea here is that, as long as the responses made sense and they add more information that can be given to the client, it’s beneficial to allow multiple responses instead of trying to force a single, canonical response. Though I like this paper and the application that it presents, one issue I have is that they don’t show a proper user study. Maybe they felt it was unnecessary because user studies on automatic and crowd based chatbots have been done before and the results of these would be no different. But I still think they should’ve done some client side interviews or observations, or at least shown a graph of the Likert scale responses they collected for the two phases.

  1. Do you see a similarity between this work and the Pull the Plug paper? Is the idea of curation and quality control and teaching AI how to do quality control a common refrain in crowd work research?
  2. Do you find the integration of Filler bot, Interview bot, and clever bot, which are not actually contributing anything useful to the conversation, of any use? Was it just there to add conversation noise? Did they serve a training purpose?
  3. Would a user study have shown anything interesting or suprising compared a standard AI based or crowd based chat bot?

Read More

03/25/2020 – Sushmethaa Muhundan – Evorus: A crowd-powered Conversational Assistant Built to Automate Itself Over Time

The paper explores the feasibility of a crowd-powered conversational assistant that is capable of automating itself over time. The main intent of building such a system is to dynamically support a vast set of domains by exploiting the capabilities of numerous chatbots and providing a universal portal to help answer user’s questions. The system, Evorus, is capable of supporting multiple bots and given a query, predicts which bot’s response is most relevant to the current conversation. This prediction is validated using crowd workers from MTurk and the response with the maximum upvotes is sent to the user. The feedback gained from the workers is then used to develop a learning algorithm that helps improve the system. As part of this study, the Evorus chatbot was integrated with Google hangouts and user’s queries were presented to MTurk workers via an interface. The workers are presented with multiple possible answers that come from various bots for each query. The workers can then choose to upvote or downvote the answers presented or respond to the query by typing in an appropriate answer. An automatic voting system was also devised with the aim to reduce worker’s involvement in the process. The results of the study showed that Evorus was able to automate itself over time without compromising conversation quality.

I feel that the problem that this paper is trying to solve is very real: the current landscape of conversational assistants like Apple’s Siri and Amazon’s Echo is limited to specific commands and the users need to be aware of the commands supported in order to maximize the benefit of using them. This oftentimes becomes a roadblock as the AI bots are constrained to specific, pre-defined domains. Evorus tries to solve this problem by creating a platform that is capable of integrating multiple bots and leveraging their skill-set to answer a myriad of questions from different domains. 

The focus on manual intervention reduction via automation and focus on quality throughout the paper was good. I found the voting bot particularly interesting where a learning algorithm was developed that used upvotes and downvotes provided by workers on previous conversations to learn from the worker’s voting patterns and would be capable of making similar decisions. Also, the upvotes and downvotes were also used to gauge the quality of responses from candidates and this was used as further input to predict the most suitable bots in the future.

Fact boards were another interesting feature that included chat logs and recorded facts and were part of the interface provided to the workers to provide context about the conversation. This ensures that the workers are caught up to speed and are capable of making informed decisions while responding to the users.

  1. Given the scale at which information generation is growing, is the solution proposed in the paper feasible? Can this truly handle diverse domain queries while reducing human efforts drastically and also maintaining quality?
  2. Given the complexity of natural languages, would the proposed AI system be able to completely understand the user’s need and respond with relevant replies without human intervention? Would the role of a human ever become dispensable?
  3. How long do you think would it take for the training to be sufficient to entirely remove the role of a human in the loop in the Evorus system?

Read More

03/18/2020 – Nan LI – Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time

Summary:

The main objective of this paper is to solve the monetary cost and response latency problem of crowd-powered conversational assistants. The first approach developed in this paper is to combine the crowd workers and automated systems to achieve high quality, low latency, and low-cost solution. Based on this idea, the paper designed a crowd-powered conversational assistant, Evorus, which can gradually automate itself over time by including varies responses from chosen chatbots, learn and reuse prior responses, and also reduce the oversight from the crowd via an automatic voting system. To design and refined the flexible framework for open-domain dialog, the author developed two phases of public field deployment and testing with real users. The final goal of this system is to let the automatic components within this system gradually take over from the crowd.

Reflection:

I think this paper presented several excellent points. First, the crowd-powered system has been widely employed because of low monetary costs and high convenience. However, as the number of required crowd workers and tasks increases, the expenditures for those workers gradually increase to a non-negligible number. Besides, even though the platform that enables hiring crowd workers quickly is available, the response latency is still non-ignorable. The author in this paper also realized this deficiency and trying to develop an approach to solve these problems.

Second, it is a prevalent idea that combines crowd workers and automated systems. The novelty of this paper is adding another automatic voting system to decide which response to send to the end-user. This machine learning model enables a high-quality response by reducing crowed-oversight. The increase of error tolerance enables even an imperfect automation component to contribute to the conversation without impact the quality. Thus, the system could integrate more types of chatbots and extend the explore region of different actions. Besides, due to the balance of “upvote” and “downvote” of this system, Evorus enables flexibility and fluid collaboration between humans and chatbots.

Third, another novel attribute of this system is “reuse prior responses.” I think the idea of enabling Evorus to find answers to similar queries in prior conversations to suggest as responses to new queries is a key approach that probably changes the partial crowed-prowed system to a completely automatic system. Because this is a simulation of people learning from the past, furthermore, this is also what we do in daily life conversation. Thus, as the system involves in more conversations, and memorizes more query-response pairs from all the old conversations, the system might be able to build a comprehensive database which stores all type of conversation query and response. On that day, the system might become automatic, ultimately. However, this database might need to up data and become partially crowd-powered once a while regarding the constant change of information and the way people communicate.

Question:

  • What do you think of this system? Do you think it is possible that the system only relies on automation one day?
  • What do you think about the voting system? Do you think it is a critical factor that enables a high-quality response? What do you think about the design of different weights for “upvote” and “downvote”?
  • It is a prevalent idea nowadays to combine crowd-worker and AI systems to achieve high accuracy or high quality. However, the author in the paper expects the system to rely on automation increasingly. Can you see the benefit if this expectation achieved?

Word Count: 567

Read More