Summary
Huang et al.’s paper “Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time” explores a novel approach of a crowd-powered system architecture that can support gradual automation. The motivation of the research is that crowd-powered conversational assistants have been shown to achieve better performance than automated systems, but they cannot be widely adopted due to their monetary cost and response latency. Inspired by the idea to combine the crowd-powered and automatic approaches, the researchers develop Evorus, a crowd-powered conversational assistant. Powered by three major components(learning to choose chatbots, reusing prior responses, and automatic voting), new chatbots can be automated to more scenarios over time. The experimental results show that Evorus can evolve without compromising conversational quality. Their proposed framework contributes to the research direction of how automation can be introduced efficiently in a deployed system.
Reflection
Overall, this paper proposes an interesting gradual automatic approach for empowering conversational assistants. One selling point of the paper is that users can converse with the proposed Evorus in open domains instead of limited ones. To achieve that goal, the researchers design the learning framework as it can assign a slightly higher likelihood of newly-added chatbots to collect more data. I would imagine it requires a large amount of time for domain-specific data collection. Sufficient data collection in each domain seems important to ensure the quality of open-domain conversation. Similar to the cold-start problems in recommender systems, the data collected for different domains is likely imbalanced, for example, certain domains may gain no or little data during the general data collection process. It is unclear how the proposed framework can deal with this problem. One direction I can think about is to utilize some machine learning techniques such as zero-shot learning (domain does not appear in prior conversations) and few-shot learning (domain rarely discussed in prior conversations) to deal with the imbalanced data collected by the chatbot selector.
For the second component, the reuse of prior answers seems a good way to reduce the system computational cost. However, text retrieval can be very challenging. Take lexical ambiguity as an example, polysemous words would hinder the accuracy of the derived results because different contexts are mixed in the instances, collected from the corpus, in which a polysemous word occurred. If the retrieval component cannot handle the lexical ambiguity issue well, the reuse of prior answers may find irrelevant responses to user conversations which could potentially introduce errors to the results.
In the design of the third component, both workers and the vote bot can upvote suggested responses. It requires the collection of sufficient vote weight until Evorus accepts the response candidate and sends it to the users. Depending on the threshold of the sufficient vote weight, the latency could be very long. In the design of user-centric applications, it is important to keep the latency/runtime in mind. I feel the paper would be more persuasive if it can provide support experiments on the latency of the proposed system.
Discussion
I think the following questions are worthy of further discussion.
- Do you think it is important to ensure users can converse with the conversational agents in open domains in all scenarios? Why or why not?
- What improvements can you think about that the researchers could improve in the Evorus learning framework?
- At what point, you think the conversational agent can be purely automatic or it is better to always have a human-in-the-loop component?
- Do you consider utilizing the gradual automated learning framework in your project? If yes, how are you going to implement it?