Summary
Often, humans have to be trained on how to talk to AIs such that the AI understands the human and what it is supposed to do. However, this certainly puts the onus on the human to adjust rather than having the AI adjust to the human. One way of solving that issue was to create crowdsourced responses so that the human and AI could understand each other with what was essentially a middle-man approach. Huang et al.’s work was on creating a hybrid crowd and AI powered conversational assistant that aims to create an AI that can be fast enough for a human to interact with naturally but also creates higher-quality natural responses that the crowd could create. It accomplishes this through reusing responses that were previously generated by the crowd over time as high quality responses are identified. They also deployed Evorus over a period of 5 months with 281 conversations that gradually moved from crowd responses to chatbot responses.
Personal Reflection
I liked that the authors took an odd stand in this line of research early on in the paper – that, while it has been suggested that AI will eventually take over for the crowd implementations of a lot of systems, this hasn’t happened despite a lot of research. This stand highlights that the crowd has been performing tasks that it was supposed to stop at some point for a long time.
Also, I found that I could possibly adapt what I’m working on with automatic speech recognition (ASR) to improve itself with a similar approach. If I took, for example, several outputs from different ASR algorithms along with crowd responses and had a ranked voting for which was the best transcription, perhaps it could eventually wean itself off the crowd as well.
It was also interesting that they took a reddit or other social website approach with the upvote/downvote system of determination. This approach seems to have long legs in fielding appropriate responses via the crowd.
The last observation I would like to make is that they had an interesting and diverse set of bots, though I question the usefulness of some of them. I don’t really understand how filler bot can be useful except in situations where the AI doesn’t really understand what is happening, for example. I had also though the interview bot would be low performing as the types of interviews it describes that it pulled its training data from would be particular to certain types of people.
Questions
- Considering they said that the authors felt that Evorus wasn’t a complete system but a stepping point to a complete system, what do you think they could do to improve it? I.E., what more can be done?
- What other domains within human-AI collaboration could use this approach of the crowd being a scaffold that the AI develops upon until the scaffold is no longer needed? Is the lack of these deployments evidence that the developers don’t want to leave this crutch or is it due to the crowd still being needed?
- Does the weighting behind the upvotes and downvotes make sense? Should the votes have equal (yet opposite) weighting or should they be manipulated as the authors did? Why or why not?
- Should the workers be incentivized for upvotes or downvotes? What does this do the middle-of-the-road responses that could be right or wrong?
So far as other domains go where this kind of setup could be useful, I think image recognition and transcription could definitely find help here. For a use case that happens in real time, maybe a translator application where people take pictures of signs when they are visiting foreign countries in order to get that translated in real time. OCR is quite good, as is machine translation, but sometimes the picture might have bad lighting or be at a bad angle or half the words are cut off (in which case the machine will give an incorrect translation). In those situations, a human might always need to be on hand to help.