03/04/2020 – Nurendra Choudhary – Real-time captioning by groups of non-experts

Summary

In this paper, the authors discuss a collaborative real-time captioning framework called LEGION:SCRIBE. They compare their system against the previous approach called CART and Automated Speech Recognition (ASR) system. The authors initiate the discussion with the benefits of captioning. They proceed to explain the expensive cost of hiring stenographers. Stenographers are the fastest and most accurate captioners with access to specialized keyboards and expertise in the area. However, they are prohibitively expensive (100-120$ an hour). ASR is much cheaper but their low accuracy deems them inapplicable in most real-world scenarios.

To alleviate the issues, the authors introduce SCRIBE framework. In SCRIBE, crowd-workers caption smaller parts of the speech. The parts are merged using an independent framework to form the final sentence. The latency of the system is 2.89s, emphasizing its real-time nature, which is a significant improvement over ~5s of CART.

Reflection

The paper introduces an interesting approach to collate data from multiple crowd workers for sequence learning tasks. The method has been applied before in cases such as Google Translate (translating small phrases) and ASR (voice recognition of speech segments). However, SCRIBE distinguishes itself by bringing in real-time improvement in the system. But, the system relies on the availability of crowd workers. This may lead to unreliable behaviour in the system. Additionally, the hired workers are not professionals. Hence, the quality is affected by human behavioral features such as mindset, emotions or mental stamina. I believe a study on the evolution of SCRIBE overtime and its dependence on such features needs to be analyzed.

Furthermore, I question the crowd management system. Amazon MT cannot guarantee real-time labourers. Currently, given the supply of workers with respect to the tasks, workers are always available. However, as more users adopt the system, this need not always hold true. So, crowd management systems should provide alternatives that guarantee such requirements. Also, the work provider needs to find alternatives to maintain real-time interaction, in case the crowd fails. In case of SCRIBE, the authors can append an ASR module in a situation of crowd failure. ASR may not give the best results but would be able to ensure smoother user experience.

The current development system does not consider the volatility of crowd management systems. This makes them an external single point of failure. I think there should be a push in the direction of simultaneously adopting multiple management systems for the framework to increase their reliability. This will also improve system efficiency because it has a more diverse set of results as choice. Thus benefiting the overall model structure and user adoption.

Questions

Google Translate uses a similar strategy by asking its users to translate parts of sentences. Can this technique be globally applied to any sequential learning framework? Is there a way we can divide sequences into independent segments? In case of dependent segments, can we just use a similar merging module or is it always problem-dependent?
The system depends on the availability of crowd workers. Should there be a study on the availability aspect? What kind of systems would be benefitted from this?
Should there be a new crowd work management system with a sole focus on providing real-time data provisions?
Should the responsibility of ensuring real-time nature be on the management system or the work provider? How will it impact the current development framework?

Word Count: 567

One thought on “03/04/2020 – Nurendra Choudhary – Real-time captioning by groups of non-experts”

Dylan Finch says:

March 4, 2020 at 6:12 pm

In response to your first question, I think that answer is yes. The kind of technique described in the paper seems extremely powerful and I don’t think that there is any reason that these ideas could not be applied to many other similar problems. I think that the most important thing that we need to think about when creating crowdsourcing tasks is how we break up the tasks. I think that the best contribution of this paper is the very interesting way that they came up with to break up the task. If we get smarter about how we break up tasks, I think that a lot more problems can be solved via crowdsourcing.

Log in to Reply

Summary

Reflection

Questions

Nurendra Choudhary

One thought on “03/04/2020 – Nurendra Choudhary – Real-time captioning by groups of non-experts”

Leave a Reply Cancel reply