03/04/2020 – Mohannad Al Ameedi – Real-Time Captioning by Groups of Non-Experts

Summary

In this paper, the authors proposing a low latency captioning solution for the deaf and hard of hearing people that can work in real-time setting. Although, there are available solutions, but they are either very expensive or low quality. The proposed system allows people with hearing disability to request a captioning at any time and get the result in a few seconds. The system depends on a combination of non-expert crowd sourcing workers and local staff to provide the captioning. Each request will be handled by multiple people and the result will be a combination of all the participants’ input.  The request will be submitted in an audio stream format and the result will be in a text format. Crowdsource platform is used to submit the request and the result is retrieved in seconds. The proposed system uses an algorithm that work on a stream manner where the input can be process as it is received and aggregate the result at the end. The system outperforms all other available options on both coverage and accuracy.  The proposed solution is feasible to be applied in a production setting.

Reflection

I found the idea of real time captioning very interesting. My understanding was there is always a latency when depending on crowdsourcing and cannot be applied in real world scenarios, but it will be interesting to know how the system will work when the number of users increase.

I also found the concept of multiple people working on the same audio stream and combining the result very interesting. Collecting captions from multiple people and then trying to figure out what is unique and what is duplicate and producing a final sentence, paragraph, or script is a challenging task.

This work is like multiple people work on one task or multiple developers writing code to implement a single feature. Normally the supervisor or development lead will merge the result, but in this case the algorithm is taking care of the merge.

Questions

  • The authors measured the system on a limited number of users, do you think the system will continue outperforming other methods if it is get deployed in real world setting?
  • Since we have an increasing number of live streaming on work, school, and other places, can we use the same concept to pass the URL and get instance captioning? What are the limitations of this approach?
  • What are the privacy concerns with this approach especially if it is get used in medical field? Normally limited number of people get hired to help on such tasks, while the crowdsourcing is opened to a wide range of people.

One thought on “03/04/2020 – Mohannad Al Ameedi – Real-Time Captioning by Groups of Non-Experts

  1. I also found the concept of multiple people working on a single audio stream to be interesting and I like the analogy you draw to multiple developers working on code for a single feature.
    With respect to your question on ‘privacy concerns with this approach especially if it is get used in medical field’, perhaps the implementers of the system could utilize private clouds that would sign an NDA and attempt to perform some kind of obfuscation (at a very high level) before the data is sent out to crowdworkers.

Leave a Reply