03/04/2020 – Dylan Finch – Real-Time Captioning by Groups of Non-Experts

Word count: 564

Summary of the Reading

This paper aims to help with accessibility of audio streams by making it easier to create captions for deaf listeners. The typical solution to this problem is to hire expensive, highly trained professionals who require specialized keyboards, stenographers. Or, in other cases, people with less training to create captions, but these captions may take longer to write, creating a latency between what is said in the audio and the captions. This is not desirable, because it makes it harder for the deaf person to connect the audio with any accompanying video. This paper aims to marry cheap, easy to produce captions with the ability to have the cpations created in real time and with little latency. The solution is to use many people who do not require specialized training. When working together, a group of crowd workers can achieve high caption coverage of audio with a latency of only 2.9 seconds.

Reflections and Connections

I think that this paper highlights one of the coolest things that crowdsourcing can do. It can take big, complicated tasks that used to require highly trained individuals and make them accomplishable by ordinary people. This is extremely powerful. It makes all kinds of technologies and techniques much more accessible. It is hard to hire one highly trained stenographer, but it is easy to hire a few normal people. This is the same idea that powers Wikipedia. Many people make small edits, using specialized knowledge that they know, and, together, they create a highly accessible and complete collection of knowledge. This same principle can and should be applied to many more fields. I would love to see what other professions could be democratized through the use of many normal people to replace one highly trained person. 

This research also shows how it is possible to break up tasks that may have traditionally been thought of as atomic. Transcribing audio is a very hard task to solve using crowd workers because there are not real discrete tasks that could b e sent to crowd workers. The stream of audio is continuous and always changing. However, this paper shows that it is possible to break up this activity into manageable chunks that can be accomplished by crowd workers, the researchers just needed to think outside of the box. I think that this kind of thinking will become increasingly important as more and more work is crowdsourced. I think that as we learn how to solve more and more problems using crowdsourcing, the issue becomes less and less ot can we solve this using crowdsource and becomes much more about how can we break up this problem into manageable pieces that can be done by the crowd. This kind of research has applications elsewhere, too. I think that in the future this kind of research will be much more important. 

Questions

  1. What are some similar tasks that could be crowdsourced using a method similar to the one described in the paper?
  2. How do you think that crowdsourcing will impact the accessibility of our world? Are there other ways that crowdsourcing could make our world more accessible?
  3. Do you think there will come a time when most professions can be accomplished by crowd workers? What do you think the extent of crowd expertise will be?

3 thoughts on “03/04/2020 – Dylan Finch – Real-Time Captioning by Groups of Non-Experts

  1. Hi Dylan,

    Great comment. The thought about this system uses the same idea that powers Wikipedia is really interesting. I would like to make a comment on your third question. First, I don’t think that professions can be accomplished by crowd workers one day. Because I think the idea related to this paper only limited when regular people can also do part of the captioned job. These high workload jobs can be accomplished by crowd workers easily, but what about the jobs which require a high professional? For example, designing the internal structure of a building These types of jobs can only be accomplished by people who have been trained professionally, instead of a group of regular crowd workers.

  2. Great point Dylan. I agree that crowdsourcing can be used in a variety of tasks. The ones similar to audio captioning I can think about include translation, photo tagging, and even writing. There is also potential to extend crowdsourcing to more creative tasks, such as website design, flyer design, video clipping, etc. There are a lot of advantages of using crowdsourcing, including comparatively lower cost, faster in task tackling, and more diverse ideas. However, I feel not every task is well-suited for collective intelligence. For example, the task dealing with a sensitive problem. Such tasks may benefit from crowdsourcing, but turning them over to crowds may raise ethical issues. Compared to considering the capacity of crowd expertise, my concern is more about efficiency. For the task presented in the paper, it can be achieved by an individual with proper training, while it may take several times of the same time to complete task by merging the work of several non-expert individuals.

  3. I would like to answer the first question. I think the Google Translate works similarly. The users correct segments of sentence translation (phrases) and the AI is able to improve upon this. In my experience, Google Translate was really bad upon launch but now provides much better translation for certain language pairs.

Leave a Reply