{"id":723,"date":"2020-03-04T14:17:54","date_gmt":"2020-03-04T14:17:54","guid":{"rendered":"http:\/\/wordpress.cs.vt.edu\/cs6724s20\/?p=723"},"modified":"2020-03-04T14:17:54","modified_gmt":"2020-03-04T14:17:54","slug":"03-04-2020-ziyao-wang-real-time-captioning-by-groups-of-non-experts","status":"publish","type":"post","link":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/2020\/03\/04\/03-04-2020-ziyao-wang-real-time-captioning-by-groups-of-non-experts\/","title":{"rendered":"03\/04\/2020- Ziyao Wang \u2013 Real-time captioning by groups of non-experts"},"content":{"rendered":"\n<p>Traditional real-time captioning tasks are completed by\nprofessional captionists.\nHowever, the cost to hire them is expensive. Alternatively, some automatic\nspeech recognition systems have been developed. But there is still problem that\nthese systems perform badly when the audio quality is low or there are multiple\npeople talking. In this paper, the authors developed a system which can hire\nseveral non-expert workers to do the caption task and merge their works\ntogether to obtain a high accuracy caption output. As the workers have a\nsignificant lower salary compared with the experts, the cost will be reduced\neven multiple workers are hired. Also, the system has a good performance\ncollecting workers\u2019 jobs and merging them to get a high accuracy output with\nlow latency.<\/p>\n\n\n\n<p>Reflections:<\/p>\n\n\n\n<p>When\nsolving problems with the requirement of high accuracy and low latency, I\nalways hold the view that only AI or experts can complete such kind of tasks.\nHowever, in this paper, the authors showed us that non-experts can also complete\nthis kind of tasks if we can have a group of people work together. <\/p>\n\n\n\n<p>Compared\nwith the professionals, hiring non-experts will cost much less. Compared with\nAI, people can handle some complicated situations better. This system combined\nthis two advantages and provided a cheap real-time captioning system with high\naccuracy. <\/p>\n\n\n\n<p>It\nis for sure that this system has lots of advantages, but we should still\nconsider it critically. For the cost, it is true that hiring non-experts will\nspend much less than hiring professional captionists. However, the system needs\nto hire 10 workers to get 80 to 90 percentage accuracy. Even though the workers\nhave a low salary, for example 10 dollars per hour, the total cost will reach\n100 dollars per hour. Hiring experts will only cost around 120 dollars for one\nhour, which shows that the saving of applying the system is relatively low.<\/p>\n\n\n\n<p>For\nthe accuracy part, there is possibility that all the 10 workers missed a part\nof the audio. As a result, even merging all the results provided by the\nworkers, the system will still miss this part\u2019s caption. Instead, though the AI\nsystem may provide caption with errors, the system can at least provide\nsomething for all words in the audio.<\/p>\n\n\n\n<p>For\nthese two reasons, I think hiring less workers, for example three to five\nworkers, to fix the errors in the system generated caption will save more money\nwhile the system can still maintain high accuracy. And with the provided\ncaption, the workers\u2019 tasks will be easier, and they may provide more accurate\nresults. Also, for the circumstances in which AI system performs well, the\nworkers will not need to spent time typing, and the latency of the system will\nbe reduced.<\/p>\n\n\n\n<p>Questions:<\/p>\n\n\n\n<p>What\nare the advantages of hiring non-expert humans to do the captioning compared\nwith the experts or AI systems?<\/p>\n\n\n\n<p>Will\na system hiring less workers to fix the errors in the AI generated caption be\ncheaper? Will this system perform better?<\/p>\n\n\n\n<p>For\nthe system mentioned in the second question, does it have any limitations or\ndrawbacks?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Traditional real-time captioning tasks are completed by professional captionists. However, the cost to hire them is expensive. Alternatively, some automatic speech recognition systems have been developed. But there is still problem that these systems perform badly when the audio quality is low or there are multiple people talking. In this paper, the authors developed a [&hellip;]<\/p>\n","protected":false},"author":297,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[75,77],"class_list":["post-723","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-captioning","tag-class7"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/723","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/users\/297"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/comments?post=723"}],"version-history":[{"count":2,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/723\/revisions"}],"predecessor-version":[{"id":727,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/723\/revisions\/727"}],"wp:attachment":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/media?parent=723"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/categories?post=723"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/tags?post=723"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}