{"id":794,"date":"2020-03-24T23:34:59","date_gmt":"2020-03-24T23:34:59","guid":{"rendered":"http:\/\/wordpress.cs.vt.edu\/cs6724s20\/?p=794"},"modified":"2020-03-24T23:42:04","modified_gmt":"2020-03-24T23:42:04","slug":"3-25-20-jooyoung-whang-evaluating-visual-conversational-agents-via-cooperative-human-ai-games","status":"publish","type":"post","link":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/2020\/03\/24\/3-25-20-jooyoung-whang-evaluating-visual-conversational-agents-via-cooperative-human-ai-games\/","title":{"rendered":"3\/25\/20 &#8211; Jooyoung Whang &#8211; Evaluating Visual Conversational Agents via Cooperative Human-AI Games"},"content":{"rendered":"\n<p>This paper attempts to\nmeasure the performance of a visual conversational bot called ALICE with a\nhuman teammate, opposed to what modern AI study commonly does, which is\nmeasuring amongst a counterpart AI. The authors design and deploy two versions\nof ALICE: one trained by supervised learning and the other by reinforced\nlearning. The authors made Mturk workers have a Q&amp;A session with ALICE to\ndiscover a hidden image only shown to ALICE, within a pool of similar images.\nAfter a fixed set of questions, the Mturk workers were asked to make a guess for\nwhich one was the hidden image. The authors evaluated using the resulting mental\nrankings of the hidden images after the user\u2019s conversations with the AI. They\nfound in previous works that bots trained using reinforced learning performed\nbetter than the other. However, the authors discover that there is no\nsignificant difference when evaluated in a human-AI team. <\/p>\n\n\n\n<p>This paper was a good reminder that the ultimate user at the end is a human. It\u2019s easy to forget that what computers prefer does not automatically translate over to a human\u2019s case. It was especially interesting to see that a significant performance difference in an AI-AI setting was rendered minimal with humans in the loop. It made me wonder what it was about the reinforced-learned ALICE that QBOT preferred over the other version. Once finding that distinguishing factor, we might be able to make humans learn and adapt to the AI, leading to improved team performance.<\/p>\n\n\n\n<p>It was a little disappointing the same research with QBOTs being the subject was left for future work. I would have loved to see the full picture. It could have also provided insight into what I\u2019ve written above; what was it that QBOTs preferred reinforced learning?<\/p>\n\n\n\n<p>This paper identified\nthat there\u2019s still a good distance between human cognition and AI cognition. If\nfurther studies find ways to minimize this gap, it will allow a quicker AI\ndesign process, where the resulting AI will be effective for both human and AI\nwithout needing extra adjustments for the human side. It would be interesting\nto see if it is possible to train an AI to think like a human in the first\nplace.<\/p>\n\n\n\n<p>These are the\nquestions I had while reading this paper:<\/p>\n\n\n\n<p>1. This paper was presented\nin 2017. Do you know any other studies done after this that measured human-AI\nperformance? Do they agree that there\u2019s a gap between humans and AIs?<\/p>\n\n\n\n<p>2. If you have\nexperience training visual conversational bots, do you know if a bot prefers\nsome information over others? What is the most significant difference between a\nbot trained with supervised learning and reinforced learning?<\/p>\n\n\n\n<p>3. In this study, the Mturk workers were asked to make a guess after a fixed number of questions. The study does not measure what\u2019s the minimum or the maximum number of questions needed on average to make an accurate guess. Do you think the accuracy of the guesses will proportionally increase as the number of questions increases? If not, what kind of regression do you think it will follow?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This paper attempts to measure the performance of a visual conversational bot called ALICE with a human teammate, opposed to what modern AI study commonly does, which is measuring amongst a counterpart AI. The authors design and deploy two versions of ALICE: one trained by supervised learning and the other by reinforced learning. The authors [&hellip;]<\/p>\n","protected":false},"author":281,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[87,81],"class_list":["post-794","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-class10","tag-vqagames"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/794","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/users\/281"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/comments?post=794"}],"version-history":[{"count":1,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/794\/revisions"}],"predecessor-version":[{"id":795,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/794\/revisions\/795"}],"wp:attachment":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/media?parent=794"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/categories?post=794"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/tags?post=794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}