{"id":1144,"date":"2020-04-21T03:06:10","date_gmt":"2020-04-21T07:06:10","guid":{"rendered":"http:\/\/wordpress.cs.vt.edu\/cs6724s20\/?p=1144"},"modified":"2020-04-21T03:06:10","modified_gmt":"2020-04-21T07:06:10","slug":"04-22-2020-bipasha-banerjee-solvent-a-mixed-initiative-system-for-finding-analogies-between-research-papers","status":"publish","type":"post","link":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/2020\/04\/21\/04-22-2020-bipasha-banerjee-solvent-a-mixed-initiative-system-for-finding-analogies-between-research-papers\/","title":{"rendered":"04\/22\/2020 &#8211; Bipasha Banerjee &#8211; SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><strong>Summary&nbsp;<\/strong><br><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The paper by Chan et al. is an interesting read on finding analogies from research papers. The main domain considered here is scientific papers. The annotation scheme has been divided into four categories, namely, background, purpose, mechanism, and findings. This paper&#8217;s goal is to make it easier for researchers to find related work in their field. The author conducted three studies to test their approach and its feasibility. The first study consisted of domain-experts annotating a particular abstract in their research area. The second study, on the other hand, focussed mainly on how the model could tackle the real-world problem where a researcher needs to find relevant papers in their area to act as inspiration, related-work, or even baselines for their experiments. The last study, however, was very different from the first two, where an experienced researcher annotated the data or used the system for solving their research problem. The third study, on the other hand, used crowdworkers to annotate abstracts. The platforms utilized by the authors were Upwork and Amazon Mechanical Turk.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Reflection<\/strong><br><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The mixed-initiative model developed by the authors is an excellent step in the right direction to find analogies in scientific papers. There are traditional approaches in natural language processing that help find similarities in textual data. The website [1] gives an excellent insight into the steps involved in finding similarities in texts. However, when it comes to scientific data, just using these steps is not enough. Additionally, most of the models involved are trained using generic websites and news data (like CNN or DailyMail). Hence, most of the scientific jargon is \u201cout of vocabulary\u201d (OOV) for such models. Hence, I appreciate that the authors used human annotations along with traditional methods in information retrieval (like TF-IDF, etc.) to tackle the problem at hand.<br><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Additionally, for finding the similarity metric, they took multiple categories into account, like Purpose+Mechanism. This is definitely useful when finding similarities in the text data. I also liked the fact that for the studies, they considered normal crowdworkers in addition to people with domain knowledge. I was intrigued to find that 75% of the time, the annotations of crowdworkers matched with the researchers. Hence the conclusion that \u201ccrowd annotations still improve analogy-mining\u201d is valuable. Not only that, getting researchers in large amounts in one domain just to annotate the data is difficult, sometimes there are very few people in one domain of research. Rather than having to find researchers who are available to annotate such data, it is good that we can use existing methods available.<br><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Lastly, I would like to mention that I liked that the paper identified the limitations very well, and the scope for future work has also been clearly mentioned.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Questions<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Would you agree that level of expertise of the human annotators would not affect the results for your course project? If yes, could you please clarify?<\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">(For my class project, I think I would agree with the paper\u2019s findings. I work on reference string parsing, and I don\u2019t think we need experts just to label the tokens.)<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Could we have more complex categories or sub-categories rather than just the four identified?<\/li><li>How would this extend to longer pieces of texts like chapters of book-length documents?&nbsp;<\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>References&nbsp;<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><a href=\"https:\/\/medium.com\/@Intellica.AI\/comparison-of-different-word-embeddings-on-text-similarity-a-use-case-in-nlp-e83e08469c1c\">https:\/\/medium.com\/@Intellica.AI\/comparison-of-different-word-embeddings-on-text-similarity-a-use-case-in-nlp-e83e08469c1c<\/a>&nbsp;<\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary&nbsp; The paper by Chan et al. is an interesting read on finding analogies from research papers. The main domain considered here is scientific papers. The annotation scheme has been divided into four categories, namely, background, purpose, mechanism, and findings. This paper&#8217;s goal is to make it easier for researchers to find related work in [&hellip;]<\/p>\n","protected":false},"author":186,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[106,109],"class_list":["post-1144","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-class14","tag-solvent"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/1144","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/users\/186"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/comments?post=1144"}],"version-history":[{"count":1,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/1144\/revisions"}],"predecessor-version":[{"id":1145,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/posts\/1144\/revisions\/1145"}],"wp:attachment":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/media?parent=1144"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/categories?post=1144"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724s20\/wp-json\/wp\/v2\/tags?post=1144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}