This work aims to explore a mixed-initiative approach to help find analogies between research papers. The study uses research experts as well as crowd-workers to annotate research papers by marking the purpose, mechanism, and findings of the paper. Using these annotations, a semantic vector representation is constructed that can be used to compare different research papers and identify analogies within domains as well as across domains. The paper aims to leverage the inherent causal relationship between purpose and mechanism to build “soft” relational schemas that can be compared to determine analogies. Three studies were conducted as part of this paper. The first study was to test the system’s quality and feasibility by asking domain expert researchers to annotate 50 research papers. The second study was to explore whether the system would be beneficial to actual researchers looking for analogical inspiration. The third study involved using crowd workers as annotators to explore scalability. The results from the studies showed that annotating the purpose and mechanism aspects of research papers is scalable in terms of cost, not critically dependent on annotator expertise, and is generalizable across domains.
I feel that the problem this system is trying to solve is real. While working on research papers, there is often a need to find analogies for inspiration and/or competitive analysis. I have also faced difficulty finding relevant research papers while working on my thesis. If scaled and deployed properly, SOLVENT would definitely be helpful to researchers and could potentially save a lot of time that would otherwise be spent on searching for related papers.
The paper claims that the system quality is not critically dependent on annotator expertise and the system can be scaled using crowd workers as annotators. However, the results showed that the annotations of Upwork workers matched expert annotations 78% of the time and those of Mturk workers matched 59% of the time. The results also showed that the results varied considerably: a few papers had 96% agreement while a few had only 4%. I am a little skeptical regarding these numbers and I am not convinced that expert annotations are dispensable. I feel that using crowd workers might help the system scale but it might have a negative impact on quality.
I found one possible future extension extremely interesting: the possibility of authors themselves annotation their work. I feel that if each author spends a little extra effort to annotate their own work, a large corpus could easily be created with high-quality annotations. SOLVENT could easily produce great results using this corpus.
- What are your thoughts about the system proposed? Would you want to use this system to aid your research work?
- The study indicated that the system needs to be vetted with large datasets and the usefulness of the system is yet to be truly tested in real-world settings. Given these limitations, do you think the usage of this system is feasible? Why or why not?
- One potential extension mentioned in the paper is to combine the content-based approach with graph-based approaches like citation graphs. What are other possible extensions that would enhance the current system?
Hi Sushmethaa, I agree with you that the system is helpful when we are trying to find analogies for inspiration. To address your first question, I think it is a neat idea and has a lot of potential for extensions. I would like to browse the papers retrieved by the system to see if there are any relevant papers especially those in other research areas. I’m not sure if the current version of the proposed system can save me time because just as other trained graduate students, I tend to go through its abstract first when I get a paper, and most of the abstracts follow a writing format which facilitates our reading even without the colorful annotation presented by the Solvent system. I would expect more from the system beyond that.