Academic research strives for innovation in every field. But in order to innovate, researchers need to scour the length and breadth of their field as well as adjacent fields in order to get new ideas, understand the current work, and make sure they are not repeating someone else’s work. Being able to use an automated system that will scour the published research landscape for similar existing work would be a huge boost to productivity. SOLVENT aims to solve this problem by providing a method of having humans annotating the research abstracts to identify the background, purpose, method, and findings and use that as a schema to index the papers and identify similar papers (which have also been annotated and indexed) that match closely in one or a combination of those categories. The authors conduct three case studies to validate their method, for finding analogies within a single domain, for finding analogies across different domains, and for validating if annotating using crowd workers would be useful. They find that their method consistently outperforms other baselines in finding analogous papers.
I think that something like this would be incredibly helpful for researchers and significantly streamline the research process. I would gladly use something like this in my research process because it would save so much time. Of course, as they pointed out, the issue is scale because for it to be useful, a very large chunk (or ideally all) of current published research needs to be annotated according to the presented system. This could be something that can be integrated as a gamification mechanism in google scholar, where you occasionally ask the user to annotate an abstract. This way, you’re able to do it at scale. I also find it interesting that purpose+mechanism produced more results than background+purpose+mechanism. I would’ve figured that the more context that background provides would serve to find better matches. But given the goal of finding analogies even in different fields, perhaps background+purpose+mechanism rightly does not give great results because it gets too specific by providing too much information.
- Would you find use for this? Or would you prefer seeking out papers on your own?
- Do you think that the annotation categories are appropriate? Would other categories work? Maybe more or less categories?
- Is it feasible to expand on this work to cover annotating the content of whole papers? Would that be asking too much?
To answer your first question, such a system does seem helpful and I think it would prove really useful for me as it saves a lot of time and effort.