4/29/2020 – Nurendra Choudhary – Accelerating Innovation Through Analogy Mining

Summary

In this paper, the authors argue the need for an automated tool for finding analogies from large research repositories such as US patent databases. Previous approaches in the area include manually constructing large structured corpora and automated approaches that are able to find semantically relevant research but cannot identify proper structure in the documents. The manual corpora are expensive to construct and maintain, whereas, automated detection is inefficient due to lack of structure identification.

The authors propose an architecture that defines a structured purpose-mechanism schema for the analogy identification between two research papers. The purpose and mechanism are identified by crowd-workers and a word vectorization is utilized to represent the different sections as vectors. The similarity metric is cosine similarity between the query vectors. The query vectors for the experiments are purpose, mechanism and concatenation of purpose and mechanism. The authors utilize the Precision@K metric to evaluate and compare to conclude the efficiency of mechanism only and concat purpose-mechanism queries over other types. 

Reflection

The paper is very similar to the SOLVENT discussed in the previous class. I believe they were both developed by the same research group and also share the same authors. I believe SOLVENT solves a range of problems in this paper. For example, the problem that purpose-mechanism cannot be generalized to all research fields and there is a need to add additional fields to make it work better for a wider range of fields. 

The baselines do not utilize the entire paper. I do not think it is fair to compare abstracts of different domains to find analogies. The abstract do not always speak about the problem or solution in necessary depth. According to me, we should add more sections such as Introduction, Methodology and Conclusion. I am not sure if they would perform better but I would like to see these metrics reported. Also the diversity of fields used in the experiments is limited to engineering backgrounds. I think this should be expanded to include other fields such as medicine, business and humanities (a lot of early scientists were philosophers :P).

Questions

  1. What problems in this paper does SOLVENT solve? Do you think the improvement in performance is worth the additional memory utilized?
  2. How do you think this framework will help in inspiring new work? Can we put our ideas into a purpose schema mechanism to get a set of relevant analogies that may inspire further research?
  3. The authors only utilize abstract to find analogies. Do you think this is a good enough baseline? Should we utilize the entire paper as a baseline? What are the advantages and disadvantages of such an approach? Is it not more fair?
  4. Currently, can we learn purpose mechanism schemas for different fields independently and map between them? Is there a limit to the amount of variation that limits this framework? For example, is it fair to use medical documents’ abstracts and compare them to abstracts to CS papers given the stark amount of difference between them?

Word Count: 509

Leave a Reply