Summary
The paper by Hope et al. talks about analogy mining from texts, primarily data that is unstructured. They have used a product description dataset from Quirky.com, which is described to be a product innovation website, to find products that are similar. To be specific, they have used “purpose” and the “mechanism” of products to find analogies between items. They have also considered the traditional similarity metrics and techniques, namely the TF-IDF, LSA, GloVe, and LDA, to compare the proposed approach. Amazon Mechanical Turk crowd workers were used to create the training dataset. A recurrent neural network was then used to learn the representations of purpose and mechanism from the human-annotated training data. It is mentioned in the paper that they wanted to see if their approach enhanced creativity in idea generation. They tried to measure this by using graduate students to judge the idea generated on three main areas, novelty, quality, and feasibility. It was concluded that there was an increase of creativity among the participants of the study.
Reflection
The paper is an interesting read on the topic of finding analogies in texts. I really found it interesting how they defined similarities based on the purpose and the mechanisms in which the products worked. I know that the authors mentioned that since the dataset was about product description, the purpose-mechanism structure worked in finding analogies. However, they suggested some complex or hierarchial levels for more complex datasets like scientific papers. The only concern I had with this comment was, wouldn’t increasing the complexity of the training data further complicate the process of finding analogies? Instead of hierarchical level, I think it is best to add other labels to the text to find similarities. I think what I am suggesting is similar to what was done in the paper by Chang et al. [1], where background and findings were also included along with the labels included here.
The paper is a good groundwork on the work of finding similarities while using crowd workers to create the training data. This methodology, in my opinion, truly forms a mixed-initiative structure. Here, the authors did extensive evaluation and experimentation on the AI side of things. I really liked the way they compared against the traditional information retrieval mechanisms to find analogies.
I liked that the paper also aimed to find if the creativity was increased. My only concern was “creativity” although defined is subjective. They said that they used graduate students but did not mention their background. Hence, a graduate student with a relatively creative background, say a minor in a creative field may view things differently.
In conclusion, I found this research to be strong as it included verification and validation of the results from all angles and not only the AI or the human side.
Questions
- Are you using a similarity metric in your course project? If yes, what are the algorithms you are using? ( In our project, we are not using any similarity metric, but I have used all the traditional metrics mentioned in the paper in my research work before).
- Other than scientific data, what other kinds of complex datasets would need additional labels or hierarchical labeling?
- Do you agree with the authors’ way of finding if the study had enhanced creativity?
References
- Chan, Joel, et al. “Solvent: A mixed initiative system for finding analogies between research papers.” Proceedings of the ACM on Human-Computer Interaction 2.CSCW (2018): 1-21.