Paper: Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Shahaf, and Aniket Kittur. 2018. SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers. Proc. ACM Hum.-Comput. Interact. 2, CSCW: 31:1–31:21
Summary: In this paper, the authors attempt to help researchers by generating (mining) analogies in other domains to help support interdisciplinary research. The paper proposes a unique annotation schema to extend prior work by Hope et al. and has four key facets: background, purpose, mechanism, and findings. The paper also has 3 interesting studies. First, it was collecting annotations from domain experts in research fields, and second, using the Solvent system to generate analogies with real-world usefulness. Finally, the authors scaled up Solvent through the use of crowdsourcing workflows. In each of the three studies, they used semantic vector representations for the annotations. The first study had a dataset focused on papers from CSCW and annotated by a member of the research team, while the second study involved working with an interdisciplinary team in bioengineering and mechanical engineering to determine whether Solvent could help identify analogies not easily found with citation tree search. Finally, in the third study, the authors leveraged crowd workers from UpWork and Amazon Mechanical Turk to generate annotations, and the authors found that workers had difficulties with the purpose and mechanism type annotations. On the whole, the Solvent system was found to help researchers and generate analogies effectively.
Reflection: Overall, I think this paper is well-motivated, and the 3 studies that form the basis for the results are impressive. It was also interesting that there was significant agreement between crowd workers and researchers in terms of annotation percentage. This proves a useful finding more broadly in that novices may be able to contribute to science not necessarily by doing science (especially as science gets harder to do by “normal” people, and is done in larger and larger teams), but by finding analogies between different disciplines’ literatures.
For their second study, the authors trained a word2vec model on a curated dataset of over 3000 papers from 3 domains. This was also good because they did not limit their work to just one domain and strived to generalize their work/findings. However, they are still largely engineering disciplines, albeit CSCW has a somewhat social science component to it. I wonder how it would work in other disciplines such as between the pure sciences? That might be an interesting follow up study.
I wonder how such a system might be deployed more broadly, as compared to a limited way that was done in this paper. I also wonder how long it would have taken crowd workers to go through the tasks and generate the findings in total.
Questions:
- What other domains do you think Solvent would be useful in? Would it easily generalize?
- Is majority vote an appropriate mechanism? What else could be used?
- What are the challenges to creating analogies?