04/22/2020 – Palakh Mignonne Jude – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

SUMMARY

The authors attempt to assist researcher to analogies in other domains in an attempt to aid interdisciplinary research. They propose a modified annotation scheme that extends on the work described by Hope et. al. [1] and contains 4 elements – Background, Purpose, Mechanism, and Findings. The authors conduct 3 studies – the first, involving the sourcing of annotations from domain-expert researchers, the second, using SOLVENT to find analogies with real-world value, and the third, scaling up SOLVENT through crowdsourcing. In each study, semantic vector representations were created from the annotations. In the first study, the dataset used focused on papers from the CSCW conference and was annotated by members of the research team. In the second study, the researchers worked with an interdisciplinary team working with bioengineering and mechanical engineering in an attempt to identify whether SOLVENT can aid in identifying analogies not easily found through keyword/citation tree searches. In the third study, the authors used crowdsource workers from Upwork and AMT to perform the annotations. The authors found that these crowd annotations did have substantial agreement with researcher annotations but the workers struggled with purpose and mechanism annotations. Overall, the authors found that SOLVENT helped researchers to find analogies more effectively.

REFLECTION

I liked the motivation for this paper – especially the study 3 that used of crowdworkers for the annotations and was glad to know that the authors found substantial agreement between crowdworker annotations and researcher annotations. This was an especially good finding as the corpus that I deal with also contains scientific work and scaling the annotations for the same has been a concern in the past.

As part of the second study, the authors mention that they trained a word2vec model on 3,000 papers in the dataset curated using papers from the 3 domains under consideration. This made me wonder about the generalizability of their approach. Would it be possible to generated more scientific word vectors that span across multiple domains? I think it would be interesting to see how the performance of a such a system would measure against the existing system. In addition to this, word2vec is known to face issue with out-of-vocabulary words, so that made me wonder if the authors had made any provisions to deal with the same.

QUESTIONS

  1. In addition to the domains mentioned by the authors in the discussion section, what other domains can SOLVENT be applied to and how useful do you think it would be in those domains?
  2. The authors used majority vote as the quality control mechanism for Study 3. What more sophisticated measures could be used instead of majority vote? Would any of the methods proposed in the paper ‘CrowdScape: Interactively Visualizing  User Behavior and Output’ be applicable in this setting?
  3. How well would SOLVENT extend to the abstracts of Electronic Theses and Dissertations that would contain a mix of STEM as well as non-STEM research? Would any modifications be required to the annotation scheme presented In this paper?

REFERENCES

  1. Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2017. Accelerating Innovation Through Analogy Mining. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,235–243.

2 thoughts on “04/22/2020 – Palakh Mignonne Jude – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

  1. In response to your last question, I think that SOLVENT would be hard to extend to ETD’s. These documents are much longer than most papers, some with as many as a hundred, compared to less than 10 for most papers. IT may be possible to break up these longer documents and have people process parts of the document, instead of the whole thing. However, I don’t think that the fact that ETD’s are mixed domain would make it harder for this system to work. I would also like to see this system extended to ETD’s. ETD’s are much harder to read and get the main ideas from, because they are just so much longer and more complicated.

  2. In response to your first question, I think this can be used in a variety of other domains, such as in creative fields—e.g., architecture, and in the medical field to find new ways to solve problems. Medical journal papers are especially long and hard to parse and I think researchers in that field would definitely benefit from that.

Leave a Reply