04/22/2020 – Sukrit Venkatagiri – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

April 23, 2020May 4, 2020 Sukrit Venkatagiri Leave a comment

Paper: Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Shahaf, and Aniket Kittur. 2018. SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers. Proc. ACM Hum.-Comput. Interact. 2, CSCW: 31:1–31:21

Summary: In this paper, the authors attempt to help researchers by generating (mining) analogies in other domains to help support interdisciplinary research. The paper proposes a unique annotation schema to extend prior work by Hope et al. and has four key facets: background, purpose, mechanism, and findings. The paper also has 3 interesting studies. First, it was collecting annotations from domain experts in research fields, and second, using the Solvent system to generate analogies with real-world usefulness. Finally, the authors scaled up Solvent through the use of crowdsourcing workflows. In each of the three studies, they used semantic vector representations for the annotations. The first study had a dataset focused on papers from CSCW and annotated by a member of the research team, while the second study involved working with an interdisciplinary team in bioengineering and mechanical engineering to determine whether Solvent could help identify analogies not easily found with citation tree search. Finally, in the third study, the authors leveraged crowd workers from UpWork and Amazon Mechanical Turk to generate annotations, and the authors found that workers had difficulties with the purpose and mechanism type annotations. On the whole, the Solvent system was found to help researchers and generate analogies effectively.

Reflection: Overall, I think this paper is well-motivated, and the 3 studies that form the basis for the results are impressive. It was also interesting that there was significant agreement between crowd workers and researchers in terms of annotation percentage. This proves a useful finding more broadly in that novices may be able to contribute to science not necessarily by doing science (especially as science gets harder to do by “normal” people, and is done in larger and larger teams), but by finding analogies between different disciplines’ literatures.

For their second study, the authors trained a word2vec model on a curated dataset of over 3000 papers from 3 domains. This was also good because they did not limit their work to just one domain and strived to generalize their work/findings. However, they are still largely engineering disciplines, albeit CSCW has a somewhat social science component to it. I wonder how it would work in other disciplines such as between the pure sciences? That might be an interesting follow up study.

I wonder how such a system might be deployed more broadly, as compared to a limited way that was done in this paper. I also wonder how long it would have taken crowd workers to go through the tasks and generate the findings in total.

Questions:

What other domains do you think Solvent would be useful in? Would it easily generalize?
Is majority vote an appropriate mechanism? What else could be used?
What are the challenges to creating analogies?

04/22/2020 – Yuhang Liu – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

April 21, 2020April 22, 2020 yuhang Liu 2 Comments

Summary:

This paper mainly talks about a new paper search system, and I think that it can be thought as a thought initiation system rather than a paper search system. The article first proposes that scientific discovery is usually promoted by finding analogies in distant fields, but as more and more papers are published now, it is difficult to find papers with relevant ideas in a field, even though in those cross-field. Therefore, in order to achieve this aim, the authors introduced a hybrid system. In this system, crowdsourcing workers are mainly responsible for reading and understanding an article. People need to analyze an article from four aspects, Background, Purpose, Mechanism, Findings. The computer then analyzes the article based on these semantic frameworks, such as through TFIDF, or a combination of different architectures, and then finds similar usages from research papers. And through verification, found that these annotations are more effective, and can help experts.

Reflection:

First of all, I agree with the goal of the thesis, helping more researchers to obtain new ideas by analogy from outside the field, and then use these ideas and innovative ideas to promote the development of science and technology. I also think that analogies can help technology anyway. The article also cited quite a few examples to show the effectiveness of analogy and the breakthrough of research after the analogy. And in my reading in the past few weeks, I often feel that the article uses analogy. Since then, I have been thinking about how to help people more effectively through analogy and learn from other subject areas. Secondly, I think it is a very effective method to introduce people to assist in the completion. This is also the biggest gain in the course of my word. When a problem is encountered, the consideration is to use human power to solve it. And in the article, let the crowd source workers to annotate the article from four directions, I think it can greatly decompose an article, so that the computer can better understand this article. It is still difficult for a computer to directly understand an article and find out from it, but based on these architectures, finding the connection between articles is indeed a relatively simple task. But at the same time, I have some doubts about the effectiveness of the system proposed in the article. The article also spent a considerable amount of space to describe the limitations of this system. And my doubts are mainly focused on the third point, the usefulness in the real world. I think there are many aspects that will affect this practicality. For example, when the data increases, there will be more similar analogies, and the quality of these analogies is difficult to control. As we know, not all the lessons are useful, some ideas may bring Other problems, such as reduced efficiency, wasted resources, etc. The final point is that I think it takes a lot to get a good idea. We also need to control the quality of the work done by the workers and whether the algorithm can be so enlightening. In my opinion, a good innovation is usually electro-optical flint. Although this may be relatively easy to achieve on the basis of analogy, it still needs a good collaboration between human and machine to complete.

Question:

Do you think finding analogy by analyze papers based on their framework is a useful way?
Is there any other factors might influence this system, such as the increase of similar articles or different understanding between workers?
Except the methods mentioned in the article, as computer scientists, what can we do to inspire people?

04/22/20 – Fanglan Chen – SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers

April 21, 2020 Fanglan Chen 1 Comment

Summary

Chan et al.’s paper “SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers” explores the feasibility to leverage a mixed-initiative system to categorize research papers into their relational schemas by a collaborative Human-AI team, which can be utilized to identify analogical research papers potentially leading to innovative knowledge discoveries. The motivation of the researchers is the boom of research papers during recent decades, which makes searching for relevant papers in one domain or cross domains become more and more difficult. To facilitate the paper retrieval and interdisciplinary analogies search, the researchers develop a mixed-initiative system called SOLVENT in which humans mark the key aspects of research papers (their background, purpose, mechanism, and findings) with a computational machine learning model extracting semantic representations from these key aspects, which can facilitate the identifying analogies across different research domains.

Reflection

I think this paper conducted an innovative study on how the proposed system can actually support knowledge sharing and discovery in one domain and across different research communities. In the research explosion era, researchers would greatly benefit from using such a system for their own research and explore more interdisciplinary possibilities. That makes me think about why the system can achieve good performance via annotating the content of abstracts in the domains they conducted experiments. As we know, abstracts of the papers usually summarize the most important point of the research papers at a high-level. So it is intuitive and wise to utilize that part for annotating and further tasks. The researchers adopt the pre-trained word embedding models to generate semantic vector representations for each component, which performs pretty well in the tasks presented in the paper. I would imagine that the framework would probably work especially well for experimentation-driven domains, computer science, civil engineering, biology, etc., in which the research papers follow a specific writing structure. Can the proposed framework scale up to other less structured text materials, such as essays, novels, by extending it to full content instead of focusing on the abstracts? I think that would be an interesting future direction to explore.

In addition, one potential future work discussed in the paper is to extend the content-based approach with graph-based approaches like citation networks. I feel this is a novel idea and there is a lot of potential in this direction. Since the proposed system has the ability to find analogies across various research areas, I would be curious to see if it is possible to generate a knowledge graph based on the analogy pairs that can create something similar to a research road map, which indicates how the ideas from different papers in various research areas relate in a larger scope. I would imagine researchers would benefit from a systematized collection of research ideas.

Discussion

I think the following questions are worthy of further discussion.

Would you use this system to support your own research? Why or why not?
Do you think that the annotation categories capture the majority of the research papers? Can you think about other categories the paper did not mention?
What do you think of the researchers’ approach to annotating the abstracts? Would it be helpful to expand on this work to annotate the full content of the papers?
Do you think the domains involved in cross-domain research share the same purpose and mechanism? Can you think about some possible examples?

04/22/2020 – Palakh Mignonne Jude – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

April 21, 2020 Palakh Mignonne Jude 2 Comments

SUMMARY

The authors attempt to assist researcher to analogies in other domains in an attempt to aid interdisciplinary research. They propose a modified annotation scheme that extends on the work described by Hope et. al. [1] and contains 4 elements – Background, Purpose, Mechanism, and Findings. The authors conduct 3 studies – the first, involving the sourcing of annotations from domain-expert researchers, the second, using SOLVENT to find analogies with real-world value, and the third, scaling up SOLVENT through crowdsourcing. In each study, semantic vector representations were created from the annotations. In the first study, the dataset used focused on papers from the CSCW conference and was annotated by members of the research team. In the second study, the researchers worked with an interdisciplinary team working with bioengineering and mechanical engineering in an attempt to identify whether SOLVENT can aid in identifying analogies not easily found through keyword/citation tree searches. In the third study, the authors used crowdsource workers from Upwork and AMT to perform the annotations. The authors found that these crowd annotations did have substantial agreement with researcher annotations but the workers struggled with purpose and mechanism annotations. Overall, the authors found that SOLVENT helped researchers to find analogies more effectively.

REFLECTION

I liked the motivation for this paper – especially the study 3 that used of crowdworkers for the annotations and was glad to know that the authors found substantial agreement between crowdworker annotations and researcher annotations. This was an especially good finding as the corpus that I deal with also contains scientific work and scaling the annotations for the same has been a concern in the past.

As part of the second study, the authors mention that they trained a word2vec model on 3,000 papers in the dataset curated using papers from the 3 domains under consideration. This made me wonder about the generalizability of their approach. Would it be possible to generated more scientific word vectors that span across multiple domains? I think it would be interesting to see how the performance of a such a system would measure against the existing system. In addition to this, word2vec is known to face issue with out-of-vocabulary words, so that made me wonder if the authors had made any provisions to deal with the same.

QUESTIONS

In addition to the domains mentioned by the authors in the discussion section, what other domains can SOLVENT be applied to and how useful do you think it would be in those domains?
The authors used majority vote as the quality control mechanism for Study 3. What more sophisticated measures could be used instead of majority vote? Would any of the methods proposed in the paper ‘CrowdScape: Interactively Visualizing User Behavior and Output’ be applicable in this setting?
How well would SOLVENT extend to the abstracts of Electronic Theses and Dissertations that would contain a mix of STEM as well as non-STEM research? Would any modifications be required to the annotation scheme presented In this paper?

REFERENCES

Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2017. Accelerating Innovation Through Analogy Mining. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,235–243.

4/22/20 – Lee Lisle – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

April 21, 2020April 21, 2020 Lorance R Lisle 1 Comment

Summary

Chan et al.’s paper discusses a way to find similarities in research papers through the use of mixed initiative analysis. They use a combination of humans to identify sections of abstracts and machine learning algorithms to identify key words in those sections in order to distill the research down into a base analogy. They then compare across abstracts to find papers with the same or similar characteristics. This enables researchers to find similar research as well as potentially apply new methods to different problems. They evaluated these techniques through three studies. The first study used grad students reading and annotating abstracts from their own domain as a “best-case” scenario. Their tool worked very well with the annotated data as compared to using all words. The second study looked at helping find analogies to fix similar problems, using out-of-domain experts to annotate abstracts. Their tool found more possible new directions than the all words baseline tool. Lastly, the third study sought to scale up using crowdsourcing. While the annotations were of a lesser quality with mTurkers, they still outperformed the all-words baseline.

Personal Reflection

I liked this tool quite a bit, as it seems a good way to “unstuck” oneself in the research black hole and find new ways of solving problems. I also enjoyed that the annotations didn’t necessarily require domain-specific or even researcher-specific knowledge even with the various jargon that is used. Furthermore, though it confused me initially, I liked how they used their own abstract as an extra figure of sorts – using their own approach to annotating their abstract was a good idea. It cleverly showed and explained how their approach works quickly without reading the entire paper.

I did find a few things confusing about their paper, however. They state that the GloVe model doesn’t work very well in one section, but then use it in another. Why go back to using it if it had already disappointed the researchers in one phase? Another complication I noticed was that they didn’t define the dataset in the third study. Where did the papers come from? I can glean from reading it that it was from one of the prior two studies, but I think its relevant to ask if it was the domain-specific or the domain-agnostic datasets (or both).

I was curious about total deployment time for this kind of thing. Did they get all of the papers analyzed by the crowd in 10 minutes? 60 minutes? A day? With how parallel the task can be performed, I can imagine it could be very quick to get the analysis performed. While this task doesn’t need to be quickly performed, it could be an excellent bonus of the approach.

Questions

This tool seems extremely useful. When would you use it? What would you hope to find using this tool?
Is the annotation of 10,000 research papers worth $4000? Why or why not?
Based on their future work, what do you think is the best direction to go with this approach? Considering the cost of the crowdworkers, would you pay for a tool like this, and how much would be reasonable?

04/22/2020 – Vikram Mohanty – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

April 21, 2020April 21, 2020 Vikram Mohanty Leave a comment

Authors: Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Shahaf, Aniket Kittur.

Summary

This paper addresses the problem of finding analogies between research problems across similar/different domains by providing computational support. The paper proposes SOLVENT, a mixed-initiative system where humans annotate aspects of research papers that denote their background (the high-level problems being addressed), purpose (the specific problems being addressed), mechanism (how they achieved their purpose), and findings (what they learned/achieved), and a computational model constructs a semantic representation from these annotations that can be used to find analogies among the research papers. The authors evaluated this system against baseline information retrieval approaches and also with potential target users i.e. researchers. The findings showed that SOLVENT performed significantly better than baseline approaches, and the analogies were useful for the users. The paper also discusses implications for scaling up.

Reflection

This paper demonstrates how human-interpretable feature engineering can improve existing information retrieval approaches. SOLVENT addresses an important problem faced by researchers i.e. drawing analogies to other research papers. Drawing from my own personal experiences, this problem has presented itself at multiple stages, be it while conceptualizing a new problem, or figuring out how to implement the solution, or trying to validate a new idea, or eventually, while writing the Related Work section of a paper. This goes without saying, that SOLVENT, if commercialized, would be a boon for the thousands of researchers out there. It was nice to see the evaluation including real graduate students as their validation seemed the most applicable for such a system.

SOLVENT demonstrates the principles of mixed-initiative interfaces effectively by leveraging the complementary strengths of humans and AI. Humans are better at understanding context, and in this case, it’s that of a research paper. AI can help in quickly scanning through a database to find other articles with similar “context”. I really like the simple idea behind SOLVENT i.e how would we, humans, find analogical ideas? We will look for similar purpose and/or similar/different mechanisms. So, how about we do just that? It’s a great case of how human-interpretable intuitions translate into intelligent system design, and also scores over end-to-end automation. Something I reflected in previous papers — it always helps to look for answers by beginning from the problem and understanding it better. And that’s reflected in what SOLVENT ultimately achieves i.e. scoring over an end-to-end automation approach.

The findings are definitely interesting, particularly the drive for scaling up. Turkers certainly provided an improvement over the baseline, even though their annotations fared worse than the experts and the Upwork crowd. I am not sure what the longer term implications here are, though. Should Turkers be used to annotate larger datasets? Or should the researchers figure out a way to improve Turker annotations? Or train the annotators? These are all interesting questions. One long term implication here is to re-format the abstract into a background + purpose + mechanism + findings structure right at the initial stage. This still does not solve the thousands of prior papers. Overall, this paper certainly opens doors for future analogy mining approaches.

Questions

Should conferences and journals re-format the abstract template into a background + purpose + mechanism + findings to support richer interaction between domains and eventually, accelerate scientific progress?
How would you address annotating larger datasets?
How did you find the feature engineering approach used in the paper? Was it intuitive? How would you have done it differently?

04/22/20 – Jooyoung Whang – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

April 21, 2020April 21, 2020 Jooyoung Whang 2 Comments

This paper proposes a novel mixed-initiative method called SOLVENT that has the crowd annotate relevant parts of a document based on purpose and mechanism and representing the documents on a vector space. The authors identify that representing technical documents using the purpose-mechanism concept with crowd workers has obstacles such as technical jargon, multiple sub-problems in one document, and the presence of understanding-oriented papers. Therefore, the authors modify the structure to hold background, purpose, mechanism, and findings instead. With each document represented by this structure, the authors were able to apply natural language processing techniques to perform analogical queries. The authors found better query results than baseline all-words representations. To scale the software, the authors made workers of Upwork and Mturk annotate technical documents. The authors found that the workers struggled with the concept of purpose and mechanism, but still provided improvements for analogy-mining.

I think this study will go nicely together with document summarization studies. It would especially help since the annotations are done by specific categories. I remember one of our class’s project involved ETDs and required summaries. I think this study could have benefited that project given enough time.

This study could also have benefited my study. One of the sample use-cases that the paper introduced was improving creative collaboration between users. This is similar to my project which is about providing creative references for a creative writer. However, if I want to apply this study to my project, I would need to additionally label each of the references provided by the Mturk workers by purpose and mechanism. This will cost me additional funds for providing one creative reference. This study would have been very useful if I had enough money and wanted more quality content rankings in terms of analogy.

It was interesting that the authors mentioned different domain papers could still have the same purpose-mechanism. It made me wonder if researchers would really want similar purpose-mechanism papers on a different domain. I understand multi-disciplinary work is being highlighted these days but would each of the disciplines involved in a study try to address the same purpose and mechanism? Wouldn’t they address different components of the project?

The followings are the questions that I had while reading the paper.

1. The paper notes that many technical documents are understanding-oriented papers that have no purpose-mechanism mappings. The authors resolved this problem by defining a larger mapping that is able to include these documents. Do you think the query results would have had higher quality if the mapping was kept compact instead of increasing the size? For example, would it have helped if the system separated purpose-mechanism and purpose-findings?

2. As mentioned in my reflection, do you think the disciplines involved in a multi-disciplinary project all have the same purpose and mechanism? If not, why?

3. Would you use this paper for your project? To put in other words, does your project require users or the system to locate analogy inside a text document? How would you use the system? What kind of queries would you need out of the combinations possible (background, purpose, mechanism, findings)?

04/22/2020 – Sushmethaa Muhundan – SOLVENT: A Mixed-Initiative System for Finding Analogies between Research Papers

April 21, 2020 Sushmethaa Muhundan 1 Comment

This work aims to explore a mixed-initiative approach to help find analogies between research papers. The study uses research experts as well as crowd-workers to annotate research papers by marking the purpose, mechanism, and findings of the paper. Using these annotations, a semantic vector representation is constructed that can be used to compare different research papers and identify analogies within domains as well as across domains. The paper aims to leverage the inherent causal relationship between purpose and mechanism to build “soft” relational schemas that can be compared to determine analogies. Three studies were conducted as part of this paper. The first study was to test the system’s quality and feasibility by asking domain expert researchers to annotate 50 research papers. The second study was to explore whether the system would be beneficial to actual researchers looking for analogical inspiration. The third study involved using crowd workers as annotators to explore scalability. The results from the studies showed that annotating the purpose and mechanism aspects of research papers is scalable in terms of cost, not critically dependent on annotator expertise, and is generalizable across domains.

I feel that the problem this system is trying to solve is real. While working on research papers, there is often a need to find analogies for inspiration and/or competitive analysis. I have also faced difficulty finding relevant research papers while working on my thesis. If scaled and deployed properly, SOLVENT would definitely be helpful to researchers and could potentially save a lot of time that would otherwise be spent on searching for related papers.

The paper claims that the system quality is not critically dependent on annotator expertise and the system can be scaled using crowd workers as annotators. However, the results showed that the annotations of Upwork workers matched expert annotations 78% of the time and those of Mturk workers matched 59% of the time. The results also showed that the results varied considerably: a few papers had 96% agreement while a few had only 4%. I am a little skeptical regarding these numbers and I am not convinced that expert annotations are dispensable. I feel that using crowd workers might help the system scale but it might have a negative impact on quality.

I found one possible future extension extremely interesting: the possibility of authors themselves annotation their work. I feel that if each author spends a little extra effort to annotate their own work, a large corpus could easily be created with high-quality annotations. SOLVENT could easily produce great results using this corpus.

What are your thoughts about the system proposed? Would you want to use this system to aid your research work?
The study indicated that the system needs to be vetted with large datasets and the usefulness of the system is yet to be truly tested in real-world settings. Given these limitations, do you think the usage of this system is feasible? Why or why not?
One potential extension mentioned in the paper is to combine the content-based approach with graph-based approaches like citation graphs. What are other possible extensions that would enhance the current system?

04/22/2020 – Bipasha Banerjee – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

April 21, 2020 bipashab 1 Comment

Summary

The paper by Chan et al. is an interesting read on finding analogies from research papers. The main domain considered here is scientific papers. The annotation scheme has been divided into four categories, namely, background, purpose, mechanism, and findings. This paper’s goal is to make it easier for researchers to find related work in their field. The author conducted three studies to test their approach and its feasibility. The first study consisted of domain-experts annotating a particular abstract in their research area. The second study, on the other hand, focussed mainly on how the model could tackle the real-world problem where a researcher needs to find relevant papers in their area to act as inspiration, related-work, or even baselines for their experiments. The last study, however, was very different from the first two, where an experienced researcher annotated the data or used the system for solving their research problem. The third study, on the other hand, used crowdworkers to annotate abstracts. The platforms utilized by the authors were Upwork and Amazon Mechanical Turk.

Reflection

The mixed-initiative model developed by the authors is an excellent step in the right direction to find analogies in scientific papers. There are traditional approaches in natural language processing that help find similarities in textual data. The website [1] gives an excellent insight into the steps involved in finding similarities in texts. However, when it comes to scientific data, just using these steps is not enough. Additionally, most of the models involved are trained using generic websites and news data (like CNN or DailyMail). Hence, most of the scientific jargon is “out of vocabulary” (OOV) for such models. Hence, I appreciate that the authors used human annotations along with traditional methods in information retrieval (like TF-IDF, etc.) to tackle the problem at hand.

Additionally, for finding the similarity metric, they took multiple categories into account, like Purpose+Mechanism. This is definitely useful when finding similarities in the text data. I also liked the fact that for the studies, they considered normal crowdworkers in addition to people with domain knowledge. I was intrigued to find that 75% of the time, the annotations of crowdworkers matched with the researchers. Hence the conclusion that “crowd annotations still improve analogy-mining” is valuable. Not only that, getting researchers in large amounts in one domain just to annotate the data is difficult, sometimes there are very few people in one domain of research. Rather than having to find researchers who are available to annotate such data, it is good that we can use existing methods available.

Lastly, I would like to mention that I liked that the paper identified the limitations very well, and the scope for future work has also been clearly mentioned.

Questions

Would you agree that level of expertise of the human annotators would not affect the results for your course project? If yes, could you please clarify?

(For my class project, I think I would agree with the paper’s findings. I work on reference string parsing, and I don’t think we need experts just to label the tokens.)

Could we have more complex categories or sub-categories rather than just the four identified?
How would this extend to longer pieces of texts like chapters of book-length documents?

References

https://medium.com/@Intellica.AI/comparison-of-different-word-embeddings-on-text-similarity-a-use-case-in-nlp-e83e08469c1c

04/22/20 – Lulwah AlKulaib-Solvent

April 21, 2020April 21, 2020 Lulwah AlKulaib Leave a comment

Summary

The paper argues that scientific discoveries are based on analogies in distant domains. Nowadays, it is difficult for researchers to keep up in finding analogies due to the rapidly growing number of papers in each discipline and the difficulty of finding useful analogies from unfamiliar domains. The authors propose a system to solve this issue. Solvent, a mixed initiative system for finding analogies between research papers. They hire human annotators that structure academic papers abstracts into different aspects and then a model constructs the semantic representations from the provided annotations. The resulting semantic annotations are then used in finding analogies within research papers in that domain and across different domains. In their studies, they show that the proposed system finds more analogies than existing baseline approaches in the information retrieval field. They outperform state of the art and prove that annotations can generalize beyond the domain and that the analogies that the semantic model found are found to be useful by experts. Their system is a step in a new direction towards computationally augmented knowledge sharing between different fields.

Reflection

This was a very interesting paper to read. The authors use of scientific ontologies and scholarly discourse like those in Core Information about Scientific Papers (CISP) ontology makes me think of how relevant their work is, even when their goal differs from the corpora paper. I found the section where they explain adapting the annotation methods for research papers very useful for a different project.

One thing that I had in mind while reading the paper was how scalable is this to larger datasets. As they have shown us in the studies, the datasets are relatively small. The authors explain in the limitations that part of the bottleneck is having a good set of gold standard matches that they can use to evaluate their approach. I think that’s a valid reason, but still doesn’t eliminate the question of what would it require? and how well would it work?

When going over their results and seeing how they outperformed existing state of the art models/approaches, I also thought about real world applications and how useful this model is. I never thought of using analogies to perform discovery in different scientific domains. I always thought it would be more reliable to have a co-author from that domain that would weigh in. Especially nowadays with the vast communities of academics and researchers on social media it’s no longer that hard to find someone that could be a collaborator on a domain that isn’t yours. Also when looking at their results, their high precision was only in results recommending the top k% of most similar pairs analogies. I wonder if automating that has a greater impact than using the knowledge of a domain expert.

Discussion

Would you use a similar model in your research?
What would be an application that you can think of where this would help you while working on a new domain?
Do you think that this system could be outperformed by using a domain expert instead of the automated process?