04/22/2020 – Subil Abraham – Chan et al., “SOLVENT”

April 20, 2020April 20, 2020 Subil Abraham 1 Comment

Academic research strives for innovation in every field. But in order to innovate, researchers need to scour the length and breadth of their field as well as adjacent fields in order to get new ideas, understand the current work, and make sure they are not repeating someone else’s work. Being able to use an automated system that will scour the published research landscape for similar existing work would be a huge boost to productivity. SOLVENT aims to solve this problem by providing a method of having humans annotating the research abstracts to identify the background, purpose, method, and findings and use that as a schema to index the papers and identify similar papers (which have also been annotated and indexed) that match closely in one or a combination of those categories. The authors conduct three case studies to validate their method, for finding analogies within a single domain, for finding analogies across different domains, and for validating if annotating using crowd workers would be useful. They find that their method consistently outperforms other baselines in finding analogous papers.

I think that something like this would be incredibly helpful for researchers and significantly streamline the research process. I would gladly use something like this in my research process because it would save so much time. Of course, as they pointed out, the issue is scale because for it to be useful, a very large chunk (or ideally all) of current published research needs to be annotated according to the presented system. This could be something that can be integrated as a gamification mechanism in google scholar, where you occasionally ask the user to annotate an abstract. This way, you’re able to do it at scale. I also find it interesting that purpose+mechanism produced more results than background+purpose+mechanism. I would’ve figured that the more context that background provides would serve to find better matches. But given the goal of finding analogies even in different fields, perhaps background+purpose+mechanism rightly does not give great results because it gets too specific by providing too much information.

Would you find use for this? Or would you prefer seeking out papers on your own?
Do you think that the annotation categories are appropriate? Would other categories work? Maybe more or less categories?
Is it feasible to expand on this work to cover annotating the content of whole papers? Would that be asking too much?

4/22/2020 – Nurendra Choudhary – SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers

April 20, 2020April 20, 2020 Nurendra Choudhary Leave a comment

Summary

SOLVENT aims to find analogies between research papers in different fields. E.g. simulated annealing in AI optimization is derived from metallurgy , Information foraging from Animal foraging. It aims to extract idea features from a research paper according to purpose-mechanism schema; Purpose: what they are trying to achieve, and Mechanism: how they achieve that purpose.

Research Papers cannot always be put into a purpose-mechanism schema due to Complex Language , Hierarchy of Problems and Mechanism vs Findings. Hence, the authors propose a Modified Annotation Scheme that includes; Background: Defines context of the problem, Purpose: Main problem being solved by the paper and Mechanism: The method developed to solve the problem and Findings: The conclusions of the research work In case of understanding type of papers, this section gives more information. Query the schema using cosine similarity of tf-idf weighted average of word vectors. The authors scale up with crowd workers because expert annotation is prohibitively expensive. However, this shows a significant disagreement between crowd-workers and experts.

Reflection

Amazing (nothing’s perfect right :P) balance between human annotation capabilities and AI’s analysis of huge information sources to solve the problem of analogy retrieval. Additionally, the paper shows a probable future of crowd-work where tasks are increasingly complex for regular human beings. We have discussed this evolution in several classes before this and the paper is a contemporary example of this move towards complexity. The study that shows its application in a real-world research team provides a great example that other research teams can borrow.

I would like a report of the performance of Knowledge Back+Purpose+Mechanism+Findings. I don’t understand the reason for its omission (probably space issues). The comparison baseline utilizes Abstract queries but a system can potentially have access to the entire paper. I think that would help with the comparative study. The researchers test out expert researchers and crowd workers. An analysis should be done on utilizing undergraduate/graduate students. As pointed out by the authors, the study set is limited due to the need of expert annotations. However, no diversity is seen in the fields of study.

“Inspiration: find relevant research that can suggest new properties of polymers to leverage,or new techniques for stretching/folding or exploring 2D/3D structure designs”

Fields searched: materials science, civil engineering, and aerospace engineering.

Questions

The paper shows the complexity of future crowd-work. Is the solution limited to area experts? Is there a way to simplify or better define the tasks for a regular crowd? If not, what is the limit of a regular crowd-worker’s potential?
How important is finding such analogies in your research fields? Would you apply this framework in your research projects?
Analogies are meant for easier understanding and inspiration. SOLVENT has a limited application in inspiring novel work. Do you agree? What could be a possible scenario where it inspires novel work?
Compared to MTurk, Upwork workers show better agreement with researchers across the board. What do you think is the reason? When would you use MTurk vs UpWork? Do you think higher pay would proportionally improve the work quality as well?

Word Count: 532

04/22/2020 – Nan LI – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

April 20, 2020May 4, 2020 NAN LI 1 Comment

Summary

This paper introduced a hybrid-initiative system called SOLVENT, which aims to find analogies between research papers in different fields through combining the work of human annotations of critical features of research paper and computational model which construct a semantic vector from those annotations. The author conducted three studies to demonstrate the performance and efficiency. In the first study, the author let people with specialized domain knowledge to do the annotation works and proved the system able to find known analogies. In order to prove the effectiveness of using SOLVENT to find analogies with the real-word case, the author demonstrated a real-word scenario, explained the primary process, and let professional without domain knowledge people to do the annotation. The results indicate that the relevant match found by the system was judged to be both useful and novel. The third study proved that they could scale up the system by allowing crowd workers to annotate papers.

Reflection

I think this paper brought up a novel and needed the idea. One limitation of searching for related work online is that the scope of the searching result is too narrow if we search for the keyword or citation. The results usually only include the related paper, the paper which cited the related paper, or the same paper that published from a different place. However, you can always find some inspiration from paper that not relevant to what you are looking for, or it can be an irrelevant paper. Nevertheless, this situation is usually unattainable. Take our project as an example. Our project was inspired by one of the papers we read before, and we would like to improve the work in that paper further. It should be straightforward to find related work because we have previous work already. Disappointingly, we could not find many useful papers or even latent techniques. The only thing we found appears most frequently is the same paper which inspired us, but published from a different place. Thus, from my perspective, this system is designed for searching inspirations through finding analogies. If the system can achieve this, it would be significant.

On the other hand, this seems like a costly approach because it requires a large number of workers to do the annotation work of a large scale of paper to guarantee the system’s performance. Besides, based on the results provided in the paper, the system performance can only be described as “useful” instead of “efficiently.” If I urgently need inspiration, I may try such a system, but I would not count on this system.

Question

What do you think of the original idea presented in the paper “Scientific discoveries are often driven by finding analogies in distant domains”? Do you think this theory applies for the majority of people? Do you think finding analogies would inspire your work?
What do you think regarding the system “usefulness” and “efficiency”?
Can you think about any other way to utilize the SOLVENT system?
The user mentioned in the paper that crowd workers can do the annotation work with no domain knowledge or even with no experience of reading a paper. Do you think this will influence the system performance? What are your criteria for recruit workers?

Word Count: 538

04/22/2020 – Ziyao Wang – SOLVENT: A Mixed-Initiative System for Finding Analogies Between Research Papers

April 19, 2020 Ziyao Wang 1 Comment

The authors introduced a mixed-initiative system named SOLVENT in this paper. In the system, humans annotate aspects of research papers that denote their background, purpose, mechanism, and findings. A computational model is used to construct a semantic representation from these annotations which are valuable for finding analogies among papers. They tested their system through three experiments. The first one used annotation from domain expert researchers, the second one used annotation from experts outside the papers’ domain and the last one used crowdsourcing. From the experiments’ results, the authors found that the system was able to detect analogies among different domains and even crowd-sourcing workers, who have limited domain knowledge, are able to do the annotations. Additionally, the system has a better performance than similar systems.

Reflections

I used to only search papers within the computer science area when I did my projects or tried to solve a problem. Of course, sometimes we can get inspired by the ideas in papers from other domains, but it is quite difficult to find such a paper. There are countless papers in various domains, and there was not an efficient method to find needed papers from other domains. Additionally, even we found a paper that is valuable to the problem we are solving, the lack of background knowledge may make it difficult for us to understand the ideas behind the paper.

This system is a great help to the above situation. It can let people find related papers from other domains even they have limited knowledge about that domain. Even though the amount of papers is increasing sharply, we can still find the papers we need efficiently with this system. Before, we can only search for keywords related to the specific area. With this system, we can try to search for ideas instead of specific words. This is beneficial if some of the papers used abbreviations or analogies in the titles. If we only use keyword searching, we may miss these papers. But with idea searching, these papers will be marked as valuable. Also, the human annotation in the system can help us to understand the ideas of the papers easily, and we can exclude irrelated papers with high efficiency.

One more point is that cross-domains projects or researches are increasing significantly nowadays. For these studies, they needed to read a huge amount of papers in both domains and then they can have a novel idea to solve the cross-domain problem. If these researchers can have the system, they can find similarities in both domains easily and can have headlines about the background, purpose, mechanism, and findings in the papers. The efficiency of these researches can be improved, and the researchers can find more novel interdisciplinary studies with the help of the system.

Questions:

Will the performance of the system decrease when dealing with larger size database and more difficult papers?

Is it possible to update the system results regularly when newly published papers are added into the database?

Can this system be applied in the industries? For example, find the similarity of the mechanisms in the production of two products and use the findings to improve the mechanisms.

04/22/2020 – Dylan Finch – SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers

April 19, 2020April 19, 2020 Dylan Finch 1 Comment

Word count: 566

Summary of the Reading

This paper describes a system called SOLVENT, which uses humans to annotate parts of academic papers like the high-level problems being addressed in the paper, the specific lower-level problems being addressed in the paper, how the paper achieved its goal, and what was learned/achieved in the paper. Machines are then used to help detect similarities between papers so that it is easier for future researchers to find articles related to their work.

The researchers conducted three studies where they showed that their system greatly improves results over similar systems. They found that the system was able to detect near analogies between papers and that it was able to detect analogies across domains. One interesting finding was that even crowd workers without extensive knowledge about the paper they are annotating can produce helpful annotations. They also found that annotations could be created relatively quickly.

Reflections and Connections

I think that this paper addresses a real and growing problem in the scientific community. With more people doing research than ever, it is increasingly hard to find papers that you are looking for. I know that when I was writing my thesis, it took me a long time to find other papers relevant to my work. I think this is mainly because we have poor ways of indexing papers as of now. Really the only current ways that we can index papers are by the title of the paper and by the keywords embedded in the paper, if they exist. These methods can help find results, but they are terrible when they are the only way to find relevant papers. A title may be about 20 words long, with keywords being equally short. 40 words does not allow us to store enough information to fully represent a paper. We lose even more space for information when half of the title is a clever pun or phrase. These primitive ways of indexing papers also lose much of the nuance of papers. It is hard to explain results or even the specific problem that a paper is addressing in 40 words. So, we lose that information and we cannot index on it.

A system like the one described in this paper would be a great help to researchers because it would allow them to find similar papers much more easily. This doesn’t even mention the fact that it lets researchers find papers outside of their disciplines. That opens up a whole new world of potential collaboration. This might help to eliminate the duplication of research in separate domains. Right now, it is possible that mathematicians and computer scientists, for example, try to experiment on the same algorithm, not knowing about the team from the other discipline. This wastes time, because we have two groups researching the same thing. A system like this could help mitigate that.

Questions

How would a system like this affect your life as a researcher?
Do you currently have trouble trying to find papers or similar ideas from outside your domain of research?
What are some limitations of this system? Is there any way that we could produce even better annotations of research papers?
Is there some way we could get the authors of each paper to produce data like this by themselves?