04/29/20 – Fanglan Chen – Accelerating Innovation Through Analogy Mining

Summary

Hope’s paper “Accelerating Innovation Through Analogy Mining” studies how to boost knowledge discovery through searching for analogies in massive and unstructured real-world datasets. This research is motivated by the availability of large idea repositories which can be used as databases to search for analogous problems. However, it is very challenging to find useful analogies among the massive and noisy real-world repositories. Manual and automated methods have their own advantages and disadvantages: hand-created databases have a high relational structure which is central to analogy search but expensive to obtain; naive machine learning or information retrieval can be easily scaled to large datasets with similarity metrics but fail to incorporate structural similarity. To address the challenges, the researchers explore the potential of learning “problem schemas,” which are simpler structural representations that specify the purpose and mechanism of a product. Their proposed approach leverages the power of crowdsourcing and recurrent neural networks to extract purpose and mechanism vector representations from product descriptions. The experimental results indicate that the learned vectors can facilitate the search of analogies with higher precision and recall than traditional information retrieval methods. 

Reflection

This paper introduces an innovative approach for analogy search, the task of which is very similar to the “SOLVENT” paper we discussed last week. The “weaker structural representations” idea is very interesting and it allows more flexibility in automatic algorithm design compared with relying on fully structured analogical reasoning. My impression is that the human component in the model design is comparatively weaker than that of the approaches presented in other readings we discussed before. Crowd work in this paper is leveraged as an approach to generate training data and evaluate the experimental results. I have been thinking about if there is any place that has the potential to incorporate human interaction in the model design. As we know, recurrent neural networks have certain limitations. The first weakness is variable length, which means the RNN models cannot handle long sequence data, and this largely constrains the usage scenarios. The second weakness is the sliding window, which ignores the continuity between the sliced subsequences, which is more close to the surface but not as deep as the paper claims. I am wondering if there is a possibility to leverage human interaction to overcome the shortcoming of the model itself.

The performance of machine learning models is highly driven by the quality of the training data. With the crowdsourced product purpose and mechanism annotations, I feel there is a need to incorporate some quality control components in the proposed framework. A few papers we discussed before touching upon this point. Also, though very powerful in numerous complex tasks, deep learning models are usually criticized due to its lack of interpretability. Although the RNN performance reported in the paper in regards to recall and precision is better than that of traditional information retrieval methods. However, those similarity-based methods have their own merits as their mechanism and decision boundaries are more transparent so it would be possible to detect where the problem is and why the results are not desirable. In this case, there is a trade-off between the model performance (accuracy, precision, and recall) and interpretability, it is worthy of thinking about which one to choose over the other.

Discussion

I think the following questions are worthy of further discussion.

  • What do you think can incorporate more human components in the model design?
  • Compared with last week’s approach SOLVENT, which one you think works better? Why?
  • What are some other potential applications for this system outside of knowledge discovery?
  • Do you think the recurrent neural network method is better than traditional similarity-based methods such as TF-IDF in the analogy search task and other NLP tasks? Why or why not?

Read More

04/29/2020 – Palakh Mignonne Jude – Accelerating Innovation Through Analogy Mining

SUMMARY

In this paper, the authors attempt to facilitate the process of finding analogies with a view to boost creative innovations by exploring the value that can be added by incorporating weak structural representations. They leverage the vast body of online information available (for the purpose of this study, product descriptions from Quirky.com). They generate microtasks for crowdworkers to perform that were designed to label the ‘purpose’ and ‘mechanism’ parts of a product description. The authors use GloVe word vectors to represent their purpose and mechanism words and use a BiRNN to learn the purpose and mechanism. In order to collect analogies, they use AMT crowd workers to find analogies for 8000 product descriptions. In the evaluation stage, the authors attempt to weigh the usefulness of their algorithm by having participants redesign a product. 38 AMT workers were recruited for the same and the task was to design a cell phone charger case. 5 graduate students were recruited to evaluate the ideas generated by the workers. Based on a predefined criterion of ‘good’ ideas, 208 were produced out of 749 total ideas (with 2 judges rating it as good) and 154 were produced out of 749 total ideas (with 3 judges rating it as good). In both cases, the analogy approach proposed by the authors out-performed the TF-IDF baseline model and random model.

REFLECTION

I found the motivation of this study to be very good – especially based on ‘bacteria-slot machine’ analogy example highlighted in the introduction of the paper. I agree that given the vast amount of data available, having such a system that would accelerate the process of finding analogies could very well aid in quicker innovation and discovery.

I like that the authors chose to present their approach by using product descriptions. I also like the use of ‘purpose’ and ‘mechanism’ annotations and feel that given the more general domain of this study, the quality of annotations by the crowdworkers would be better than in the case of the paper on ‘SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers’.

I also liked that the authors presented the results given by using the TF-IDF baseline as it indicated the findings that would have been generated by near-domain results. I felt that it was good that the authors added a criterion to judge the feasibility of an idea, one that could be implemented using existing technologies.

Additionally, while I found that the study and methods proposed by this paper were good, I did not like the organization of the paper.

QUESTIONS

  1. How would you rate the design of the interface used to collect ‘purpose’ and ‘mechanism’ annotations? What changes might you propose to make this better?
  2. The authors do not mention the details or experiences of the AMT workers. How much would workers prior experience influence their ability to find ideas using this approach? Can this approach aid more experienced people working with product innovations?
  3. The authors of SOLVENT leverage the mixed-initiative system proposed by this paper to find analogies between research papers. Which are the domains where this approach would fail to a great extent (even if modifications were to be made)?

Read More

04/29/20 – Sukrit Venkatagiri – Accelerating Innovation Through Analogy Mining

Paper: Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2017. Accelerating Innovation Through Analogy Mining. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17), 235–243. https://doi.org/10.1145/3097983.3098038

Summary: This paper talks about the challenge of mining analogies from large, real-world repositories, such as patent databases. Such databases pose challenges because they are highly relational but sparse in nature. This s a reason why machine learning approaches do not fare well when applied to these types of databases, especially since they cannot formulate a patter of the underlying structure, which is important for analogy mining. The corpora are also expensive to build, store, and update, while automation cannot be easily applied. The authors overcome these limitations by leveraging the creativity of the crowd and affordable computational capabilities of RNNs. The approach is a structured purpose-mechanism schema for identifying analogies between two research papers. Finally, the authors evaluate crowd worker performance by asking graduate students to annotate the ideas generated around three main ideas: quality, novelty, and feasibility. They find that their approach increased feasibility among the participants in the study.

Reflection:
Overall, I really liked the paper in how it attempts to solve a hard problem by using a scalable approach: crowds and RNNs, and tests it on a real-world dataset. I also liked how the paper defines similarity between different ideas (i.e. analogies) based on purpose and the mechanisms through which products work. Further, the paper suggests more complex metrics for research papers. This raises the question: how much more difficult is it to mine analogies for complex/more abstract ideas, compared to simple ideas? Perhaps structured labels could help in that regards.

The approach itself is commendable since it is a great example of a mixed-initiative user interface that combines the creativity of the crowd and the affordable computation of RNNs. Further, this approach does not needlessly waste human computation. The authors also completed a thorough evaluation of the machine intelligence portion.

Second, I appreciate the approach taken towards making something subjective—in this case, creativity—into something more objective, by breaking it down into different rate-able metrics.

Finally, the idea of using randomly generate analogies to “spark creativity” and the results of that show that creativity really does need diverse ideas. I wonder why this may be, and how to introduce such randomness into real-world work practice.

Questions:
1. How scalable do you think the system is? What other limitations does it have?
2. Can this approach be used to generate analogies in other fields? What would be different?
3. Do you think creativity is subject? Can it be made into something objective?

                                                                          

Read More

4/29/2020 – Akshita Jha – Accelerating Innovation Through Analogy Mining

Summary:
“Accelerating Innovation Through Analogy Mining” by Hope et al. talks about the problem of analogy mining from messy and chaotic real-world datasets. Hand created databases have high relational structures but are sparse in nature. On the other hand, machine learning and information retrieval techniques can be used but they lack the understanding of the underlying structure which is crucial to analogy related tasks. The authors leverage the strengths of both crowdsourcing and machine learning techniques to learn analogies from these real-world datasets. They make use of the creativity of the crowds with the cheap computing power of recurrent neural networks. The authors extract meaningful vector representations from product descriptions. They observe that this methodology achieves greater precision and recall than the traditional information-retrieval methods. The authors also demonstrate that the models significantly helped in generating more creative ideas compared to analogies retrieved by traditional methods.

Reflections:
This is a really interesting paper that talks about a scalable approach to finding analogies in large, real-world, messy datasets. The authors use a bi-directional Recurrent Neural Network (RNN) with Gated Recurrent Units (GRU) to learn the purpose and mechanism vectors for product descriptions. However, since the paper came out there have been great advances in the field of natural language processing tasks because of BERT: Bidirectional Encoder Representations from Transformers. BERT has achieved a state of the art results for many natural language tasks like question answering, natural language understanding, search, and retrieval, etc.. I’m curious to know how BERT would affect the results of this current system. Would we still need crowd workers for analogy detection or would using BERT alone for analogy computation suffice? One of the limitations of RNN is that it is directional, i.e., it can either read from right to left or left to right, or both. BERT is essentially non-directional, i.e, it takes all the words as inputs at once and hence, can compute the complex non-linear relationships between them. This would definitely prove helpful for detecting analogies. the approach taken by the authors by using TF-IDF did result in diversity but did not take into account the relevance. Also, the purpose and mechanism vector by the authors did not distinguish between high and low-level features. These learned vectors also did not take into account the intra-dependencies between different purposes and mechanisms or the inter-dependency between various purposes and mechanisms. It would be interesting to observe how these dependencies could be encoded and whether they would benefit the final tasks of analogy computation. Another aspect that can be looked into is the trade-off between generating useful vector and its relation with the creativity of the crowd-workers. Does creativity increase, decrease, or remain the same?

Questions:
1. Which another creative task can benefit from human-AI interaction?
2. Why is the task of analogy computation important?
3. How are you incorporating and leveraging the strength of crowd-workers and machine learning in your project?

Read More

4/29/2020 – Nurendra Choudhary – Accelerating Innovation Through Analogy Mining

Summary

In this paper, the authors argue the need for an automated tool for finding analogies from large research repositories such as US patent databases. Previous approaches in the area include manually constructing large structured corpora and automated approaches that are able to find semantically relevant research but cannot identify proper structure in the documents. The manual corpora are expensive to construct and maintain, whereas, automated detection is inefficient due to lack of structure identification.

The authors propose an architecture that defines a structured purpose-mechanism schema for the analogy identification between two research papers. The purpose and mechanism are identified by crowd-workers and a word vectorization is utilized to represent the different sections as vectors. The similarity metric is cosine similarity between the query vectors. The query vectors for the experiments are purpose, mechanism and concatenation of purpose and mechanism. The authors utilize the Precision@K metric to evaluate and compare to conclude the efficiency of mechanism only and concat purpose-mechanism queries over other types. 

Reflection

The paper is very similar to the SOLVENT discussed in the previous class. I believe they were both developed by the same research group and also share the same authors. I believe SOLVENT solves a range of problems in this paper. For example, the problem that purpose-mechanism cannot be generalized to all research fields and there is a need to add additional fields to make it work better for a wider range of fields. 

The baselines do not utilize the entire paper. I do not think it is fair to compare abstracts of different domains to find analogies. The abstract do not always speak about the problem or solution in necessary depth. According to me, we should add more sections such as Introduction, Methodology and Conclusion. I am not sure if they would perform better but I would like to see these metrics reported. Also the diversity of fields used in the experiments is limited to engineering backgrounds. I think this should be expanded to include other fields such as medicine, business and humanities (a lot of early scientists were philosophers :P).

Questions

  1. What problems in this paper does SOLVENT solve? Do you think the improvement in performance is worth the additional memory utilized?
  2. How do you think this framework will help in inspiring new work? Can we put our ideas into a purpose schema mechanism to get a set of relevant analogies that may inspire further research?
  3. The authors only utilize abstract to find analogies. Do you think this is a good enough baseline? Should we utilize the entire paper as a baseline? What are the advantages and disadvantages of such an approach? Is it not more fair?
  4. Currently, can we learn purpose mechanism schemas for different fields independently and map between them? Is there a limit to the amount of variation that limits this framework? For example, is it fair to use medical documents’ abstracts and compare them to abstracts to CS papers given the stark amount of difference between them?

Word Count: 509

Read More

04/29/2020 – Bipasha Banerjee – Accelerating Innovation Through Analogy Mining

Summary 

The paper by Hope et al. talks about analogy mining from texts, primarily data that is unstructured. They have used a product description dataset from Quirky.com, which is described to be a product innovation website, to find products that are similar. To be specific, they have used “purpose” and the “mechanism” of products to find analogies between items. They have also considered the traditional similarity metrics and techniques, namely the TF-IDF, LSA, GloVe, and LDA, to compare the proposed approach. Amazon Mechanical Turk crowd workers were used to create the training dataset. A recurrent neural network was then used to learn the representations of purpose and mechanism from the human-annotated training data. It is mentioned in the paper that they wanted to see if their approach enhanced creativity in idea generation. They tried to measure this by using graduate students to judge the idea generated on three main areas, novelty, quality, and feasibility. It was concluded that there was an increase of creativity among the participants of the study.                                                                                        

Reflection

The paper is an interesting read on the topic of finding analogies in texts. I really found it interesting how they defined similarities based on the purpose and the mechanisms in which the products worked. I know that the authors mentioned that since the dataset was about product description, the purpose-mechanism structure worked in finding analogies. However, they suggested some complex or hierarchial levels for more complex datasets like scientific papers. The only concern I had with this comment was, wouldn’t increasing the complexity of the training data further complicate the process of finding analogies? Instead of hierarchical level, I think it is best to add other labels to the text to find similarities. I think what I am suggesting is similar to what was done in the paper by Chang et al. [1], where background and findings were also included along with the labels included here.

The paper is a good groundwork on the work of finding similarities while using crowd workers to create the training data. This methodology, in my opinion, truly forms a mixed-initiative structure. Here, the authors did extensive evaluation and experimentation on the AI side of things. I really liked the way they compared against the traditional information retrieval mechanisms to find analogies. 

I liked that the paper also aimed to find if the creativity was increased. My only concern was “creativity” although defined is subjective. They said that they used graduate students but did not mention their background. Hence, a graduate student with a relatively creative background, say a minor in a creative field may view things differently.

In conclusion, I found this research to be strong as it included verification and validation of the results from all angles and not only the AI or the human side. 

Questions

  1. Are you using a similarity metric in your course project? If yes, what are the algorithms you are using? ( In our project, we are not using any similarity metric, but I have used all the traditional metrics mentioned in the paper in my research work before).
  2. Other than scientific data, what other kinds of complex datasets would need additional labels or hierarchical labeling?
  3. Do you agree with the authors’ way of finding if the study had enhanced creativity? 

References

  1. Chan, Joel, et al. “Solvent: A mixed initiative system for finding analogies between research papers.” Proceedings of the ACM on Human-Computer Interaction 2.CSCW (2018): 1-21.

Read More

04/29/20 – Jooyoung Whang – Accelerating Innovation Through Analogy Mining

This paper sought to find analogies in big messy real-world natural language data by applying the structure of purpose and mechanism. The authors created binarized data for each purpose and mechanism of a document by setting some words to 1 if one of purpose or mechanism could be represented by the word and 0 if not. Then the authors could evaluate distances between each of these vectors to let users generate creative ideas that have a similar purpose but the different mechanisms. The authors utilized Mturk to generate training sets as well as for evaluation. They measured creativity in terms of novelty, quality, and feasibility. The authors report significantly improved performance than the baseline of plain TF-IDF or random.

This paper appeared to be similar to the SOLVENT paper from last week, except that this one worked with product descriptions, only used the purpose-mechanism structure, and evaluated based on creativity. This was actually a more interesting read for me because it was more relevant to my project. I was especially inspired by the authors’ method of evaluating creativity. I think I may be able to do something similar for my project.

I took special note to the amount of compensation the authors paid to Mturk workers and tried to reverse-calculate the time they allotted for each worker. The authors paid $1.5 for a task that required redesigning an existing product, using 12 near-purpose far-mechanism solutions found by the authors’ approach. This must be a lot of reading (assuming 150 words per solution, that’s 1800 words of reading leaving out the instructions!) and creative thinking. Based on the amount paid, the authors expected about 10 minutes for the participants to finish their work. I am unsure if this amount was appropriate, but based on the authors’ results, it seems successful. It was difficult for me to gauge how much I should pay for my project’s tasks, but I think this study gave me a good anchor point. My biggest dilemma was balancing out the number of creative references provided by my workers versus the quality (more time needed to generate, thus more expensive) for each of the references.

These are the questions that I had while reading the paper:

1. One of the reasons why the SOLVENT paper expanded their analogy structure to purpose-background-mechanism-findings was because not all papers had a “mechanism” or a “solution.” (i.e. some papers were about simple findings of a problem or domain.) Do you think the same applies to this study?

2. Do you think the amount of compensation the authors paid was appropriate? If not, how much do you think would have been appropriate? I would personally really like to read some answers about this question to apply to my project.

3. What other ways could be used to measure “creativity”? The authors did a great job by breaking down creativity into smaller measurable components (although still being qualitative ones) like novelty, quality, and feasibility. Would there be a different method? Would there be more measurable components? Do you think the authors’ method captures the entirety of creativity?

Read More

04/29/2020 – Yuhang Liu – Accelerating Innovation Through Analogy Mining

Summary:

This article is similar to a previous paper “SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers” in that it focuses on how to search for articles in a vast corpus of papers. In the massive essay library, constantly get ideas. The research method of this paper is based on previous papers and previous papers with some changes. In previous papers, this system of discovering new ideas by analogy usually requires an understanding of the deep similarity between two entities, and then comparison or research, However, finding analogies is challenging for machines, as it is based on having an understanding of the deep relational similarity between two entities that may be very different in terms of surface attributes, and in previous studies there were methods based on similarity, such as TF-IDF, LSA, LDA, and GloVe, but the authors of this paper investigate a weaker structural representation, the goal is to come up with a representation that can be learned, while still being expressive enough to allow analogical mining. So the method proposed in this article requires crowdsourced workers to annotate the purpose and mechanism of a paper. Then through learning and systematic research, new ideas for similar purposes but through different mechanisms are obtained. And from the test, the authors found it has a better effect than other methods.

Reflection:

First of all, on the basis of reading another paper on a similar subject, I want to make a comparison between the two articles. In another article, People need to analyze an article from four aspects, Background, Purpose, Mechanism, Findings. Then the machine generates an analogy by comparing and learning these four structural aspects. In this article, a paper is divided into two parts, the purpose and the mechanism, which is also called as a relatively weak structured representation. Separating an idea into purpose and mechanisms enables core analogical innovation processes such as repurposing. So the final experiment in this article is also based on the same purpose, different mechanism. So, there are only two dimensions to represent a paper which is more abstract and broad, and directly learn them in a supervised method. The benefits of doing this is, it is possible to automatically extract these representations from product descriptions for potential wide applicability. Identifying key components and functions can also improve the search function of the system and better understand the needs of users.

Secondly, I think the most important thing about a system that finds analogies between papers or products is its feasibility, which is reason why I think the method in this paper is better. In terms of the feasibility of the system, letting a crowdsourced worker or machine discover the purpose and mechanism of an article described in a paper is far simpler than analyzing the structure (background, purpose, mechanism, finding) of an article, and there will be less errors in the simple tasks, so a better data sets will be available for learning. At the same time, in terms of the feasibility of the ideas that are formed, the system introduces graduate students to comprehensively judge the feasibility of ideas. From my aspect, this is particularly important, if an idea is unrealizable or cannot withstand the test , then no matter how novel, it is useless. So in my opinion this is the advantage of the system.

But at the same time, I have some doubts about the results of the system, because the system seems to be more inclined to find the different mechanism for the same purpose. For different purposes, the idea of the same mechanism seems difficult to obtain. And I think that the most difficult part of innovation is to apply an idea to a new field, and the examples of bionics that we think of most are: fish float and submarine, bat and radio. In my opinion, those inventions that have a significant impact on people’s lives are usually the application of mechanisms in other fields to achieve new goals. So for an unresearched field or unrealized purpose, I think these can be used as the direction of future research of the method.

Question:

  1. What are the limitations do you think of the method proposed in this article?
  2. What methods do you think can be used to evaluate the usefulness of analogy ideas?
  3. What do you think is more important for the idea of finding an analogy, surface similarity, structural similarity, or some other factors?

Read More

04/29/2020 – Sushmethaa Muhundan – Accelerating Innovation Through Analogy Mining

This paper aims to explore a mixed-initiative approach to find analogies in the context of product ideas to help people innovate. This system focuses on a corpus of product innovation from Quirky.com and aims to explore the feasibility of learning simple, structural representations of the problem schemas to find out analogies. The purpose and mechanism of each product idea are extracted and a vector representation is constructed. This is used to find analogies in order to inspire people to generate creative ideas. Three experiments were conducted as part of this study. The first experiment involved AMT crowd workers annotating the product ideas to segregate the purpose and mechanism. These annotations were used to construct the vectors in order to compare and compute the analogies. As part of the second experiment, 8000 Quirky products were chosen and crowd workers were asked to use the search interface to find out analogies for 200 seed documents. Finally, a within-study experiment was conducted wherein the participants were asked to redesign an existing product. For the given product idea, the participants received 12 product inspirations retrieved using the system developed, 12 using TF-IDF, and 12 were retrieved randomly. The system developed aimed to retrieve near-purpose, far-mechanism analogies to help users come up with innovative ideas. The results showed that the ideas generated by using the system’s results were more creative than the other two conditions.

Analogies often prove to be a source of inspiration and/or competitive analysis. In the case of inspiration, cross-domain analogies are specifically extremely helpful and can be difficult to find. Also, given the rate at which information generation is growing, it has become all the more difficult to explore and find analogies. Such systems would definitely help save time by predicting potential analogies from a large dataset. I feel that the system would definitely help come up with creative, alternate solutions when compared to traditional information retrieval systems.

With respect to the ideation evaluation experiment, the results showed that the randomly generated analogies actually were more successful in helping users come up with creative ideas when compared to the TF-IDF condition. This shows that the traditional information retrieval technologies would not work well as is in the setting of analogy predictions and needs to be tweaked in order to help serve the purpose of inspiring users. I feel that there is potential to expand the core concepts of the proposed system and use it in different applications. For instance, finding similar content could be augmented into a recommendation system where the system could recommend content similar to the user’s browsing history.

  • What are your thoughts about the system proposed? Do you think this system is scalable?
  • The study uses only purpose and mechanism to compare and predict analogies. Do you think these parameters are sufficient? Are there any other features that can be extracted to improve the system?
  • The paper mentions the need for extensions to generalize the solution to apply the system to other domains. Apart from the suggestions in the paper, what are some potential extensions?

Read More

04/29/2020 – Mohannad Al Ameedi – Accelerating Innovation Through Analogy Mining

Summary

In this paper, the authors aim to improve the search and discovery for ideas using analogies in massive and unstructured datasets. Their approach combines both crowd workers and recurrent neural network to learn from a week structural representation of vectors. The authors used a patent dataset to search for product description and used crowd workers to extracted purpose and mechanics to help with finding ideas across different domains. They have used Amazon Mechanical Turk to hire workers to perform a dual annotation on each product description by labeling the parts of text that is related to the purpose of the product and another labeling related to the mechanism or the way the product work and used . The authors then used bidirectional recurrent neural network and information retrieval techniques to find a deep and more accurate similarity between the searched idea and available innovation and research about it. The authors approach has a high precision and recall and can improve the retrieval accuracy by 25%.   

Reflection

I think the approach used by the authors is very interesting. Extracting the purpose and mechanism from a production description is like looking at the data from two different angles. Calculating the similarity base on two vectors is a nice implementation and can help on finding a close relationship between two subjects in different domains that share a common attribute.

I also like the idea of using deep learning instead of TF-IDF to calculate the similarity between to products’ description as it can improve the quality of the search.

I personally use google scholars to search for a similar ideas but didn’t use the websites mentioned in the paper and that is something that I have learned while reading the paper.

This approach can be used as a verification tool when reviewing a copy right application. The idea might be the same as other idea but in different domain and the application that was built by the authors can help on finding this out.

This approach is like mapping vocabulary to concept space to improve information retrieval by performing latent space indexing rather than just performing similarity on keywords. Different words might have the same meaning and one word might have different meanings. Searching based on the keywords might retrieval incorrect results, while searching based on the concept might lead to a much accurate result.

Questions

  • The authors asked the crowd workers to extract two pieces of information, the purpose and mechanism, from the product description. Can we use this approach to solve a different problem?
  • Do you agree with the authors that the recurrent neural network is better than traditional TF-IDF in calculating the similarity for the two vectors? Why or why not?
  • Can you use a similar approach in your project to ask the crowd workers to annotate your data from two different perspectives or looking at your data from two different angles?
  • The authors mentioned more than two websites that store information about patents, have you used these websites?

Read More