4/22/2020 – Nurendra Choudhary – SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers

Summary

SOLVENT aims to find analogies between research papers in different fields. E.g. simulated annealing in AI optimization is derived from metallurgy , Information foraging from Animal foraging. It aims to extract idea features from a research paper according to purpose-mechanism schema; Purpose: what they are trying to achieve, and  Mechanism: how they achieve that purpose. 

Research Papers cannot always be put into a purpose-mechanism schema due to Complex Language , Hierarchy of Problems and Mechanism vs Findings. Hence, the authors propose a Modified Annotation Scheme that includes; Background: Defines context of the problem, Purpose: Main problem being solved by the paper and Mechanism: The method developed to solve the problem and Findings: The conclusions of the research work In case of understanding type of papers, this section gives more information. Query the schema using cosine  similarity of tf-idf weighted average of word vectors. The authors scale up with crowd workers because expert annotation is prohibitively expensive. However, this shows a significant disagreement between crowd-workers and experts.

Reflection

Amazing (nothing’s perfect right :P) balance between human annotation capabilities and AI’s analysis of huge information sources to solve the problem of analogy retrieval. Additionally, the paper shows a probable future of crowd-work where tasks are increasingly complex for regular human beings. We have discussed this evolution in several classes before this and the paper is a contemporary example of this move towards complexity. The study that shows its application in a real-world research team provides a great example that other research teams can borrow.

I would like a report of the performance of Knowledge Back+Purpose+Mechanism+Findings. I don’t understand the reason for its omission (probably space issues). The comparison baseline utilizes Abstract queries but a system can potentially have access to the entire paper. I think that would help with the comparative study. The researchers test out expert researchers and crowd workers. An analysis should be done on utilizing undergraduate/graduate students. As pointed out by the authors, the study set is limited due to the need of expert annotations. However, no diversity is seen in the fields of study. 

“Inspiration: find relevant research that can suggest new properties of polymers to leverage,or new techniques for stretching/folding or exploring 2D/3D structure designs”

Fields searched: materials science, civil engineering, and aerospace engineering.

Questions

  1. The paper shows the complexity of future crowd-work. Is the solution limited to area experts? Is there a way to simplify or better define the tasks for a regular crowd? If not, what is the limit of a regular crowd-worker’s potential?
  2. How important is finding such analogies in your research fields? Would you apply this framework in your research projects?
  3. Analogies are meant for easier understanding and inspiration. SOLVENT has a limited application in inspiring novel work. Do you agree? What could be a possible scenario where it inspires novel work?
  4. Compared to MTurk, Upwork workers show better agreement with researchers across the board. What do you think is the reason? When would you use MTurk vs UpWork? Do you think higher pay would proportionally improve the work quality as well?

Word Count: 532

Read More

04/22/2020 – Nan LI – Opportunities for Automating Email Processing: A Need-Finding Study

Summary:

The main objective of this paper is to investigate the potential of user needs regarding email management automation. To achieve this, the author conducted a mixed-methods need-finding study through three probes. First, the author determined the categories of email automation requirements through a workshop and then conducted a more extensive survey to deepen the understanding of the identified needs. The paper listed the primary needs identified in the workshop. Then, they also investigate the existing email automation software to detect what demands have been addressed and list 8 significant functions of email scripts on Github. Finally, they experimented with a programmable email system, YouPS, which allows users to custom email management automation using simple programmatic language. This experiment lasted a week to observe the user’s interaction with the system. Finally, the author discussed the limation and future works regarding the current email clients. 

Reflection:

I think this is an essential topic regarding the critical proportion of mail in daily study and life. Actually, I did not realize that email can play such an essential role in daily life before I came to America. Because in the place I came from, people prefer to use instant contact software, especially for a private chat or group discussion. In this year, I have gradually become accustomed to using mail, and I have developed many habits that I did not realize, but were identified by this article. For example, I would mark the read email as “unread” if that email contains important information. Even though email has the function called “flag,” but I still ignore the email that I “flag.” In contrast, mark as unread is the best way to remind me there is an important thing that I need to deal with ASAP. Therefore, when reading this article, most of my feelings like, yes, this is just what I want; or, it would be wonderful if this demand could be met. 

On the other hand, there are severy identified demands already achieved. For example, we can reference or quote from another email when sending emails, as well as aggregate responses into a poll based on the same sender. Besides, I think the email modes also implemented already (as I have received the automatic reply from faculty in our university when they are on vacation). These features make the email function more robust. 

Regarding the third probe, there is an obvious limitation, and this limitation also mentioned in the paper, which is a lack of existing non-programmer tools for automating an email. However, I think before creating GUIs, I think the more significant thing is to figure out whether people would utilize those email rules if we implement them. For example, email has the function call “flag,” which makes the vital email “stand out.” They even have the choice to use a different color to distinguish the email. Nevertheless, I still prefer to mark as “unread” when I really need to deal with that email soon. Therefore, it is worth thinking about how to implement these email rules to maximize utilization and convenience. 

Question:

  1. What is your particular need for email automation? Which needs identifying in the paper most suit your needs? Do you have any other needs that not mentioned in the paper?
  2. What do you think about the approach that investigates the user’s need for email automation in the first probe? It seems that this method only allows users to brainstorm, and only 13 participants have uneven gender distribution, do you think it will work well?
  3. Actually, there are a lot of function identified in the paper have achieved nowadays, for example, reference or quote from prior emails, and summary a group of responses containing the initial responses. Do you use these features? What do you think of them?

Word count: 630

Read More

04/22/2020 – Nan LI – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

Summary

This paper introduced a hybrid-initiative system called SOLVENT, which aims to find analogies between research papers in different fields through combining the work of human annotations of critical features of research paper and computational model which construct a semantic vector from those annotations. The author conducted three studies to demonstrate the performance and efficiency. In the first study, the author let people with specialized domain knowledge to do the annotation works and proved the system able to find known analogies. In order to prove the effectiveness of using SOLVENT to find analogies with the real-word case, the author demonstrated a real-word scenario, explained the primary process, and let professional without domain knowledge people to do the annotation. The results indicate that the relevant match found by the system was judged to be both useful and novel. The third study proved that they could scale up the system by allowing crowd workers to annotate papers. 

Reflection

I think this paper brought up a novel and needed the idea. One limitation of searching for related work online is that the scope of the searching result is too narrow if we search for the keyword or citation. The results usually only include the related paper, the paper which cited the related paper, or the same paper that published from a different place. However, you can always find some inspiration from paper that not relevant to what you are looking for, or it can be an irrelevant paper. Nevertheless, this situation is usually unattainable. Take our project as an example. Our project was inspired by one of the papers we read before, and we would like to improve the work in that paper further. It should be straightforward to find related work because we have previous work already. Disappointingly, we could not find many useful papers or even latent techniques. The only thing we found appears most frequently is the same paper which inspired us, but published from a different place. Thus, from my perspective, this system is designed for searching inspirations through finding analogies. If the system can achieve this, it would be significant. 

On the other hand, this seems like a costly approach because it requires a large number of workers to do the annotation work of a large scale of paper to guarantee the system’s performance. Besides, based on the results provided in the paper, the system performance can only be described as “useful” instead of “efficiently.” If I urgently need inspiration, I may try such a system, but I would not count on this system.

Question

  1. What do you think of the original idea presented in the paper “Scientific discoveries are often driven by finding analogies in distant domains”? Do you think this theory applies for the majority of people? Do you think finding analogies would inspire your work?
  2. What do you think regarding the system “usefulness” and “efficiency”? 
  3. Can you think about any other way to utilize the SOLVENT system?
  4. The user mentioned in the paper that crowd workers can do the annotation work with no domain knowledge or even with no experience of reading a paper. Do you think this will influence the system performance? What are your criteria for recruit workers?

Word Count: 538

Read More

4/22/2020 – Nurendra Choudhary – The Knowledge Accelerator: Big Picture Thinking in Small Pieces

Summary

In this paper, the authors aim to provide a framework to deconstruct complex systems into smaller tasks that can be easily managed and done by crowd-workers without need of supervision. Currently, crowdsourcing is predominantly used for small tasks within a larger system dependent on expert reviewers/content managers. Knowledge Accelerator provides a framework to build complex systems solely based on small crowdsourcing tasks.

The authors argue that major websites like Wikipedia depend on minor contributors but require an expensive network of dedicated moderators and reviewers to maintain the system. They eliminate these points by a two phase approach: inducing structure and information cohesion. Inducing structure is done through collecting relevant web pages, extracting relevant text and creating a topic structure to encode the clips to the categories. The information cohesion is achieved by crowd-workers gathering information and improving sections of the overall article without global knowledge and adding relevant multimedia images. 

Reflection

The paper introduces a strategy for knowledge collection that completely removes the necessity for any intermediate moderator/reviewer. KA shows the potential of unstructured discussion forums as sources of information. Interestingly, this is exactly the end goal of my team’s course project. The idea of small-scale structure collection from multiple crowd-workers without any of them having context to the global article is generalizable to several areas such as annotating segments of large geographical images, annotating segments of movies/speech and fake-news detection through construction of an event timeline.

The paper introduces itself as a break-down strategy for all complex systems into simpler tasks that can be crowdsourced. However, it settles into the problem of structure and collection. E.g. Information structures and collection are not enough for jobs that involve original creation such as softwares, network architectures, etc. 

The system heavily relies on crowd-sourcing tasks. Some modules have effective AI counterparts. E.g. Inducing topical structure, searching relevant sources of information and multimedia components. I think a comparative study would help me understand the reasons for the decision. 

The fact that Knowledge Accelerator works better than search sites opens up new venues of exploration that collect data by inducing structure in various domains. 

Questions

  1. The paper discusses the framework’s application in Question-Answering. What are the other possible applications in other domains? Do you see an example application in a non-AI domain?
  2. I see that the proposed framework is only applicable to collection of existing information. Is there another possible application? Is there a way we can create new information through logical reasoning processes such as deduction, induction and abduction?
  3. The paper mentions that some crowd-work platforms allow complex tasks but require a whetting period between workers and task providers. Do you think a change in these platforms would help? Also, in traditional jobs, interviews enable similar whetting. Is it a waste of time if quality of work improves?
  4. I found my project similar to the framework in terms of task distribution. Are you using a similar framework in your projects? How are you using the given ideas? Will you be able to integrate this in your project?

Word Count: 524

Read More

04/22/2020 – Ziyao Wang – SOLVENT: A Mixed-Initiative System for Finding Analogies Between Research Papers

The authors introduced a mixed-initiative system named SOLVENT in this paper. In the system, humans annotate aspects of research papers that denote their background, purpose, mechanism, and findings. A computational model is used to construct a semantic representation from these annotations which are valuable for finding analogies among papers. They tested their system through three experiments. The first one used annotation from domain expert researchers, the second one used annotation from experts outside the papers’ domain and the last one used crowdsourcing. From the experiments’ results, the authors found that the system was able to detect analogies among different domains and even crowd-sourcing workers, who have limited domain knowledge, are able to do the annotations. Additionally, the system has a better performance than similar systems.

Reflections

I used to only search papers within the computer science area when I did my projects or tried to solve a problem. Of course, sometimes we can get inspired by the ideas in papers from other domains, but it is quite difficult to find such a paper. There are countless papers in various domains, and there was not an efficient method to find needed papers from other domains. Additionally, even we found a paper that is valuable to the problem we are solving, the lack of background knowledge may make it difficult for us to understand the ideas behind the paper.

This system is a great help to the above situation. It can let people find related papers from other domains even they have limited knowledge about that domain. Even though the amount of papers is increasing sharply, we can still find the papers we need efficiently with this system. Before, we can only search for keywords related to the specific area. With this system, we can try to search for ideas instead of specific words. This is beneficial if some of the papers used abbreviations or analogies in the titles. If we only use keyword searching, we may miss these papers. But with idea searching, these papers will be marked as valuable. Also, the human annotation in the system can help us to understand the ideas of the papers easily, and we can exclude irrelated papers with high efficiency.

One more point is that cross-domains projects or researches are increasing significantly nowadays. For these studies, they needed to read a huge amount of papers in both domains and then they can have a novel idea to solve the cross-domain problem. If these researchers can have the system, they can find similarities in both domains easily and can have headlines about the background, purpose, mechanism, and findings in the papers. The efficiency of these researches can be improved, and the researchers can find more novel interdisciplinary studies with the help of the system.

Questions:

Will the performance of the system decrease when dealing with larger size database and more difficult papers?

Is it possible to update the system results regularly when newly published papers are added into the database?

Can this system be applied in the industries? For example, find the similarity of the mechanisms in the production of two products and use the findings to improve the mechanisms.

Read More

04/22/2020 – Ziyao Wang – Opportunities for Automating Email Processing: A Need-Finding Study

The authors conducted a series of surveys regarding email automation. Firstly, they held a workshop which invited 13 computer science students who are able to program. They were required to write email rules using natural language or pseudocode to identify categories of needed email automation. Then they analyzed the source code of scripts on GitHub to see what is needed and already developed by programmers. Finally, they deployed a programmable system YouPS which enables users to write custom email automation rules and made a survey after they used the system for one week. Finally, they found that currently limited email automation cannot meet users’ requirements, about 40% or the rules cannot be deployed using existed systems. Also, they concluded these extra user requirements for future development.

Reflections

The topic of this paper is really interesting. We use email every day and sometimes are annoyed by some of the emails. Though the email platforms already deployed some automation and allow users to customize their own scripts, some of the annoying emails can still go into users’ inboxes and some important emails are classified as spams. For me, I used to adjust myself to the automation mechanism. Check my spams and delete the advertisements from the inbox every day. But it would be great if the automation can be more user-friendly and provide more labels or rules for users to customize. This paper focused on this problem and did a thorough series of surveys to understand the users’ requirements. All the example scripts shown in the results seem useful to me and I really want the system can be deployed practically.

We can also learn from the methods used by the authors to do the surveys. Firstly, they hired computer science students to find general requirements. These students can seem like pilots. From these pilots, the researchers can have an overview of what is needed by the users. Then they did background researches according to the findings from the pilots. Finally, they combined the findings from both pilots and background researches to implement a system and test the system with the crowdsource workers, who can represent the public. This series of works is a good example of our projects. For future projects, we may also follow this workflow.

From my point of view, a significant limitation in the paper is that they only test the system on a small group of people. Neither computer science students nor programmers who upload their codes to GitHub cannot represent the public. Even the crowd workers still cannot represent the public. Most of the public knows little about programming and do not complete Hits on MTurk. Their requirements are not considered. If the condition available, the surveys should be done with more people.

Questions:

What is your preference in email automation? Do you have any preference which is not provided by current automation?

Can the crowd workers represent the public?

What should we do if we want to test systems with the public?

Read More

04/22/2020 – Dylan Finch – Opportunities for Automating Email Processing: A Need-Finding Study

Word count: 586

Summary of the Reading

This paper investigates automation with regards to emails. A large portion of many people’s days is devoted to sifting through the hundreds of emails that they receive. Many of the tasks that go into this might be automatable. This paper not only looks at how different tasks related to dealing with emails can be automated, but it also investigates the opportunities for automation in popular email clients. 

The paper found that many people wanted to automate tasks that required more data from emails. Users wanted access to things like the status of the email (pending, done, etc.), the deadline, the topic, the priority, and many other data points. The paper also noted that people would like to be able to aggregate responses to emails to more easily see things like group responses to an event. Having access to these features would allow for users to better manage their inboxes. Some current solutions exist to these issues, but some automation is held back by limitations in email clients.

Reflections and Connections

I love the idea of this paper. I know that ever since I got my email account, I have loved playing around with the automation features. When I was a kid it was more because it was just fun to do, but now that I’m an adult and receive many more emails than back then (and many more than I would like), I need automation to be able to deal with all of the emails that I get on a daily basis. 

I use Gmail and I think that it offers many good features for automating my inbox. Most importantly, Gmail will automatically sort mail into a few major categories, like Primary, Social, and Promotions. This by itself is extremely helpful. Most of the important emails get sent to the Primary tab so I can see them and deal with them more easily. The Promotions tab is also great at aggregating a lot of the emails I get from companies about products or sales or whatever that I don’t care about most of the time. Gmail also allows users to make filters that will automatically do some action based on certain criteria about the email. I think both of these features are great. But, it could be so much more useful.

As the paper mentions, many people want to be able to see more data about emails. I agree. The filter feature in Gmail is great, but you can only filter based on very simple things like the subject of the email, the date it was sent, or the sender. You can’t create filters for more useful things like tasks that are listed in the email, whether or not the email is an update to a project that you got other emails about, or the due date of tasks in the email. Like the paper says, these would be useful features. I would love a system that allowed me to create filters based on deeper data about my emails. Hopefully Gmail can take some notes from this paper and implement new ways to filter emails.

Questions

  1. What piece of data would you like to be able to sort emails by?
  2. What is your biggest problem with your current email client? Does it lack automation features? 
  3. What parts of email management can we not automate? Why? Could we see automatic replies to long emails in the future?

Read More

04/22/2020 – Dylan Finch – SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers

Word count: 566

Summary of the Reading

This paper describes a system called SOLVENT, which uses humans to annotate parts of academic papers like the high-level problems being addressed in the paper, the specific lower-level problems being addressed in the paper, how the paper achieved its goal, and what was learned/achieved in the paper. Machines are then used to help detect similarities between papers so that it is easier for future researchers to find articles related to their work.

The researchers conducted three studies where they showed that their system greatly improves results over similar systems. They found that the system was able to detect near analogies between papers and that it was able to detect analogies across domains. One interesting finding was that even crowd workers without extensive knowledge about the paper they are annotating can produce helpful annotations. They also found that annotations could be created relatively quickly.

Reflections and Connections

I think that this paper addresses a real and growing problem in the scientific community. With more people doing research than ever, it is increasingly hard to find papers that you are looking for. I know that when I was writing my thesis, it took me a long time to find other papers relevant to my work. I think this is mainly because we have poor ways of indexing papers as of now. Really the only current ways that we can index papers are by the title of the paper and by the keywords embedded in the paper, if they exist. These methods can help find results, but they are terrible when they are the only way to find relevant papers. A title may be about 20 words long, with keywords being equally short. 40 words does not allow us to store enough information to fully represent a paper. We lose even more space for information when half of the title is a clever pun or phrase. These primitive ways of indexing papers also lose much of the nuance of papers. It is hard to explain results or even the specific problem that a paper is addressing in 40 words. So, we lose that information and we cannot index on it. 

A system like the one described in this paper would be a great help to researchers because it would allow them to find similar papers much more easily. This doesn’t even mention the fact that it lets researchers find papers outside of their disciplines. That opens up a whole new world of potential collaboration. This might help to eliminate the duplication of research in separate domains. Right now, it is possible that mathematicians and computer scientists, for example, try to experiment on the same algorithm, not knowing about the team from the other discipline. This wastes time, because we have two groups researching the same thing. A system like this could help mitigate that.

Questions

  1. How would a system like this affect your life as a researcher?
  2. Do you currently have trouble trying to find papers or similar ideas from outside your domain of research?
  3. What are some limitations of this system? Is there any way that we could produce even better annotations of research papers?
  4. Is there some way we could get the authors of each paper to produce data like this by themselves?

Read More

04/22/2020 – Mohannad Al Ameedi – The Knowledge Accelerator: Big Picture Thinking in Small Pieces

Summary

In this paper, the authors try to help crowdsource workers to have a big picture of the system that the small tasks assigned to them try to accomplish which can help in executing the task more efficiently to have a better contribution on achieving the big goal. The work also tries to help companies to remove the bottleneck caused by a small number of people who normally know or maintain the big picture and can cause serious risks if these individuals leave the company. The authors designed and developed a system known as Knowledge Accelerator that can be used by crowdsource worker to answer a given question and allow them to use relevant resources to help them answering the question in a big picture context without the need to a moderator. The system starts by asking the workers to choose different web pages related to the topic then extract the relevant information, then cluster the information on different categories. The system then integrate the information by drafting an article, then allow the editing of the article, and finally add supporting images or videos that are related to the article, and that way the system can help the crowdsource worker to know the big picture of the system and complete the task in a way that can achieve the big picture and goal of the system.

Reflection

I found the study mentioned in the paper to be very interesting. I agree with the authors that a lot of tasks that are done by crowdsource workers are simple and it is hard to divide complex tasks that can require the knowledge of the big picture. Knowing the big picture is very important and often known be very few people who are normally in technical leadership positions and losing them might cause serious issues.

I like the way the system is designed to provide a high-level information about the system while working on a small tasks. The pipeline and multi-stage operations used in the system to generate a cohesive article can help the workers to achieve the goal and also to know more information about the topic.  

This approach can also be used when building large scale systems where many components need to be built by developers, and often these developers don’t know what the system is trying to accomplish or try to solve. Normally developers work on a specific task to add an employee table or building a web service endpoint that can receive a request and send back a response without knowing who will be using the system or what will be the impact of their tasks on the overall system. I think we can use such system to help developers to understand the big picture of the system which can help them to solve problem on a way that can make a greater impact on the big picture or big problem that the system is trying to solve.

Questions

  • The system developed by the authors can help with generating articles about a specific topic, can we use the system to solve different problems?
  • Can we use this approach in software development to help developers understand the big picture of the system that they are trying to build especially when building large systems?
  • Can you think of a way to use a similar approach in your course project?

Read More

04/22/2020 – Mohannad Al Ameedi – Opportunities for Automating Email Processing: A Need-Finding Study

Summary

In this paper, the authors aim to study the needs of users for email automation and the resources required to achieve the automation. The authors goal is to design a good email automation system. They led a workshop to group the requirements into different categories, and they also conducted a survey using human computation to help understanding the users needs. After collecting all the requirements, the authors performed another study by reviewing an open source codebase available on GitHub to see which requirements already been met. After building and running the source code, they asked users to interact with the system to find out what is working well and what is not. They find out that there are limitations with the current implementation especially with complex requirements and lots of requirements are not being met. The authors hope that their findings can help future research to focus on the needs that are not met or satisfied yet.

Reflection

I found the method used by the authors to be very interesting. Conducting a survey and leading a workshop to find the users requirements and cross reference them with what is available and what is not with the current implementations is a nice approach to find out what is not implement yet.

I also like the idea of performing a code analysis on an open source project and link the analysis with user requirements. This approach can be used by software companies to search GitHub for current implementations of certain requirement rather than just searching a code implementation for a specific library or a tool.

I like the idea of email automation and I have used rules before to automatically move certain emails to special fodders. Nowadays most systems send automatic notifications and these notifications are necessary but sometimes it make it hard to distinguish between emails that need an immediate response versus emails that need a review at a later time. I also like that Gmail automatically move emails that has advertisements to different folders or different view to let the user focus on the important emails.

I agree with authors that there is a big room of improvements in the current implementation of email automation, but it will be interesting to know what will be the results if email systems like outlook, Gmail, and Yahoo have been deeply investigated to know what have been already implemented in these systems which was missing in the system that they have studied.

Questions

  • The authors studied the current implementation using one system and over a week of time, do you think using more than one systems or study the user interactions over multiple weeks or months might lead to a different results?
  • Do you think email automation can be used to send critical business emails that might accidentally includes some information that shouldn’t be sent? How can such systems overcome such issues?
  • Have you used rules to automate email operations? Were they useful?  

Read More