04/22/2020 – Bipasha Banerjee – Opportunities for Automating Email Processing: A Need-Finding Study

Summary

The primary goal of the paper is to provide automation support to users in terms of email handling. The authors first tried to determine what are the automatic features that users wanted in their email service and what are the informational and computational needs when it came to implementing such a system. They conducted three experiments, out of which, the first experiment was to gauge what kind of features the users wanted to be automated. In this particular study, there was no boundary as to what can or can’t be implemented. So this experiment effectively gave all the range of tasks and features users would “wish” their email interface provided. The second experiment aimed to find all the current automated implementations available on the market. This involved sifting through GitHub repositories to find projects aiming to solve the current gap regarding automation of email processing. Finally, the last experiment involved giving the users to code their “ideal” features using the YouPS interface. This study consisted mainly of students from engineering backgrounds familiar with Python programming.

Reflection

The paper provided an interesting perspective on how users want their email clients to perform. For this, it was important to understand the needs of the people. The authors’ do this by conducting the first experiment on finding the ideal features that users look for. I liked the way the task of customer discovery of needs was approached. However, I wanted to point out that the median age range of participants’ was 20-29, and all had a university affiliation. It would be interesting to see what older people from both university and industry background want in such email clients. Getting the perspective of a senior researcher or a senior manager is important. I feel that these people are the ones who receive far more number of emails and would want and need automated email processing. 

I resonated with most of the needs that users’ pointed out and recognized some of the existing features that my current email client provides. For example, google generally keeps an option of “follow up” if an email sent didn’t get a response or the “reply” option if the email received hasn’t been replied for n-days. I am particularly interested in the different modes that could be set up. This would prove to be useful where the user could focus on work and periodically check a particular label like “to-do” or “important.” Additionally, only getting an important notification is also a priceless feature, in my opinion. 

Having loved all the proposed features in this paper, I would also like to point out some of the flaws, in my opinion. First, some of the applied rules might cause disruptions in case of important emails. One of the features mentioned was to automatically mark an email “read” when the consecutive emails come from the same recipient. This would work in case of a “social” or “promotions” email. However, this might end up making the user do more tasks, i.e., find from the read emails the ones that he actually never read. Additionally, I was also curious to know how security was handled here. Emails are anyways not known to be a secure medium of communication, and using this tool on top of it might make it further unsecure. Especially when research-related topics are discussed in the emails, they might be prone to breach? 

Questions

  1. What are the features you look for when it comes to email management? I would want to be only notified about emails that are important. 
  2. What other systems could benefit from such studies other than email processing? Could this be used to improve recommender systems, other file organizing software? 
  3. Would it be useful to take the input of senior researchers and managers? They are people who receive a lot of emails, and knowing their needs would be useful.
  4. How was the security handled in the YouPS system? 

Read More

04/22/2020 – Akshita Jha – Opportunities for Automating Email Processing: A Need-Finding Study

Summary:
“Opportunities for Automating Email Processing: A Need-Finding Study” by Park et. al. is an interesting paper as it talks about the need to manage emails. Managing emails is a time-consuming task that takes significant effort both from the consumer and the recipient. The authors find out that some of the work can be automated. The authors performed a mixed-methods need-finding study in order to essentially understand and answer two important questions: (i) What kind of automatic email handling do users want? (ii) What kind of information and computation is needed to support that automation? The authors conduct an investigation including a design workshop and a survey to identify the categories of needs and thoroughly understand these needs. The authors also surveyed the existing automated email classification systems to understand which needs have been addressed and where the gaps are in fulfilling these needs. The work done highlights the need for: “(i) a richer data model for rules, (ii) more ways to manage attention, (iii) leveraging internal and external email context, (iv) complex processing such as response aggregation, and affordances for senders.” The authors also ran a small authorized script over a user’s inbox which demonstrated that the above needs can not be fulfilled by existing email clients. This can be used as a motivation for new design interventions in email clients.

Reflections:
This is an interesting work that has the potential to pave the way for new design interventions in email processing and email management. However, there are certain limitations of this work. Out of the three studies that the authors conducted, two of them were explicitly focused on programmers. The third study focused on an engineer. This brings into question the generalizability of the experiments conducted. The needs of diverse users may wary and the results might not hold. Also, the questions the authors asked in the survey were influenced by the design workshop conducted by the authors which in turn influenced the analysis of the needs. The results might not hold true for all kinds of participants. The authors also could not quantify the percentage of need that is not being met. Also, asking programmers to be a part of the study did not help as they have the skills to write their own code and fulfill their own needs. The GUI needed by non-programmers might differ from those needed by the programmers. The authors should also seek insight from prior tools to build and improve upon their system.

Questions:
1. What are your thoughts on the paper? How do you plan to use the concepts present in the paper in your project?
2. Would you want an email client that manages your attention? Do you think that would be helpful?
3. How difficult is it for machine learning models to understand the internal and external context of an email?
4. Which email client do you use? What are the limitations of that email client?
5. Do you think writing simple rules for email management is too simplistic?

Read More

04/22/2020 – Akshita Jha – The Knowledge Accelerator: Big Picture Thinking in Small Pieces

Summary:
“The Knowledge Accelerator: Big Picture Thinking in Small Pieces” by Hahn et. al. talks about interdependent pieces of crowdsourcing. Most of the crowdsourcing tasks involve relying on a small number of humans to complete all the tasks required to complete a big picture. For example, most of the work in Wikipedia is done by a small number of highly invested editors. The authors bring up the idea of using a computational system such that each individual sees only a small part of the whole. This is a difficult task as much of the real-world tasks cannot be broken down into small, independent units and hence, the Amazon Mechanical Turk (AMT) cannot be used efficiently as it is used for prototyping. Also, it is a challenging problem to maintain the coherency of the overall system while breaking down the big task into smaller and mutually independent chunks for the crowd workers to work on. Moreover, the quality of the work being done is also dependent on the division of the tasks into coherent chunks. The authors present their idea which mitigates the need for a big picture view by a small number of workers, by ensuring small contributions by the individuals who see only a small chunk of the whole.

Reflections:
This is an interesting work as it talks about the advantages and the limitations of breaking down big tasks into small pieces. They built a prototype system called “Knowledge Accelerator” which was constrained such that no single task would amount for more than $1. Although the authors used this metric for tasks division, I’m not sure if this is a good enough metric to judge the independence and the quality of the small task. Also, the authors mention that the system should not be seen as a replacement for expert creation and curation of content. I disagree with this as I feel that with some modifications to the system, the system has the potential and might be able to completely replace humans for this task in the future. As is, the system has some gaping issues. The absence of a nuanced structure in the digests is problematic. It might also help to include iterations in the system after the workers have completed a part of their tasks and require more information. Finally, the authors would benefit by taking into account the cost of producing these answers on a large scale. The authors could use a computational model to dynamically decide how many workers and products to use at each stage such that the overall cost is minimized. The authors can also check if some of the answers could be reused across questions and across users. Incorporating contextual information can also help improve the system significantly.

Questions:
1. What are your thoughts on the paper?
2. How do you plan to use the concepts present in the paper in your project?
3. Are you dividing your tasks into small chunks such that crowd workers only see a part of the whole?

Read More

04/22/2020 – Subil Abraham – Chan et al., “SOLVENT”

Academic research strives for innovation in every field. But in order to innovate, researchers need to scour the length and breadth of their field as well as adjacent fields in order to get new ideas, understand the current work, and make sure they are not repeating someone else’s work. Being able to use an automated system that will scour the published research landscape for similar existing work would be a huge boost to productivity. SOLVENT aims to solve this problem by providing a method of having humans annotating the research abstracts to identify the background, purpose, method, and findings and use that as a schema to index the papers and identify similar papers (which have also been annotated and indexed) that match closely in one or a combination of those categories. The authors conduct three case studies to validate their method, for finding analogies within a single domain, for finding analogies across different domains, and for validating if annotating using crowd workers would be useful. They find that their method consistently outperforms other baselines in finding analogous papers.

I think that something like this would be incredibly helpful for researchers and significantly streamline the research process. I would gladly use something like this in my research process because it would save so much time. Of course, as they pointed out, the issue is scale because for it to be useful, a very large chunk (or ideally all) of current published research needs to be annotated according to the presented system. This could be something that can be integrated as a gamification mechanism in google scholar, where you occasionally ask the user to annotate an abstract. This way, you’re able to do it at scale. I also find it interesting that purpose+mechanism produced more results than background+purpose+mechanism. I would’ve figured that the more context that background provides would serve to find better matches. But given the goal of finding analogies even in different fields, perhaps background+purpose+mechanism rightly does not give great results because it gets too specific by providing too much information.

  1. Would you find use for this? Or would you prefer seeking out papers on your own?
  2. Do you think that the annotation categories are appropriate? Would other categories work? Maybe more or less categories?
  3. Is it feasible to expand on this work to cover annotating the content of whole papers? Would that be asking too much?

Read More

04/22/2020 – Subil Abraham – Park et al., “Opportunities for automating email processing”

Despite many different innovations from many different directions to try and revolutionize text communication, the humble email has lived on. The use of emails and email clients has adapted to current demands. The authors of this paper conduct investigations of the current needs of email users through a workshop and a survey, and they analyze open source code repositories that were performing things on email, in order to identify what are the current users needs, what is not being solved by existing clients, and what tasks are people taking the initiative to solve programmatically that their email clients don’t solve. They identify several high level categories of needs: the need for additional metadata on the email, the ability to leverage internal and external context, managing attention, not overburdening the receiver, and automated content processing and aggregation. They create a python tool called YouPS that provides an API with which a user can write scripts to perform some email automation tasks. They study the users of their tool for a week and note the kind of work they automate with the tool. They found that about 40% of the rules the users created with YouPS could not be done within their ordinary email client.

It’s fascinating that there is so much efficiency that can be obtained by allowing people to manage their email programmatically. I feel like something like this should’ve been a solved problem but apparently there is still room for innovation. It’s also possible that what YouPS provides is something that couldn’t really be done in an existing client, either because an existing client is trying to be as user friendly as possible to the widest variety of people (how many people actually know what IMAP does?). Alternatively, it could be a result of email clients just having accumulated so much cruft that adding a programmable layer would be incredibly hard. I get the reason why their survey participants skew towards computer science students, and how their solution gravitates towards solving the email problem in a way that is better suited for people who are in computer science. But I also think that, in the long term, keeping YouPS the way it is is the right way to go. With every additional layer of abstraction you add, you lose flexibility. GUIs are not the way to go but rather that people will adapt to using the programming API as programming becomes more prevalent in daily life. I also find the idea of modes really interesting and useful and would definitely be something I would like to have in my email clients.

  1. What kind of modes would you set up with different sets of rules e.g. a research mode, a weekend mode?
  2. Do you think that changing YouPS to be more GUI based would be beneficial because it would reach a wider audience? Or should it keep its flexibility at the cost of maybe not having wide spread adoption?
  3. How would you think about training an ML model that can capture internal and external context in your email?

Read More

4/22/2020 – Nurendra Choudhary – SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers

Summary

SOLVENT aims to find analogies between research papers in different fields. E.g. simulated annealing in AI optimization is derived from metallurgy , Information foraging from Animal foraging. It aims to extract idea features from a research paper according to purpose-mechanism schema; Purpose: what they are trying to achieve, and  Mechanism: how they achieve that purpose. 

Research Papers cannot always be put into a purpose-mechanism schema due to Complex Language , Hierarchy of Problems and Mechanism vs Findings. Hence, the authors propose a Modified Annotation Scheme that includes; Background: Defines context of the problem, Purpose: Main problem being solved by the paper and Mechanism: The method developed to solve the problem and Findings: The conclusions of the research work In case of understanding type of papers, this section gives more information. Query the schema using cosine  similarity of tf-idf weighted average of word vectors. The authors scale up with crowd workers because expert annotation is prohibitively expensive. However, this shows a significant disagreement between crowd-workers and experts.

Reflection

Amazing (nothing’s perfect right :P) balance between human annotation capabilities and AI’s analysis of huge information sources to solve the problem of analogy retrieval. Additionally, the paper shows a probable future of crowd-work where tasks are increasingly complex for regular human beings. We have discussed this evolution in several classes before this and the paper is a contemporary example of this move towards complexity. The study that shows its application in a real-world research team provides a great example that other research teams can borrow.

I would like a report of the performance of Knowledge Back+Purpose+Mechanism+Findings. I don’t understand the reason for its omission (probably space issues). The comparison baseline utilizes Abstract queries but a system can potentially have access to the entire paper. I think that would help with the comparative study. The researchers test out expert researchers and crowd workers. An analysis should be done on utilizing undergraduate/graduate students. As pointed out by the authors, the study set is limited due to the need of expert annotations. However, no diversity is seen in the fields of study. 

“Inspiration: find relevant research that can suggest new properties of polymers to leverage,or new techniques for stretching/folding or exploring 2D/3D structure designs”

Fields searched: materials science, civil engineering, and aerospace engineering.

Questions

  1. The paper shows the complexity of future crowd-work. Is the solution limited to area experts? Is there a way to simplify or better define the tasks for a regular crowd? If not, what is the limit of a regular crowd-worker’s potential?
  2. How important is finding such analogies in your research fields? Would you apply this framework in your research projects?
  3. Analogies are meant for easier understanding and inspiration. SOLVENT has a limited application in inspiring novel work. Do you agree? What could be a possible scenario where it inspires novel work?
  4. Compared to MTurk, Upwork workers show better agreement with researchers across the board. What do you think is the reason? When would you use MTurk vs UpWork? Do you think higher pay would proportionally improve the work quality as well?

Word Count: 532

Read More

04/22/2020 – Nan LI – Opportunities for Automating Email Processing: A Need-Finding Study

Summary:

The main objective of this paper is to investigate the potential of user needs regarding email management automation. To achieve this, the author conducted a mixed-methods need-finding study through three probes. First, the author determined the categories of email automation requirements through a workshop and then conducted a more extensive survey to deepen the understanding of the identified needs. The paper listed the primary needs identified in the workshop. Then, they also investigate the existing email automation software to detect what demands have been addressed and list 8 significant functions of email scripts on Github. Finally, they experimented with a programmable email system, YouPS, which allows users to custom email management automation using simple programmatic language. This experiment lasted a week to observe the user’s interaction with the system. Finally, the author discussed the limation and future works regarding the current email clients. 

Reflection:

I think this is an essential topic regarding the critical proportion of mail in daily study and life. Actually, I did not realize that email can play such an essential role in daily life before I came to America. Because in the place I came from, people prefer to use instant contact software, especially for a private chat or group discussion. In this year, I have gradually become accustomed to using mail, and I have developed many habits that I did not realize, but were identified by this article. For example, I would mark the read email as “unread” if that email contains important information. Even though email has the function called “flag,” but I still ignore the email that I “flag.” In contrast, mark as unread is the best way to remind me there is an important thing that I need to deal with ASAP. Therefore, when reading this article, most of my feelings like, yes, this is just what I want; or, it would be wonderful if this demand could be met. 

On the other hand, there are severy identified demands already achieved. For example, we can reference or quote from another email when sending emails, as well as aggregate responses into a poll based on the same sender. Besides, I think the email modes also implemented already (as I have received the automatic reply from faculty in our university when they are on vacation). These features make the email function more robust. 

Regarding the third probe, there is an obvious limitation, and this limitation also mentioned in the paper, which is a lack of existing non-programmer tools for automating an email. However, I think before creating GUIs, I think the more significant thing is to figure out whether people would utilize those email rules if we implement them. For example, email has the function call “flag,” which makes the vital email “stand out.” They even have the choice to use a different color to distinguish the email. Nevertheless, I still prefer to mark as “unread” when I really need to deal with that email soon. Therefore, it is worth thinking about how to implement these email rules to maximize utilization and convenience. 

Question:

  1. What is your particular need for email automation? Which needs identifying in the paper most suit your needs? Do you have any other needs that not mentioned in the paper?
  2. What do you think about the approach that investigates the user’s need for email automation in the first probe? It seems that this method only allows users to brainstorm, and only 13 participants have uneven gender distribution, do you think it will work well?
  3. Actually, there are a lot of function identified in the paper have achieved nowadays, for example, reference or quote from prior emails, and summary a group of responses containing the initial responses. Do you use these features? What do you think of them?

Word count: 630

Read More

04/22/2020 – Nan LI – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

Summary

This paper introduced a hybrid-initiative system called SOLVENT, which aims to find analogies between research papers in different fields through combining the work of human annotations of critical features of research paper and computational model which construct a semantic vector from those annotations. The author conducted three studies to demonstrate the performance and efficiency. In the first study, the author let people with specialized domain knowledge to do the annotation works and proved the system able to find known analogies. In order to prove the effectiveness of using SOLVENT to find analogies with the real-word case, the author demonstrated a real-word scenario, explained the primary process, and let professional without domain knowledge people to do the annotation. The results indicate that the relevant match found by the system was judged to be both useful and novel. The third study proved that they could scale up the system by allowing crowd workers to annotate papers. 

Reflection

I think this paper brought up a novel and needed the idea. One limitation of searching for related work online is that the scope of the searching result is too narrow if we search for the keyword or citation. The results usually only include the related paper, the paper which cited the related paper, or the same paper that published from a different place. However, you can always find some inspiration from paper that not relevant to what you are looking for, or it can be an irrelevant paper. Nevertheless, this situation is usually unattainable. Take our project as an example. Our project was inspired by one of the papers we read before, and we would like to improve the work in that paper further. It should be straightforward to find related work because we have previous work already. Disappointingly, we could not find many useful papers or even latent techniques. The only thing we found appears most frequently is the same paper which inspired us, but published from a different place. Thus, from my perspective, this system is designed for searching inspirations through finding analogies. If the system can achieve this, it would be significant. 

On the other hand, this seems like a costly approach because it requires a large number of workers to do the annotation work of a large scale of paper to guarantee the system’s performance. Besides, based on the results provided in the paper, the system performance can only be described as “useful” instead of “efficiently.” If I urgently need inspiration, I may try such a system, but I would not count on this system.

Question

  1. What do you think of the original idea presented in the paper “Scientific discoveries are often driven by finding analogies in distant domains”? Do you think this theory applies for the majority of people? Do you think finding analogies would inspire your work?
  2. What do you think regarding the system “usefulness” and “efficiency”? 
  3. Can you think about any other way to utilize the SOLVENT system?
  4. The user mentioned in the paper that crowd workers can do the annotation work with no domain knowledge or even with no experience of reading a paper. Do you think this will influence the system performance? What are your criteria for recruit workers?

Word Count: 538

Read More

4/22/2020 – Nurendra Choudhary – The Knowledge Accelerator: Big Picture Thinking in Small Pieces

Summary

In this paper, the authors aim to provide a framework to deconstruct complex systems into smaller tasks that can be easily managed and done by crowd-workers without need of supervision. Currently, crowdsourcing is predominantly used for small tasks within a larger system dependent on expert reviewers/content managers. Knowledge Accelerator provides a framework to build complex systems solely based on small crowdsourcing tasks.

The authors argue that major websites like Wikipedia depend on minor contributors but require an expensive network of dedicated moderators and reviewers to maintain the system. They eliminate these points by a two phase approach: inducing structure and information cohesion. Inducing structure is done through collecting relevant web pages, extracting relevant text and creating a topic structure to encode the clips to the categories. The information cohesion is achieved by crowd-workers gathering information and improving sections of the overall article without global knowledge and adding relevant multimedia images. 

Reflection

The paper introduces a strategy for knowledge collection that completely removes the necessity for any intermediate moderator/reviewer. KA shows the potential of unstructured discussion forums as sources of information. Interestingly, this is exactly the end goal of my team’s course project. The idea of small-scale structure collection from multiple crowd-workers without any of them having context to the global article is generalizable to several areas such as annotating segments of large geographical images, annotating segments of movies/speech and fake-news detection through construction of an event timeline.

The paper introduces itself as a break-down strategy for all complex systems into simpler tasks that can be crowdsourced. However, it settles into the problem of structure and collection. E.g. Information structures and collection are not enough for jobs that involve original creation such as softwares, network architectures, etc. 

The system heavily relies on crowd-sourcing tasks. Some modules have effective AI counterparts. E.g. Inducing topical structure, searching relevant sources of information and multimedia components. I think a comparative study would help me understand the reasons for the decision. 

The fact that Knowledge Accelerator works better than search sites opens up new venues of exploration that collect data by inducing structure in various domains. 

Questions

  1. The paper discusses the framework’s application in Question-Answering. What are the other possible applications in other domains? Do you see an example application in a non-AI domain?
  2. I see that the proposed framework is only applicable to collection of existing information. Is there another possible application? Is there a way we can create new information through logical reasoning processes such as deduction, induction and abduction?
  3. The paper mentions that some crowd-work platforms allow complex tasks but require a whetting period between workers and task providers. Do you think a change in these platforms would help? Also, in traditional jobs, interviews enable similar whetting. Is it a waste of time if quality of work improves?
  4. I found my project similar to the framework in terms of task distribution. Are you using a similar framework in your projects? How are you using the given ideas? Will you be able to integrate this in your project?

Word Count: 524

Read More

04/22/2020 – Ziyao Wang – SOLVENT: A Mixed-Initiative System for Finding Analogies Between Research Papers

The authors introduced a mixed-initiative system named SOLVENT in this paper. In the system, humans annotate aspects of research papers that denote their background, purpose, mechanism, and findings. A computational model is used to construct a semantic representation from these annotations which are valuable for finding analogies among papers. They tested their system through three experiments. The first one used annotation from domain expert researchers, the second one used annotation from experts outside the papers’ domain and the last one used crowdsourcing. From the experiments’ results, the authors found that the system was able to detect analogies among different domains and even crowd-sourcing workers, who have limited domain knowledge, are able to do the annotations. Additionally, the system has a better performance than similar systems.

Reflections

I used to only search papers within the computer science area when I did my projects or tried to solve a problem. Of course, sometimes we can get inspired by the ideas in papers from other domains, but it is quite difficult to find such a paper. There are countless papers in various domains, and there was not an efficient method to find needed papers from other domains. Additionally, even we found a paper that is valuable to the problem we are solving, the lack of background knowledge may make it difficult for us to understand the ideas behind the paper.

This system is a great help to the above situation. It can let people find related papers from other domains even they have limited knowledge about that domain. Even though the amount of papers is increasing sharply, we can still find the papers we need efficiently with this system. Before, we can only search for keywords related to the specific area. With this system, we can try to search for ideas instead of specific words. This is beneficial if some of the papers used abbreviations or analogies in the titles. If we only use keyword searching, we may miss these papers. But with idea searching, these papers will be marked as valuable. Also, the human annotation in the system can help us to understand the ideas of the papers easily, and we can exclude irrelated papers with high efficiency.

One more point is that cross-domains projects or researches are increasing significantly nowadays. For these studies, they needed to read a huge amount of papers in both domains and then they can have a novel idea to solve the cross-domain problem. If these researchers can have the system, they can find similarities in both domains easily and can have headlines about the background, purpose, mechanism, and findings in the papers. The efficiency of these researches can be improved, and the researchers can find more novel interdisciplinary studies with the help of the system.

Questions:

Will the performance of the system decrease when dealing with larger size database and more difficult papers?

Is it possible to update the system results regularly when newly published papers are added into the database?

Can this system be applied in the industries? For example, find the similarity of the mechanisms in the production of two products and use the findings to improve the mechanisms.

Read More