04/22/2020 – Vikram Mohanty – Opportunities for Automating Email Processing: A Need-Finding Study

Authors: Soya Park, Amy X. Zhang, Luke S. Murray, David R. Karger

Summary

This paper addresses the problem of automating email processing. Through an elaborate need-finding exercise with different participants, the paper synthesizes the different aspect of email management that users would want to automate, and the type of information and computation needed to achieve that. The paper also conducts a survey of existing email automation software to understand what has already been achieved. The findings show the need for a richer data model for rules, more ways to manage attention, leveraging internal and external email context, complex processing such as response aggregation, and affordances for senders.

Reflection

This paper demonstrates why need-finding exercises are useful, particularly, when the scope of automation is endless and one needs to figure out what deserves attention. This approach also helps developers/companies avoid proposing one-size-fits-all solutions and when it comes to automation, avoid end-to-end automated solutions that often fall short (in case of emails, it’s certainly debatable what qualifies as end-to-end solution). Despite the limitations mentioned in the paper, I feel the paper took steps in the right direction by gathering multiple opinions to help scope down the email processing-related automation problem into meaningful categories. Probing for developers who have shared code on Github certainly provided great value to the search to understand how experts think about the problem.

One of the findings was that users re-purposed existing affordances in the email clients to fit their personal needs. Does that mean the original design did not factor in user needs? Or the fact that the email clients need to evolve as per these new user needs?

NLP can support building richer data models for emails by learning the latent structures over time. I am sure, there’s enough data out there for training models. Of course, there will be biases and inaccuracies, but that’s where design can help mitigate the consequences.

Most of the needs were filters/rules-based, and therefore, it made sense to deploy YouPS and see how participants used them. Going forward, it will be really interesting to see how non-computer scientists use a GUI+Natural Language-based version of YouPS to fit their needs. The findings, there, will make it clear about which automation aspects should be prioritized for developing first.

As an end-user of email client, some, if not most, of my actions are at a sub-conscious level. For e.g. there are certain types of emails I do not think for even one second before marking them as read. I wonder, if a need-finding exercise, as described in this paper, would be able to capture those thoughts . Or, in addition to all the categories proposed in this paper, there should also be one where an AI attempts to make sense of your actions, and shows you a summary of what it thinks. The user, can then, reflect and figure out, if the AI’s “sensemaking” holds up or needs tweaking, and eventually, be automated. This is a mixed-initiative solution, which can effectively, over a period of time adapt to the user’s needs. This certainly depends on the AI being good enough to interpret the patterns in the user’s actions.

Questions

  1. Keeping the scope/deadlines of the semester class project aside, would you consider a need-finding exercise for your class project? How would you do it? Who would be the participants?
  2. Did you find the different categories for automated email processing exhaustive? Or would you have added something else?
  3. Do you employ any special rules/patterns in handling your email?

Read More

4/22/20 – Lee Lisle – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

Summary

Chan et al.’s paper discusses a way to find similarities in research papers through the use of mixed initiative analysis. They use a combination of humans to identify sections of abstracts and machine learning algorithms to identify key words in those sections in order to distill the research down into a base analogy. They then compare across abstracts to find papers with the same or similar characteristics. This enables researchers to find similar research as well as potentially apply new methods to different problems. They evaluated these techniques through three studies. The first study used grad students reading and annotating abstracts from their own domain as a “best-case” scenario. Their tool worked very well with the annotated data as compared to using all words. The second study looked at helping find analogies to fix similar problems, using out-of-domain experts to annotate abstracts. Their tool found more possible new directions than the all words baseline tool. Lastly, the third study sought to scale up using crowdsourcing. While the annotations were of a lesser quality with mTurkers, they still outperformed the all-words baseline.

Personal Reflection

               I liked this tool quite a bit, as it seems a good way to “unstuck” oneself in the research black hole and find new ways of solving problems. I also enjoyed that the annotations didn’t necessarily require domain-specific or even researcher-specific knowledge even with the various jargon that is used. Furthermore, though it confused me initially, I liked how they used their own abstract as an extra figure of sorts – using their own approach to annotating their abstract was a good idea. It cleverly showed and explained how their approach works quickly without reading the entire paper.

               I did find a few things confusing about their paper, however. They state that the GloVe model doesn’t work very well in one section, but then use it in another. Why go back to using it if it had already disappointed the researchers in one phase? Another complication I noticed was that they didn’t define the dataset in the third study. Where did the papers come from? I can glean from reading it that it was from one of the prior two studies, but I think its relevant to ask if it was the domain-specific or the domain-agnostic datasets (or both).

               I was curious about total deployment time for this kind of thing. Did they get all of the papers analyzed by the crowd in 10 minutes? 60 minutes? A day? With how parallel the task can be performed, I can imagine it could be very quick to get the analysis performed. While this task doesn’t need to be quickly performed, it could be an excellent bonus of the approach.

Questions

  1. This tool seems extremely useful. When would you use it? What would you hope to find using this tool?
  2.  Is the annotation of 10,000 research papers worth $4000? Why or why not?
  3. Based on their future work, what do you think is the best direction to go with this approach? Considering the cost of the crowdworkers, would you pay for a tool like this, and how much would be reasonable?

Read More

04/22/2020 – Vikram Mohanty – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

Authors: Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Shahaf, Aniket Kittur.

Summary

This paper addresses the problem of finding analogies between research problems across similar/different domains by providing computational support. The paper proposes SOLVENT, a mixed-initiative system where humans annotate aspects of research papers that denote their background (the high-level problems being addressed), purpose (the specific problems being addressed), mechanism (how they achieved their purpose), and findings (what they learned/achieved), and a computational model constructs a semantic representation from these annotations that can be used to find analogies among the research papers. The authors evaluated this system against baseline information retrieval approaches and also with potential target users i.e. researchers. The findings showed that SOLVENT performed significantly better than baseline approaches, and the analogies were useful for the users. The paper also discusses implications for scaling up.

Reflection

This paper demonstrates how human-interpretable feature engineering can improve existing information retrieval approaches. SOLVENT addresses an important problem faced by researchers i.e. drawing analogies to other research papers. Drawing from my own personal experiences, this problem has presented itself at multiple stages, be it while conceptualizing a new problem, or figuring out how to implement the solution, or trying to validate a new idea, or eventually, while writing the Related Work section of a paper. This goes without saying, that SOLVENT, if commercialized, would be a boon for the thousands of researchers out there. It was nice to see the evaluation including real graduate students as their validation seemed the most applicable for such a system.

SOLVENT demonstrates the principles of mixed-initiative interfaces effectively by leveraging the complementary strengths of humans and AI. Humans are better at understanding context, and in this case, it’s that of a research paper. AI can help in quickly scanning through a database to find other articles with similar “context”. I really like the simple idea behind SOLVENT i.e how would we, humans, find analogical ideas? We will look for similar purpose and/or similar/different mechanisms. So, how about we do just that? It’s a great case of how human-interpretable intuitions translate into intelligent system design, and also scores over end-to-end automation. Something I reflected in previous papers — it always helps to look for answers by beginning from the problem and understanding it better. And that’s reflected in what SOLVENT ultimately achieves i.e. scoring over an end-to-end automation approach.

The findings are definitely interesting, particularly the drive for scaling up. Turkers certainly provided an improvement over the baseline, even though their annotations fared worse than the experts and the Upwork crowd. I am not sure what the longer term implications here are, though. Should Turkers be used to annotate larger datasets? Or should the researchers figure out a way to improve Turker annotations? Or train the annotators? These are all interesting questions. One long term implication here is to re-format the abstract into a background + purpose + mechanism + findings structure right at the initial stage. This still does not solve the thousands of prior papers. Overall, this paper certainly opens doors for future analogy mining approaches.

Questions

  1. Should conferences and journals re-format the abstract template into a background + purpose + mechanism + findings to support richer interaction between domains and eventually, accelerate scientific progress?
  2. How would you address annotating larger datasets?
  3. How did you find the feature engineering approach used in the paper? Was it intuitive? How would you have done it differently?

Read More

04/22/20 – Jooyoung Whang – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

This paper proposes a novel mixed-initiative method called SOLVENT that has the crowd annotate relevant parts of a document based on purpose and mechanism and representing the documents on a vector space. The authors identify that representing technical documents using the purpose-mechanism concept with crowd workers has obstacles such as technical jargon, multiple sub-problems in one document, and the presence of understanding-oriented papers. Therefore, the authors modify the structure to hold background, purpose, mechanism, and findings instead. With each document represented by this structure, the authors were able to apply natural language processing techniques to perform analogical queries. The authors found better query results than baseline all-words representations. To scale the software, the authors made workers of Upwork and Mturk annotate technical documents. The authors found that the workers struggled with the concept of purpose and mechanism, but still provided improvements for analogy-mining.

I think this study will go nicely together with document summarization studies. It would especially help since the annotations are done by specific categories. I remember one of our class’s project involved ETDs and required summaries. I think this study could have benefited that project given enough time.

This study could also have benefited my study. One of the sample use-cases that the paper introduced was improving creative collaboration between users. This is similar to my project which is about providing creative references for a creative writer. However, if I want to apply this study to my project, I would need to additionally label each of the references provided by the Mturk workers by purpose and mechanism. This will cost me additional funds for providing one creative reference. This study would have been very useful if I had enough money and wanted more quality content rankings in terms of analogy.

It was interesting that the authors mentioned different domain papers could still have the same purpose-mechanism. It made me wonder if researchers would really want similar purpose-mechanism papers on a different domain. I understand multi-disciplinary work is being highlighted these days but would each of the disciplines involved in a study try to address the same purpose and mechanism? Wouldn’t they address different components of the project?

The followings are the questions that I had while reading the paper.

1. The paper notes that many technical documents are understanding-oriented papers that have no purpose-mechanism mappings. The authors resolved this problem by defining a larger mapping that is able to include these documents. Do you think the query results would have had higher quality if the mapping was kept compact instead of increasing the size? For example, would it have helped if the system separated purpose-mechanism and purpose-findings?

2. As mentioned in my reflection, do you think the disciplines involved in a multi-disciplinary project all have the same purpose and mechanism? If not, why?

3. Would you use this paper for your project? To put in other words, does your project require users or the system to locate analogy inside a text document? How would you use the system? What kind of queries would you need out of the combinations possible (background, purpose, mechanism, findings)?

Read More

04/22/20 – Jooyoung Whang – Opportunities for Automating Email Processing: A Need-Finding Study

In this paper, the authors explore the kinds of automated functionalities or needs for E-mail interfaces users would want. The authors held workshops with technical and non-technical people to learn about these needs. The authors found the need for functionalities such as additional or richer E-mail data models involving latent information, internal or external context, using mark-as-read to control notifications, self-destructing event E-mails, different representation of E-mail threads, and content processing. Afterward, the authors mined Github repositories that actually held implementation of E-mail automation and labeled them. The authors found prevalent implementations were on automizing repetitive processing tasks. Outside the needs identified from their first probe, the authors also found needs such as using the E-mail inbox as middleware and analyzing E-mail statistics. The authors did a final study by providing users with their own programmable E-mail inbox interface called YouPS.

I really enjoyed reading the section about probes 2 and 3 where actual implementations were done using IMAP libraries. I especially like the one about notifying the respondent using flashing visuals on a Raspberry PI. It looks like a very creative and fun project. I also noticed that many of the automation were in processing repetitive tasks. This again confirms the machine affordance about being able to process many repetitive tasks.

I personally thought YouPS to be a very useful tool. I also frequently have trouble organizing my tens of thousands of unread E-mails comprising of main advertisements. I think YouPS could serve me nicely in fixing this. I found that YouPS is public and accessible online (https://youps.csail.mit.edu/editor). I will definitely return to this interface once time permits and start dealing with my monstrosity of an inbox. YouPS addresses nicely the complexity of developing a custom inbox management system. I am not familiar with the concept of IMAPs, which hinders me from implementing E-mail related functionalities in my personal projects. A library like YouPS that simplifies the protocol would be very valuable to me.

The followings are the questions that I had while reading this paper.

1. What kind of E-mail automation would you want to make given the ability to make any automation functionality?

2. The authors mentioned in their limitations that their study’s participants were mostly technical programmers. What difference would there be between programmers and non-programmers? If the study was able to be done with only non-programmers do you think the authors would have seen a different result? Is there something specifically relevant to programmers that resulted in the existing implementations of E-mail automation? For example, maybe programmers usually deal with more technical E-mails?

2. What interface is desirable for non-programmers to meet their needs? The paper mentions that one participant did not like that current interfaces required many clicks and typing to create an automation rule and they didn’t even work properly. What would be a good way for non-programmers to develop an automation rule? The creation of a rule requires a lot of logical thinking comprising of many if-statements. What would be a minimum requirement or qualification for non-programmers to create an automation rule?

Read More

04/22/2020 – Sushmethaa Muhundan – SOLVENT: A Mixed-Initiative System for Finding Analogies between Research Papers

This work aims to explore a mixed-initiative approach to help find analogies between research papers. The study uses research experts as well as crowd-workers to annotate research papers by marking the purpose, mechanism, and findings of the paper. Using these annotations, a semantic vector representation is constructed that can be used to compare different research papers and identify analogies within domains as well as across domains. The paper aims to leverage the inherent causal relationship between purpose and mechanism to build “soft” relational schemas that can be compared to determine analogies. Three studies were conducted as part of this paper. The first study was to test the system’s quality and feasibility by asking domain expert researchers to annotate 50 research papers. The second study was to explore whether the system would be beneficial to actual researchers looking for analogical inspiration. The third study involved using crowd workers as annotators to explore scalability. The results from the studies showed that annotating the purpose and mechanism aspects of research papers is scalable in terms of cost, not critically dependent on annotator expertise, and is generalizable across domains.

I feel that the problem this system is trying to solve is real. While working on research papers, there is often a need to find analogies for inspiration and/or competitive analysis. I have also faced difficulty finding relevant research papers while working on my thesis. If scaled and deployed properly, SOLVENT would definitely be helpful to researchers and could potentially save a lot of time that would otherwise be spent on searching for related papers.

The paper claims that the system quality is not critically dependent on annotator expertise and the system can be scaled using crowd workers as annotators. However, the results showed that the annotations of Upwork workers matched expert annotations 78% of the time and those of Mturk workers matched 59% of the time. The results also showed that the results varied considerably: a few papers had 96% agreement while a few had only 4%. I am a little skeptical regarding these numbers and I am not convinced that expert annotations are dispensable. I feel that using crowd workers might help the system scale but it might have a negative impact on quality.

I found one possible future extension extremely interesting: the possibility of authors themselves annotation their work. I feel that if each author spends a little extra effort to annotate their own work, a large corpus could easily be created with high-quality annotations. SOLVENT could easily produce great results using this corpus.

  • What are your thoughts about the system proposed? Would you want to use this system to aid your research work?
  • The study indicated that the system needs to be vetted with large datasets and the usefulness of the system is yet to be truly tested in real-world settings. Given these limitations, do you think the usage of this system is feasible? Why or why not?
  • One potential extension mentioned in the paper is to combine the content-based approach with graph-based approaches like citation graphs. What are other possible extensions that would enhance the current system?

Read More

04/22/2020 – Sushmethaa Muhundan – Opportunities for Automating Email Processing: A Need-Finding Study

This work aims to reduce the efforts of senders and receivers in the email management space by designing a useful, general-purpose automation system. This work is a need-finding study that aims to explore the potential scope for automation along with studying the information and computation required to support this automation. The study also explores existing email automation systems in an attempt to determine which needs have been addressed already. The study employes open-ended surveys to gather needs and categorize them. A need for a richer data model for rules, more ways to manage attention, leveraging internal and external email context, complex processing such as response aggregation, and affordances for senders emerged as common themes from the study. The study also developed a platform, YouPS, that enabled programmers to develop automation scripts using Python but abstracted the complexity of IMAP API integration. Participants were asked to program using the YouPS platform to write scripts that would automate tasks to make email management easier. The results showed that the usage of the platform was able to solve problems that were not straight-forward to solve in the existing email clients’ ecosystem. The study concluded by listing limitations and also highlighted prospective future work.

I found it really interesting that this study provided the platform, YouPS, to understand what automation scripts would have been developed if it was easy to integrate with the existing APIs. After scraping public Github repositories for potential automation solutions, the study found that there were limited solutions that were generally-accessible. I feel that providing a GUI that would enable programmers as well as non-programmers to furnish rules to structure their inbox, as well as schedule outgoing emails using context, would definitely be useful. This GUI would be an extension to YouPS that abstracts the API integration layer away so that the end-users can focus on fulfilling their needs to enhance productivity.

While it is intuitive that receivers of emails would want automation to help them organize the incoming emails, it was interesting that the senders also wanted to leverage context and reduce the load on recipients by scheduling their emails to be sent when the receiver is not busy. The study mentioned leveraging internal and external context to process the emails and I feel that this would definitely be helpful. Filtering emails based on past interactions and the creation of “modes” to handle incoming emails would be practical. Another need that I was able to relate to was the aggregation example the study talks about. When an invite is sent to a group of people, individual emails for each response is often unnecessary. Aggregating the responses and presenting a single email with all the details would definitely be convenient.

  • The study covered areas where automation would help in the email management space. Which need did you identify with the most? 
  • Apart from the needs identified in the study, what are some other scenarios that you would personally prefer to be automated?
  • The study indicated that participants preferred to develop scripts using YouPS to help organize their emails as opposed to using the rule-authoring interfaces in their mail clients. Would you agree? Why or why not?

Read More

04/22/2020 – Bipasha Banerjee – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

Summary 

The paper by Chan et al. is an interesting read on finding analogies from research papers. The main domain considered here is scientific papers. The annotation scheme has been divided into four categories, namely, background, purpose, mechanism, and findings. This paper’s goal is to make it easier for researchers to find related work in their field. The author conducted three studies to test their approach and its feasibility. The first study consisted of domain-experts annotating a particular abstract in their research area. The second study, on the other hand, focussed mainly on how the model could tackle the real-world problem where a researcher needs to find relevant papers in their area to act as inspiration, related-work, or even baselines for their experiments. The last study, however, was very different from the first two, where an experienced researcher annotated the data or used the system for solving their research problem. The third study, on the other hand, used crowdworkers to annotate abstracts. The platforms utilized by the authors were Upwork and Amazon Mechanical Turk.

Reflection

The mixed-initiative model developed by the authors is an excellent step in the right direction to find analogies in scientific papers. There are traditional approaches in natural language processing that help find similarities in textual data. The website [1] gives an excellent insight into the steps involved in finding similarities in texts. However, when it comes to scientific data, just using these steps is not enough. Additionally, most of the models involved are trained using generic websites and news data (like CNN or DailyMail). Hence, most of the scientific jargon is “out of vocabulary” (OOV) for such models. Hence, I appreciate that the authors used human annotations along with traditional methods in information retrieval (like TF-IDF, etc.) to tackle the problem at hand.

Additionally, for finding the similarity metric, they took multiple categories into account, like Purpose+Mechanism. This is definitely useful when finding similarities in the text data. I also liked the fact that for the studies, they considered normal crowdworkers in addition to people with domain knowledge. I was intrigued to find that 75% of the time, the annotations of crowdworkers matched with the researchers. Hence the conclusion that “crowd annotations still improve analogy-mining” is valuable. Not only that, getting researchers in large amounts in one domain just to annotate the data is difficult, sometimes there are very few people in one domain of research. Rather than having to find researchers who are available to annotate such data, it is good that we can use existing methods available.

Lastly, I would like to mention that I liked that the paper identified the limitations very well, and the scope for future work has also been clearly mentioned. 

Questions

  1. Would you agree that level of expertise of the human annotators would not affect the results for your course project? If yes, could you please clarify?

(For my class project, I think I would agree with the paper’s findings. I work on reference string parsing, and I don’t think we need experts just to label the tokens.)

  1. Could we have more complex categories or sub-categories rather than just the four identified?
  2. How would this extend to longer pieces of texts like chapters of book-length documents? 

References 

  1. https://medium.com/@Intellica.AI/comparison-of-different-word-embeddings-on-text-similarity-a-use-case-in-nlp-e83e08469c1c 

Read More

04/22/20 – Lulwah AlKulaib-Acclerator

Summary

Most of the crowdsourcing tasks in the real world are submitted to platforms as one big task due to the difficulty in decomposing tasks into small, independent units. The authors argue that by decomposing and distributing tasks we could utilize more of the resources provided by crowdsourcing platforms at a lower cost than the existing traditional method. They propose a computational system that can frame interdependent, small tasks to represent one big picture system. This proposal is difficult, so to investigate its viability, the authors prototype the system to test the distributed information combination after all tasks are done and to evaluate the output across multiple topics. The system compared well to existing top information sources on the web and it exceeded or approached quality ratings for highly curated reputable sources. The authors also suggested some design patterns that should help other researchers/systems when thinking of breaking big picture projects into smaller pieces. 

Reflection

This was an interesting paper. I haven’t thought about breaking down a project into smaller pieces to save on costs or that it would get better quality results by doing so. I agree that some of the existing tasks are too big, complex, and time consuming and maybe those need to be broken down to smaller tasks. I still can’t imagine how breaking tasks so small that they can’t cost more than $1 generalizes well to all existing projects that we have on Amazon MTurk.  

The authors mention that their system, even though it has a strong performance, was generated by non-expert workers that did not see the big picture, and that it should not be thought of as a replacement to expert creation and curation of content. I agree with that. No matter how good the crowd is, if they’re non-experts and they don’t have full access to the full picture, there would be some information missing which could lead to mistakes and imperfection. That shouldn’t be compared to a domain knowledge expert who would do a better job even if it costs more. Cost should not be a reason we favor the results of this system.

The design patterns suggested were a useful touch and the way they were explained help in understanding the proposed system as well. I think that we should adapt some of these design patterns as best as we could in our projects. Learning about this paper late enough in our experiment design would make it hard to implement breaking our tasks down to simpler tasks and test that theory on different topics. I would have loved to see how we each reported since we have an array of different experiments and simplifying some tasks could be impossible. 

Discussion

  • What are your thoughts on breaking down tasks to such a small size?
  • Do you think that this could be applicable to all fields and generate similarly good quality? If not, where do you think this might not perform well?
  • Do you think that this system could replace domain experts? Why? Why not?
  • What applications is this system best suited for?

Read More

04/22/20 – Lulwah AlKulaib-Solvent

Summary

The paper argues that scientific discoveries are based on analogies in distant domains. Nowadays, it is difficult for researchers to keep up in finding analogies due to the rapidly growing number of papers in each discipline and the difficulty of finding useful analogies from unfamiliar domains. The authors propose a system to solve this issue. Solvent, a mixed initiative system for finding analogies between research papers. They hire human annotators that structure academic papers abstracts into different aspects and then a model constructs the semantic representations from the provided annotations. The resulting semantic annotations are then used in finding analogies within research papers in that domain and across different domains. In their studies, they show that the proposed system finds more analogies than existing baseline approaches in the information retrieval field. They outperform state of the art and prove that annotations can generalize beyond the domain and that the analogies that the semantic model found are found to be useful by experts. Their system is a step in a new direction towards computationally augmented  knowledge sharing between different fields. 

Reflection

This was a very interesting paper to read. The authors use of scientific ontologies and scholarly discourse like those in Core Information about Scientific Papers (CISP) ontology makes me think of how relevant their work is, even when their goal differs from the corpora paper. I found the section where they explain adapting the annotation methods for research papers very useful for a different project. 

One thing that I had in mind while reading the paper was how scalable is this to larger datasets. As they have shown us in the studies, the datasets are relatively small. The authors explain in the limitations that part of the bottleneck is having a good set of gold standard matches that they can use to evaluate their approach. I think that’s a valid reason, but still doesn’t eliminate the question of what would it require? and how well would it work?

When going over their results and seeing how they outperformed existing state of the art models/approaches, I also thought about real world applications and how useful this model is. I never thought of using analogies to perform discovery in different scientific domains. I always thought it would be more reliable to have a co-author from that domain that would weigh in. Especially nowadays with the vast communities of academics and researchers on social media it’s no longer that hard to find someone that could be a collaborator on a domain that isn’t yours. Also when looking at their results, their high precision was only in results recommending the top k% of most similar pairs analogies. I wonder if automating that has a greater impact than using the knowledge of a domain expert.

Discussion

  • Would you use a similar model in your research?
  • What would be an application that you can think of where this would help you while working on a new domain?
  • Do you think that this system could be outperformed by using a domain expert instead of the automated process?

Read More