04/29/2020 – Sushmethaa Muhundan – Accelerating Innovation Through Analogy Mining

April 27, 2020 Sushmethaa Muhundan 1 Comment

This paper aims to explore a mixed-initiative approach to find analogies in the context of product ideas to help people innovate. This system focuses on a corpus of product innovation from Quirky.com and aims to explore the feasibility of learning simple, structural representations of the problem schemas to find out analogies. The purpose and mechanism of each product idea are extracted and a vector representation is constructed. This is used to find analogies in order to inspire people to generate creative ideas. Three experiments were conducted as part of this study. The first experiment involved AMT crowd workers annotating the product ideas to segregate the purpose and mechanism. These annotations were used to construct the vectors in order to compare and compute the analogies. As part of the second experiment, 8000 Quirky products were chosen and crowd workers were asked to use the search interface to find out analogies for 200 seed documents. Finally, a within-study experiment was conducted wherein the participants were asked to redesign an existing product. For the given product idea, the participants received 12 product inspirations retrieved using the system developed, 12 using TF-IDF, and 12 were retrieved randomly. The system developed aimed to retrieve near-purpose, far-mechanism analogies to help users come up with innovative ideas. The results showed that the ideas generated by using the system’s results were more creative than the other two conditions.

Analogies often prove to be a source of inspiration and/or competitive analysis. In the case of inspiration, cross-domain analogies are specifically extremely helpful and can be difficult to find. Also, given the rate at which information generation is growing, it has become all the more difficult to explore and find analogies. Such systems would definitely help save time by predicting potential analogies from a large dataset. I feel that the system would definitely help come up with creative, alternate solutions when compared to traditional information retrieval systems.

With respect to the ideation evaluation experiment, the results showed that the randomly generated analogies actually were more successful in helping users come up with creative ideas when compared to the TF-IDF condition. This shows that the traditional information retrieval technologies would not work well as is in the setting of analogy predictions and needs to be tweaked in order to help serve the purpose of inspiring users. I feel that there is potential to expand the core concepts of the proposed system and use it in different applications. For instance, finding similar content could be augmented into a recommendation system where the system could recommend content similar to the user’s browsing history.

What are your thoughts about the system proposed? Do you think this system is scalable?
The study uses only purpose and mechanism to compare and predict analogies. Do you think these parameters are sufficient? Are there any other features that can be extracted to improve the system?
The paper mentions the need for extensions to generalize the solution to apply the system to other domains. Apart from the suggestions in the paper, what are some potential extensions?

04/29/2020 – Sushmethaa Muhundan – VisiBlends: A FlexibleWorkflow for Visual Blends

April 27, 2020 Sushmethaa Muhundan 1 Comment

This work presents VisiBlends, a system for creating visual blends by providing a flexible workflow that follows an iterative design process. The paper explores the feasibility of decomposing the creative task of designing a visual blend into microtasks that can be completed collaboratively. The paper also aims to explore if this system is capable of enabling novices to create visual blends easily. The workflow proposed decomposes the task process of creating a visual blend into computation techniques and human tasks. The main tasks involved in the workflow are brainstorming, finding images for each concept, annotating images for shape and coverage, synthesis, evaluation, and iteration. Three studies were conducted as part of this paper. The first was to test the feasibility of decentralized collaboration and involved 7 individuals taking part in only one of the above-mentioned tasks. The results demonstrated that although they were working only on parts of the workflow, together they were able to successfully create visual blends. The second study was to test group collaborations and involved groups of 2-3 people working together to create visual blends. The third study involved testing the ability of VisiBlends in helping novice users create visual blends. Overall, the results showed that due to the flexible, iterative nature of the system, the complex task of creating visual blends was successfully decomposed into microtasks.

I feel that human cognition and AI capabilities have been leveraged innovatively to decompose a complex, creative problem into distributable microtasks. VisiBlends breaks down the task into microtasks that involve both human cognition as well as artificial intelligence. This paper addresses the fundamental elements involved in the creative task of visual blend designing and introduces the concept of the Single Shape Mapping design pattern.

I feel that VisiBlends is a great tool that allows novice users to create exciting visual content to promote their idea/product. It provides a platform that abstracts the complexities of creating a visual blend away. The user is tasked with finding the relevant images and annotating them for shapes and coverage. The rest of the magic happens on its own wherein VisiBlends runs a matching algorithm and synthesizes various potential blends from the user’s images. The iterative design flow allows users to modify the suggestions as required. I really liked the resue functionality of VisiBlends as well. This would definitely help with the brainstorming phase and would save a lot of time and effort.

It was interesting to note that each participant was trained on the entire workflow although they were made to contribute only to one step. This helped them understand the big picture and they were able to function more efficiently. Can this finding be applied to other systems to improve their performance?
Can similar techniques of decomposition be applied to other creative domains?
One challenge mentioned in the paper was regarding the difficulty faced during the search for relevant images where users were not able to find appropriate images. What are some techniques that could be employed to overcome this?

04/22/2020 – Sushmethaa Muhundan – SOLVENT: A Mixed-Initiative System for Finding Analogies between Research Papers

April 21, 2020 Sushmethaa Muhundan 1 Comment

This work aims to explore a mixed-initiative approach to help find analogies between research papers. The study uses research experts as well as crowd-workers to annotate research papers by marking the purpose, mechanism, and findings of the paper. Using these annotations, a semantic vector representation is constructed that can be used to compare different research papers and identify analogies within domains as well as across domains. The paper aims to leverage the inherent causal relationship between purpose and mechanism to build “soft” relational schemas that can be compared to determine analogies. Three studies were conducted as part of this paper. The first study was to test the system’s quality and feasibility by asking domain expert researchers to annotate 50 research papers. The second study was to explore whether the system would be beneficial to actual researchers looking for analogical inspiration. The third study involved using crowd workers as annotators to explore scalability. The results from the studies showed that annotating the purpose and mechanism aspects of research papers is scalable in terms of cost, not critically dependent on annotator expertise, and is generalizable across domains.

I feel that the problem this system is trying to solve is real. While working on research papers, there is often a need to find analogies for inspiration and/or competitive analysis. I have also faced difficulty finding relevant research papers while working on my thesis. If scaled and deployed properly, SOLVENT would definitely be helpful to researchers and could potentially save a lot of time that would otherwise be spent on searching for related papers.

The paper claims that the system quality is not critically dependent on annotator expertise and the system can be scaled using crowd workers as annotators. However, the results showed that the annotations of Upwork workers matched expert annotations 78% of the time and those of Mturk workers matched 59% of the time. The results also showed that the results varied considerably: a few papers had 96% agreement while a few had only 4%. I am a little skeptical regarding these numbers and I am not convinced that expert annotations are dispensable. I feel that using crowd workers might help the system scale but it might have a negative impact on quality.

I found one possible future extension extremely interesting: the possibility of authors themselves annotation their work. I feel that if each author spends a little extra effort to annotate their own work, a large corpus could easily be created with high-quality annotations. SOLVENT could easily produce great results using this corpus.

What are your thoughts about the system proposed? Would you want to use this system to aid your research work?
The study indicated that the system needs to be vetted with large datasets and the usefulness of the system is yet to be truly tested in real-world settings. Given these limitations, do you think the usage of this system is feasible? Why or why not?
One potential extension mentioned in the paper is to combine the content-based approach with graph-based approaches like citation graphs. What are other possible extensions that would enhance the current system?

04/22/2020 – Sushmethaa Muhundan – Opportunities for Automating Email Processing: A Need-Finding Study

April 21, 2020April 21, 2020 Sushmethaa Muhundan 1 Comment

This work aims to reduce the efforts of senders and receivers in the email management space by designing a useful, general-purpose automation system. This work is a need-finding study that aims to explore the potential scope for automation along with studying the information and computation required to support this automation. The study also explores existing email automation systems in an attempt to determine which needs have been addressed already. The study employes open-ended surveys to gather needs and categorize them. A need for a richer data model for rules, more ways to manage attention, leveraging internal and external email context, complex processing such as response aggregation, and affordances for senders emerged as common themes from the study. The study also developed a platform, YouPS, that enabled programmers to develop automation scripts using Python but abstracted the complexity of IMAP API integration. Participants were asked to program using the YouPS platform to write scripts that would automate tasks to make email management easier. The results showed that the usage of the platform was able to solve problems that were not straight-forward to solve in the existing email clients’ ecosystem. The study concluded by listing limitations and also highlighted prospective future work.

I found it really interesting that this study provided the platform, YouPS, to understand what automation scripts would have been developed if it was easy to integrate with the existing APIs. After scraping public Github repositories for potential automation solutions, the study found that there were limited solutions that were generally-accessible. I feel that providing a GUI that would enable programmers as well as non-programmers to furnish rules to structure their inbox, as well as schedule outgoing emails using context, would definitely be useful. This GUI would be an extension to YouPS that abstracts the API integration layer away so that the end-users can focus on fulfilling their needs to enhance productivity.

While it is intuitive that receivers of emails would want automation to help them organize the incoming emails, it was interesting that the senders also wanted to leverage context and reduce the load on recipients by scheduling their emails to be sent when the receiver is not busy. The study mentioned leveraging internal and external context to process the emails and I feel that this would definitely be helpful. Filtering emails based on past interactions and the creation of “modes” to handle incoming emails would be practical. Another need that I was able to relate to was the aggregation example the study talks about. When an invite is sent to a group of people, individual emails for each response is often unnecessary. Aggregating the responses and presenting a single email with all the details would definitely be convenient.

The study covered areas where automation would help in the email management space. Which need did you identify with the most?
Apart from the needs identified in the study, what are some other scenarios that you would personally prefer to be automated?
The study indicated that participants preferred to develop scripts using YouPS to help organize their emails as opposed to using the rule-authoring interfaces in their mail clients. Would you agree? Why or why not?

04/15/2020 – Sushmethaa Muhundan – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

April 15, 2020 Sushmethaa Muhundan 1 Comment

This work aims to design and evaluate a mixed-initiative approach to fact-checking by blending human knowledge and experience with the efficiency and scalability of automated information retrieval and ML. The paper positions automatic fact-checking systems as an assistive technology to augment human decision making. The proposed system fulfills three key properties namely model transparency, support for integrating user knowledge, and quantification and communication of model uncertainty. Three experiments were conducted using MTurk workers to measure the participants’ performance in predicting the veracity of given claims using the system developed. The first experiment compared users who perform the task with and without seeing ML predictions. The second compared a static interface with an interactive interface where the users were provided options to mend or override the predictions of the AI system. Results showed that the users were generally able to use the interface but this was of little use when the predictions were accurate. The last experiment compared a gamified task design with a non-gamified one, but no significant differences in performance were found. The paper also discusses the limitations of the proposed system and explores further research opportunities.

I liked the fact that the focus of the paper was more on designing automated systems that were user-friendly rather than focussing on improving prediction accuracy. The paper takes into consideration the human element of the human-AI interaction and focuses on making the system better and more meaningful. The proposed system aims to learn from the user and provide a personalized prediction based on the opinions and inputs from the user.

I liked the focus on transparency and communication. Transparent models help users better understanding the internal workings of the system and hence helps build trust. Regarding communication, I feel that conveying the confidence of the prediction helps the users make an informed decision. This is much better than a system that might have high precision but does not communicate the confidence scores. In cases where this system makes an error, the consequences are likely to be precarious since the user might blindly follow the prediction of the system.

The side-effect of making the system transparent was interesting. Not only would transparency lead to higher trust levels, but it would also help teach and structure the user’s own information literacy skills regarding the logical process to follow for assessing claim validity. Thereby, the system proposed truly leveraged the complementary strengths of the human and the AI.

Apart from the three properties incorporated in the study (transparency, support for integrating user knowledge, and communication of model uncertainty), what are some other properties that can be incorporated to better the AI systems?
The study aims to leverage the complementary strengths of humans and AI but certain results were inconclusive as noted in the paper. Besides the limitations enumerated in the paper, what are other potential drawbacks of the proposed system?
Given that the study presented is in the context of automated fact-checking systems, what other AI systems can these principles be applied to?

04/15/2020 – Sushmethaa Muhundan – What’s at Stake: Characterizing Risk Perceptions of Emerging Technologies

April 15, 2020 Sushmethaa Muhundan 2 Comments

This work aims to explore the impact of perceived risk on choosing to use technology. A survey was conducted to assess the mental models of users and technologists regarding the risks of using emerging, data-driven technologies. Guidelines to develop a risk-sensitive design was then explored in order to address the perceived risk and mitigate it. This model aimed to identify when misaligned risk perceptions may warrant reconsideration. Fifteen risks relating to technology were devised and a total of 175 participants were recruited to process the perceived risks relating to each of the above categories. The participants comprised of 26 experts and 149 non-experts. Results showed that the technologists were more skeptical regarding using data-driven technologies as opposed to non-experts. Therefore, the authors urge designers to strive harder to make the end-users aware of the potential risks involved in the systems. The study recommends that design decisions regarding risk mitigation features for a particular technology should be sensitive to the difference between the public’s perceived risk and the acceptable marginal perceived risk at that risk level.

Throughout the paper, there is a focus on creating design guidelines that reduce risk exposure and increase public awareness relating to potential risks and I feel like this is the need of the hour. The paper focuses on identifying remedies to set appropriate expectations in order to help the public make informed decisions. This effort is good since it is striving to bridge the gap and keep the users informed about the reality of the situation.

It is concerning to note that the results found technologists to be more skeptical regarding using data-driven technologies as opposed to non-experts. This is perturbing since this shows that the risks relating to the latest technologies are perceived as stronger by the group who is involved in the creation of those risks than users of the technology.

Although the participant’s count of experts and non-experts was skewed, it was interesting that when the results were aggregated, the top three highest perceived risks were the same. The only difference was the order of ranking.

It was interesting to note that majority of both groups rated nearly all risks related to emerging technologies as characteristically involuntary. This strongly suggests that the consent procedures in place are not effective. Either the information is not being conveyed to the users transparently or the information is represented in a complex manner and hence the content is not understood the end-users.

In the context of the current technologies we use on a daily basis, which factor is more important from your point of view: personal benefits (personalized content) or privacy?
The study involved a total of 175 participants comprised of 26 experts and 149 non-experts. Given that there is a huge difference in these numbers and the divide is not even close to being equal, was it feasible to analyze and draw conclusions from the study conducted?
Apart from the suggestions in the study, what are some concrete measures that could be adopted to bridge the gap and keep the users informed about the potential risks associated with technology?

04/08/2020 – Sushmethaa Muhundan – Agency plus automation: Designing artificial intelligence into interactive systems

April 8, 2020 Sushmethaa Muhundan 1 Comment

This work explores strategies to balance the role of agency and automation by designing user interfaces that enable the shared representations of AI and humans. The goal is to productively employ AI methods while also ensuring that humans remain in control. Three case studies are discussed and these are data wrangling, data visualization for exploratory analysis, and natural language translation. Across each, strategies for integrating agency and automation by incorporating predictive models and feedback into interactive applications are explored. In the first case study, an interactive system is proposed that aims at reducing human efforts by recommending potential transformation, gaining feedback from the user, and performing the transformations as necessary. This would enable the user to focus on tasks that would require the application of their domain knowledge and expertise rather than spending time and effort manually performing transformations. A similar interactive system was developed to aid visualization efforts. The aim was to encourage more systematic considerations of the data and also reveal potential quality issues. In the case of natural language translation, a mixed-initiative translation approach was explored.

The paper has a pragmatic view of the current AI systems and makes a realistic observation that the current AI systems are not capable of completely replacing humans. There is an emphasis on leveraging the complementary strengths of both the human and the AI throughout the paper which is practical.

Interesting observations were made in the Data Wrangler project with respect to proactive suggestions. If these were presented initially, before the user has had a chance to interact with the system, this feature received negative feedback and was ignored. But, if the same suggestions were presented to users whilst the user was engaging with the system, although the suggestions were not related to the user’s current task, it was received positively. Users viewed themselves as initiators in the latter scenario and hence felt that they were controlling the system. This observation was fascinating since it shows that while designing such user interfaces, the designers should ensure that their users feel in control and are not feeling insecure while using AI systems.

With respect to the second case study, it was reassuring to learn that the inclusion of automated support from the interactive system was able to shift user behavior for the better and helped broaden their understanding of the data. Another positive effect was that the system helped humans combat confirmation bias. This shows that if the interface is designed well, the benefits of AI amplifies the results gained when humans apply their domain expertise.

The paper deals with designing interactive systems where the complementary strengths of agents and automation systems are leveraged. What could be the potential drawbacks of such systems, if any?
How would the findings of this paper be translated in the context of your class project? Is there potential to develop similar interactive systems to improve the user experience of the end-users?
Apart from the three case studies presented, what are some other domains where such systems can be developed and deployed?

04/08/2020 – Sushmethaa Muhundan – CrowdScape: Interactively Visualizing User Behavior and Output

April 8, 2020 Sushmethaa Muhundan 1 Comment

This work aims to address quality issues in the context of crowdsourcing and explores strategies to involve humans in the evaluation process via interactive visualizations and mixed-initiative machine learning. CrowdScape is a tool proposed that aims to ensure quality even in complex or creative settings. This aims to leverage both the end output as well as workers’ behavior patterns to develop insights about performance. CrowdScape is built on top of Mechanical Turk and obtains data from two sources: the MTurk API in order to obtain the products of work done and Rzeszotarski and Kittur’s Task Fingerprinting system in order to capture worker behavioral traces. The tool utilizes these two data sources and generates an interactive data visualization platform. With respect to worker behavior, raw event logs, and aggregate worker features are incorporated to provide diverse interactive visualizations. Four specific case studies were discussed and these included tasks relating to translation, color preference survey, writing, and video tagging.

In the context of creative works and complex tasks where it is extremely difficult to evaluate the task results objectively, I feel that mixed-initiative approaches like the one described in the paper can be effective to gauge the worker’s performance.

I specifically liked the feature mentioned with respect to aggregating features of worker behavioral traces wherein the user is presented with capabilities to dynamically query the visualization system to support data analysis. This gives the user control over what features are important to them and allows users to focus on those specific behavioral traces as opposed to presenting the users with static visualizations which would have limited impact.

Another interesting feature provided by the system was that it enabled users to cluster submission based on aggregate event features and I feel that this would definitely help save time and effort from the user’s side and would thereby quicken the process.

In the translation case study presented, it was interesting to note that one of the factors that affected the study of lack of focus was tracking copy-paste keyboard usage. This would intuitively translate to the fact that the worker has used third-party software for translation. However, this alone might not be proof enough since it is possible that the worker translated the task locally and was copy-pasting his/her own work. This shows that while user behavior tracking can provide insights, it might not be sufficient to draw conclusions. Hence, coupling it with the output data and comparing and visualizing it would definitely help draw concrete conclusions.

Apart from the techniques mentioned in the paper, what are some alternate techniques to gauge the quality of crowd workers in the context of complex or creative tasks?
Apart from the case studies presented, what are some other domains where such systems can be developed and deployed?
Given that the tool relies on worker’s behavior patterns and given that these may vary largely from worker to worker, are there situations in which the proposed tool would fail to produce reliable results with respect to performance and quality?

03/25/2020 – Sushmethaa Muhundan – Evorus: A crowd-powered Conversational Assistant Built to Automate Itself Over Time

March 22, 2020March 24, 2020 Sushmethaa Muhundan 1 Comment

The paper explores the feasibility of a crowd-powered conversational assistant that is capable of automating itself over time. The main intent of building such a system is to dynamically support a vast set of domains by exploiting the capabilities of numerous chatbots and providing a universal portal to help answer user’s questions. The system, Evorus, is capable of supporting multiple bots and given a query, predicts which bot’s response is most relevant to the current conversation. This prediction is validated using crowd workers from MTurk and the response with the maximum upvotes is sent to the user. The feedback gained from the workers is then used to develop a learning algorithm that helps improve the system. As part of this study, the Evorus chatbot was integrated with Google hangouts and user’s queries were presented to MTurk workers via an interface. The workers are presented with multiple possible answers that come from various bots for each query. The workers can then choose to upvote or downvote the answers presented or respond to the query by typing in an appropriate answer. An automatic voting system was also devised with the aim to reduce worker’s involvement in the process. The results of the study showed that Evorus was able to automate itself over time without compromising conversation quality.

I feel that the problem that this paper is trying to solve is very real: the current landscape of conversational assistants like Apple’s Siri and Amazon’s Echo is limited to specific commands and the users need to be aware of the commands supported in order to maximize the benefit of using them. This oftentimes becomes a roadblock as the AI bots are constrained to specific, pre-defined domains. Evorus tries to solve this problem by creating a platform that is capable of integrating multiple bots and leveraging their skill-set to answer a myriad of questions from different domains.

The focus on manual intervention reduction via automation and focus on quality throughout the paper was good. I found the voting bot particularly interesting where a learning algorithm was developed that used upvotes and downvotes provided by workers on previous conversations to learn from the worker’s voting patterns and would be capable of making similar decisions. Also, the upvotes and downvotes were also used to gauge the quality of responses from candidates and this was used as further input to predict the most suitable bots in the future.

Fact boards were another interesting feature that included chat logs and recorded facts and were part of the interface provided to the workers to provide context about the conversation. This ensures that the workers are caught up to speed and are capable of making informed decisions while responding to the users.

Given the scale at which information generation is growing, is the solution proposed in the paper feasible? Can this truly handle diverse domain queries while reducing human efforts drastically and also maintaining quality?
Given the complexity of natural languages, would the proposed AI system be able to completely understand the user’s need and respond with relevant replies without human intervention? Would the role of a human ever become dispensable?
How long do you think would it take for the training to be sufficient to entirely remove the role of a human in the loop in the Evorus system?

03/25/2020 – Sushmethaa Muhundan – Evaluating Visual Conversational Agents via Cooperative Human-AI Games

March 22, 2020March 24, 2020 Sushmethaa Muhundan Leave a comment

The primary intent of this paper is to measure the performance of human-AI teams and this is done in the context of the AI being a visual conversational agent using a cooperative game. Oftentimes the performance of AI systems is evaluated in isolation or with respect to interaction with other AI systems. This paper attempts to understand if this AI-AI performance evaluation can be extended to predict the performance of the AI system while it interacts with humans, which is essentially the AI-team performance. To measure the effectiveness of AI in the context of human-AI teams, a game-with-a-purpose (GWAP) is used called GuessWhich. The game involves a human player interacting with an AI component wherein the AI component generates clues pertaining to a secret image that the human is unaware of. Via this question-answer model, the human asks questions regarding the image and attempts to identify the secret image from a pool of images. Two versions of the AI component are used in this experiment, one trained in a supervised manner, and the other which is pre-trained with supervised learning and fine-tuned via reinforcement learning. The experiment results show that there is no significant performance difference between the two versions when interacting with humans.

The trend of humans interacting with AI directly or directly has increased exponentially and therefore it was interesting that the focus of this paper is on the performance of the human-AI team and not AI in isolation. Since it is becoming increasingly common to use AI in the context of humans, a dependency is created that cannot be measured by solely measuring the performance of the AI component alone.

While the results show that there is no significant performance difference between the two versions of AI used, the experiment results also show that while the performance of the AI improved as per AI-AI performance evaluation, this does not directly translate into better human-AI team performance. This was an interesting insight that challenges the existing AI evaluation norms.

Also, the cooperative game used in the experiments was complicated from a development point of view and it was interesting to understand how the AI was developed and how the pool of images was selected. The paper also explores the possibility of the Mturk workers discovering the strength of the AI and framing subsequent questions accordingly in order to leverage the strength discovered. This was a very fascinating possibility as it ties back to the mental models’ humans create while interacting with AI systems.

Given that the study was conducted in the context of visual conversational agents, are these results generalizable outside of this context?
It is observed that human-AI team performances are not significantly different for SL when compared to RL. What are some reasons that you can think of that explains this anomaly observed?
Given that ALICE is imperfect, what would be the recovery cost of an incorrect answer? Would this substantially impact the performance observed?