04/29/2020 – Mohannad Al Ameedi – DiscoverySpace Suggesting Actions in Complex Software

Summary

In this paper, the authors aim to help beginner users of complex software to execute complex tasks in a simple way to help them build confidence and not lose interest in the software. The approach used as an extension to photoshop and collects instructions available in the community to build macros that can execute multiple steps to achieve an action or a goal. Users might use information available online, but they might get lost with the overloaded information available or choose a solution that is not efficient.

The approach offers suggestions to the users in the context of the current action to help them with executing the next desired action.

The authors asked the users to participate in two surveys to measure their confidence level of using photoshop application. The participants level of experience in photo editing software varied from beginner to expert and first survey showed that beginner users might lose confidence when there is no  suggestion feature that can help on executing tasks. The second survey was about DiscoverySpace system which showed that beginner users gained confidence when they use a tool that can help them with executing complex task to achieve their goal.

Reflection

I found the approach used by the authors to be very interesting. Providing suggestions to new users to execute different tasks to achieve a goal can help beginner to be successful in their job or on the project that they are working on.

Using information and instruction available online to build the list of tasks that are required to achieve a certain action is a nice implementation and can be used in lots of domains.

The idea of using recommendations based on the context of the available action is more effective than the help options available on the application that might give too much information and links that make it difficult to find the solution especially for beginner users.

I think this approach can also be used for new hires or new students to offer templates for a certain actions. Often new hires and new students need to execute multiple steps in order to achieve a goal and failing on these tasks might cause multiple issues that prevent them from having a smooth onboarding experience.   I also think that this approach can be used in software development to help new programmers to use template to design new systems or to solve coplex problems. Stack overflow has solutions to thousands of programming issues and can help on building macros or templates to solve a specific issue.

Questions

  • The approach used by the authors can help beginners on execute a set of tasks to accomplish a goal in photoshop, can we use a similar approach in a different application or in different domain?
  • Can you use a similar approach in your project?
  • Do you think that this approach can also help experts on using these systems especially when a new major feature is released?

Read More

04/29/2020 – Mohannad Al Ameedi – Accelerating Innovation Through Analogy Mining

Summary

In this paper, the authors aim to improve the search and discovery for ideas using analogies in massive and unstructured datasets. Their approach combines both crowd workers and recurrent neural network to learn from a week structural representation of vectors. The authors used a patent dataset to search for product description and used crowd workers to extracted purpose and mechanics to help with finding ideas across different domains. They have used Amazon Mechanical Turk to hire workers to perform a dual annotation on each product description by labeling the parts of text that is related to the purpose of the product and another labeling related to the mechanism or the way the product work and used . The authors then used bidirectional recurrent neural network and information retrieval techniques to find a deep and more accurate similarity between the searched idea and available innovation and research about it. The authors approach has a high precision and recall and can improve the retrieval accuracy by 25%.   

Reflection

I think the approach used by the authors is very interesting. Extracting the purpose and mechanism from a production description is like looking at the data from two different angles. Calculating the similarity base on two vectors is a nice implementation and can help on finding a close relationship between two subjects in different domains that share a common attribute.

I also like the idea of using deep learning instead of TF-IDF to calculate the similarity between to products’ description as it can improve the quality of the search.

I personally use google scholars to search for a similar ideas but didn’t use the websites mentioned in the paper and that is something that I have learned while reading the paper.

This approach can be used as a verification tool when reviewing a copy right application. The idea might be the same as other idea but in different domain and the application that was built by the authors can help on finding this out.

This approach is like mapping vocabulary to concept space to improve information retrieval by performing latent space indexing rather than just performing similarity on keywords. Different words might have the same meaning and one word might have different meanings. Searching based on the keywords might retrieval incorrect results, while searching based on the concept might lead to a much accurate result.

Questions

  • The authors asked the crowd workers to extract two pieces of information, the purpose and mechanism, from the product description. Can we use this approach to solve a different problem?
  • Do you agree with the authors that the recurrent neural network is better than traditional TF-IDF in calculating the similarity for the two vectors? Why or why not?
  • Can you use a similar approach in your project to ask the crowd workers to annotate your data from two different perspectives or looking at your data from two different angles?
  • The authors mentioned more than two websites that store information about patents, have you used these websites?

Read More

04/22/2020 – Mohannad Al Ameedi – The Knowledge Accelerator: Big Picture Thinking in Small Pieces

Summary

In this paper, the authors try to help crowdsource workers to have a big picture of the system that the small tasks assigned to them try to accomplish which can help in executing the task more efficiently to have a better contribution on achieving the big goal. The work also tries to help companies to remove the bottleneck caused by a small number of people who normally know or maintain the big picture and can cause serious risks if these individuals leave the company. The authors designed and developed a system known as Knowledge Accelerator that can be used by crowdsource worker to answer a given question and allow them to use relevant resources to help them answering the question in a big picture context without the need to a moderator. The system starts by asking the workers to choose different web pages related to the topic then extract the relevant information, then cluster the information on different categories. The system then integrate the information by drafting an article, then allow the editing of the article, and finally add supporting images or videos that are related to the article, and that way the system can help the crowdsource worker to know the big picture of the system and complete the task in a way that can achieve the big picture and goal of the system.

Reflection

I found the study mentioned in the paper to be very interesting. I agree with the authors that a lot of tasks that are done by crowdsource workers are simple and it is hard to divide complex tasks that can require the knowledge of the big picture. Knowing the big picture is very important and often known be very few people who are normally in technical leadership positions and losing them might cause serious issues.

I like the way the system is designed to provide a high-level information about the system while working on a small tasks. The pipeline and multi-stage operations used in the system to generate a cohesive article can help the workers to achieve the goal and also to know more information about the topic.  

This approach can also be used when building large scale systems where many components need to be built by developers, and often these developers don’t know what the system is trying to accomplish or try to solve. Normally developers work on a specific task to add an employee table or building a web service endpoint that can receive a request and send back a response without knowing who will be using the system or what will be the impact of their tasks on the overall system. I think we can use such system to help developers to understand the big picture of the system which can help them to solve problem on a way that can make a greater impact on the big picture or big problem that the system is trying to solve.

Questions

  • The system developed by the authors can help with generating articles about a specific topic, can we use the system to solve different problems?
  • Can we use this approach in software development to help developers understand the big picture of the system that they are trying to build especially when building large systems?
  • Can you think of a way to use a similar approach in your course project?

Read More

04/22/2020 – Mohannad Al Ameedi – Opportunities for Automating Email Processing: A Need-Finding Study

Summary

In this paper, the authors aim to study the needs of users for email automation and the resources required to achieve the automation. The authors goal is to design a good email automation system. They led a workshop to group the requirements into different categories, and they also conducted a survey using human computation to help understanding the users needs. After collecting all the requirements, the authors performed another study by reviewing an open source codebase available on GitHub to see which requirements already been met. After building and running the source code, they asked users to interact with the system to find out what is working well and what is not. They find out that there are limitations with the current implementation especially with complex requirements and lots of requirements are not being met. The authors hope that their findings can help future research to focus on the needs that are not met or satisfied yet.

Reflection

I found the method used by the authors to be very interesting. Conducting a survey and leading a workshop to find the users requirements and cross reference them with what is available and what is not with the current implementations is a nice approach to find out what is not implement yet.

I also like the idea of performing a code analysis on an open source project and link the analysis with user requirements. This approach can be used by software companies to search GitHub for current implementations of certain requirement rather than just searching a code implementation for a specific library or a tool.

I like the idea of email automation and I have used rules before to automatically move certain emails to special fodders. Nowadays most systems send automatic notifications and these notifications are necessary but sometimes it make it hard to distinguish between emails that need an immediate response versus emails that need a review at a later time. I also like that Gmail automatically move emails that has advertisements to different folders or different view to let the user focus on the important emails.

I agree with authors that there is a big room of improvements in the current implementation of email automation, but it will be interesting to know what will be the results if email systems like outlook, Gmail, and Yahoo have been deeply investigated to know what have been already implemented in these systems which was missing in the system that they have studied.

Questions

  • The authors studied the current implementation using one system and over a week of time, do you think using more than one systems or study the user interactions over multiple weeks or months might lead to a different results?
  • Do you think email automation can be used to send critical business emails that might accidentally includes some information that shouldn’t be sent? How can such systems overcome such issues?
  • Have you used rules to automate email operations? Were they useful?  

Read More

04/15/2020 – Mohannad Al Ameedi – Believe it or not Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

In this paper, the authors propose a mixed initiative approach for fact checking that combine both human knowledge and experience with the automated information retrieval and machine learning. The paper discusses the challenges of the massive amount of information available today on the internet that some of them might not be accurate which introduce a risk to the information consumers. The proposed system retrieve relevant information about a certain topic and use machine learning and natural language processing to assess the factuality of the information and provide a confidence level to the users and let the user decide wither to use the information or do a manual research to validate the claims. The partnership between the artificial intelligence system and human interaction can offer effective fact checking that can support the human decision in a salable and effective way.

Reflection

I found the approach used by the authors to be very interesting. I personally had a discussion with a friend recently about a certain topic that was mentioned in Wikipedia, and I thought the numbers and facts mentioned were accurate but it turns out the information were wrong and he asked me to check an accreditable source. If I had the opportunity to use the proposed system on the paper, then accredited source could have ranked higher than Wikipedia.

The proposed system is very important in our digital age where so much information is generated on a daily bias and we are not only searching for information, but we are also receiving so much information through social media related to current events and some of these events have high impact on our life and we need to assess the factuality of these information and the proposed system can help a lot on that.

The proposed system is like a search engine that not only rank document based on relevance to the search query but also based on the fact-checking assessment of the information. The human interaction is like the relevance feedback in search engine which can improve the retrieval of the information which leads to a better ranking.

Questions

  • The AI systems can be biased because the training data can be biased. How can we make the system unbiased?
  • The proposed system use information retrieval to retrieve relevant articles about a certain topic and then use the machine learning to validate the source of the information and then present the confidence level of each article. Do you think the system should filter out articles with poor accuracy as they might confuse the user? Or they might be very valuable?
  • With the increase usage of social networking, many individuals write or share fake news intentionally or unintentionally. Millions of people post information every day. Can we use the proposed system to assess the fake news? if yes, then can we scale the system to assess millions or billions of tweets or posts?

Read More

04/09/2020 – Mohannad Al Ameedi – Agency plus automation Designing artificial intelligence into interactive systems

Summary

In this paper, the author proposes multiple systems that can combine the power of both artificial inttelgence and human computation and overcome each one weakness. The author thinks that automating all tasks can lead to a poor results as human component is needed to review and revise results get the best results. The author the autocomplete and spell checkers examples to show that artificial intelligence can offer suggestion and then human can review or revise these suggestions or dismiss the suggestions. The author propose different systems that uses predictive interaction that help users on their tasks that can be partially automated to help the users to focus more on the things that they care more about. One of these systems called Data Wrangling that can used by data analyst on the data preprocessing to help them with cleaning up the data to save more than %80 of their work. The users will need to setup some data mapping and can accept or reject the suggestions. The author proposed project called Voyager that can help with data visualization for exploratory analysis which can be used to help with suggesting visualization elements. The author suggests using AI to automate repeated task and offer the best suggestions and recommendations and let the human decide whether to accept or reject the recommendations. This kind of interaction can improve both machine learning results and human interaction.

Reflection

I found the material presented in the paper to be very interesting. Many discussions were about whether machine can replace human or not was addressed in this paper. The author mentioned that machine can do well with the help of human and the human in the loop will always be necessary.

I also like the idea of the Data Wrangling system as many data analysts and developer spend considerable time on cleaning up the data and most of the steps are repeated regardless of the type of data, and automating these steps will help a lot of people to do more effective work and to focus more on the problem that they are trying to solve rather than spending time on things that are not directly related to the problem.

I agree with author that human will always be in the loop especially on systems that will be used by humans. Advances in AI need human on annotating or labeling the data to work effectively and also to measure and evaluate the results.

Questions

  • The author mentioned that the Data Wrangler system can be used by data analysts to help with data preprocessing, do you think that this system can also be used by data scientist since most machine learning and deep learning projects require data cleanup ?
  • Can you give other examples of AI-Infused interactive systems that can help different domains and can be deployed into production environment to be used by large number of users and can scale well with increased load and demands?

Read More

04/08/2020 – Mohannad Al Ameedi – CrowdScape Interactively Visualizing User Behavior and Output

Summary

In this paper, the authors propose a system that can evaluate complex tasks based on both workers output and behaviors. Other available systems are focus on once aspect of evaluation on either the worker output or behavior which can give poor results especially with complex or creative work. The proposed system combine works through interactive visualization and mixed initiative machine learning. The proposed system, CrowdScape, offers visualization to the users that allow them to filter out poor output to focus on a limited number of responses and use machine learning to measure the similarity of response with the best submission and that way the requester can get the best output and best behavior at the same time. The system provides time series data about for user actions like mouse move or scroll up and down to generate visual timeline for tracing user behavior. The system can work only with web pages and has some limitation, but the value that can give to the customer is high and can enable users to navigate through workers results easily and efficiently.

Reflection

I found the method used by the authors be very interesting. Requesters receive too much information about the workers and visualizing that data can help the requesters to know more about the data, and the use of machine learning can help a lot on classifying or clustering the optimal workers output and behaviors. Other approaches mentioned in the paper are also interesting especially for simple tasks that don’t need complex evaluation.

I also didn’t know that we can get detailed information about the workers output and behavior and found YouTube example mentioned in the paper to be very interesting. The example mentioned shows that MTurk returns everything related to the user actions using the help of JavaScript while working on the YouTube video which can be used in many scenarios. I agree with the authors about the approach which can combine the best of the two approaches. I think it will be interesting to know how many worker response are filtered out in the first phase of the process because that can tell us if sending the request even worthwhile. If too many responses are not considered, then it is possible that task need to be evaluated again.

Questions

  • The authors mentioned that their proposed system can help to filter out poor outputs on the first phase. Do you think if to many responses are filtered out means that the guidelines are the selection criteria needs to be reevaluated?
  • The authors depend on JavaScript to track information about the workers behaviors do you think MTurk needs to approve that or it is not necessary? And do you think that the workers also need to be notified before accepting the task?
  • The authors mention that CrowdScape can be used to evaluate complex and creative tasks, do you think that they need to add some process to make sure that the task really need to be evaluated by their system, or you think the system can also work with simple tasks?

Read More

03/25/2020 – Mohannad Al Ameedi – “Like Having a Really bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

Summary

In this paper, the authors, try to understand the user experience of conversational agents by examining the factors that motivate users to work with these agents, and also try to propose design considerations that overcome the current limitation and improve human interaction. During their study, they found that there is a huge gap between user expectations and conversational agents’ operations.

They also found that there are limited studies about how agents are used on a daily bases and most of these studies were not about user experiences and more focus on technical architecture, language learning, and other areas.

The authors conducted interviews with 14 individuals who use conversational agents regularly, and their ages varies from 25 to 60 years. Some of these individuals have in depth technical knowledge and the others are regular users of technologies.

They found that the key motivation of using the conversational agents was time saving where users ask the CA to execute simple tasks that normally require multiple steps like checking the weather, setting reminders, setting alarms, getting directions. They also found that the users started the engagement through playful interaction like asking the CA to tell them a joke or playing a music. Only few users, who have technical knowledge, reported using these systems on basic work-related tasks.

The user’s interactions were mainly on non-critical tasks and have reported that the agents were not that successful when they are asked to execute complex tasks. The studies shows that users don’t trust conventional agents when it comes to executing critical tasks like sending emails or making a phone calls and they need a visual confirmation to complete these kind tasks. They also mentioned that these systems don’t accept feedback and there are no transparencies of how things are working internally.

The authors suggest considering ways reveal system intelligence, reconsidering the interactional promise made by humorous engagement, considering how best to indicate capability though interaction, and rethinking system feedback and design goals in light of the dominant use case, as areas for future investigation and development.

Reflection

I found the results reported by the study to be very interesting. Most users learned to use these CA systems as they go by trying different words and keywords unit it worked out, and the conversational agents failed to have a natural interaction with humans.

I also thought that companies like Google, Amazon, Microsoft, and Facebook have developed conversational systems that can perform much better than answering simple questions and struggling with complex questions, but it appears that is not the case. These companies have developed very sophisticated AI systems and services and it seems to me that there are some limitation like computational power or latency considerations are preventing these systems from performing well.

I agree with the authors that providing feedback can improve human interaction with CA systems and communicating the capability can lower the expectation which leads to reducing the gap between the expectation and the operation.

Questions

  • The authors mentions that most users felt unsure as to whether their conversational agents had a capacity to learn, can we use reinforcement learning to help the CA to adapt and learn while engaging with users in a single session?
  • The authors mentioned that CA systems are generally good with simple tasks, but not with complex tasks and they are struggling with understanding human requests. Do you think that there are technical limitation or other factors preventing the system from performing well with humans? what are these factors?
  • The authors mentioned that most instances, the operation of the CA systems failed to bridge the gap between user expectation and system operation. If that the case for conversational agents, do you think that we are far away from deploying autonomous cars, which are far more complicated than CAs, in real time setting since it has direct interaction with environments?

Read More

3/25/2020 – Mohannad Al Ameedi – Evaluating Visual Conversational Agents via Cooperative Human-AI Games

Summary

Improvements in artificial intelligence systems are normally measured alone without taking into consideration the human element. In this paper, the authors try to measure and evaluate the human-AI team performance by designing an interactive visual conversational agent that involve both human and AI to solve a specific problem. The conversational agent assigns the AI system a secret image with caption which is not known by the human, and the human start rounds of questions to guess the correct image from pool of images. The agent maintains an internal memory of questions and answers to help maintaining the conversation.

The authors use two version of AI systems, the first one is trained using supervised learning and the second is trained using reinforcement learning. The second system outperforms the first, but the improvement doesn’t translate well when interacting with human which proves that advances in AI system doesn’t necessarily means advances in the human-AI team performance.

Reflection

I found the idea of running two AI systems with the same human to be very interesting. Normally we think that advances in AI system can lead to better usage by the human, but the study shows that this is not the case. Putting the human in the loop while improving the AI system will give us the real performance of the system.

I also found the concept of reinforcement learning in conversational agents to be also very interesting. Using online learning by assigning a positive and negative rewards can help to improve the conversation between human and AI system, which can prevent the system from getting stuck on the same answer if the human ask the same question.

The work in somehow like the concept of compatibility. When human makes a mental model about the AI system. Advances in AI system might not be translated into a better usage by the human, and this is what was proven by the authors when they use two AI systems and one is better than the other, but improvement not necessarily translate to better performance by the users.

Questions

  • The authors proved that improvement in AI system alone doesn’t necessarily leads to a better performance when using the system by human, can we involve the human in the process of improving the AI system to lead to a better performance when the AI system get improved?
  • The authors use a single secret image known by the AI system but not known by the human, can we make the image unknown to the AI system too by providing a pool of images and the AI system select the appropriate image? And can we do that with acceptable response latency?
  • If we have to use a conversational agents like bots in production setting, do you think the performance of an AI system trained using a supervised learning can response faster than a system trained using a reinforcement learning giving that the reinforcement learning will need to adjust it is behavior based on the reward or feedback?

Read More

02/25/2020 – Mohannad Al Ameedi – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

Summary

In this paper, the authors study the effect of updating AI system on the human-AI team performance. The study is focused on the decision-making systems where the users decide on whether accept the AI system recommendation or perform a manual process to make a decision. The authors name the users experience a mental model that the users built over a course of the usage of the system. Improving the accuracy of the AI system might disturb the users’ mental model and decrease the overall performance of the system. The paper mentioned two examples of a readmission system that is used by doctors to predict if the patient will get readmitted or not and also another system that is used by judges and shows the negative impact of the system updates on both systems. The authors propose a platform that can be used by the users to recognize objects and can built users mental model and give the user rewards and get feedback to improve the overall system performance which encompass both of the AI system performance and compatibility.

Reflection

I found the idea of the compatibility very interesting. I always thought that the performance of the AI model on the validation is the most and only factor that should be taken into consideration, and I never thought about the negative effect on the user experience or mental model of the user, and now I can see that the compatibility and the performance tradeoff is a key in deploying a successful AI agent.

At the beginning, I thought that the word compatibility was not the right term to describe the subject. My understanding was compatibility in software systems refer to making sure the a newer version of the system should still work in different versions of the operating  system, but now I think the user is taking a similar role as the operation system when dealing with the AI agent.

Updating the AI system looks similar to updating the user interface of an application where the users might not like a newly added feature or the new way used by the system handle a task.

Questions

  • The authors mention the patient readmission and the judge examples to demonstrate how the AI update might affect the users, are there any other examples?
  • The authors propose a platform that can get user feedback but not in real world setting. Can we build a platform that can get feedback at run-time using reinforcement learning where the reward can be calculated in each user action ad adjust the action to use the current model or previous model?
  • If we want to use crowd-sourcing to improve the performance/compatibility of the AI system then the challenge will be on building a mental model for the user since different user will take a different task and we have no control on choosing the same worker every time, any idea that can help on using crowd-sourcing to improve the AI agent.

Read More