4/29/2020 – Nurendra Choudhary – Accelerating Innovation Through Analogy Mining

Summary

In this paper, the authors argue the need for an automated tool for finding analogies from large research repositories such as US patent databases. Previous approaches in the area include manually constructing large structured corpora and automated approaches that are able to find semantically relevant research but cannot identify proper structure in the documents. The manual corpora are expensive to construct and maintain, whereas, automated detection is inefficient due to lack of structure identification.

The authors propose an architecture that defines a structured purpose-mechanism schema for the analogy identification between two research papers. The purpose and mechanism are identified by crowd-workers and a word vectorization is utilized to represent the different sections as vectors. The similarity metric is cosine similarity between the query vectors. The query vectors for the experiments are purpose, mechanism and concatenation of purpose and mechanism. The authors utilize the Precision@K metric to evaluate and compare to conclude the efficiency of mechanism only and concat purpose-mechanism queries over other types. 

Reflection

The paper is very similar to the SOLVENT discussed in the previous class. I believe they were both developed by the same research group and also share the same authors. I believe SOLVENT solves a range of problems in this paper. For example, the problem that purpose-mechanism cannot be generalized to all research fields and there is a need to add additional fields to make it work better for a wider range of fields. 

The baselines do not utilize the entire paper. I do not think it is fair to compare abstracts of different domains to find analogies. The abstract do not always speak about the problem or solution in necessary depth. According to me, we should add more sections such as Introduction, Methodology and Conclusion. I am not sure if they would perform better but I would like to see these metrics reported. Also the diversity of fields used in the experiments is limited to engineering backgrounds. I think this should be expanded to include other fields such as medicine, business and humanities (a lot of early scientists were philosophers :P).

Questions

  1. What problems in this paper does SOLVENT solve? Do you think the improvement in performance is worth the additional memory utilized?
  2. How do you think this framework will help in inspiring new work? Can we put our ideas into a purpose schema mechanism to get a set of relevant analogies that may inspire further research?
  3. The authors only utilize abstract to find analogies. Do you think this is a good enough baseline? Should we utilize the entire paper as a baseline? What are the advantages and disadvantages of such an approach? Is it not more fair?
  4. Currently, can we learn purpose mechanism schemas for different fields independently and map between them? Is there a limit to the amount of variation that limits this framework? For example, is it fair to use medical documents’ abstracts and compare them to abstracts to CS papers given the stark amount of difference between them?

Word Count: 509

Read More

4/29/2020 – Nurendra Choudhary – DiscoverySpace: Suggesting Actions in Complex Software

Summary

In this paper, the authors introduce Discovery Space, an extension to Adobe Photoshop that suggests high-level macro actions based on visual features. Complex platforms such as Photoshop are great tools to aid creativity. However, their features are rather complex for beginners making for a steep learning curve. DiscoverySpace utilizes one-click actions suggested in the online community and makes these macro actions available to new users thus softening their introduction to the software.

For the experiment, the authors maintain two independent control groups. One group has access to the DiscoverySpace panel in Photoshop and the other only has the basic tool. Experiments show that DiscoverySpace helps beginners by suggesting initial macro actions. Subjects in the no tool group were frustrated with the tool’s complex features producing worse results than the subjects in DiscoverySpace group. Also, the authors suggest that some steps in the process can be replaced by advances in AI algorithms in the future which will lead to faster processes.

Reflection

The paper is really interesting in its approach to reduce the system’s complexity by integrating macro action suggestions. The framework is very generalizable and can work in a multiple number of complex softwares such as Excel sheets (to help with common macro functions), Powerpoint presentations (to apply popular transitions or slide formats) and AI frameworks (pre-building popular networks).  Another important aspect is that such technologies are already being applied in several places. Voice assistants have specific suggestions to introduce users to common tasks such as setting up alarms, checking weather, etc.

However, the study group is relatively very small. I do not understand the reason for this. The tasks could be put into an MTurk type format and given to several users. Given the length of the task (~30 min), the authors could potentially use train and work platforms such as Upwork too. Hence, I believe the conclusions of the paper are very specific to the subjects. Also, the authors suggest the potential of integrating AI systems to their framework. I think it would help if more examples were given for such integrations. 

Also, utilizing DiscoverySpace-like mechanisms draws in more users. This provides a monetary-incentive to businesses to invest in more such ideas. One example can be the paper-clip assistant in initial versions of Windows that introduced users to the operating system.

Questions

  1. I believe machine learning frameworks like tensorflow and pytorch have examples to introduce themselves to beginners. They could benefit from a DiscoverySpace-like action suggestion mechanism. Can you give some examples of softwares in your research area that could benefit from such frameworks?
  2. I believe the limited number of subjects is a huge drawback to trust in the conclusions of the paper. Can you provide some suggestions on how the experiments could be scaled to utilize more workers at a limited cost?
  3. The authors provide the example of using advances in image analysis to replace a part of DiscoverySpace. Can you think of some other frameworks that have replaceable parts? Should we develop more architectures based on this idea that they can be replaced by advances in AI?
  4. Give some examples of systems that already utilize DiscoverySpace-like framework to draw in more users?

Word Count: 538

Read More

4/22/2020 – Nurendra Choudhary – SOLVENT: A Mixed Initiative System for Finding Analogies Between Research Papers

Summary

SOLVENT aims to find analogies between research papers in different fields. E.g. simulated annealing in AI optimization is derived from metallurgy , Information foraging from Animal foraging. It aims to extract idea features from a research paper according to purpose-mechanism schema; Purpose: what they are trying to achieve, and  Mechanism: how they achieve that purpose. 

Research Papers cannot always be put into a purpose-mechanism schema due to Complex Language , Hierarchy of Problems and Mechanism vs Findings. Hence, the authors propose a Modified Annotation Scheme that includes; Background: Defines context of the problem, Purpose: Main problem being solved by the paper and Mechanism: The method developed to solve the problem and Findings: The conclusions of the research work In case of understanding type of papers, this section gives more information. Query the schema using cosine  similarity of tf-idf weighted average of word vectors. The authors scale up with crowd workers because expert annotation is prohibitively expensive. However, this shows a significant disagreement between crowd-workers and experts.

Reflection

Amazing (nothing’s perfect right :P) balance between human annotation capabilities and AI’s analysis of huge information sources to solve the problem of analogy retrieval. Additionally, the paper shows a probable future of crowd-work where tasks are increasingly complex for regular human beings. We have discussed this evolution in several classes before this and the paper is a contemporary example of this move towards complexity. The study that shows its application in a real-world research team provides a great example that other research teams can borrow.

I would like a report of the performance of Knowledge Back+Purpose+Mechanism+Findings. I don’t understand the reason for its omission (probably space issues). The comparison baseline utilizes Abstract queries but a system can potentially have access to the entire paper. I think that would help with the comparative study. The researchers test out expert researchers and crowd workers. An analysis should be done on utilizing undergraduate/graduate students. As pointed out by the authors, the study set is limited due to the need of expert annotations. However, no diversity is seen in the fields of study. 

“Inspiration: find relevant research that can suggest new properties of polymers to leverage,or new techniques for stretching/folding or exploring 2D/3D structure designs”

Fields searched: materials science, civil engineering, and aerospace engineering.

Questions

  1. The paper shows the complexity of future crowd-work. Is the solution limited to area experts? Is there a way to simplify or better define the tasks for a regular crowd? If not, what is the limit of a regular crowd-worker’s potential?
  2. How important is finding such analogies in your research fields? Would you apply this framework in your research projects?
  3. Analogies are meant for easier understanding and inspiration. SOLVENT has a limited application in inspiring novel work. Do you agree? What could be a possible scenario where it inspires novel work?
  4. Compared to MTurk, Upwork workers show better agreement with researchers across the board. What do you think is the reason? When would you use MTurk vs UpWork? Do you think higher pay would proportionally improve the work quality as well?

Word Count: 532

Read More

4/22/2020 – Nurendra Choudhary – The Knowledge Accelerator: Big Picture Thinking in Small Pieces

Summary

In this paper, the authors aim to provide a framework to deconstruct complex systems into smaller tasks that can be easily managed and done by crowd-workers without need of supervision. Currently, crowdsourcing is predominantly used for small tasks within a larger system dependent on expert reviewers/content managers. Knowledge Accelerator provides a framework to build complex systems solely based on small crowdsourcing tasks.

The authors argue that major websites like Wikipedia depend on minor contributors but require an expensive network of dedicated moderators and reviewers to maintain the system. They eliminate these points by a two phase approach: inducing structure and information cohesion. Inducing structure is done through collecting relevant web pages, extracting relevant text and creating a topic structure to encode the clips to the categories. The information cohesion is achieved by crowd-workers gathering information and improving sections of the overall article without global knowledge and adding relevant multimedia images. 

Reflection

The paper introduces a strategy for knowledge collection that completely removes the necessity for any intermediate moderator/reviewer. KA shows the potential of unstructured discussion forums as sources of information. Interestingly, this is exactly the end goal of my team’s course project. The idea of small-scale structure collection from multiple crowd-workers without any of them having context to the global article is generalizable to several areas such as annotating segments of large geographical images, annotating segments of movies/speech and fake-news detection through construction of an event timeline.

The paper introduces itself as a break-down strategy for all complex systems into simpler tasks that can be crowdsourced. However, it settles into the problem of structure and collection. E.g. Information structures and collection are not enough for jobs that involve original creation such as softwares, network architectures, etc. 

The system heavily relies on crowd-sourcing tasks. Some modules have effective AI counterparts. E.g. Inducing topical structure, searching relevant sources of information and multimedia components. I think a comparative study would help me understand the reasons for the decision. 

The fact that Knowledge Accelerator works better than search sites opens up new venues of exploration that collect data by inducing structure in various domains. 

Questions

  1. The paper discusses the framework’s application in Question-Answering. What are the other possible applications in other domains? Do you see an example application in a non-AI domain?
  2. I see that the proposed framework is only applicable to collection of existing information. Is there another possible application? Is there a way we can create new information through logical reasoning processes such as deduction, induction and abduction?
  3. The paper mentions that some crowd-work platforms allow complex tasks but require a whetting period between workers and task providers. Do you think a change in these platforms would help? Also, in traditional jobs, interviews enable similar whetting. Is it a waste of time if quality of work improves?
  4. I found my project similar to the framework in terms of task distribution. Are you using a similar framework in your projects? How are you using the given ideas? Will you be able to integrate this in your project?

Word Count: 524

Read More

4/15/2020 – Nurendra Choudhary – What’s at Stake: Characterizing Risk Perceptions ofEmerging Technologies

Summary

In this paper, the authors study the associated risk perception of human mental models to AI systems. For analyzing risk perception, they study 175 individuals, both individually and comparatively, while also factoring in psychological factors. Additionally, they also analyze the factors that lead to people’s conceptions or misconceptions in risk assessment. Their analysis shows that technologists or AI experts consider the studied risks as posing more threat to society than non-experts. Such differences, according to the author, can be utilized to define system design and decision-making.

However, most of the subjects agree that such system risks (identity theft, personal filter bubbles) were not voluntarily introduced in the system but were a consequence or side-effects of integrating some valuable tools or services. The paper also discusses risk-sensitive designs that need to be applied when the difference between public and expert opinion on risk is high. They emphasize on the integration of risk-sensitivity earlier in the design process rather than the current process where it is an after-thought of a deployed system.

Reflection

Given the recent usability of AI technologies in everyday lives (Tesla cars, Google Search, Amazon Marketplace, etc.), this study is very necessary. The risks do not just involve test subjects but a much larger populace that is unable to comprehend these technologies that intrude in their daily lives. These leave them vulnerable to exploitation. Several cases of identity theft or spam treachery have already taken victims due to lack of awareness. Hence, it is very crucial to analyze the amount of information that can reduce such cases. Additionally, a system should provide a comprehensive analysis of its limitations and possible misuse.

Google Assistant records all conversations to detect its initiation phrases “OK Google”. It depends on the fact that the recording is a stream and no data is stored except a segment. However, a possible listener to extract the streams and utilize another program to integrate them into comprehensible knowledge that can be exploited. Users are confident in the system due to the speech segmentation. However, an expert can see-through the given ruse and imagine the listener scenario just based on the knowledge that such systems exist. This knowledge is not entirely expert-oriented and can be transferred to users, thus preventing exploitation.   

Questions

  1. Think about systems that do not rely or have access to user information (e.g. Google Translate, Duckduckgo). What information can they still get from users? Can this be utilized in an unfair manner? Would these be risk-sensitive features? If so, how should the system design change?
  2. Unethical hackers generally work in networks and are able to adapt to security reinforcements. Can security reinforcements utilize risk-sensitive designs to overcome hacker adaptability? What such changes could be thought of in the current system?
  3. Experts tend to show more caution towards technologies. What amount of knowledge introduces such caution? Can this amount be conveyed to all the users of a particular product? Would this knowledge help risk-sensitivity?
  4. Do you think the individuals selected for the task are a representative set? They utilized MTurk for their study. Isn’t there an inherent presumption of being comfortable with computers? How could this bias the study? Is the bias significant?

Word Count: 542

Read More

4/15/2020 – Nurendra Choudhary – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

In this paper, the authors study the human-side or human trust in automated fact checking systems. In their experiments, they show that human beings were able to improve their accuracy when subjected to correct model predictions. However, the human judgement is also shown to degrade in case of incorrect model predictions. This establishes the trustful relationship between humans and their fact-checking models. Additionally, the authors find that humans interacting with the AI system improve their predictions significantly, suggesting transparency of models as a key aspect to human-AI interaction.

The authors provide a novel mixed-initiative framework to integrate human intelligence with fact checking models. Also, they analyze the benefits and drawbacks of such integrated systems.  

The authors also point out several limitations in their approach such as non-American representation in MTurk workers and bias towards AI predictions. Furthermore, they point out the system’s effectiveness in mediating debates and also convey real-time fact checks in an argument setting. Also, interaction with the tool could serve as a platform of identifying the reason for opinion differences.

Reflection

The paper is very timely in the sense that fake news has become a widely used tool of political and social gain. People, unfamiliar with the power of the internet, tend to believe unreliable sources and form very powerful opinions based on them. Such a tool can be extremely powerful in eliminating such controversies. The idea of analyzing human role in AI fact checkers is also extremely important. AI fact checkers lack perfect accuracy and given the problem, perfect accuracy is a requirement. Hence, the role of human beings in the system cannot be undermined. However, human mental models tend to trust the system after correct predictions and do not efficiently correct itself for incorrect predictions. This becomes an inherent limitation for these AI systems. Thus, the paper’s idea of introducing transparency is extremely appropriate and necessary. Given more incite into the mechanism of fact checkers, human beings would be able to better optimize their mental models, thus improving the performance of collaborative human-AI team.

AI systems can analyze huge repositories of information and humans can perform more detailed analysis. In that sense, fact-checking human-AI team utilizes the most important capabilities of both human and AI. However, as pointed out in the paper, humans tend to overlook their own capabilities and rely on model prediction. This could be due to human trust after some correct predictions. Given the plethora of existing information, it would be really inconvenient for humans to assess it all. Hence, I believe these initial trials are extremely important to build the correct amount of trust and expectations.

Questions

  1. Can a fact-checker learn from its mistakes pointed out by humans? How would that work? Would that make the fact-checker dynamic? If so, what is the extent of this change and how would the human mental models adapt effectively to such changes?
  2. Can you suggest a better way of model interaction between humans and models? Also, what other tasks can such interaction be effective?
  3. As pointed out in the paper, humans tend to overlook their own capabilities and rely on model prediction? What is a reason for this? Is their a way to make the collaboration more effective?
  4. Here, the assumption is that human beings are the veto authority. Can there be a case when this is not true? Is it always right to trust the judgement of humans (in this case underpaid crowd workers)?

Word Count: 588

Read More

04/08/2020 – Nurendra Choudhary – Agency plus automation: Designing artificial intelligence into interactive systems

Summary

In this paper, the authors study system designs that included different kinds of interaction between human agency and automation. They utilize human control and their complementary strengths over algorithms to build a more robust architecture that absorbs from both their strengths. They share case studies of interactive systems in three different problems – data wrangling, exploratory analysis and natural language translation. 

To achieve synchronization between automation and human agency, they propose a design of shared representations for augmented tasks with predictive models for human capabilities and actions. The authors criticize the need for the AI community’s push towards complete automation and argue that the focus should rather be on systems that are augmented with human intelligence. In their results, they show that such models are more usable in current situations. They depict the utilization for interactive user interfaces to integrate human feedback into AI and improve the systems and also provided correct results for that problem instance. They utilize shared representations for AI that can be edited by humans for removing inconsistencies, thus, integrating human capability for those tasks.

Reflection

This is a problem we have discussed in class several times. However, the outlook of this paper is really interesting. It shows shared representation as a method for integrating human agency. Several papers we have studied utilize human feedback as part of augmenting the learning process. However, this paper discusses auditing the output of the AI system. Representations for AI is a very critical attribute. Its richness decides the efficiency of the system and its lack of interpretability is generally the reason several AI applications are considered black-box models. I think shared representations in a broader sense, also, suggests broader AI understanding akin to unifying human and AI capabilities in the most optimal way. 

However, such representations might limit the capability of the AI mechanisms behind them. The optimization in AI is with respect to a task and that basic metric decides the representations in AI models. The models are effective because they are able to detect patterns in multi-dimensional spaces that humans cannot comprehend. The paper aims to make that space comprehensible, thus, eliminating the very basic complication that makes an AI effective. Hence, I am not sure if it is the best idea for long-term development. I believe we should stick to current feedback loops and only accept interpretable representations with statistically insignificant results differences.  

Questions

  1. How do we optimize for quality of shared representations versus quality of system’s results?
  2. Humans that are needed to optimize shared representations may be fewer when compared to the number of people who can complete the task. What would be the cost-benefit ratio for shared representations? Do you think the approach will be worth it in the long-term?
  3. Do we want our AI systems to be fully automatic at some point? If so, how does this approach benefit or limit the move towards the long-term goal?
  4. Should there be separate workflows or research communities that work on independent AI and AI systems with human agency? What can these communities learn from each other? How can they integrate and utilize each other’s capabilities? Will they remain independent and lead to other sub-areas of research?

Word Count: 545

Read More

04/08/2020 – Nurendra Choudhary – CrowdScape: Interactively Visualizing User Behavior and Output

Summary

In this paper, the authors solve the problem of large-scale human evaluation through CrowdScape, a system for large-scale human evaluation based on interactive visualizations and mixed-initiative machine learning. They track two major previous approaches of quality control – worker behaviour and worker output.  

The contributions of the paper include an interactive interface for crowd worker results, visualization for crowd behavior, techniques for exploration of crowd worker products and mixed initiative machine learning for bootstrapping user intuitions. The previous work includes analyzing the crowd worker behavior and output independently, whereas, CrowdScape provides an interface for analyzing them together.  CrowdSpace utilizes mouse movement, scrolling, keypresses, focus events, and clicks to build worker profiles. Additionally, the paper also points out its limitations such as neglected user behaviors like focus of fovea. Furthermore, it shows the potential of CrowdSpace in other experimental setups which are primarily offline or cognitive and don’t require user movement on the system to analyze their behavior.

Reflection

CrowdSpace is a very necessary initiative as the number of users increases for evaluation. Another interesting aspect is that this also increases developers’ creativity and possibilities as he can now evaluate more complex and focus-based algorithms. However, I feel the need for additional compensation here. The crowd workers are being tracked and this is an intrusion to their privacy. I understand that this is necessary for the process to function but given that it makes focus an essential aspect of worker compensation, they should be awarded fairly for it. 

Also, the user behaviors tracked here fairly cover most significant problems in the AI community. However, more inputs should cover a better range of problems. Adding more features would not only increase problem coverage but also lead to increase in development effort. There could be several instances when a developer does not build something due to lack of evaluation techniques or popular measures. Increasing features would help get rid of this concern. For example, if we are able to track the fovea of user’s, developers could not study the effect of different advertising techniques or build algorithms to predict and track interest in different varieties of videos (business of YouTube).

Also, I am not sure of the effectiveness of tracking the movements given in the paper. The paper considers effectiveness as a combination of worker’s behavior and output. But in several tasks you need mental models that do not require the movements tracked in the paper. In such cases, the output needs to have more weightage. I think the evaluator should be given the option to change the weights of different parameters, so that he could vary the platform for different problems making it more ubiquitous. 

Questions

  1. What kind of privacy concerns could be a problem here? Should the analyzer have access to such behavior? Is it fair to ask the user for his information? Should the user be additionally compensated for such intrusion to privacy?
  2. What other kinds of user behaviors are traceable? The paper mentions fovea’s focus. Can we also track listening focus or mental focus in other ways? Where will this information be useful in our problems?
  3. CrowdSpace uses the platform’s interactive nature and visualization to improve user experience. Should there be an overall focus on improving UX at the development level? Or should we let them be separate processes?
  4. CrowdSpace considers worker behavior and output to analyze human evaluation. What other aspects could be used to analyze the results?

Word Count: 582

Read More

03/25/2019 – Nurendra Choudhary – Evaluating Visual Conversational Agents via Cooperative Human-AI Games

Summary

In this paper, the authors measure the performance of human-AI teams instead of isolated AI. They employ a GuessWhich game in the context of a visual conversation agent system. The game works on interaction between humans and the AI system called ALICE.

The game includes two agents- the questioner and answerer. The answerer has access to images, based on which they ask questions to the answerer. The answerer replies to the questions and tries to guess the correct image from an exhaustive set of images. For the human-AI team, the questioner is ALICE and the answerer is human. The performance in terms of the number of questions needed for the correct guess. Also, the authors utilize a QBot (Questioner Bot) instead of humans for comparative analysis between ALICE-human and ALICE-QBot.

The authors discuss various challenges with the approaches such as robustness to incorrect question-answer pairs, human learning on AI and others. They conclude that  ALICE-RL, the state-of-the-art in AI literature, does not perform better than ALICE-QBot in ALICE-human pairs. This highlights the disconnect between isolated AI development and development in teams with humans.

Reflection

The paper discusses an important problem of disconnect between isolated AI development and its real-world usage with humans in the loop. However, I feel there are some drawbacks in the experimental setup. In the QBot part, I do not agree with the temporally dynamic nature of the questionnaire. I think the QBot should get access to the perfect set of questions (from humans) to generate the new question. This would make the comparison fair and less dependent on its own performance.

An interesting point is the dependence of AI on humans. The perfect AI system should not rely on humans. However, current AI systems rely on humans to be useful in real-world. This leads to a paradox where we need to make AI systems human-compliant but move towards the larger goal of bringing in independent AI. 

To achieve the larger goal, I believe isolated development of AI is crucial. However, the systems also need to contribute to human society. For this, I believe we utilize variants of the underlying system to support human behavior. This approach supports isolated development and additionally collects auxiliary performance of human behavior which can further improve the AI’s performance. This approach is already being applied effectively. For example, in case of Google Translate, the underlying neural network model was developed in isolation. Human corrections to its translations provide auxiliary information and also improve the human-AI team’s overall performance. This leads to a significant overall improvement in the translator’s performance overtime. 

Questions

  1. Is it fair to use the GuessWhich game as an indicator of AI’s success? Shouldn’t we rely on the final goal of an AI to better appreciate the disconnect?
  2. Should the human-AI teams just be part of evaluation or also the development part? How would we include them in the development phase for this problem?
  3. The performance of ALICE relies on the QBot mechanism. Could we use human input to improve QBot’s question generation mechanism to improve its robustness?
  4. The experiment provides a lot of auxiliary data such as correct questions, relevance of questions to the images and robustness of bots with respect to their own answers. Can we integrate this data into the main architecture in a dynamic manner?

Word Count:564

Read More

03/25/2019 – Nurendra Choudhary – All Work and No Play? Conversations with a Question-and-Answer Chatbot in the Wild

Summary

In this paper, the authors study a Human Resources chatbot to analyze the interactions between the bot and its users. Their primary aim is to utilize the study to enhance the interactions of conversational agents in terms of behavioral aspects such as playfulness and information content. Additionally, the authors emphasize on the adaptivity of the systems based on particular user’s conversational data.

For the experiments, they adopted an agent called Chip (Cognitive Human Interface Personality). Chip has access to all the company related assistance information. The human subjects for this experiment are new employees that need constant assistance to orient themselves in the company. Chip is integrated into the IM services of the company to provide real-time support.

Although Chip is primarily a question-answer agent, the authors are more interested in the behavioral ticks in the interaction such as playful chit-chats, system inquiry, feedbacks and habitual communicative utterances. They utilize the information from such ticks to further enhance the conversational agent and improve its human-like behavior (and not focus solely on answer-retrieval efficiency).

Reflection

All Work and No play is a very appropriate title for the paper. Chip is primarily applied in a formal context where social interactions are considered unnecessary (if not inappropriate). However, human interactions always include a playful feature to improve quality of communication. No matter the context, human conversation is hardly ever void of behavioral features. The features exhibit emotions and significant subtext. Given the setting, it is a good study to analyze the effectiveness of conversational agents with human behavior features. However, some limitations of the study include the selection bias (as indicated in the paper too). The authors pick conversation subparts that are subjectively considered to include the human behavior features. However,I do not see a better contemporary method in the literature to efficiently avoid the selection bias.

Additionally, I see this study as part of a wider move of the community towards appending human-like behavior to their AI systems. If we look at the current popular AI conversation agents like Alexa, Siri, Google Assistant and others, we find a common aim to enhance human-specific features with limited utilitarian value such as jokes, playful ticks among others. I believe this type of learning also mitigates the amount of adaptation humans need before being comfortable with the system. In the previous classes, we have seen the adaptation of mental models with a given AI tool. If the AI systems behave more like humans and learn accordingly, humans would not need significant learning to adopt these AI tools in their daily life. For example, when voice assistants did not include these features, they were significantly less prevalent than in the current society and they are only projected to widen their market.

Questions

  1. How appropriate is it to have playful systems in an office environment? Is it sufficient to have efficient conversational agents or do we require human-like behavior in a formal setting?
  2. The features seem even more relevant for regular conversational agents. How will the application and modeling differ in those cases?
  3. The authors select the phrases or conversational interactions as playful or informal based on their own algorithm. How does this affect the overall analysis setup? Is it fair? Can it be improved?
  4. We are trying to make the AI more human-like and not using it simply as a tool. Is this part of a wider move as the future area of growth in AI? 

Word Count: 590

Read More