4/29/2020 – Akshita Jha – Accelerating Innovation Through Analogy Mining

April 28, 2020 Akshita Jha Leave a comment

Summary:
“Accelerating Innovation Through Analogy Mining” by Hope et al. talks about the problem of analogy mining from messy and chaotic real-world datasets. Hand created databases have high relational structures but are sparse in nature. On the other hand, machine learning and information retrieval techniques can be used but they lack the understanding of the underlying structure which is crucial to analogy related tasks. The authors leverage the strengths of both crowdsourcing and machine learning techniques to learn analogies from these real-world datasets. They make use of the creativity of the crowds with the cheap computing power of recurrent neural networks. The authors extract meaningful vector representations from product descriptions. They observe that this methodology achieves greater precision and recall than the traditional information-retrieval methods. The authors also demonstrate that the models significantly helped in generating more creative ideas compared to analogies retrieved by traditional methods.

Reflections:
This is a really interesting paper that talks about a scalable approach to finding analogies in large, real-world, messy datasets. The authors use a bi-directional Recurrent Neural Network (RNN) with Gated Recurrent Units (GRU) to learn the purpose and mechanism vectors for product descriptions. However, since the paper came out there have been great advances in the field of natural language processing tasks because of BERT: Bidirectional Encoder Representations from Transformers. BERT has achieved a state of the art results for many natural language tasks like question answering, natural language understanding, search, and retrieval, etc.. I’m curious to know how BERT would affect the results of this current system. Would we still need crowd workers for analogy detection or would using BERT alone for analogy computation suffice? One of the limitations of RNN is that it is directional, i.e., it can either read from right to left or left to right, or both. BERT is essentially non-directional, i.e, it takes all the words as inputs at once and hence, can compute the complex non-linear relationships between them. This would definitely prove helpful for detecting analogies. the approach taken by the authors by using TF-IDF did result in diversity but did not take into account the relevance. Also, the purpose and mechanism vector by the authors did not distinguish between high and low-level features. These learned vectors also did not take into account the intra-dependencies between different purposes and mechanisms or the inter-dependency between various purposes and mechanisms. It would be interesting to observe how these dependencies could be encoded and whether they would benefit the final tasks of analogy computation. Another aspect that can be looked into is the trade-off between generating useful vector and its relation with the creativity of the crowd-workers. Does creativity increase, decrease, or remain the same?

Questions:
1. Which another creative task can benefit from human-AI interaction?
2. Why is the task of analogy computation important?
3. How are you incorporating and leveraging the strength of crowd-workers and machine learning in your project?

04/29/2020 – Akshita Jha – DiscoverySpace: Suggesting Actions in Complex Software

April 28, 2020April 28, 2020 Akshita Jha 2 Comments

Summary:
“DiscoverySpace: Suggesting Actions in Complex Software” by Fraser et. al. talks about complex software and ways novice users can navigate these complex systems. “DiscoverySpace is a prototype extension panel for Adobe Photoshop that suggests task-level action macros to apply to photographs based on visual features.” The authors find out that the actions suggested by DiscoverSpace help novices maintain confidence, accomplish tasks, and discover new features. The work highlights how user generated content can be leveraged by interface designers to help new and inexperienced navigate a complex system. There are several problems that a beginner might face when trying to access new complex systems: (i) the novice user might not be familiar with the technical jargon used by the software, (ii) the online tutorials might be difficult to follow and assume a certain amount of background knowledge, (iii) there can be several ways to accomplish the same task and the user might get overwhelmed and confused when getting to know about them. The paper presents DiscoverySpace which is a prototype action suggestion software to help beginners get started without feeling overwhelmed.

Reflections:
This is an interesting paper as it talks about the methodology that can be adopted by software designers to build a system that aids novice users, instead of overwhelming them. This reminds me of the popular ‘cold start’ problem in computational modeling. The term essentially refers to the problem when computers do not have enough information to model the desired human behavior. This is due to the lack of initial user interactions. The authors try to mitigate this problem in DiscoverySpace by conducting a survey to help identify and narrow down the kind of help novices need. It was an interesting find that participants who used the web to look for help achieved their results less often. I would have expected it to be the other way round. The authors suggest that the participants failed to find the best way to accomplish the task and Google does not always help find the best results. One of the limitations of the study was that the task was open-ended. If the task were more directed, the results would have led to better findings. Also, self-reporting expertise on a task might not be the most reliable way to assess the user as a novice or an expert. Another thing to note here is that all the participants had some kind of domain knowledge, either through the basic principles of photography or through simpler photo-editing software. It would be interesting to see how the results pan out for users from a different field. I was also wondering if the design goals presented by the authors are too generic. This can be a good thing as it allows other systems to take these goals into consideration but it might also prove harmful as it might limit the capability of DiscoverySpace by not taking into account the specific design goals that this particular system might benefit from.

Questions:
1. Did you agree with the methodology of the paper?
2. Which design goal do you think would apply to you?
3. Can you think of any other software that is complex enough to require design interventions?
5. How are you incorporating creativity into your project?

04/22/2020 – Akshita Jha – Opportunities for Automating Email Processing: A Need-Finding Study

April 21, 2020 Akshita Jha 1 Comment

Summary:
“Opportunities for Automating Email Processing: A Need-Finding Study” by Park et. al. is an interesting paper as it talks about the need to manage emails. Managing emails is a time-consuming task that takes significant effort both from the consumer and the recipient. The authors find out that some of the work can be automated. The authors performed a mixed-methods need-finding study in order to essentially understand and answer two important questions: (i) What kind of automatic email handling do users want? (ii) What kind of information and computation is needed to support that automation? The authors conduct an investigation including a design workshop and a survey to identify the categories of needs and thoroughly understand these needs. The authors also surveyed the existing automated email classification systems to understand which needs have been addressed and where the gaps are in fulfilling these needs. The work done highlights the need for: “(i) a richer data model for rules, (ii) more ways to manage attention, (iii) leveraging internal and external email context, (iv) complex processing such as response aggregation, and affordances for senders.” The authors also ran a small authorized script over a user’s inbox which demonstrated that the above needs can not be fulfilled by existing email clients. This can be used as a motivation for new design interventions in email clients.

Reflections:
This is an interesting work that has the potential to pave the way for new design interventions in email processing and email management. However, there are certain limitations of this work. Out of the three studies that the authors conducted, two of them were explicitly focused on programmers. The third study focused on an engineer. This brings into question the generalizability of the experiments conducted. The needs of diverse users may wary and the results might not hold. Also, the questions the authors asked in the survey were influenced by the design workshop conducted by the authors which in turn influenced the analysis of the needs. The results might not hold true for all kinds of participants. The authors also could not quantify the percentage of need that is not being met. Also, asking programmers to be a part of the study did not help as they have the skills to write their own code and fulfill their own needs. The GUI needed by non-programmers might differ from those needed by the programmers. The authors should also seek insight from prior tools to build and improve upon their system.

Questions:
1. What are your thoughts on the paper? How do you plan to use the concepts present in the paper in your project?
2. Would you want an email client that manages your attention? Do you think that would be helpful?
3. How difficult is it for machine learning models to understand the internal and external context of an email?
4. Which email client do you use? What are the limitations of that email client?
5. Do you think writing simple rules for email management is too simplistic?

04/22/2020 – Akshita Jha – The Knowledge Accelerator: Big Picture Thinking in Small Pieces

April 20, 2020 Akshita Jha 1 Comment

Summary:
“The Knowledge Accelerator: Big Picture Thinking in Small Pieces” by Hahn et. al. talks about interdependent pieces of crowdsourcing. Most of the crowdsourcing tasks involve relying on a small number of humans to complete all the tasks required to complete a big picture. For example, most of the work in Wikipedia is done by a small number of highly invested editors. The authors bring up the idea of using a computational system such that each individual sees only a small part of the whole. This is a difficult task as much of the real-world tasks cannot be broken down into small, independent units and hence, the Amazon Mechanical Turk (AMT) cannot be used efficiently as it is used for prototyping. Also, it is a challenging problem to maintain the coherency of the overall system while breaking down the big task into smaller and mutually independent chunks for the crowd workers to work on. Moreover, the quality of the work being done is also dependent on the division of the tasks into coherent chunks. The authors present their idea which mitigates the need for a big picture view by a small number of workers, by ensuring small contributions by the individuals who see only a small chunk of the whole.

Reflections:
This is an interesting work as it talks about the advantages and the limitations of breaking down big tasks into small pieces. They built a prototype system called “Knowledge Accelerator” which was constrained such that no single task would amount for more than $1. Although the authors used this metric for tasks division, I’m not sure if this is a good enough metric to judge the independence and the quality of the small task. Also, the authors mention that the system should not be seen as a replacement for expert creation and curation of content. I disagree with this as I feel that with some modifications to the system, the system has the potential and might be able to completely replace humans for this task in the future. As is, the system has some gaping issues. The absence of a nuanced structure in the digests is problematic. It might also help to include iterations in the system after the workers have completed a part of their tasks and require more information. Finally, the authors would benefit by taking into account the cost of producing these answers on a large scale. The authors could use a computational model to dynamically decide how many workers and products to use at each stage such that the overall cost is minimized. The authors can also check if some of the answers could be reused across questions and across users. Incorporating contextual information can also help improve the system significantly.

Questions:
1. What are your thoughts on the paper?
2. How do you plan to use the concepts present in the paper in your project?
3. Are you dividing your tasks into small chunks such that crowd workers only see a part of the whole?

04/15/20 – Akshita Jha – Disinformation as Collaborative Work: Surfacing the Participatory Nature of Strategic Information Operations

April 15, 2020 Akshita Jha 1 Comment

Summary:
“Disinformation as Collaborative Work: Surfacing the Participatory Nature of Strategic Information Operations” by Starbird et. al. talks about strategic information operations like disinformation, political propaganda, conspiracy theories, etc.. They gather valuable insights about the functioning of these organizations by studying the online discourse in-depth, both qualitatively and quantitatively. The authors present three case studies: (i) Trolling Operations by the Internet Research Agency Targeting U.S. Political Discourse (2015-2016), (ii) The Disinformation Campaign Targeting the White Helmets, and (iii) The Online Ecosystem Supporting Conspiracy Theorizing about Crisis Events. These three case studies highlight the coordinated effort of several organizations to spread misinformation and influence the political discourse of the nation. Through these three case studies, the authors attempt to go beyond understanding online bots and trolls to move towards providing a more nuanced and descriptive perspective of these co-ordinated destructive online operations. This work also successfully highlights the challenging problem for “researchers, platform designers, and policy-makers — distinguishing between orchestrated, explicitly coordinated, information operations and the emergent, organic behaviors of an online crowd.”

Reflections:
This is an interesting work that talks about misinformation and the orchestrated effort that goes behind spreading it. I found the overall methodology adopted by the researchers particularly interesting. The authors use qualitative, quantitative and visual techniques to effectively demonstrate the spread of misinformation from the actors (twitter accounts and websites that initiate the discussion) to the target audience (the accounts that retweet and are connected to these actors either directly or indirectly). For example, the case study talking about the Internet Research Agency Targeting U.S. Political Discourse that greatly influenced the 2016 elections, highlighted the pervasiveness of the Russian IRA agents using network analysis and visual techniques. The authors noted that the “fake” accounts influenced both sides: left-leaning accounts criticized and demotivated the support for the U.S. presidential candidate, Hillary Clinton while promoting the now president, Donald Trump, on the right. Similarly, these fake Russian accounts were active on both sides of the discourse for the #BlackLivesMatter movement. It is commendable that the authors were able to successfully uncover the hidden objective of these misinformation campaigns and observe how these accounts presented themselves as both people and organizations in order to embed themselves in the narrative. The authors also mention that they make use of trace ethnography to track down the activities of the fake accounts. I was reminded of another work, “The Work of Sustaining Order in Wikipedia: The Banning of a Vandal”, that also made of trace ethnography to narrow down the rogue user. It would be interesting to read about a work where trace ethnography was used to track down a “good user”. I would have liked if the paper went into the details of their quantitative analysis and the exact methodology they adopted for their network analysis. I’m also curious to know if the accounts were cherry-picked to show the ones with the most destructive influence or the resulting graph we see in the paper covers all possible accounts. It would have helped if the authors had spoken about the limitations of their work and their own biases that might have had some influence on the results.

Questions:
1. What are your general thoughts on the paper?
2. Do you think machine learning algorithms can help in such a scenario? If yes, what role will they play?
3. Have you ever interacted with an online social media bot? What has that been like?

04/15/20 – Akshita Jha – Believe It or Not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

April 14, 2020April 15, 2020 Akshita Jha Leave a comment

Summary:
This paper discusses the issue of fact-checking, i.e., the estimation of the credibility of a given statement, which is extremely pertinent in today’s climate. With the generation and exploration of information becoming increasingly simpler, the task of judging the trustworthiness of the found information is becoming increasingly challenging. Researchers have attempted to address this issue with a plethora of AI based fact-checking tools. However, a majority of these are based on neural network models that provide no description, whatsoever, of how they have arrived at a particular verdict. It has not yet been addressed how this lack of transparency, in addition to a poor user interface, affects the usability or even, the perceived authenticity of such tools. This tends to breed mistrust in the minds of the user towards the tool, which ends up being counter-productive to the original goal. This paper tackles this important issue head-on by designing a transparent system that mixes the scope and scalability of an AI system with inputs from the user. While this transparency increases the user’s trust in the system, the user’s input also improves the predictive accuracy of the system. On a very important note, there is an additional side-benefit that this user interaction addresses the deeper issue of information illiteracy in our society, by cultivating the skill of questioning the veracity of a claim and the reliability of a source. The researchers build a model that uses NLP tools to aggregate and assess various articles related to a certain statement and assign a stance to each of the articles. The user can modify the weights associated with each of these stances and influence the verdict generated by the model on the veracity of the claim. They, further, conduct three studies, wherein they judge the usability, the effectiveness, and flaws of this model. They compare the assessments of multiple statements by the participants, before and after being exposed to the model’s prediction. They also verify if the interaction with the model provides additional support as compared to simply displaying the result. A third task estimates if the gamification of this task has any effects. While the third task seems inconclusive, the first two tasks lead the researchers to conclude that interaction with the system causes an increase in the user’s trust in the model’s results, albeit even in the case of a false prediction of the model. However, this interaction also helps improve the prediction of the model, for the claims tested herein.

Reflections:
The paper brings up an interesting point about the transparency of the model. When people talk about an AI system normally a binary system comes to the mind, which takes in a statement, assesses it in a black box, and returns a binary answer. What is interesting is that the ability to interact with the predictions of the models, enables the user to improve their own judgment and even compensate for the lacking of the model. The aspect of user interaction aspect in AI systems has been grossly overlooked. While there are some clear undeniable benefits of this model, there is a very dangerous issue that this human modifiable fact-checking could lead to. Using the slider to modify the reputation of a given source, can potentially lead to user’s inducing their own biases into the system, and effectively creating echo chambers of their own views. This could nefariously impact the verdict of the ML system and thus reinforcing the possible prejudices of the user. I would suggest that the model should assign x% weightage to its own assessment of the source and (100-x)% to the user’s assessment. This would be a step in ensuring that the user’s prejudices do not suppress the model’s judgment completely. However, without doubt, the advantage of this interaction, inadvertently, helping a user learn how to tackle misinformation and check the reputation of sources, is highly laudable. This is definitely worth considering in future models along these lines. From the human perspective, the bayesian or linear approaches adopted by these models make them very intuitive to understand. However, one must not underestimate the effectiveness of neural networks in being more powerful in aggregating relevant information and assessing its quality. A simple linear approach is bound to have its fallacies, and hence, it would be interesting to see a model with uses the power of neural networks in addition to these techniques with help with the transparency aspect. On a side note, it would have been useful to have more information on the NLP and ML methods used. The related work regarding these models is also insufficient to provide a clear background about the existing techniques. One glaring issue with the paper is their circumventing of the insignificance of their result in task #2. They mention that the p-value is just below the threshold. However, statistics teaches us that the exact value is not of importance, it’s the threshold that is set before conducting the experiment that matters. Thus, the statement “..slightly larger than the 0.05..” is simply careless.

Questions:
1. Why is there no control group in task 2 and task 3 as well?
2. What are your general thoughts on the paper? Do you approve of the methodology?

4/8/20 – Akshita Jha – Agency plus automation: Designing artificial intelligence into interactive systems

April 8, 2020 Akshita Jha 2 Comments

Summary:
“Agency plus automation: Designing artificial intelligence into interactive systems” by Heer talks about the drawback of using artificial intelligence techniques for automating tasks, especially the ones that are considered repetitive and monotonous. However, this presents a monumentally optimistic point of view by completely ignoring the ghost work or the invisible labor that goes into making ‘automating’ these tasks. This gap between crowd work and machine automation highlights the need for design and engineering interventions. The authors of this paper try to make use of the complementary nature strengths and weaknesses of the two – creativity, intelligence, world-knowledge of the crowd workers and the cheap and no cognitive overload provided by automated systems. The authors describe in detail the case studies of interactive systems in three different areas – data wrangling, exploratory analysis, and natural language translation. These systems combine computational support with interactive systems. The authors also talk about sharing representations of tasks to include both human intelligence and automated support in the design itself. The authors conclude that “neither automated suggestions nor direct manipulation plays a strictly dominant role” and ” a fluent interleaving of both modalities can enable more productive, yet flexible, work.”

Reflections:
There is a lot of invisible work that goes into automating a task. Most automated tasks require hundreds, if not thousands, of annotations. Machine learning researchers turn a blind eye to all the effort that goes into annotations by calling their systems ‘fully automated’. This view is exclusionary and does not do justice to the vital but seemingly trivial work done by the crowd workers. One of the areas that one can focus on is the open question of shared representation – Is it possible to integrate data representation with human intelligence? If yes, is that useful? Data representation often involves the construction of latent space to reduce the dimensionality of input data and get concise and meaningful information. There may or may not be such representations exist for human intelligence. Maybe borrowing from social psychology might help in such a scenario. There can be other ways to go around this. For example, the authors focus on building interactive systems with ‘collaborative’ interfaces. The three interaction models: Wrangler, Voyager, and PTM do not distribute the tasks equally between humans and automated systems. The automated methods prompt the users with different suggestions which the end user reviews. The final decision making power lies with the end user. It would be interesting to see what would the results looks like if the roles were reversed and the system was turned on its head. An interesting case study could be if the suggestion was given by the end user and the ultimate decision making capability rested with the system. Would the system still be as collaborative? What would the drawbacks of such systems be?

Questions:

1. What are your general thoughts on the paper?
2. What did you think about the case studies? Which other case studies would you include?
3. What are your thoughts on evaluating systems with shared representations? Which evaluation criteria can we use?

4/8/20 – Akshita Jha – CrowdScape: Interactively Visualizing User Behavior and Output

April 8, 2020 Akshita Jha 1 Comment

Summary:
“CrowdScape: Interactively Visualizing User Behavior and Output” by Rzeszotarski and Kittur talks about crowdsourcing and the importance of interactive visualization using the complementary strengths and weaknesses of crowd workers and machine intelligence. Crowd sourcing helps work distribution. Quality control approaches for this are often not scalable. Crowd organizing algorithms like Partition-Map-Reduce, Find-Fix-Verify, and Price-Divide-Solve are used for easy distribution, merging and checking the work in crowd sourcing. However, they aren’t very accurate or useful in complex subjective tasks. CrowdScape assimilates worker behavior with worker input using interaction, visualization, and machine learning. This supports the human evaluation of crowd work. CrowdScape enables the user to hypothesize about and test the crowd to distill the selections by using a sensemaking loop. This paper proposes novel techniques for crowd worker’s product exploration and visualizations for crowd worker behavior. It also provides tools for classification or crowd workers and an interface for interactive exploration of these results using mixed-method machine learning.

Reflections:
There has been work done involving crowd behaviour centered on worker behaviour or worker output in isolation but combining them is very fruitful to generate mental models of the workers and build a feedback loop. Visualisation of the workers’ process helps us understand their cognitive process and thus perceive the end product better. CrowdScape can only be used in webpages online that allow the injection of JavaScript. It is not useful when this is blocked or for non-web offline interfaces. The set of aggregate features used might not always provide useful feedback. The already existing quality control measures are not very different from CrowdScape in case of clear, consensus ground truth exists, such as identifying a spelling error. In such cases, the effort put in learning and using CrowdSpace may not always be beneficial and hence may not be too advantageous. In some cases, the behavioral traces of the worker may not be very indicative. Such as when they work on a different editor and finally copy and paste the work in another one. Tasks that are heavily cognitive or totally offline are also not very compliant with the general methods supported by CrowdScape. This system heavily relies on the detailed level of behavioral traces such as mouse movement, scrolling, keypresses, focus events, and clicks. It should be ensured that this intrusiveness and the implied decrease in efficiency should be countered by the accuracy of the measurement of the behavior. An interesting point to note here is that this tool can become privacy-intrusive if care is not taken. We should ensure that changes are made to the tool as crowd work becomes increasingly relevant and the tool becomes vital to better understand the underlying data and crowd behaviour. Apart from these reflections, I would just like to point out that the graphs that the authors use in the paper help in conveying their results really well. I feel this is one detail that is vital but easily overlooked in most papers.

Questions:
1. What are your general thoughts about this paper?
2. Do you agree with the methodology followed?
3. Do you approve of the interface? Would you make any changes to the interface?

03/24/2020 – Akshita Jha – “Like Having a Really bad PA”: The Gulf between User Expectation and Experience of Conversational Agents

March 24, 2020 Akshita Jha 2 Comments

Summary:
“Like Having a Really bad PA: The Gulf between User Expectation and Experience of Conversational Agents” by Luger and Sellen talks about conversational agents and the gap between user expectation and the response given by the conversational agent. Conversational agents have been on the rise for quite some time now. All the big and well-known companies like Apple, Microsoft, Google, IBM, etc. have their own proprietary conversational agents. The authors report the findings of interviews with 14 end-users in order to understand the interactional factors affecting everyday use. The findings show that the end-users use conversational agents: (i) as a form of play, (ii) for a hands-free approach, (iii) for formal queries, and (iv) for simple tasks. The authors use “Norman’s ‘gulfs of execution and evaluation’ and infer the possible implications of their findings for the design of future systems.” The authors found that in the majority of instances the conversational agent was unable to fill the gap between user expectation and how the agent actually operates. It was also found that incorporating playful triggers and trigger responses in the systems increased human engagement.

Reflection:
This is an interesting work as it talks about the gap between user expectation and the system behavior, especially in the context of conversational agents. The researchers confirm that there is a “gulf” between the expectation and the reality and the end-users continually overestimate the amount of demonstratable intelligence that the system possesses. The authors also emphasized on the importance of the design and interactability of these conversational agents to make them better suited for engaging users. The users expect the chatbot to converse like humans but in reality, AI is far from it. The authors suggest considering ways to (a) to reveal system intelligence (b) to change the interactability to reflect the system’s capability (c) reign in the promises made by the scientists (d) revamp the system feedback given. The limitations of the study are that the sample is the male population from the UK. The findings presented in the paper, therefore, might be skewed. The primary use case for a conversational agent, not surprisingly, was ‘hands-free’ usage for saving time. However, if the conversational agent results in an error, the process becomes more cumbersome and time-consuming than originally typing in the query. The user tolerance in such cases might be low and lead to distrust which can negatively affect the feedback the conversational agents receive. The authors also talk about the different approaches that end-users take to interact with Google Now vs Siri. It would be interesting to see how user behavior changes with different conversational agents.

Questions:
1. What are your views about conversational agents?
2. Which conversational agent do you think performs the best? Why?
3. As a computer scientist, what can you do to make the end-users more aware of the limitations of conversational agents?
4. How can be best incorporate feedback into the system?
5. Do you think using multimodal representations of intelligence is the way forward? What challenges do you see in using such a form of representation?

03/24/2020 – Akshita Jha – All Work and No Play? Conversations with a Question-and-Answer Chatbot in the Wild

March 24, 2020 Akshita Jha 4 Comments

Summary:
“All Work and No Play? Conversations with a Question-and-Answer Chatbot in the Wild” by Liao et. al. talks about conversational agents and their interactions with the end-users. The end-user of a conversation agent might want something more than just information from these chatbots. Some of these can be playful conversations. The authors study a field deployment of human resource chatbot and discuss the interest areas of users with respect to the chatbot. The authors also present a methodology involving statistical modeling to infer user satisfaction from the conversations. This feedback from the user can be used to enrich conversational agents and make them better interact with the end-user in order to guarantee user satisfaction. The authors primarily discuss 2 research questions: (i) What kind of conversational interactions did the user have with the conversational agent in the wild, (ii) What kind of signals given by the user to the conversational agents can be used to study human satisfaction and engagement. The findings show that the main areas of conversations include “feedback-giving, playful chit-chat, system inquiry, and habitual communicative utterances.” The authors also discuss various functions of conversational agents, design implications, and the need for adaptive conversational agents.

Reflection:
This is a very interesting paper because it talks about the surprising dearth of research in the gap between user interactions in the lab and those in the wild. It highlights the differences between the two scenarios and the varying degree of expectations that the end-user might have while interacting with a conversational agent. The authors also mention how the conversation is almost always initiated by the conversational agent which might not be the best scenario depending upon the situation. The authors also raise an interesting point where the conversational agent mostly functions as a question answering system. This is far from ideal and prevents the user from having an organic conversation. To drive home this point further, the authors compare and contrast the signals of an informal playful conversation with that of a functional conversation in order to provide a meaningful and nuanced understanding of user behavior that can be incorporated by the chatbot. The authors also mention that the results were based on survey data which was done in a workplace environment and do not claim generalization. The authors also study only work professionals and the results might not hold for a population from a different age group. An interesting point here is that the users strive for human-like conversations. This got me thinking if this a realistic goal to strive for? What would the research direction look like if we modified our expectations and treated the conversational agent as an independent entity? It might help to not evaluate the conversational agents with human-level conversation skills.

Questions:
1. Have you interacted with a chatbot? What has your experience been like?
2. Which feature do you think should be a must and should be incorporated in the chatbot?
3. Is it a realistic goal to strive for human-like conversations? Why is that so important?