03/04/2020 – Sushmethaa Muhundan – Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-to-Language Technology for the Blind

The popularity of social media has increased exponentially over the past few decades and with this comes a wave of image content that is flooding social media. Amidst this growing popularity, people who are blind or visually impaired (BIV) often find it extremely difficult to understand such content. Although existing solutions offer limited capabilities to caption images and provide alternative text, these are often insufficient and have a negative impact on the experience of BIV users if inaccurate. This paper aims to provide a better platform to improve the experience of BIV users by combining crowd input with existing automated captioning approaches. As part of the experiments, numerous workflows with varying degrees of human involvement and automated systems involvement were designed and evaluated. The four frameworks that were introduced as part of this study include a fully-automated captioning workflow, a human-corrected captioning workflow, a conversational assistant workflow, and a structured Q&A workflow. It was observed that though the workflows involving humans in the loop was time-consuming, it increased the user’s satisfaction by providing accurate descriptions of the images.

Throughout the paper, I really liked the focus on improving the experience of blind or visually impaired users while using social media and ensuring that accurate image description is provided so that the BIV users understand the context. The paper explores innovative means of leveraging humans in the loop to solve this pervasive issue.

Also, the particular platform being targeted here is social media which comes with its own challenges. Social media is a setting where the context and emotions of the images are as important as the image description itself to provide the BIV users sufficient information to understand the post. Another aspect that I found interesting was the focus on scalability which is extremely important in a social media setting.

I found the concepts of TweetTalk conversational workflow and the Structured Q&A workflow interesting as they proved a mixed approach by involving humans in the loop whenever necessary. The intent of the conversational workflow is to understand the aspects that make a caption valuable to a BIV user. I felt that this fundamental understanding is extremely essential to build further systems that ensure user satisfaction.

It was good to see that the sample tweets were chosen based on broad areas of topics that represented the various interests reported by blind users. An interesting insight that came out of the study was that no captions were preferred to inaccurate captions to avoid the cost of recovery from misinterpretation based on an inaccurate caption.

  1. Despite being validated by 7 BIV people, the study largely involved simulating a BIV user’s behavior. Do the observations hold good for scenarios with actual BIV users or is the problem not captured via these simulations?
  2. Apart from the two new workflows used in this paper, what are some other techniques that can be used to improve the captioning of the images on social media that captures the essence of the post?
  3. Besides social media, what other applications or platforms have similar drawbacks from the perspective of BIV users? Can the workflows that were introduced in this paper be used to solve those problems as well?

Read More

03/04/2020 – Sushmethaa Muhundan – Pull the Plug? Predicting If Computers or Humans Should Segment Images

The paper proposes a resource allocation framework that intelligently distributes work between a human and an AI system in the context of foreground object segmentation. The advantages of using a mix of both humans and AI rather than either of them alone is demonstrated via the study conducted. The goal is to ensure that high-quality object segmentation results are produced while using considerably less human efforts involved. Two systems are implemented as part of this paper that automatically decide when to transfer control from the human to the AI component and vice versa, depending on the quality of segmentation encountered at each phase. The first system eliminates the need for human annotation effort by replacing human efforts with computers to generate coarse object segmentation which is refined by segmentation tools. The second system predicts the quality of the annotations and automatically identifies a subset of them that needs to be re-annotated by humans. Three diverse datasets were used to train and validate the system and these include datasets representing visible, phase contrast microscopy, and fluorescence microscopy images.

The paper explores leveraging the complementary strengths of humans and AI and allocates resources accordingly in order to reduce human involvement. I particularly liked the focus on quality throughout the paper. This particular system that employs a mixed approach mechanism ensures that the quality of the traditional systems which relied heavily on human involvement is met. The resultant system was successfully able to reduce significant hours of human effort and also maintain the quality of the resultant foreground object segmentation of images which is great.

Another aspect of the paper that I found impressive was the conscious effort to develop a single prediction model that is applicable across different domains. Three diverse datasets were employed as part of this initiative. The paper talks about the disadvantages of other systems that do not work well on multiple datasets. In such cases, only a domain expert or computer vision expert would be able to predict when the system would succeed. This paper claims that this is altogether avoided in this system. Also, the decision to intentionally include humans only once per image is good as opposed to the existing system where human effort is required multiple times during the initial segmentation phase of each image.

  1. This paper primarily focuses on reducing human involvement in the context of foreground object segmentation. What other applications can extend the principles of this system to achieve reduced involvement of humans in the loop while ensuring that quality is not affected?
  2. The system deals with predicting the quality of image segmentation outputs and involves the human to re-annotate only the lowest quality ones. What other ideas can be employed to ensure reduced human efforts in such a system?
  3. The paper implies that the system proposed can be applied across images from multiple domains. Were the three datasets described varied enough to ensure that this is a generalized solution?

Read More

02/26/2020 – Sushmethaa Muhundan – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

The paper explores how people make fairness judgments of ML systems and the impact that different explanations can have on these fairness judgments. The paper also explores how providing personalized and adaptive explanations can support such fairness judgments of ML systems. It is extremely important to ensure algorithm fairness and there is a need to consciously work towards avoiding the risk of amplifying existing biases. In this context, providing explanations can be beneficial in two aspects, not only do they help in providing implementation details which would otherwise be a “black box” to a user, but they also facilitate better human-in-the-loop experiences by enabling people to identify fairness issues. The COMPAS recidivism data was utilized for the study and four different explanations styles were examined: input-influence based, demographic-based, sensitivity-based, and case-based. Through the study, it is highlighted that there is no one-size-fits-all solution for an effective explanation. The dataset, context, kinds of fairness issues, and user profiles vary and need to be addressed individually. The paper proposes providing hybrid explanations as a solution to address this problem thereby providing both an overview of the ML model and information about specific cases to help aid accurate fairness judgment.

While there has been a lot of research focus on developing non-discriminatory ML algorithms, this paper specifically deals with the human aspect which is necessary to identify and remedy fairness issues. I feel that this is equally important and is often overlooked. It was interesting to note that they auto-generated the explanations, unlike previous studies. 

With respect to the different explanation styles used, I found the sensitivity-based explanation particularly interesting since it clearly shows the difference in the prediction result if certain attributes were modified. According to me, this form of explanation, out of the four proposed, is extremely effective in bringing out any bias that may be present in the ML system.

I felt that the input-influence based explanation was also effective since it had the +/- markers corresponding to features that match the particular case and this gives the users a clearer picture of which attributes specifically influenced the result thereby providing the implementation details to a certain extent.

The study results documents various insights from participants, and I found some of them to be extremely fascinating. While some believed that certain predictions were biased, others found it normal for that verdict to be predicted. It truly captured the diversity in opinions and perspectives of the same ML system based on the different explanations provided.

  1. Through this study, it is revealed that the perception of bias is not uniform and is extremely subjective. Given this lack of agreement on the definition of moral concepts, how can a truly unbiased ML system be achieved?
  2. What are some practices that can be followed by ML model developers to ensure that the bias in the input dataset is identified and removed?
  3. Apart from gender-bias and ethnic-bias, what are some other prevalent biases in existing ML systems that need to be eradicated?

Read More

02/26/2020 – Sushmethaa Muhundan – Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

The perception and acceptance of AI systems are impacted by the expectations that the users have on that system as well as their prior experiences working with AI systems. Hence, expectation setting before interacting with the system is extremely pivotal to avoid any inflated expectations which in turn could lead to disappointment if they are not met. A Scheduling Assistant system has been used as an example in this paper and expectation adjustment techniques are discussed. The paper focusses on exploring methods to shape the user’s expectation before they use the system and study the impacts on the acceptance of the system by the user. Apart from this, the impact of different AI imperfections is also studied, specifically the cases of false positives vs false negatives. Accuracy indicator, example-based explanation and performance control are the three techniques proposed and evaluated. Via the studies conducted, it is concluded that a better expectation setting done before using a system decreases the chances of disappointment by highlighting the flaws of the system beforehand.

The study conducted assumes that the users are new to the environment and dedicates time explaining the interface at the initial stage of the experiment. I felt that this was helpful since the people involved in the survey can now follow along. I found this missing in some of the earlier papers read where it was assumed that all the readers had sufficient prior knowledge to follow along. Also, despite the fact that the initial performance of the system was ninety-three percent on the test dataset, in order to gauge the sentiments of the users and evaluate their expectation setting hypothesis, the authors decided to set the accuracy to fifty percent. I felt that this greatly improved the scope for disappointment, thereby helping them efficiently validate their expectation setting system and its effects. I felt that the decision to use visualizations as well as a short summary of the intent in their explanation was helpful since this eradicated the need for the users to read lengthy summaries and would offer better support for user decision. It was also good to note the authors take on deception and marketing as a means to set false expectations. This study went beyond such techniques and focused on shaping the expectations of the people via explaining the accuracy of the system. I felt that this perspective was more ethical compared to the other means adopted in this area.

  1. Apart from the expectations that users have, what other factors influence the perception and acceptance of AI systems by the users?
  2. What are some other techniques, visual or otherwise that can be adopted to set expectations of AI systems?
  3. How can the AI system developers tackle trust issues and acceptance issues? Given that perceptions and individual experiences are extremely diverse, is it possible for an AI system to be capable of completely satisfying all its users?

Read More

02/19/2020 – Updates in Human-AI teams: Understanding and Addressing the Performance/Compatibility Tradeoff – Sushmethaa Muhundan

The paper studies human-AI teams in decision-making settings specifically focusing on updates made to the AI component and its subsequent influence on the decision-making process of the human. In an AI-advised human decision-making interaction, the AI system recommends actions to the human. Based on this recommendation, their past experience as well as domain knowledge, the human takes an informed decision. They can choose to go ahead with the action recommended by the AI or they can choose to disregard the recommendation. During their course of interaction with AI systems, humans develop a mental model of the system. This is developed based on mapping scenarios where the AI’s decision was correct versus when they were incorrect by means of rewards and feedback provided to the humans by the system. As part of the experiment, studies were conducted to establish relationships between updates to AI systems and team performance. User behavior was monitored using a custom platform, CAJA, built to gain insights about the influence of updates to AI models on the user’s mental model and consequently team performance. Consistency metrics were introduced and several real-world domains were analyzed including recidivism prediction, in-hospital mortality prediction, and credit risk assessment. 

It was extremely surprising to note that updates to the AI’s performance that makes it better actually may hurt the team performance. My initial instinct was that with an increase in the AI’s performance, the team performance would increase proportionally but this is not always the case. In certain cases, despite there being an increase in the AI’s performance, the new results might not be consistent with the human’s mental model and as a result, incorrect decisions are taken based on past interactions with the AI and hence the overall team performance decreases. An interesting and relatable parallel is drawn to concepts of backward compatibility in software engineering with respect to updates. The concept of compatibility is introduced using this analogy to describe the ideal scenario where updates to the AI does not introduce further errors.

The platform developed to conduct the studies, CAJA, was an innovative way to overcome the challenges of testing in real-world settings. This platform abstract away the specifics of problem-solving by presenting a range of problems that distills the essence of mental models and trust in one’s AI teammate. It was very interesting to note that these problems were designed such that no human could be a task expert thereby maximizing the importance of mental models and their influence in decision making.

  • What are some efficient means to share the summary of AI updates in a concise, human-readable form that captures the essence of the update along with the reason for the change?
  • What are some innovative ideas that can be used to reduce the cost incurred by re-training humans in an AI-advised human decision-making ecosystem?
  • How can developers be made more aware of the consequences of the updates made to the AI model on team performance? Would increasing awareness help improve team performance?

Read More

02/19/2020 – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal – Sushmethaa Muhundan

The paper takes about the counter-vandalism process in Wikipedia focussing on both the human efforts as well as the silent non-human efforts put in. Fully-automated anti-vandalism bots are a key part of this process and play a critical role in managing the content on Wikipedia. The actors involved range from being fully autonomous software to semi-automated programs to user interfaces used by humans. A case study is presented which is an account of detecting and banning a vandal. This aims to highlight the importance and impact of bots and assisted editing programs. Vandalism-reverting software use queuing algorithms teamed with a ranking mechanism based on vandalism-identification algorithms. The queuing algorithm takes into account multiple factors like the kind of user who made the edit, revert history of the user as well as the type of edit made. The software proves to be extremely effective in presenting prospective vandals to the reviewers. User talk pages are forums utilized to take action after an offense has been reverted. This largely invisible infrastructure has been extremely critical in insulating Wikipedia from vandals, spammers, and other malevolent editors. 

I feel that the case study presented helps understand the internal working of vandalism-reverting software and it is a great example of handling a problem by leveraging the complementary strengths of AI and humans via technology. It is interesting to note that the cognitive work of identifying a vandal is distributed across a heterogeneous network and is unified using technology! This lends speed and efficiency and makes the entire system robust. I found it particularly interesting that ClueBot, after identifying a vandal, immediately reverted the edit within seconds. This edit did not have a wait in a queue for a human or a non-human bot to review but was resolved immediately using a bot.

A pivotal feature of this ecosystem that I found very fascinating was the fact that domain expertise or skill is not required to handle such vandal cases. The only expertise required of vandal fighters is in the use of the assisted editing tools themselves, and the kinds of commonsensical judgment those tools enable. This widens the eligibility criteria for prospective workers since specialized domain experts are not required.

  • The queuing algorithm takes into account multiple factors like the kind of user who made the edit, revert history of the user as well as the type of edit made. Apart from the factors mentioned in the paper, what other factors can be incorporated into the queuing algorithm to improve its efficiency?
  • What are some innovative ideas that can be used to further minimize the turnaround reaction time to a vandal in this ecosystem?
  • What other tools can be used to leverage the complementary strengths of humans and AI using technology to detect and handle vandals in an efficient manner?

Read More

02/05/2020 – Sushmethaa Muhundan – Power to the People: The Role of Humans in Interactive Machine Learning

The paper promotes the importance of studying users and having ML systems learning interactively from them. The effectiveness of such systems that take into account their users and learn from them is often better than traditional systems and this illustrated using multiple examples. The authors feel that the involvement of users would lead to better user experiences and more robust learning systems. Interactive ML systems offer more rapid, focused and incremental model updates when compared to traditional ML systems by involving the end-user to interact and drive the system towards the intended behavior. This was often restricted to skilled practitioners in the traditional ML systems and had led to delays in incorporating end-users feedback. The benefits of interactive ML systems are two-fold: not only do they help validate the system’s performance with real users, but they also help in gaining insights for future improvement. User interaction with interactive ML was studied in detail and common themes were presented in this paper. Novel interfaces for interactive ML was also discussed that aimed at leveraging human knowledge more effectively and efficiently. These involved new methods for receiving inputs as well as providing outputs which in turn gave the user more control over the learning system and made the system more transparent.

Active learning is an ML paradigm that involves the learner choosing the examples from which they learn. It was interesting to learn about the negative impacts of this paradigm which led to frustration amongst users in the setting of interactive learning. It was uncovered that the users found the stream of questions annoying. On one hand, users are wanting to get involved in such studies to better understand the ecosystem while on the other hand, certain models are getting negative feedback. Another aspect that I found interesting was that the users were open to learning about the internal workings of the system and how the feedback affected the system. The direct impact of their feedback on subsequent iterations of the model motivated them to get more involved. It was also good to note that users were willing to give detailed feedback if given a choice as opposed to just helping with classification. 

Regarding future work, I agree with the authors in that standardization of work done so far on interactive ML under different domains is required in order to avoid duplication of work by researchers in different domains. Converging on and adopting a common language is the need of the hour to help accelerate research in this space. Also, given the subjective nature of the studies explained in this paper, I feel that a comprehensive study needs to be done and a thorough round of testing involving a diverse number of people is necessary before adopting any new interface since we do not want the new interface to be counter-productive as was in several cases cited here.

  • The paper talks about the trade-off between accuracy and speed while dealing with research on user interactions with interactive machine learning due to the requirement for rapid model updates. What are some ways to handle this trade-off?
  • While interactive ML systems involve interaction with end-users, how can the expertise of skilled practitioners be leveraged and combined with these systems to make the process more effective?
  • What are some innovative methods that can be used to experiment with crowd-powered systems to investigate how crowds of people might collaboratively drive such systems?

Read More

02/05/2020 – Sushmethaa Muhundan – Principles of Mixed-Initiative User Interfaces

There has been a long-standing debate in the field of user-interface research regarding which area to focus on: building entirely automated interface agents or building tools that enhance direct manipulation by the users. This paper reviews key challenges and opportunities for building mixed-initiative user interfaces that enable users and intelligent agents to collaborate efficiently by combining the above-mentioned areas into one hybrid system that would help leverage the advantages of both the areas. The expected costs and benefits of the agent’s action are studied and compared to the costs and benefits of the agent’s inaction. A large portion of the paper deals with managing uncertainties that agents may have regarding the needs of the user. Three categories of actions are adopted according to the inferred probability the agent calculates regarding the user’s intent. No action is taken if the agent deems it correct, the agent engages the user in a dialog to understand the user’s intent if the intent is unclear or the agent goes ahead and provides the action if the inferred probability of the user wanting the action is high. The LookOut system for scheduling and meeting management, which is an automation service integrated with Microsoft Outlook, has been used as an example to explain this system.


The paper describes multiple interaction modalities and their attributes which I found interesting. These range from manual operation mode to basic automated-assistance mode to social-agent model. An alerting feature in the manual operation mode made the user aware that the agent would have taken some action at that point if it was in an automatic mode. I particularly found this interesting since this gives the users a sense of how the system reacts. If that happened to be helpful to the user, their trust level would automatically increase and as a result, the likelihood of them using the system the next time increases. I also liked the option of engaging in a dialog with the user as an option for action. During times of uncertainty, rather than guessing the intent of the user and risk damaging the trust of the user on the system, I feel that it is better to clarify the user’s intent by asking the user. It was also good to know that the automated systems analyze the user’s behavior and learn from it. The system takes into account past interactions and modifies its parameters to make a more informed decision the next time around. 

While the agent learns from the relationship established between the length of the email and the time taken by the user to respond to it, I feel that a key factor missing is the current attention span of the user. While this is hard to gauge and learn from, I feel that it plays an equally important role in determining the time taken to respond to an email thereby affecting the agent’s judgment. 

  • Does the system account for scenarios involving the absence of the user immediately after the opening of an email? Would the lack of response have a negative impact on the feedback loop of the system?
  • Apart from the current program context, the user’s sequence of actions and choice of words used in queries, what are some additional attributes that can be considered by the agent while inferring the goal of the user?
  • I found the concept of context-dependent changes to utilities extremely interesting. Have there been studies conducted that involve a comprehensive set of scenarios that can affect the utility of a system based on context?

Read More

01/29/20 – Sushmethaa Muhundan –The Future of Crowd Work

While crowd work has the potential to support a flexible workforce and leverage expertise distributed geographically, the current landscape of crowd work is often associated with negative attributes such as meager pay and lack of benefits. This paper proposes potential changes to better the entire experience in this landscape. The paper draws inputs from organizational behavior, distributed computing and feedback from workers to create a framework for future crowd work. The aim is to provide a framework that would help build a culture of crowd work that is more attractive for the requesters as well as the workers and that can support more complex, creative and highly valued work. The platform should be capable of decomposing tasks, assigning them appropriately, motivating workers and should have a structured workflow that enables a collaborative work environment. Quality assurance is also a factor that needs to be ensured. Creating career ladders, improving task design for better clarity and facilitating learning are key themes that emerged from this study. Improvements along these themes would enable create a work environment conducive for both the requesters as well as workers. Motivating the workers, creating communication channels between requesters and workers, providing feedback to workers are all means to achieve this goal.

Since the authors were requesters themselves, it was nice to see that they sought to get the perspectives of the current workers in order to take into account both the parties’ viewpoints before constructing the framework. An interesting comparison of the crowdsourcing market has been made to a loosely coupled distributed computing system and this helped build the framework by drawing an analogy to solutions developed to similar problems in the distributed computing space. I liked the importance given to feedback and learning which are components of the framework. I feel that feedback is of extreme importance when it comes to improving one’s self and this is not prevalent in the current ecosystem. As for learning, I feel that personal growth is very essential in any working environment and a focus on learning would facilitate self-improvement which in turn would help them perform subsequent tasks better. As a result, the requesters are benefitted since the crowd workers are more proficient in their work. I particularly found the concept of intertwining AIs guiding crowds and crowds guiding AIs extremely interesting. The thought of leveraging the strengths of both AI and humans to strengthen the other is intriguing and has great potential if utilized meaningfully.

  • How can we create a shift in the mindset of the current requesters who get their work done for meager pay to actually change their viewpoint and invest in the workers by giving valuable feedback/spend time ensuring the requirements are well understood?
  • What are some interesting ways that can be employed to leverage AIs guiding crowds?
  • How can we prevent the disruption of quality by a handful of malicious users who collude to agree on wrong answers to cheat the system? How can we build a framework of trust that is resistant to malicious workers and requesters who can corrupt the system?

Read More

01/22/20 – Sushmethaa Muhundan – Ghost Work

Automation is taking over the world and a vast majority of people are losing their jobs! This might not necessarily be true. As the boom of automation and AI increases, there is a huge, silent effort from humans behind making this possible. Termed ghost workers, they are the people who “help” machines become smarter and make decisions that a machine is not capable of making. The paradox of automation’s last mile refers to humans who train AI, which ultimately goes on to function entirely on its own thereby making the human redundant. The phase in which human input is required to assist AI to become smarter is where a hidden world of ghost workers work day and night for meager pay. This human labor is often intentionally hidden from the outside world.

I agree with the author in that the current smart AI systems which are required to respond within seconds often require inputs from humans to help solve issues that are too complicated for their AI “mind”.  Human discretion is almost always required to gauge sentiments correctly, identify patterns or adapt to the latest slang. These are things that the computers would not be able to decipher on their own and would require human intervention to solve real-time problems.

The consumers, however, remain until today, unaware of the fact that a human is actually involved in the transaction, behind-the-scenes. Worse, the conditions of work and pay are not widely known as well. This came as a shock to me. The current conditions could potentially lead to the isolation of ghost workers. Treating ghost workers as nothing more than a means to get a job done strips the job of any protection whatsoever and also dehumanizes the workers in the requester’s eyes. Therefore, I resonate with the author’s thoughts about bringing this to light and feel that a call for transparency is the need of the hour.

While the author is talking about the negative impacts of this form of ghost working, he also highlights why the workers prefer to do this job and why they return to the platform to search for jobs every single day. It is because of the anonymity that this platform provides by removing from the individual any attributes that might otherwise hinder their ability to seek jobs outside. Remote access is another aspect of ghost working that attracts people to engage in this work. A surprising fact I learnt from the extract is that the quality of the ghost workers often surpasses that of full-time employees (who get paid a lot more and also get benefits). Fear of losing subsequent tasks via the platform motivates these workers to raise the bars and deliver extremely high-quality work.

The nature of jobs posted is seasonal and depends on the requirements of the requesters. As mentioned in the extract, a few days bloom with requests while the rest of the days pass without a single request. Is there anything that can be done to streamline work so that the ghost workers are guaranteed work? Is there a way to provide alternate employment to the ghost workers that guarantees regular employment? This would be immensely helpful since the vast majority of ghost workers (if not all) depend on income from these sources to meet their living expenses.

Can task starvation be avoided by innovations?

While this platform indeed provides anonymity to the workers, why are the ghost workers not paid a fair amount? What can be done to ensure that a fair amount is distributed to the workers?

Read More