02/19/2020 – Sukrit Venkatagiri – In Search of the Dream Team

Paper:  Sharon Zhou, Melissa Valentine, and Michael S. Bernstein. 2018. In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18), 1–13. https://doi.org/10.1145/3173574.3173682

Summary: This paper introduces a system called DreamTeam that explores a search space for the optimal design of teams in an online setting. The system does this through multi-armed bandits with temporal constraints, a type of algorithm that manages the timing of exploration–exploitation trade-offs across multiple bandits simultaneously. This answers a classic question in HCI and CSCW: when should teams favor one approach over another? The paper contributes a computationally identifiable method of good team structures, a system that manifests this, and an evaluation with improvements of 46%. The paper concludes with a discussion of computational partners for improving group work, such as aiding us by pointing out our biases and inherent limitations, and helping us replan when the environment shifts.

Reflection:

I appreciate the way they evaluated the system and conducted randomized controlled trials for each of their experimental conditions. The evaluation is done on a collaborative intellective task, and I wonder how different the findings would be if they had evaluated it using a creative task, instead of an intellective or analytic task. Perhaps there is a different optimal “dream team” based not only on the people but the task itself. 

I also appreciate the thorough system description and how the system was integrated within Slack as opposed to having it be its own standalone system. This increases the real world generalizability of the system and also means that it is easier for others to build on top of. In addition, hiring workers in real-time would have been hard, and it’s unclear how synchronous/asynchronous the study was.

One interesting approach is considering both types of bandits simultaneously, exploration and exploitation. I wonder how the system might have fared if teams were given the choice to explore each on their own—probably worse. 

Another interesting finding is the evaluation with strangers on MTurk. I wonder if the results would have differed if it was a) in a co-located setting and/or b) among coworkers who already knew each other. 

Finally, the paper is nearly two years old, and I don’t see any follow up work evaluating this system in the wild. I wonder why or why not. Perhaps there is not much to gain through an in-the-wild evaluation, or that an in-the-wild evaluation did not fare well. Either way, it would be interesting to read about the results—good or bad.

Questions:

  1. Have you thought about building a Slack integration for your project instead of a standalone system?
  2. How might this system function differently if it were for a creative task such as movie animation?
  3. How would you evaluate such a system in the wild?

Read More

2/19/2020 – Jooyoung Whang – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

This paper introduces how bots and humans interact and collaborate to moderate thousands of wiki pages and ban vandal users. To study the use of moderator bots, the authors use a technique called trace ethnography. The technique traces the logs and records left by using automated services to give an insight into how the moderation was made using various tools. The authors explain how the tools facilitate distributed cognition and enhance teamwork among rather isolated vandal fighters. According to the paper, the set of vandal warnings is logged on the potential vandal user’s talk page which is then used to determine by feature vandal fighters how severe a warning should be given to the user. Temporary bans are made in a similar fashion, where a ban request is sent to the administrator’s ban request board and the next time an administrator finds a vandal activity by the same user, the ban is given. The paper makes use of a detailed use case to explain the process step-by-step.

The paper was interesting in that it shined a light to another pro that automation can bring to collaborative work. The paper emphasizes that it was the automated bots and their efficient reporting system that created a decentralized network of human moderators by pre-processing and analyzing the queued edits to form a ranked queue of potential vandal edits according to previous warnings. As there exist many effective scheduling algorithms, automated scheduling is a great way of handling human teamwork. Wikipedia’s system reminded me of a thread pool system that modern CPUs use, except that each thread’s task is carried out by a human.

Wikipedia’s vandal fighting system makes perfect use of human and AI affordance. The human’s side makes use of their linguistic and complex reasoning ability to determine the vandal edits. The AI side efficiently handles the many repetitive tasks like sorting edit queues and logging and retrieving warnings.

The followings are the questions that I had while reading the paper:

1. At the end of the use case presented in the paper, an obsolete report made after a user’s ban was automatically removed by the system. This is an example of resolving a race condition. Could there be any other possible conflicts that may occur because of the order of edits? Would some of them be difficult to fix by a bot?

2. According to the paper, it seems that the time of the warning by the system is not considered on a potential vandal user’s talk page when assigning a warning. What if the user who have gotten four warnings decided to quit being a vandal, came back a few years later, and accidentally made an edit that was considered vandal? The system would issue a temporary ban. Do you think this is fair?

3. According to the paper, vandal fighters are able to select from a range of helper bots in their activity. All these bots are compatible with each other because of the presence of a talk page provided by Wikipedia. Would there be any case where the different types of bots cause a problem or conflict with each other?

Read More

02/19/2020 – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal – Yuhang Liu

The authors of this paper examine the social role of software tools in Wikipedia, with a particular focus on automatic editing programs and assisted editing tools. The author showed by example that the content of Wikipedia has been modified by people, which may have a bad impact on society. This kind of repair can be done by administrators and some assisted software, and with the development of science and technology, this software also plays more and more roles in the recover. And using trace ethnography, the authors show how these unofficial technologies have fundamentally changed the nature of editing and management in Wikipedia. Specifically, “destructive fighting” can be analyzed as distributed cognition, emphasizing the role of non-human participants in decentralized activities that promote collective intelligence. Overall, this situation shows that software programs are not only used to enforce policies and standards. These tools can take coordinated but decentralized action, and can play a more widespread and effective role in subsequent applications.

I think this paper has given a large number of examples, including the impact of changes on society, and some analogy to perfectly explain the meaning of these network terms, which are very effective illustrations of Wikipedia and the impact of these applications on people’s lives. Among them I think the author made two main points.

  1. Robots or software, such assisted editing tools, play an increasingly important role in life and work. For example, the article mentioned two editing tools, Huggle, Twinkle, and the author introduced the use of the two software in detail. After reading the introduction, this completely subverted my concept of assisted editing tools. These unofficial tools can greatly help administrators to complete the maintenance work. This also led to the new concept of “trace ethnography”. In my opinion, trace ethnography is a way of generating rich accounts of interaction by combining a fine grained analysis of the various “trace” that are automatically recorded by the project ‟s software alongside an ethnographically derived understanding of the tools, techniques, practices, and procedures that generate such traces. it can integrate the small traces left by people on the Internet. I think it can play a vital role in controlling and monitoring people’s behavior on the Internet. In order to maintain the network environment, I even think we can use this feature more widely.
  2. The author describes the destructive behavior reform as distributed cognition through analogy of navigation. The user and the machine complete the judgment and then integrate in the network so that the intentional destructiveness can be seen. I think this kind of thinking will even greatly change the way people work in the future. The work introduced in navigation does not even require a lot of professional knowledge, it only needs to be able to read maps and use a magnifying glass. And in future work, people who work do not even need to have sufficient professional knowledge, they only need to be able to understand the information and have the right judgment. This will definitely change the way people work.

Question:

  1. In the future, will it be possible to complete inspection and maintenance by robots and computers(without people)?
  2. Is it possible to apply the ideas of trace Ethnography in other fields, such as monitoring cybercrime?
  3. Assisted editing tools reduce administrators’ requirements for related expertise. Will this change benefit these people in the long run? Does the easier job completion mean easier replacement by machines?
  4. The article mentioned that we need to consider the social impact of such assisted editing tools. What kind of social impact do you think the various software in your current life have?

Read More

02/19/2020 – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal – Sushmethaa Muhundan

The paper takes about the counter-vandalism process in Wikipedia focussing on both the human efforts as well as the silent non-human efforts put in. Fully-automated anti-vandalism bots are a key part of this process and play a critical role in managing the content on Wikipedia. The actors involved range from being fully autonomous software to semi-automated programs to user interfaces used by humans. A case study is presented which is an account of detecting and banning a vandal. This aims to highlight the importance and impact of bots and assisted editing programs. Vandalism-reverting software use queuing algorithms teamed with a ranking mechanism based on vandalism-identification algorithms. The queuing algorithm takes into account multiple factors like the kind of user who made the edit, revert history of the user as well as the type of edit made. The software proves to be extremely effective in presenting prospective vandals to the reviewers. User talk pages are forums utilized to take action after an offense has been reverted. This largely invisible infrastructure has been extremely critical in insulating Wikipedia from vandals, spammers, and other malevolent editors. 

I feel that the case study presented helps understand the internal working of vandalism-reverting software and it is a great example of handling a problem by leveraging the complementary strengths of AI and humans via technology. It is interesting to note that the cognitive work of identifying a vandal is distributed across a heterogeneous network and is unified using technology! This lends speed and efficiency and makes the entire system robust. I found it particularly interesting that ClueBot, after identifying a vandal, immediately reverted the edit within seconds. This edit did not have a wait in a queue for a human or a non-human bot to review but was resolved immediately using a bot.

A pivotal feature of this ecosystem that I found very fascinating was the fact that domain expertise or skill is not required to handle such vandal cases. The only expertise required of vandal fighters is in the use of the assisted editing tools themselves, and the kinds of commonsensical judgment those tools enable. This widens the eligibility criteria for prospective workers since specialized domain experts are not required.

  • The queuing algorithm takes into account multiple factors like the kind of user who made the edit, revert history of the user as well as the type of edit made. Apart from the factors mentioned in the paper, what other factors can be incorporated into the queuing algorithm to improve its efficiency?
  • What are some innovative ideas that can be used to further minimize the turnaround reaction time to a vandal in this ecosystem?
  • What other tools can be used to leverage the complementary strengths of humans and AI using technology to detect and handle vandals in an efficient manner?

Read More

02/19/2020 – Updates in Human-AI teams: Understanding and Addressing the Performance/Compatibility Tradeoff – Sushmethaa Muhundan

The paper studies human-AI teams in decision-making settings specifically focusing on updates made to the AI component and its subsequent influence on the decision-making process of the human. In an AI-advised human decision-making interaction, the AI system recommends actions to the human. Based on this recommendation, their past experience as well as domain knowledge, the human takes an informed decision. They can choose to go ahead with the action recommended by the AI or they can choose to disregard the recommendation. During their course of interaction with AI systems, humans develop a mental model of the system. This is developed based on mapping scenarios where the AI’s decision was correct versus when they were incorrect by means of rewards and feedback provided to the humans by the system. As part of the experiment, studies were conducted to establish relationships between updates to AI systems and team performance. User behavior was monitored using a custom platform, CAJA, built to gain insights about the influence of updates to AI models on the user’s mental model and consequently team performance. Consistency metrics were introduced and several real-world domains were analyzed including recidivism prediction, in-hospital mortality prediction, and credit risk assessment. 

It was extremely surprising to note that updates to the AI’s performance that makes it better actually may hurt the team performance. My initial instinct was that with an increase in the AI’s performance, the team performance would increase proportionally but this is not always the case. In certain cases, despite there being an increase in the AI’s performance, the new results might not be consistent with the human’s mental model and as a result, incorrect decisions are taken based on past interactions with the AI and hence the overall team performance decreases. An interesting and relatable parallel is drawn to concepts of backward compatibility in software engineering with respect to updates. The concept of compatibility is introduced using this analogy to describe the ideal scenario where updates to the AI does not introduce further errors.

The platform developed to conduct the studies, CAJA, was an innovative way to overcome the challenges of testing in real-world settings. This platform abstract away the specifics of problem-solving by presenting a range of problems that distills the essence of mental models and trust in one’s AI teammate. It was very interesting to note that these problems were designed such that no human could be a task expert thereby maximizing the importance of mental models and their influence in decision making.

  • What are some efficient means to share the summary of AI updates in a concise, human-readable form that captures the essence of the update along with the reason for the change?
  • What are some innovative ideas that can be used to reduce the cost incurred by re-training humans in an AI-advised human decision-making ecosystem?
  • How can developers be made more aware of the consequences of the updates made to the AI model on team performance? Would increasing awareness help improve team performance?

Read More

2/19/20 – Jooyoung Whang – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

According to the paper, most developers of classification or prediction systems focus on the quality of the predictions but not on the system’s team performance with the user. The authors of this paper introduce the problem that may occur according to the current model training loss criteria and provide new methods that address the problem. To develop a more advanced image of the users’ interactions with a classifier system, the authors develop a web-based game system called Caja and conduct a user study using the Amazon Mechanical Turk. They conclude that the increase in performance of the system does not necessarily mean that the team performance of the system with the users also increase. They also confirm that their proposed training method using the new loss function and a new concept called Dissonance improves team performance.

I liked the authors’ new perspective to human-AI collaboration and model training. Now that I think of it, not considering the users of the system during development is contradictory to what the system’s trying to achieve. One thing I was interested in and had thoughts about was their definition of Dissonance. The term is used to compare and link with the old model of a system with the new updated model in terms of user expectation. I saw that the term penalizes a system when the new system misclassifies for a set of input that the old model used to get right. However, what if the users of the old system made predictions according to how the system was wrong? This may be a weird concern and probably an edge case, but if the user made decisions based on the thought that the system was wrong all the time, the team performance of that that person with the updated model will always be worse even if the new system was trained with the suggested loss function.

The followings are the questions that I had while reading the paper:

1. As I have written in my reflection, do you think the new proposed training method will be effective if the users made decisions based on the idea that the system will be always wrong? Or, is this a too extreme and absurd thought?

2. The design of Caja ensures that the user can never arrive at the solution by him or herself because too much about the problem domain is hidden to the user. However, this is often not the case in real world scenarios. The user of the system is often also an expert of the related field. Does this reduce the quality and trustworthiness of the results of this research? Why or why not?

3. The research started from the idea that interaction with the users must be considered when making an update to an AI system. In this case, it was particularly for human-AI collaboration. What if it was the opposite? For example, there are AIs that are built to compete with humans like AlphaGo. These types of AIs are also developed with the goal of producing the most optimal solution to a given input without considering the interaction with the user. How can training be modified to include users for competing AIs?

Read More

02/18/20 – Akshita Jha – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

Summary:
“The Work of Sustaining Order in Wikipedia: The Banning of a Vandal” by Geiger and Ribes examines the role of software tools in the English Wikipedia, specifically involving autonomous and assisted editing. Wikipedia is a “free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.” Bots are “fully-automated software agents that perform algorithmically-defined tasks involved with editing, maintenance, and administration in Wikipedia.” Different bots have different functions which can range from simple tasks like correcting grammatical errors to more complicated tasks like detecting personal insults. The authors present a detailed case study: “The Banning of a Vandal”. The authors talk about “Huggle”, that is the most widely used editing tool across Wikipedia that queues all the edits. The user then has the option to perform a variety of actions like ‘revert’, ‘warn’, etc. on each of the edits that is displayed. The user does not have the option to select which edit he wants to make changes to. An anonymous user had been vandalizing multiple Wikipedia pages and was not discouraged by the warning and comments given by the moderators. Eventually, this rogue user was blocked by making use of the network of moderators or vandal fighters and the bots but it was more cumbersome than expected. In addition to the quantitative and the qualitative studies, the research also demonstrated the importance of trace ethnography for studying such sociotechnical systems.

Reflections:
This is an interesting work. It was particularly insightful as I was unaware of the role of multiple bots in Wikipedia editing. Bots and humans working cohesively have helped make Wikipedia the widely used resource it currently is. Making Wikipedia a free resource that allows editing by volunteers comes with a cost. This paper helped highlight the limitations of the Wikipedia bots and how a significant amount of effort is needed from multiple moderators to ban a vandal from Wikipedia. Each moderator makes a local judgement but the Wikipedia talk pages help keep a record of all the warnings against a particular user. Certain kinds of vandalism, like inserting obscenities and profanities, are easy to detect. However, if a vandal deletes an important section from the Wikipedia page, that might involve significant cognitive effort from moderators to identify and rectify. An interesting question is how would Wikipedia be effected, if it made use of a completely automated bot instead of the hybrid system it currently uses. Would the bots be able to determine the significance of an edit or a change? How would that change the moderators behaviors and actions? Since, automated tools help determine the kind of social activities that are possible on Wikipedia, will having a completely automated bot significantly alter Wikipedia and the user involvement? It would also be interesting to see if we can use trace ethnography to study Reddit, which is another big sociotechnical system.

Questions:
1. How did such a network come into place?
2. Do you think certain kinds of Wikipedia pages are more susceptible than others to vandalism?
3. Will completely automated bots help?
4. Can we conduct such a case study for Reddit? Why? Why not?

Read More

02/19/2020 – Nurendra Choudhary – Updates in Human-AI Teams

Summary

In this paper, the authors study the role of studying human-AI team performance in contrast to their individual performance and explain its necessity. They explain the importance of human inference of AI tools. Humans develop mental models of AI’s performance. Advances made in AI’s algorithm only evaluate the improvement in the prediction. However, the improvements cause behavioral changes in AI that do not fit the human’s mental models and reduce the overall performance of their team. To alleviate this, the authors propose a new logarithmic loss that considers the compatibility between human mental models and AI models for making updates to the AI model.

The authors construct user studies to show the development of human mental models across different conditions. Additionally, they illustrate the degradation in overall team performance with improvement in AI’s prediction. Furthermore, they show the addition of the additional loss increases the overall team performance of the AI model while increasing AI’s prediction efficiency. 

Reflection

Humans and AI form formidable teams in multiple environments and I think such a study as a necessity for further development of AI. Most state-of-the-art AI systems are not independently useful in real-world and rely on human intervention from time-to-time (as discussed in previous classes). Till a point of time where this situation exists, we cannot improve AI independently and have to consider the humans involved in the task. I believe the evaluation metrics currently used in AI research are completely focussed on the AI’s prediction. However, this needs to change and the paper is a great primary step in the direction. I believe we should construct more such evaluation metrics for various other AI tasks. But, if we develop our evaluation metrics around human-AI teams, we take the risk of potentially making AI systems reliant on human input. Hence, there is a possibility that AI systems never independently solve our problems. I believe the solution lies in interpretability. 

Current AI techniques rely on statistical spaces that are not human-interpretable. Focusing on making these spaces interpretable allows human comprehensibility. Interpretable AI is a rising research topic in several subareas of AI and I believe it can solve the current dilemma. We can develop AI systems independently and all the updates will be comprehensible by humans and they can accordingly update their mental models. But, we interpretability is not a trivial subject. Recent work has only shown incremental progress and the work still compromises on prediction ability for interpretability. The effectiveness of AI is observed because of their ability to recognize patterns in dimensions incomprehensible to human beings. The current paper and interpretability both require human understanding of the model and I am not sure if this is possible.

Questions

  1. Can we have evaluation metrics for other tasks based on this? Will it involve human evaluation? If so, how do we maintain comparative fairness across such metrics?
  2. If we continue evaluating Human-AI teams together, will we ever be able to develop completely independent AI systems?
  3. Should we focus on making the AI systems interpretable or their performance?
  4. Is interpretable AI the future for real-world systems? Think about, for every search query made, the user is able to see all their features that aids the system’s decision making process.

Word Count: 545

Read More

02/18/20 – Akshita Jha – Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator

Summary:
“Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator” by Jhaver et al. talks about the popular social media website Reddit and the unusual unpaid human moderators and automated moderator collaboration. Reddit moderators make use of the heavily configurable automated program called, ‘Automoderator’ to help make decisions about the content that should be removed from the website. The authors interview 16 Reddit moderators to understand how they benefit from the moderating tool, ‘Automod’ and how they adapt and configure it to reflect the subreddit’s policies to help them moderate the subreddit effectively. The authors also offer valuable insights that may benefit the creators of the platforms, designers of automated regulation systems, scholars of platform governance, and content moderators. The authors conclude by pointing out that the moderation system in reddit is a collaborative effort between humans as well as the automated systems. This hybrid system works but there is definitely a scope for improvement in the development and deployment of these tools.

Reflections:
Online platforms can be a boon or a bane depending on how people choose to engage with it. Regulation might seem necessary to ensure that low quality posts (these posts can be treated as noise) do not drown out informative and worthy posts on the site. However, this is a challenging task. Deciding whether a post is appropriate for the subreddit puts a lot of responsibility on the moderator. In some cases the moderator might be a bot, ‘Automod’ and in other cases the platform relies on paid or unpaid volunteers. Reddit moderators are unpaid. The authors in this work analysed 5 different subreddits: ‘r/photoshopbattles’, ‘r/space’, ‘r/oddlysatisfying’, ‘r/explainlikeiamfive’ and ‘r/politics’. It’s interesting that some reddit moderators prefer to implement moderation bots from scratch while others make use of tools made by others. It’s intriguing how making use of tools made by others forms a sense of community of moderators within the bigger community of reddit. Most redditors use ‘Automod’ which was initially created by ‘Chad Birch’ using the Reddit API in January 2012. However, a major drawback of this study is that all the moderators that the authors interviewed were males. It would be helpful to get the perspective of female moderators, if there are any, since the user base for Reddit is disproportionately male. I feel the authors should have selected ‘r/AskHistorians’ as one of the subreddits for analysis since it’s widely known to be highly moderated and content driven. It would have also been interesting to deep dive into the comments that ‘Automod’ marked as offensive but were not. This would help improve the performance of the moderator while informing us of its limitations. One might also need to wonder about the consequences if the subreddit community grows larger. There might be a need to reflect on the existing tools and their scale.

Questions:
1. Do you agree that social media content should be moderated?
2. What about the mental health of the moderators?
3. What kind of resources should be make available to the moderators since they are dealing with sensitive content all the time?

Read More

02/19/2020 – Nurendra Choudhary – The Work of Sustaining Order in Wikipedia

Summary

In this paper, the authors discuss the problem of maintaining order in open-edit information corpora, specifically Wikipedia here. They start with explaining the near-immunity of Wikipedia to vandalism that is achieved through a synergy between humans and AI. Wikipedia is open to all editors and the team behind the system is highly technical. However, the authors study on its immunity dependence on the community’s social behavior. They show that vandal fighters are networks of people that identify the vandals based on a network of behavior. They are supported by AI tools but banning a vandal is yet not a completely automated process. The process of banning a user is a requires individual editor judgements at a local level and a collective decision at a global level. This creates a heterogeneous network and emphasizes on decision corroboration by different actors.

As given in the conclusion, “this research has shown the salience of trace ethnography for the study of distributed sociotechnical systems”.  Here, trace ethnography combines the ability of editors with data across their actions to analyze vandalism in Wikipedia.

Reflection

It is interesting to see that Wikipedia’s vandal fighters include such a seamless cooperation between humans and AI. I think this is another case where AI can leverage human networks for support. The more significant part is that the tasks are not trivial and require human specialization and not just plain effort. Also, collaboration is a significant part of AI’s capability. Human editors analyze the articles in the local context. AI can efficiently combine the results and target the source of these errors by building a heterogeneous network of such decisions. Further, human beings analyze these networks to ban vandals. This methodology applies the most important abilities of both humans and bots. The collaboration involves the best attributes of humans, i.e; judgement and of AI, i.e; pattern recognition. Also, it effectively utilizes this collaboration against vandals who are independent or small networks of mal-practitioners who do not have access to the bigger picture.

The methodology utilizes distributed work patterns for accomplishing different tasks of editing and moral agency. Distributing the work enables involvement of human beings on trivial tasks. However, combining the results to attain logical inferences is not humanly possible. This is because the vast amount of data is incomprehensible to humans. But, humans have the ability to develop algorithms that the machine can apply at a larger-scale to get such inferences. However, the inferences do not have a fixed structure and require human intelligence to retrieve desired actions against vandalism. Given that, most of the cases of such vandalism are by independent humans, a collaborative effort by AI can greatly turn the odds for vandal fighters. This is because AI aids humans by utilizing the bigger picture incomprehensible to just humans.

Questions

  1. If vandals have access to the network, will they be able to destroy the synergy?
  2. If there’s more motivation like political or monetary gain, will it give rise to a kind-of mafia network of such mal-practitioners? Will the current methodology still be valid in such a case?
  3. Do we need a trust-worthiness metric for each Wikipedia page? Can the page be utilized as reference for absolute information?
  4. Wikipedia is a great example of crowd-sourcing and this is a great article for crowd-control on these networks. Can this be extended to other crowd-sourcing softwares like Amazon MT or information blogs?

Read More