02/19/2020 – Nan LI – In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

Summary:

The paper pointed out the theory that there is no universally ideas structures for effective teamwork. The best structure for teamwork is determined by different team members, tasks, surroundings, etc. Thus, this paper presents a system that investigates the optimal team structure by adapting different team structures to teams and evaluate the efficiency based on team performance and teamwork feedback. However, the combination of diverse dimensions of team structures and the arms (the different values of dimension) of each dimension is a large set. To avoid overwhelming group testers with these values, the paper also leverages a model called multi-armed bandits with temporal constrains which set the constraints of the number of arms selections based on several factors. The paper tested the platform with AMT workers, and evaluate the performance of the system with designed task and performance evaluation. The system confirmed that there are no two teams had the same optimal team structures, and this structure even different for the same group when completing the different tasks. The results also indicate the platform DreamTeam can promote high-efficiency teamwork.

Reflection:

First, I highly agree with the opinion that there are no universal idea structures for effective teamwork. Besides, searching the optimal structure through adapt different dimensions and evaluating each fit seems also reasonable. However, I think the experiment could gather more valuable information if they test the platform with a real group instead of a randomly formed group. Because I think the premise of becoming a group and completing a task together is that the group members are familiar with each other. Thus, this platform should be more effective in the early stages of team formation. Before the team members are familiar with each other, they can use this system to find the optimal team structure temporarily, so that they can quickly cooperate and work as a team even though they do not know each other. Nevertheless, as the familiarity among the group members raises, using this method to determine the optimal structure may be inefficient. Because they may have already find the best structure for some of the dimensions as they get along and getting more experience of working together.

I also considered another situation that is more suitable for this system, when a long-established team is assigned to working on a new type of task. Then maybe the working mode of the teamwork needs to be switched so that they can complete the new task most efficiently. At this time, the system support is demanded to find this new optimal structure.

Finally, I think the constraints method mentioned in the article also very inspirational. Maybe we can improve the effectiveness of the DreamTeam platform by allowing users to pre-delete some dimensions that they would not like to change. For example, the hierarchy or interaction pattern. In this case, the reduction of the combination is more conducive to exhaustive testing, and the adapting structure should be more fit for the teamwork.

Question:

  1. What do you think using the computation power to decide the optimal structure for teamwork?
  2. In this paper, the author finds random tester to form a group and complete the task, do you think this will influence the results?
  3. Under what condition do you think this platform would most benefit.

Word Count: 544

Read More

02/19/2020 – Nan LI – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

Summary:

In this paper, the author presented that it is prevailing nowadays human and AI form a team to make decisions in a way that AI provides recommendations and humans decide to trust or not. In these cases, successful team collaboration mainly based on human knowledge on previous AI system performance. Therefore, the author proposed that though the update of the AI system would enhance the AI predictive precision, it will hurt the team performance since the updated version of the AI system usually not compatible with the mental model that humans developed with previous AI systems. To solve this problem, the author introduced the concept of compatibility of AI update with prior user experience. To examine the role of this compatibility in Human-AI teams, the author proposed methods and designed a platform called CAJA to measure the impact of updates on team performance. The outcomes show that team performance could be harmed even the updated system’s predictive accuracy improved. Finally, the paper proposed a re-training objective that can promote the compatibility of updates. In conclusion, to avoid diminish team performance, the developer should build a more compatible update without surrendering performance. 

Reflection:

In this paper, the author talked about a specific interaction, which is AI-advised human decision making. As the example presented in the paper–Patient readmission system. In these cases, an incompatible update of the AI system would indeed harm the team performance. However, I think the extent of the impact largely depends on the correlation between the human and AI systems.

If the system and the user have a high grade of interdependence, both are not specialists on a task, the system prediction accuracy and user knowledge have the same impact on the decision result, the incompatible update of the AI system will weaken the team performance. Even though this effect can be eliminated by the running-in of the user and the system later, the cost for the decision in the high-stakes domain will be very large.

However, if the system interacts with users frequently, but the system’s predictions are only one of the concerns for humans to make decisions and cannot directly affect the decision, then the impact of incompatible updates on team performance will be limited.

Besides, if humans are more expertise on the task, and can validate the correctness of the recommendation promptly, then both the improvement of the system performance and the new errors caused by the system update will not have much impact on the results. On the other hand, if the error caused by the update does not affect team performance, then when updating the system, we do not need to consider compatibility but only need to consider the improvement of system performance. As a conclusion, if there is not enough interaction between the user and the system, and the degree of interdependence is not high, or the system only serves as an auxiliary or double-check, then the system update will not have a great impact on team performance.

A compatible update is indeed helpful for users to quickly adapt to the new system, but I think the impact of update largely depends on the correlation between the user and the system, or the proportion of the system’s role in teamwork.

Besides, design a compatible update version also requires extra cost. Therefore, I think we should consider minimizing the impact of system errors on the decision-making process when designing the system and establishing human-AI interaction.

Question:

  1. What do you think about the concept of compatibility of AI update?
  2. Do you have any human-AI system examples that apply this author theory?
  3. Under what circumstances do you think the author’s theory is the most used and when it is not applicable?
  4. When we need to update the system frequently, do you think it is better to build a compatible update or to use an alternative method to solve the human adaptation costs?
  5. In my opinion, Huaman’s degree of adaptation is very high, and the cost required for humans to adapt is much smaller than the cost of developing a compatible update version. what do you think?

Word Count: 682

Read More

02/19/20 – Fanglan Chen – In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

Summary

Zhou et al.’s paper “In Search of the Dream Team” introduces DreamTeam — a system that identifies effective team structures for each group of individuals by suggesting different structures and evaluating the fit of each team. How team works relates with team structures, including roles, norms, and interaction patterns. Prior organizational behavior research doubts the existence of universally perfect structures. The rationale is simple: teams broast of great diversity so one single structure cannot satisfy the full functioning of each team. The proposed DreamTeam explores values along five dimensions of team structures including hierarchy, interaction patterns, norms of engagement, decision-making norms, and feedback norms. The system leverages feedback, randomly choosing metrics such as team performance or satisfaction, to iteratively identify the team structures that facilitate the best organization of each team. Also, the authors design multi-armed bandits with temporal constraints, an algorithm that determines the timing of exploration and exploitation trade-offs across multiple dimensions to avoid overwhelming teams with too many changes. In the experiments, DreamTeam is integrated with the chat platform Slack and achieves better performance and more diverse team structures compared with baseline methods.

Reflection

The authors design a system to facilitate the organization of virtual teams. Along with the several limitations mentioned in the paper, I feel the proposed DreamTeam system is based on a comparatively narrow scope of what makes a dream team and it seems difficult to generalize the framework to a variety of domains or platforms. 

In the first place, I do not agree that there is a universal approach to design or evaluate a so-called dream team. The components that make a dream team vary in different domains. For example, in sports, I would say personality and talent play important roles in forming a dream team. Actually, it goes beyond the term “forming” that a bunch of talented individuals not only bring technical expertise to the team, but they also contribute passion, strong work ethic, and strive for peak performance in the pursuit of excellence. To extend that point, working with people having similar personalities, similar values, similar pursuits will bring some chemistry to the team work which potentially enables challenging problem solving and strategic planning. All of these are not mentioned in the demensions and nearly impossible to be evaluated quantitatively. 

Also, I think it is important to make every team member understand their role, such as why they need to tackle the tasks and how that ties to a larger purpose beyond self’s needs. This provides a clear purpose and direction of where a group of people need to move forwards as a team. I do not think the authors emphasize the importance of how such understanding influences team member level of commitment. In addition, this kind of unified purpose can avoid duplication of member efforts and prevents pulling the efforts in multiple directions. 

Last but not least, in my opinion, basing on the maximizing of rewards is not the ideal way to determine the best team structures. Human society treasure process as well as results. It can be seen as a successful teamwork as long as the whole team is motivated and working on it. If too much emphasis is put on results, then the joy will be drained out of the job for the team. As long as progressive steps are made towards achieving the goal within a reasonable time frame, the team will become better. Building an ambitious, driven and passionate team is just the start. We need to ensure that the team members survive and are nurtured so that they can deliver on the targets.

Discussion

I think the following questions are worthy of further discussion.

  • If you are the CEO of a company or a university president, would you consider using the proposed DreamTeam system to help organize your own team? Why or why not?
  • Do you think the five bandits capture all dimensions to make a dream team?  
  • Do you think the proposed DreamTeam system can be generalized to various domains? Are there any domains you think the system would not contribute towards an efficient team structure?
  • Is there anything you can think about to improve the proposed DreamTeam system?

Read More

02/19/2020 – Vikram Mohanty – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

Summary

This paper, through a case study, highlights the invisible distributed cognition process that goes on underneath a collaborative environment like Wikipedia, and how different actors, both humans and non-humans, come together for achieving a common goal – banning a vandal on Wikipedia. The authors show the usefulness of trace ethnography as a method for reconstructing user actions and understanding better the role each actor plays in the larger scheme of things. The paper advocates for not dismissing the role of bots as mere force multipliers, but to see them in a different lens considering the wide impact they have.

Reflection

Similar to the “Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator” paper, this paper is a great example that intelligent agents (AI-infused systems, bots, scripts, etc.) should not be studied in isolation only, but through a socio-technical lens. In my opinion, that provides a more comprehensive picture of the goals these agents can and cannot achieve, the collaboration processes they may inevitably transform, the human roles they may affect and other unintended consequences than performance/accuracy metrics alone.

Trace ethnography is a powerful method for reconstructing user actions in a distributed environment, and using that to understand how multiple actors (human and non-humans) achieve a complex objective, by sub-consciously collaborating with each other. The paper advocates that bots/automation/intelligent agents should not be seen as just force multipliers or irrelevant users. This is important as a lot of current evaluation metrics focus only on quantitative measures such as performance or accuracy. This paints an incomplete, and sometimes, an irresponsible picture of intelligent agents, as they have now evolved to assume an irreplaceable role in the larger scheme of things (or goals).

The final decision-making privilege resides with the human administrator and the whole socio-technical pipeline assists each step of decision-making with all possible information available so that checks and bounds (or order, as the paper mentions) is maintained at every stage. Automated decisions, whenever taken, are grounded in some confidence of certainty. In my opinion, while building AI models, researchers should think about the AI-infused system or the real-world setting of which these algorithms would be a part of. This might motivate researchers to make these algorithms more transparent or interpretable. The lens of the user who is going to wield these models/algorithms might help further.

It’s interesting to see some of the principles of mixed-initiative systems being used here i.e. history of the vandal’s actions, templated messages, showing statuses, etc.

Questions

  1. Do you plan to use trace ethnography in your proposed project? If so, how? Why do you think it’s going to make a difference?
  2. What are some of the risks and benefits of employing a fully automated pipeline in this particular case study i.e. banning a Wikipedia vandal?
  3. A democratic online platform like Wikipedia supports the notion of anyone coming in and making changes, and thus necessitates deploying moderation workflows to curb bad actors. However, if a platform were restrictive to some degrees, a post-hoc setup may not be necessary and the platform might be less toxic. This does not necessarily be the case for Wikipedia and can also extend to SNS like Twitter/Facebook, etc. What would you prefer, a democratic one or a restrictive one?

Read More

02/19/2020 – Palakh Mignonne Jude – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

SUMMARY

In this paper, the authors focus on the efforts (both human and non-human) taken in order to moderate content on the English-language Wikipedia. The authors use trace ethnography in order to indicate how these ‘non-human’ technologies have transformed the way editing and moderation is performed on Wikipedia. These tools not only increase the speed and efficiency of the moderators, but also aide them in identifying changes that may have gone unnoticed by moderators – for example, the use of the ‘diff’ feature to identify edits made by a user enables the ‘vandal fighters’ to easily view malicious changes that may have been made to Wikipedia pages. The authors mention editing tools such as Huggle, Twinkle as well as a bot called the ClueBot that can examine edits and revert them based on a set of criteria such as obscenity, patent nonsense as well as mass removal of content by a user.  This synergy between the tools and humans has helped monitor changes to Wikipedia in near real-time and has lowered the level of expertise required by reviewers as an average volunteer with little to no knowledge of a domain is capable of performing these moderation tasks with the help of the aforementioned tools.

REFLECTION

I think it is interesting that the authors focus on the social effect on the activities done in Wikipedia due to various bots and assisted editing tools. I especially liked the analogy drawn from the work of Ed Hutchins of a navigator that is able to know the various trajectories through the work of a dozen crew members which the authors mention to be similar to blocking a vandal on Wikipedia through the combined effort of a complex network of interactions between software systems as well as human reviewers.

I thought it was interesting that the use of bots in edits increased from 2-4% in 2006 to about 16.33% in just about 4 years and this made me wonder what the current percentage of edits made by bots would be. The paper also mentions that the detection algorithms often discriminate against anonymous and newly registered users which is why I found it interesting to learn that users were allowed to reconfigure their queues such that they did not view anonymous edits as more suspicious. The paper mentions ClueBot that is capable to automatically reverting edits that contain obscene content, which made me wonder if efforts were made to develop bots that would be able to automatically revert edits that may contain hate speech and highly bigoted views.

QUESTIONS

  1. As indicated in the paper ‘Updates in Human-AI teams’, humans tend to form mental models when it comes to trusting machine recommendations. Considering that the editing tools in this paper are responsible for queuing the edits made as well as accurately keeping track of the number of warnings given to a user, do changes in the rules used by these tools affect human-machine team performance?
  2. Would restricting edits on Wikipedia to only users that are required to have non-anonymous login credentials (if not to the general public, non-anonymous to the moderators such as the implementation on Piazza wherein the professor can always view the true identity of the person posting the question) help lower the number of cases of vandalism?
  3. The study performed by this paper is now about 10 years old. What are the latest tools that are used by Wikipedia reviewers? How do they differ from the ones mentioned in this paper? Are more sophisticated detection methods employed by these newer tools? And which is the most popularly used assisted editing tool?

Read More

02/19/2020-Bipasha Banerjee -In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

Summary:

The paper aims to find a Dream Team by adopting teams to different structures and subsequent evaluation. The authors try to identify the ideal team structure using “the multi-armed bandit” approach over time. The dream team structure selects the next exploration task based on the reward from the previous job. They explored a lot of background research on HCI groups, the structural contingency theory from organizational behavior, multi-armed bandit. A network of five bandits was created with different dimensions, namely, hierarchy, interaction patterns, norms of engagement, decision-making norms, and feedback norms. Each of the dimensions has different possible values. For example, for hierarchy, there can be three possible values – none, centralized (where a leader was elected), decentralized (majority vote). Global temporal constraint and dimensional temporal constraint are taken into consideration to determine at what stage the teams are prepared to embrace changes and also take into account if too many dimensions change at one. The authors used the popular game Codenames for the Slack interface. They used Amazon Mechanical Turk to employ 135 workers and assigned them based on five conditions, namely, control, collectively chosen, manager chosen, bandit chosen, and Dream Team is chosen. There were 35 teams with seven teams per condition. It was found that Dream Team based teams outperformed other teams 

Reflection 

The paper was a nice read on selecting the ideal team structure to maximize productivity. The paper did extensive background research on team structures and included theories from HCI and organizational behavior. Being from a CS background, I have no idea about what team structure is and the theory involved behind selecting the ideal structure. It was a very new concept for me, and the difference between the approaches taken by the HCI domain and Organizational behavior was intriguing. The authors described their approach in detail and mathematically, which makes it easy to visualize the problem as well as the method.

The most interesting section was the integration with Slack, where the Slack bot was utilized to guide the Team with broadcast messages. It was interesting to see how different teams reacted to the messages the Slack bot posted. Dream Teams mostly adhered to the suggestions of the Slack bot whereas, some of the other team structures chose to ignore them. It would be good if the evaluation is also done on a different task. The game is relatively simple, and we don’t know how the Dream Team structure would perform for complicated tasks. It would be intriguing to see how this work could be potentially extended.

The paper highlights a probabilistic approach to proposing the ideal team structure. One thing that was not very clear to me is how the slack bots suggest other than taking into consideration the current score and the best approach. Is it using NLP techniques to deduce the sentiment of the comment and then posting a cross-comment? 

Question

  1. The authors used slack to test their hypothesis. How would dream-team perform for real-life software development teams?
  2. The test subjects were Amazon Mechanical Turks, and the project was reasonably simple (codenames game). Would Dream Team performs better than the other structures when it is domain-specific, and experts are involved? Would it lead to more conflicts?
  3. Could we use better NLP techniques and sentiment analysis to guide the DreamTeams better?

Read More

02/19/2020 – Vikram Mohanty – Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator

Summary

This paper thoroughly summarizes existing work on content regulation in online platforms, but focuses on human-machine collaboration aspect in this domain which hasn’t been widely studied. While most of the work in automated content regulation has been about introducing/improving algorithms, this work adopts a socio-technical lens of trying to understand how human moderators collaborate with automated moderator scripts. As most online platforms like Facebook and Google are pretty closeted about their moderation activities, the authors focus on Reddit, which allows moderators to use the Reddit API and develop their own scripts for each sub-reddit. The paper paints a comprehensive picture of how human moderators collaborate around the automoderator script, the different roles they assume, the other tools they use, and the challenges they face. Finally, the paper proposes design suggestions which can facilitate a better collaborative experience.

Reflection

Even though the Reddit Automoderator cannot be classified as an AI/ML tool, this paper sets a great example for how researchers can better assess the impact of intelligent agents on users, their practices and their behavior. In most instances, it is difficult for AI/ML model developers to foresee exactly how the algorithms/models they build are going to be used in the real world. This paper does a great job at highlighting how the moderators collaborate amongst themselves, how different levels of expertise, tech-savviness play a role in the collaboration process, how things like community guidelines are affected and how the roles for humans changed due to the bots amidst them. Situating the evaluation of intelligent/automated agents in a real-world usage scenario can give us a lot of insights where to direct our efforts for improvement or how to redesign the overall system/platform where the intelligent agent is being served.

It’s particularly interesting to see how users (or moderators) with different experience assume different roles with regards to how or who gets to modify the automoderator scripts. It may be an empirical question, but is a quick transition from newcomers/novices to an expert useful for the the community’s health, or the roles reserved for these newcomers/novices extremely essential? If it’s the former, then ensuring a quick learning curve with the usage of these bots/intelligent agents should be a priority for developers. Simulating what content will be affected with a particular change in the algorithm/script, as suggested in the discussion, can foster a quick learning curve for users (in addition to achieving the goal of minimizing false positives).

While the paper comments on how these automated scripts are supporting the moderator, it would have been interesting to see a comparative study of no automoderator vs automoderator. Of course, that was not the goal of this paper, but it could have helped paint the picture that automoderator adds to user satisfaction. Also, as the paper mentions, the moderators value their current level of control in the whole moderation process, and therefore, would be uncomfortable in a fully automated setting or one where they would not be able to explain their decisions. This has major design implications not just for content regulation, but pretty much in for complex, collaborative task. The fact that end-users developed and used their own scripts, accustomed to the community’s needs, is promising and opens up possibilities for coming up with tools which users with no or little technical knowledge can use to easily build and test their own scripts/bots/ML models.

With the introduction of automoderator, the role of the human moderators changed from their tradition job of just moderating the content to now ensuring that the rules of automoderator are updated, preventing users to game the system and minimizing false positives. Automation creating new roles for humans, instead of replacing them, is pretty evident here. As the role of AI increases in AI-infused systems, it is also important to assess the user satisfaction with the new roles.

Questions

  1. Do you see yourself conducting AI model evaluation with a wider socio-technical lens of how they can affect the target users, their practices and behaviors? Or do you think, evaluating in isolation is sufficient?
  2. Would you advocate for AI-infused systems where the roles of human users, in the process of being transformed, get reduced to tedious, monotonous, repetitive tasks? Do you think the moderators in this paper enjoyed their new roles?
  3. Would you push for fully automated systems or ones where the user enjoys some degree of control over the process?

Read More

02/19/2020 – Palakh Mignonne Jude – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

SUMMARY

In this paper, the authors talk about the impact updates made to an AI model can have on the overall human-machine team performance. They describe the mental model that a human develops through the course of interacting with an AI system and how this gets impacted when an update is made to the AI system. They introduce the notion of ‘compatible’ AI updates and propose a new objective that will penalize new errors (errors introduced in the new model that were not present in the original model). The authors introduced terms such as ‘locally-compatible updates’, ‘compatibility score’ as well as ‘globally-compatible updates’. They performed experiments with high-stakes domains such as recidivism prediction, in-hospital mortality prediction, and credit risk assessment. They also developed a platform to study human-AI teams called CAJA, which is a web-based game and the authors claim that no human is a task expert. CAJA enables designers to vary different parameters including the number of human-visible features, AI accuracy, reward function, etc.

REFLECTION

I think this paper was very interesting as I have never considered the impact on team performance due to updates to an AI system. The idea of a mental model, as introduced by the authors of this paper, was novel to me as I have never thought about the human aspect of utilizing such AI systems that make various recommendations. This paper reminded me of the multiple affordances mentioned in the paper ‘An Affordance-Based Framework for Human Computation and Human-Computer Collaboration’ wherein both humans and machine are in pursuit of a common goal and leverage the strengths of both humans and machines.

I thought that it was good that they defined the notion of compatibility to include the human’s mental model and I agree that developers retraining AI models are susceptible to focus on retraining in terms of improving the accuracy of a model and that they tend to ignore the details of human-AI teaming.

I was also happy to read that the workers used as part of the study performed in this paper were paid on average $20/hour as per the ethical guidelines for requesters.

QUESTIONS

  1. The paper mentions the use of Logistic Regression and multi-layer perceptron. Would a more detailed study on the types of classifiers that are used in these systems help?
  2. Would ML models that have better interpretability for the decisions made have given better initial results and prevented the dip in team performance? In such cases, would providing a simple ‘change log’ (as is done in a case of other software applications), have aided in preventing this dip in team performance or would it have still been confusing to the humans interacting with the system?
  3. How were the workers selected for the studies performed on the CAJA platform? Were there any specific criteria used to select such workers? Would the qualifications of the workers have affected the results in anyway?

Read More

02/19/2020-Bipasha Banerjee – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

Summary: 

The paper discusses software development tools that help in moderating content posted or altered in the online encyclopedia, popularly known as Wikipedia. Wikipedia was built on the concept that anyone with an internet connection and a device could edit pages on the platform. However, such platforms with “anyone can edit” mantra are prone to malicious users, aka Vandals. Vandals are people who post inappropriate content in the form of text alteration, the introduction of offensive content, etc. Humans can be moderators who can scan for offensive content and remove them. However, this is a tedious task for humans to do. It is impossible for them to monitor huge amounts of content and look for small changes hidden in a large body of the text. To aid humans, there are fully automated softwares that are responsible for monitoring, editing, and overall maintenance of the platform. Examples of such software are Huggle and Twinkle. These tools work with humans and help in keeping the platform free from vandals by flagging users and taking appropriate actions as deemed necessary.

Reflection:

This paper was an interesting read on how offensive content is dealt with in platforms like Wikipedia. It was interesting to learn about different tools and how they interact with humans and help them in making the platform clean of bad content. These tools are extremely useful, and it makes use of machine affordance of dealing with large amounts of data. However, I feel we should also discuss the fact that machines need human interference to evaluate its performance. The paper mentions “leveraging the skills of volunteers who may not be qualified to review an article formally”, this statement is bold and leads to a lot of open questions. Yes, this makes it easy to hire people with lesser expertise, but at the same time, it makes us aware of the fact that machines are taking up some jobs and undermining humans’ expertise.  

Most of the tools mentioned are flagging content based on words, phrases, or deletion of enormous content. These can be defined to be rule-based rather than machine learning. Can we implement machine learning and deep learning algorithms where the tool learns from user behavior as Wikipedia is data-rich and could provide a lot of data to the model to train on? The paper mentioned that “significant removal of content” is placed higher on the filter queue. My only concern is sometimes a user might press enter by mistake. For example, take the case of git. Users write codes, and the difference is generally recorded and shown in the diff from the previous commit. If a coder writes new lines of code may be a line or two and press enter erroneously before or after the entire piece, the whole block shows as “newly added” in the diff. This is easy for a human to understand, but a machine flags such content, nonetheless. This may lead to extra work which normally would have been not in the queue or even lower.

The paper talks about the “talk page” where the warnings are posted by tools. This is a very good step as public shaming is needed to stop such baneful behavior. However, we can incorporate a harsher way to “shame” such users. This may be in the form of poster usernames on the main homepage for every category. This won’t work for anonymous, but maybe blocking their IP address would be a temporary fix, I feel like human and computer interaction is well defined in the paper and the concept of content controlling bots make our life easier.

Questions:

  1. Are machines undermining human capabilities? Do we not need expertise any more?
  2. How can such tools utilize the vast amount of data better? E.g., for training deep learning models.
  3. How could such works be extended to other platforms like Twitter?

Read More

02/19/2020 – Sukrit Venkatagiri – The Case of Reddit Automoderator

Paper: Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 5: 31:1–31:35. https://doi.org/10.1145/3338243

Summary: This paper studies Reddit’s Automod, a rule-based moderator for Reddit that automatically filters content on subreddits, and can be customized by the moderators to suit each subreddit. The paper sought to understand how moderators use Automod, and what advantages and challenges it presented. The paper discusses these findings in detail and the authors found that: there was a need for audit tools to tune the performance of Automod, a repository for sharing these tools, and for improving the division of labor between human and machine decision making. They concluded with a discussion of the sociotechnical practices that shape the use of the tools, how they help workers maintain their communities, and the challenges and limitations, as well as solutions that may help address them.

Reflection:

I appreciate that the authors were embedded within the Reddit community for over one year and provides concrete recommendations for creators of new and existing platforms, for designers and researchers interested in automated content moderation, for scholars of platform governance, and for content moderators themselves.

I also appreciate the deep and thorough qualitative nature of the study, along with the screenshots, however the paper may be too long and too detailed in some aspects. I wish there was a “mini” version of this paper. The quotes themselves were exciting and exemplary of problems the users faced.

The finding that different subreddits configured and used subreddits was interesting and I wonder how much a moderators’ skills and background affects whether and in what ways they configure and use Automod. Lastly, the conclusion is very valuable and especially as it is targeted towards different groups within and outside of academia.

Two themes that emerged, “Becoming/continuing to be a moderator” and “recruiting new moderators” sound interesting, but I wonder why it was left out of the results. The paper does not provide any explanation in regards to this.

Questions:

  1. How do you subreddits might differ in their use of Automod based on their technical abilities?
  2. How can we teach people to use Automod better?
  3. What are the limitations of Automod? How can they be overcome through ML methods?

Read More