02/19/2020 – Human-Machine Collaboration for Content Regulation – Myles Frantz

February 18, 2020February 18, 2020 Miles Frantz Leave a comment

Since the dawn of the internet, it has surpassed many expectations and is prolific throughout everyday life. Though initially there was a lack of standards in website design and forum moderation, it has relatively stabilized with various and scientific approaches. A popular forum side, Reddit, use a human lead human-ai collaboration to help automatically and manually moderate the ever-growing comments and thread. Searching through the top 100 subreddits (at the time of writing), the team followed surveyed moderators from 5 varied and highly active subreddits. These moderators are majority Due to the easy to use API provided by Reddit, one of the most used moderation tools was a third party later incorporated into Reddit Automod. This is one of the more popular and common tools used by moderators in Reddit. Since it is very extensible, there is no common standard between all the subreddits. Moderators within the 5 subreddits use this bot in relatively similar but different ways. Not only the sole bot used by moderations, other bots can be used to further interact and streamline other bots in a similar fashion. However due to the complication of bots (technologically or lack of interest in learning the tool), some subreddits let a few people manage the bots, sometimes to damning results. When issues happen, instead of being reactive to various users’ reactions, the paper argues for more transparency of the bot.

I agree with the author of the original of automod, when he started off making the bot purely to automate several steps. Continuing this forward with the scaling of Reddit, I do believe it would be impossible for only human moderators to keep up with the “trolls”.

Though I do disagree with how the rules of the automod are spread out. I would believe the decentralization of knowledge would make the system more robust, especially since the moderators are voluntary. It is natural for people to avoid what they don’t understand, for fear of it in general or fear for what repercussions may happen. Though I don’t think putting all of the work on one moderator is necessarily the right answer.

One of my questions is regarding one of the outcomes for Reddit; granting more visibility of the automods actions. Notably due to the scale of Reddit, extending this kind of functionality automatically could incur much more of a memory and storage overhead. Already Reddit stores vast amount of data however potentially doubling the memory capacity (if every comment was reviewed by automod) may be a downfall to this approach.
Instead of surveying the top 60%, I wonder if surveying the lower ranked (via RedditMetrics) subreddit with a lower number of moderators would fit the same pattern of the automod use. I would imagine they would be forced to use the automod tool more in depth and in breadth due to the lack of available resources however this is pure speculation.
A final question would be, to what percentage is there an over duplication of bots across the subreddits? If there is a big percentage it may lead to a vastly different experience across subreddits, as it seemingly is now potentially causing confusion amongst new or recurring users.

02/19/2020 – Nan LI – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

February 18, 2020February 18, 2020 NAN LI 2 Comments

Summary:

In this paper, the author presented that it is prevailing nowadays human and AI form a team to make decisions in a way that AI provides recommendations and humans decide to trust or not. In these cases, successful team collaboration mainly based on human knowledge on previous AI system performance. Therefore, the author proposed that though the update of the AI system would enhance the AI predictive precision, it will hurt the team performance since the updated version of the AI system usually not compatible with the mental model that humans developed with previous AI systems. To solve this problem, the author introduced the concept of compatibility of AI update with prior user experience. To examine the role of this compatibility in Human-AI teams, the author proposed methods and designed a platform called CAJA to measure the impact of updates on team performance. The outcomes show that team performance could be harmed even the updated system’s predictive accuracy improved. Finally, the paper proposed a re-training objective that can promote the compatibility of updates. In conclusion, to avoid diminish team performance, the developer should build a more compatible update without surrendering performance.

Reflection:

In this paper, the author talked about a specific interaction, which is AI-advised human decision making. As the example presented in the paper–Patient readmission system. In these cases, an incompatible update of the AI system would indeed harm the team performance. However, I think the extent of the impact largely depends on the correlation between the human and AI systems.

If the system and the user have a high grade of interdependence, both are not specialists on a task, the system prediction accuracy and user knowledge have the same impact on the decision result, the incompatible update of the AI system will weaken the team performance. Even though this effect can be eliminated by the running-in of the user and the system later, the cost for the decision in the high-stakes domain will be very large.

However, if the system interacts with users frequently, but the system’s predictions are only one of the concerns for humans to make decisions and cannot directly affect the decision, then the impact of incompatible updates on team performance will be limited.

Besides, if humans are more expertise on the task, and can validate the correctness of the recommendation promptly, then both the improvement of the system performance and the new errors caused by the system update will not have much impact on the results. On the other hand, if the error caused by the update does not affect team performance, then when updating the system, we do not need to consider compatibility but only need to consider the improvement of system performance. As a conclusion, if there is not enough interaction between the user and the system, and the degree of interdependence is not high, or the system only serves as an auxiliary or double-check, then the system update will not have a great impact on team performance.

A compatible update is indeed helpful for users to quickly adapt to the new system, but I think the impact of update largely depends on the correlation between the user and the system, or the proportion of the system’s role in teamwork.

Besides, design a compatible update version also requires extra cost. Therefore, I think we should consider minimizing the impact of system errors on the decision-making process when designing the system and establishing human-AI interaction.

Question:

What do you think about the concept of compatibility of AI update?
Do you have any human-AI system examples that apply this author theory?
Under what circumstances do you think the author’s theory is the most used and when it is not applicable?
When we need to update the system frequently, do you think it is better to build a compatible update or to use an alternative method to solve the human adaptation costs?
In my opinion, Huaman’s degree of adaptation is very high, and the cost required for humans to adapt is much smaller than the cost of developing a compatible update version. what do you think?

Word Count: 682

02/19/2020 – Vikram Mohanty – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

February 18, 2020 Vikram Mohanty 2 Comments

Summary

This paper, through a case study, highlights the invisible distributed cognition process that goes on underneath a collaborative environment like Wikipedia, and how different actors, both humans and non-humans, come together for achieving a common goal – banning a vandal on Wikipedia. The authors show the usefulness of trace ethnography as a method for reconstructing user actions and understanding better the role each actor plays in the larger scheme of things. The paper advocates for not dismissing the role of bots as mere force multipliers, but to see them in a different lens considering the wide impact they have.

Reflection

Similar to the “Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator” paper, this paper is a great example that intelligent agents (AI-infused systems, bots, scripts, etc.) should not be studied in isolation only, but through a socio-technical lens. In my opinion, that provides a more comprehensive picture of the goals these agents can and cannot achieve, the collaboration processes they may inevitably transform, the human roles they may affect and other unintended consequences than performance/accuracy metrics alone.

Trace ethnography is a powerful method for reconstructing user actions in a distributed environment, and using that to understand how multiple actors (human and non-humans) achieve a complex objective, by sub-consciously collaborating with each other. The paper advocates that bots/automation/intelligent agents should not be seen as just force multipliers or irrelevant users. This is important as a lot of current evaluation metrics focus only on quantitative measures such as performance or accuracy. This paints an incomplete, and sometimes, an irresponsible picture of intelligent agents, as they have now evolved to assume an irreplaceable role in the larger scheme of things (or goals).

The final decision-making privilege resides with the human administrator and the whole socio-technical pipeline assists each step of decision-making with all possible information available so that checks and bounds (or order, as the paper mentions) is maintained at every stage. Automated decisions, whenever taken, are grounded in some confidence of certainty. In my opinion, while building AI models, researchers should think about the AI-infused system or the real-world setting of which these algorithms would be a part of. This might motivate researchers to make these algorithms more transparent or interpretable. The lens of the user who is going to wield these models/algorithms might help further.

It’s interesting to see some of the principles of mixed-initiative systems being used here i.e. history of the vandal’s actions, templated messages, showing statuses, etc.

Questions

Do you plan to use trace ethnography in your proposed project? If so, how? Why do you think it’s going to make a difference?
What are some of the risks and benefits of employing a fully automated pipeline in this particular case study i.e. banning a Wikipedia vandal?
A democratic online platform like Wikipedia supports the notion of anyone coming in and making changes, and thus necessitates deploying moderation workflows to curb bad actors. However, if a platform were restrictive to some degrees, a post-hoc setup may not be necessary and the platform might be less toxic. This does not necessarily be the case for Wikipedia and can also extend to SNS like Twitter/Facebook, etc. What would you prefer, a democratic one or a restrictive one?

02/19/20 – Fanglan Chen – In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

February 18, 2020 Fanglan Chen 1 Comment

Summary

Zhou et al.’s paper “In Search of the Dream Team” introduces DreamTeam — a system that identifies effective team structures for each group of individuals by suggesting different structures and evaluating the fit of each team. How team works relates with team structures, including roles, norms, and interaction patterns. Prior organizational behavior research doubts the existence of universally perfect structures. The rationale is simple: teams broast of great diversity so one single structure cannot satisfy the full functioning of each team. The proposed DreamTeam explores values along five dimensions of team structures including hierarchy, interaction patterns, norms of engagement, decision-making norms, and feedback norms. The system leverages feedback, randomly choosing metrics such as team performance or satisfaction, to iteratively identify the team structures that facilitate the best organization of each team. Also, the authors design multi-armed bandits with temporal constraints, an algorithm that determines the timing of exploration and exploitation trade-offs across multiple dimensions to avoid overwhelming teams with too many changes. In the experiments, DreamTeam is integrated with the chat platform Slack and achieves better performance and more diverse team structures compared with baseline methods.

Reflection

The authors design a system to facilitate the organization of virtual teams. Along with the several limitations mentioned in the paper, I feel the proposed DreamTeam system is based on a comparatively narrow scope of what makes a dream team and it seems difficult to generalize the framework to a variety of domains or platforms.

In the first place, I do not agree that there is a universal approach to design or evaluate a so-called dream team. The components that make a dream team vary in different domains. For example, in sports, I would say personality and talent play important roles in forming a dream team. Actually, it goes beyond the term “forming” that a bunch of talented individuals not only bring technical expertise to the team, but they also contribute passion, strong work ethic, and strive for peak performance in the pursuit of excellence. To extend that point, working with people having similar personalities, similar values, similar pursuits will bring some chemistry to the team work which potentially enables challenging problem solving and strategic planning. All of these are not mentioned in the demensions and nearly impossible to be evaluated quantitatively.

Also, I think it is important to make every team member understand their role, such as why they need to tackle the tasks and how that ties to a larger purpose beyond self’s needs. This provides a clear purpose and direction of where a group of people need to move forwards as a team. I do not think the authors emphasize the importance of how such understanding influences team member level of commitment. In addition, this kind of unified purpose can avoid duplication of member efforts and prevents pulling the efforts in multiple directions.

Last but not least, in my opinion, basing on the maximizing of rewards is not the ideal way to determine the best team structures. Human society treasure process as well as results. It can be seen as a successful teamwork as long as the whole team is motivated and working on it. If too much emphasis is put on results, then the joy will be drained out of the job for the team. As long as progressive steps are made towards achieving the goal within a reasonable time frame, the team will become better. Building an ambitious, driven and passionate team is just the start. We need to ensure that the team members survive and are nurtured so that they can deliver on the targets.

Discussion

I think the following questions are worthy of further discussion.

If you are the CEO of a company or a university president, would you consider using the proposed DreamTeam system to help organize your own team? Why or why not?
Do you think the five bandits capture all dimensions to make a dream team?
Do you think the proposed DreamTeam system can be generalized to various domains? Are there any domains you think the system would not contribute towards an efficient team structure?
Is there anything you can think about to improve the proposed DreamTeam system?

02/19/2020 – Vikram Mohanty – Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator

February 18, 2020 Vikram Mohanty 2 Comments

Summary

This paper thoroughly summarizes existing work on content regulation in online platforms, but focuses on human-machine collaboration aspect in this domain which hasn’t been widely studied. While most of the work in automated content regulation has been about introducing/improving algorithms, this work adopts a socio-technical lens of trying to understand how human moderators collaborate with automated moderator scripts. As most online platforms like Facebook and Google are pretty closeted about their moderation activities, the authors focus on Reddit, which allows moderators to use the Reddit API and develop their own scripts for each sub-reddit. The paper paints a comprehensive picture of how human moderators collaborate around the automoderator script, the different roles they assume, the other tools they use, and the challenges they face. Finally, the paper proposes design suggestions which can facilitate a better collaborative experience.

Reflection

Even though the Reddit Automoderator cannot be classified as an AI/ML tool, this paper sets a great example for how researchers can better assess the impact of intelligent agents on users, their practices and their behavior. In most instances, it is difficult for AI/ML model developers to foresee exactly how the algorithms/models they build are going to be used in the real world. This paper does a great job at highlighting how the moderators collaborate amongst themselves, how different levels of expertise, tech-savviness play a role in the collaboration process, how things like community guidelines are affected and how the roles for humans changed due to the bots amidst them. Situating the evaluation of intelligent/automated agents in a real-world usage scenario can give us a lot of insights where to direct our efforts for improvement or how to redesign the overall system/platform where the intelligent agent is being served.

It’s particularly interesting to see how users (or moderators) with different experience assume different roles with regards to how or who gets to modify the automoderator scripts. It may be an empirical question, but is a quick transition from newcomers/novices to an expert useful for the the community’s health, or the roles reserved for these newcomers/novices extremely essential? If it’s the former, then ensuring a quick learning curve with the usage of these bots/intelligent agents should be a priority for developers. Simulating what content will be affected with a particular change in the algorithm/script, as suggested in the discussion, can foster a quick learning curve for users (in addition to achieving the goal of minimizing false positives).

While the paper comments on how these automated scripts are supporting the moderator, it would have been interesting to see a comparative study of no automoderator vs automoderator. Of course, that was not the goal of this paper, but it could have helped paint the picture that automoderator adds to user satisfaction. Also, as the paper mentions, the moderators value their current level of control in the whole moderation process, and therefore, would be uncomfortable in a fully automated setting or one where they would not be able to explain their decisions. This has major design implications not just for content regulation, but pretty much in for complex, collaborative task. The fact that end-users developed and used their own scripts, accustomed to the community’s needs, is promising and opens up possibilities for coming up with tools which users with no or little technical knowledge can use to easily build and test their own scripts/bots/ML models.

With the introduction of automoderator, the role of the human moderators changed from their tradition job of just moderating the content to now ensuring that the rules of automoderator are updated, preventing users to game the system and minimizing false positives. Automation creating new roles for humans, instead of replacing them, is pretty evident here. As the role of AI increases in AI-infused systems, it is also important to assess the user satisfaction with the new roles.

Questions

Do you see yourself conducting AI model evaluation with a wider socio-technical lens of how they can affect the target users, their practices and behaviors? Or do you think, evaluating in isolation is sufficient?
Would you advocate for AI-infused systems where the roles of human users, in the process of being transformed, get reduced to tedious, monotonous, repetitive tasks? Do you think the moderators in this paper enjoyed their new roles?
Would you push for fully automated systems or ones where the user enjoys some degree of control over the process?

02/19/2020-Bipasha Banerjee -In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

February 18, 2020 bipashab 1 Comment

Summary:

The paper aims to find a Dream Team by adopting teams to different structures and subsequent evaluation. The authors try to identify the ideal team structure using “the multi-armed bandit” approach over time. The dream team structure selects the next exploration task based on the reward from the previous job. They explored a lot of background research on HCI groups, the structural contingency theory from organizational behavior, multi-armed bandit. A network of five bandits was created with different dimensions, namely, hierarchy, interaction patterns, norms of engagement, decision-making norms, and feedback norms. Each of the dimensions has different possible values. For example, for hierarchy, there can be three possible values – none, centralized (where a leader was elected), decentralized (majority vote). Global temporal constraint and dimensional temporal constraint are taken into consideration to determine at what stage the teams are prepared to embrace changes and also take into account if too many dimensions change at one. The authors used the popular game Codenames for the Slack interface. They used Amazon Mechanical Turk to employ 135 workers and assigned them based on five conditions, namely, control, collectively chosen, manager chosen, bandit chosen, and Dream Team is chosen. There were 35 teams with seven teams per condition. It was found that Dream Team based teams outperformed other teams

Reflection

The paper was a nice read on selecting the ideal team structure to maximize productivity. The paper did extensive background research on team structures and included theories from HCI and organizational behavior. Being from a CS background, I have no idea about what team structure is and the theory involved behind selecting the ideal structure. It was a very new concept for me, and the difference between the approaches taken by the HCI domain and Organizational behavior was intriguing. The authors described their approach in detail and mathematically, which makes it easy to visualize the problem as well as the method.

The most interesting section was the integration with Slack, where the Slack bot was utilized to guide the Team with broadcast messages. It was interesting to see how different teams reacted to the messages the Slack bot posted. Dream Teams mostly adhered to the suggestions of the Slack bot whereas, some of the other team structures chose to ignore them. It would be good if the evaluation is also done on a different task. The game is relatively simple, and we don’t know how the Dream Team structure would perform for complicated tasks. It would be intriguing to see how this work could be potentially extended.

The paper highlights a probabilistic approach to proposing the ideal team structure. One thing that was not very clear to me is how the slack bots suggest other than taking into consideration the current score and the best approach. Is it using NLP techniques to deduce the sentiment of the comment and then posting a cross-comment?

Question

The authors used slack to test their hypothesis. How would dream-team perform for real-life software development teams?
The test subjects were Amazon Mechanical Turks, and the project was reasonably simple (codenames game). Would Dream Team performs better than the other structures when it is domain-specific, and experts are involved? Would it lead to more conflicts?
Could we use better NLP techniques and sentiment analysis to guide the DreamTeams better?

02/19/2020 – Palakh Mignonne Jude – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

February 18, 2020 Palakh Mignonne Jude 1 Comment

SUMMARY

In this paper, the authors focus on the efforts (both human and non-human) taken in order to moderate content on the English-language Wikipedia. The authors use trace ethnography in order to indicate how these ‘non-human’ technologies have transformed the way editing and moderation is performed on Wikipedia. These tools not only increase the speed and efficiency of the moderators, but also aide them in identifying changes that may have gone unnoticed by moderators – for example, the use of the ‘diff’ feature to identify edits made by a user enables the ‘vandal fighters’ to easily view malicious changes that may have been made to Wikipedia pages. The authors mention editing tools such as Huggle, Twinkle as well as a bot called the ClueBot that can examine edits and revert them based on a set of criteria such as obscenity, patent nonsense as well as mass removal of content by a user. This synergy between the tools and humans has helped monitor changes to Wikipedia in near real-time and has lowered the level of expertise required by reviewers as an average volunteer with little to no knowledge of a domain is capable of performing these moderation tasks with the help of the aforementioned tools.

REFLECTION

I think it is interesting that the authors focus on the social effect on the activities done in Wikipedia due to various bots and assisted editing tools. I especially liked the analogy drawn from the work of Ed Hutchins of a navigator that is able to know the various trajectories through the work of a dozen crew members which the authors mention to be similar to blocking a vandal on Wikipedia through the combined effort of a complex network of interactions between software systems as well as human reviewers.

I thought it was interesting that the use of bots in edits increased from 2-4% in 2006 to about 16.33% in just about 4 years and this made me wonder what the current percentage of edits made by bots would be. The paper also mentions that the detection algorithms often discriminate against anonymous and newly registered users which is why I found it interesting to learn that users were allowed to reconfigure their queues such that they did not view anonymous edits as more suspicious. The paper mentions ClueBot that is capable to automatically reverting edits that contain obscene content, which made me wonder if efforts were made to develop bots that would be able to automatically revert edits that may contain hate speech and highly bigoted views.

QUESTIONS

As indicated in the paper ‘Updates in Human-AI teams’, humans tend to form mental models when it comes to trusting machine recommendations. Considering that the editing tools in this paper are responsible for queuing the edits made as well as accurately keeping track of the number of warnings given to a user, do changes in the rules used by these tools affect human-machine team performance?
Would restricting edits on Wikipedia to only users that are required to have non-anonymous login credentials (if not to the general public, non-anonymous to the moderators such as the implementation on Piazza wherein the professor can always view the true identity of the person posting the question) help lower the number of cases of vandalism?
The study performed by this paper is now about 10 years old. What are the latest tools that are used by Wikipedia reviewers? How do they differ from the ones mentioned in this paper? Are more sophisticated detection methods employed by these newer tools? And which is the most popularly used assisted editing tool?

02/19/2020 – Palakh Mignonne Jude – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

February 18, 2020 Palakh Mignonne Jude 1 Comment

SUMMARY

In this paper, the authors talk about the impact updates made to an AI model can have on the overall human-machine team performance. They describe the mental model that a human develops through the course of interacting with an AI system and how this gets impacted when an update is made to the AI system. They introduce the notion of ‘compatible’ AI updates and propose a new objective that will penalize new errors (errors introduced in the new model that were not present in the original model). The authors introduced terms such as ‘locally-compatible updates’, ‘compatibility score’ as well as ‘globally-compatible updates’. They performed experiments with high-stakes domains such as recidivism prediction, in-hospital mortality prediction, and credit risk assessment. They also developed a platform to study human-AI teams called CAJA, which is a web-based game and the authors claim that no human is a task expert. CAJA enables designers to vary different parameters including the number of human-visible features, AI accuracy, reward function, etc.

REFLECTION

I think this paper was very interesting as I have never considered the impact on team performance due to updates to an AI system. The idea of a mental model, as introduced by the authors of this paper, was novel to me as I have never thought about the human aspect of utilizing such AI systems that make various recommendations. This paper reminded me of the multiple affordances mentioned in the paper ‘An Affordance-Based Framework for Human Computation and Human-Computer Collaboration’ wherein both humans and machine are in pursuit of a common goal and leverage the strengths of both humans and machines.

I thought that it was good that they defined the notion of compatibility to include the human’s mental model and I agree that developers retraining AI models are susceptible to focus on retraining in terms of improving the accuracy of a model and that they tend to ignore the details of human-AI teaming.

I was also happy to read that the workers used as part of the study performed in this paper were paid on average $20/hour as per the ethical guidelines for requesters.

QUESTIONS

The paper mentions the use of Logistic Regression and multi-layer perceptron. Would a more detailed study on the types of classifiers that are used in these systems help?
Would ML models that have better interpretability for the decisions made have given better initial results and prevented the dip in team performance? In such cases, would providing a simple ‘change log’ (as is done in a case of other software applications), have aided in preventing this dip in team performance or would it have still been confusing to the humans interacting with the system?
How were the workers selected for the studies performed on the CAJA platform? Were there any specific criteria used to select such workers? Would the qualifications of the workers have affected the results in anyway?

02/19/2020 – Sukrit Venkatagiri – In Search of the Dream Team

February 18, 2020 Sukrit Venkatagiri 1 Comment

Paper: Sharon Zhou, Melissa Valentine, and Michael S. Bernstein. 2018. In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18), 1–13. https://doi.org/10.1145/3173574.3173682

Summary: This paper introduces a system called DreamTeam that explores a search space for the optimal design of teams in an online setting. The system does this through multi-armed bandits with temporal constraints, a type of algorithm that manages the timing of exploration–exploitation trade-offs across multiple bandits simultaneously. This answers a classic question in HCI and CSCW: when should teams favor one approach over another? The paper contributes a computationally identifiable method of good team structures, a system that manifests this, and an evaluation with improvements of 46%. The paper concludes with a discussion of computational partners for improving group work, such as aiding us by pointing out our biases and inherent limitations, and helping us replan when the environment shifts.

Reflection:

I appreciate the way they evaluated the system and conducted randomized controlled trials for each of their experimental conditions. The evaluation is done on a collaborative intellective task, and I wonder how different the findings would be if they had evaluated it using a creative task, instead of an intellective or analytic task. Perhaps there is a different optimal “dream team” based not only on the people but the task itself.

I also appreciate the thorough system description and how the system was integrated within Slack as opposed to having it be its own standalone system. This increases the real world generalizability of the system and also means that it is easier for others to build on top of. In addition, hiring workers in real-time would have been hard, and it’s unclear how synchronous/asynchronous the study was.

One interesting approach is considering both types of bandits simultaneously, exploration and exploitation. I wonder how the system might have fared if teams were given the choice to explore each on their own—probably worse.

Another interesting finding is the evaluation with strangers on MTurk. I wonder if the results would have differed if it was a) in a co-located setting and/or b) among coworkers who already knew each other.

Finally, the paper is nearly two years old, and I don’t see any follow up work evaluating this system in the wild. I wonder why or why not. Perhaps there is not much to gain through an in-the-wild evaluation, or that an in-the-wild evaluation did not fare well. Either way, it would be interesting to read about the results—good or bad.

Questions:

Have you thought about building a Slack integration for your project instead of a standalone system?
How might this system function differently if it were for a creative task such as movie animation?
How would you evaluate such a system in the wild?

02/19/2020 – Sukrit Venkatagiri – The Case of Reddit Automoderator

February 18, 2020 Sukrit Venkatagiri 2 Comments

Paper: Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 5: 31:1–31:35. https://doi.org/10.1145/3338243

Summary: This paper studies Reddit’s Automod, a rule-based moderator for Reddit that automatically filters content on subreddits, and can be customized by the moderators to suit each subreddit. The paper sought to understand how moderators use Automod, and what advantages and challenges it presented. The paper discusses these findings in detail and the authors found that: there was a need for audit tools to tune the performance of Automod, a repository for sharing these tools, and for improving the division of labor between human and machine decision making. They concluded with a discussion of the sociotechnical practices that shape the use of the tools, how they help workers maintain their communities, and the challenges and limitations, as well as solutions that may help address them.

Reflection:

I appreciate that the authors were embedded within the Reddit community for over one year and provides concrete recommendations for creators of new and existing platforms, for designers and researchers interested in automated content moderation, for scholars of platform governance, and for content moderators themselves.

I also appreciate the deep and thorough qualitative nature of the study, along with the screenshots, however the paper may be too long and too detailed in some aspects. I wish there was a “mini” version of this paper. The quotes themselves were exciting and exemplary of problems the users faced.

The finding that different subreddits configured and used subreddits was interesting and I wonder how much a moderators’ skills and background affects whether and in what ways they configure and use Automod. Lastly, the conclusion is very valuable and especially as it is targeted towards different groups within and outside of academia.

Two themes that emerged, “Becoming/continuing to be a moderator” and “recruiting new moderators” sound interesting, but I wonder why it was left out of the results. The paper does not provide any explanation in regards to this.

Questions:

How do you subreddits might differ in their use of Automod based on their technical abilities?
How can we teach people to use Automod better?
What are the limitations of Automod? How can they be overcome through ML methods?