02/19/2020 – The Work of Sustaining Order in Wikipedia – Subil Abraham

This paper is a very interesting inside look at how the inner cogs of Wikipeda functions, particularly relating to how vandalism is managed with the help of automated software tools. The tools developed unofficially by Wikipedia contributors were created out of necessity in order to a) make it easier to identify bad actors, b) automate and speed up reversions of vandalism, and c) give power to the non-experts to police obvious vandalism such as changing or deleting sections without needing a subject matter expert to do a full review of the article. The paper uses trace ethnography in order to study the usage of these tools and puts forth an interesting case study of a vandal defacing various articles and how through distributed actions by various volunteers, assisted by these tools, the vandal was identified, warned for their repeated offenses, and finally banned as their egregious actions continued, all within the span of 15 minutes and no explicit coordination among the volunteers.

I find this to be a fascinating look into distributed cognition in action, where in multiple independent actors are able to take independent action that produce a cohesive result (in the case study, multiple volunteers and automated tools identifying a vandal and issuing warnings, ultimately resulting in their ban). I find I’m thinking the work of these tools as kind of an equivalent to human body’s unconscious activities. For example, the act of walking is incredibly complex involving precise coordination of hundreds of muscles all moving at the right moments. However, we do not have to think any harder than “I want to get from here to there” and our body handles the rest. That’s kind of what it feels like these tools are, something that handles the complex busywork and leave the big decisions to us. I am wondering though how things have changed from 2009. The paper mentions that the bots tend to ignore changes made by other bots because presumably those other bots are being managed by other volunteers but the bot configuration can be changed so that it explicitly monitors other bots. I wonder how much of that functionality is used now because I am sure Wikipedia now has to deal with a lot more politically motivated vandalism, and much of it is being done by bots. Reddit is a big victim of this, so it is not hard to imagine Wikipedia faces the same problem. Of course, the adversarial bots would be a lot more clever than just pretending to be a friendly bot because that might not cut it anymore. It’s still an important thing to think about.

  1. How would the functionality of Huggle and its ilk fare in the space of Reddit’s automoderator, and vice versa? Are they dealing with fundamentally different things or is there overlap?
  2. How has dealing with vandalism changed on Wikipedia in the decade since this paper was published?
  3. Is there a place for a heirarchy of bots, where lower level bots scan for vandalism and higher level bots make the decisions for banning, all with minimal human intervention? Or will there always need active human participation?

Read More

02/19/2020 – Nan LI – In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

Summary:

The paper pointed out the theory that there is no universally ideas structures for effective teamwork. The best structure for teamwork is determined by different team members, tasks, surroundings, etc. Thus, this paper presents a system that investigates the optimal team structure by adapting different team structures to teams and evaluate the efficiency based on team performance and teamwork feedback. However, the combination of diverse dimensions of team structures and the arms (the different values of dimension) of each dimension is a large set. To avoid overwhelming group testers with these values, the paper also leverages a model called multi-armed bandits with temporal constrains which set the constraints of the number of arms selections based on several factors. The paper tested the platform with AMT workers, and evaluate the performance of the system with designed task and performance evaluation. The system confirmed that there are no two teams had the same optimal team structures, and this structure even different for the same group when completing the different tasks. The results also indicate the platform DreamTeam can promote high-efficiency teamwork.

Reflection:

First, I highly agree with the opinion that there are no universal idea structures for effective teamwork. Besides, searching the optimal structure through adapt different dimensions and evaluating each fit seems also reasonable. However, I think the experiment could gather more valuable information if they test the platform with a real group instead of a randomly formed group. Because I think the premise of becoming a group and completing a task together is that the group members are familiar with each other. Thus, this platform should be more effective in the early stages of team formation. Before the team members are familiar with each other, they can use this system to find the optimal team structure temporarily, so that they can quickly cooperate and work as a team even though they do not know each other. Nevertheless, as the familiarity among the group members raises, using this method to determine the optimal structure may be inefficient. Because they may have already find the best structure for some of the dimensions as they get along and getting more experience of working together.

I also considered another situation that is more suitable for this system, when a long-established team is assigned to working on a new type of task. Then maybe the working mode of the teamwork needs to be switched so that they can complete the new task most efficiently. At this time, the system support is demanded to find this new optimal structure.

Finally, I think the constraints method mentioned in the article also very inspirational. Maybe we can improve the effectiveness of the DreamTeam platform by allowing users to pre-delete some dimensions that they would not like to change. For example, the hierarchy or interaction pattern. In this case, the reduction of the combination is more conducive to exhaustive testing, and the adapting structure should be more fit for the teamwork.

Question:

  1. What do you think using the computation power to decide the optimal structure for teamwork?
  2. In this paper, the author finds random tester to form a group and complete the task, do you think this will influence the results?
  3. Under what condition do you think this platform would most benefit.

Word Count: 544

Read More

02/19/2020 – Human-Machine Collaboration for Content Regulation – Myles Frantz

Since the dawn of the internet, it has surpassed many expectations and is prolific throughout everyday life. Though initially there was a lack of standards in website design and forum moderation, it has relatively stabilized with various and scientific approaches. A popular forum side, Reddit, use a human lead human-ai collaboration to help automatically and manually moderate the ever-growing comments and thread. Searching through the top 100 subreddits (at the time of writing), the team followed surveyed moderators from 5 varied and highly active subreddits. These moderators are majority Due to the easy to use API provided by Reddit, one of the most used moderation tools was a third party later incorporated into Reddit Automod. This is one of the more popular and common tools used by moderators in Reddit. Since it is very extensible, there is no common standard between all the subreddits. Moderators within the 5 subreddits use this bot in relatively similar but different ways. Not only the sole bot used by moderations, other bots can be used to further interact and streamline other bots in a similar fashion. However due to the complication of bots (technologically or lack of interest in learning the tool), some subreddits let a few people manage the bots, sometimes to damning results. When issues happen, instead of being reactive to various users’ reactions, the paper argues for more transparency of the bot.

I agree with the author of the original of automod, when he started off making the bot purely to automate several steps. Continuing this forward with the scaling of Reddit, I do believe it would be impossible for only human moderators to keep up with the “trolls”.

Though I do disagree with how the rules of the automod are spread out. I would believe the decentralization of knowledge would make the system more robust, especially since the moderators are voluntary. It is natural for people to avoid what they don’t understand, for fear of it in general or fear for what repercussions may happen. Though I don’t think putting all of the work on one moderator is necessarily the right answer.

  • One of my questions is regarding one of the outcomes for Reddit; granting more visibility of the automods actions. Notably due to the scale of Reddit, extending this kind of functionality automatically could incur much more of a memory and storage overhead. Already Reddit stores vast amount of data however potentially doubling the memory capacity (if every comment was reviewed by automod) may be a downfall to this approach.
  • Instead of surveying the top 60%, I wonder if surveying the lower ranked (via RedditMetrics) subreddit with a lower number of moderators would fit the same pattern of the automod use. I would imagine they would be forced to use the automod tool more in depth and in breadth due to the lack of available resources however this is pure speculation.
  • A final question would be, to what percentage is there an over duplication of bots across the subreddits? If there is a big percentage it may lead to a vastly different experience across subreddits, as it seemingly is now potentially causing confusion amongst new or recurring users.

Read More

02/19/2020 – Updates in Human-AI Teams: Understanding and Addressing the Performance / Compatibility Trade off – Yuhang Liu

This paper first proposes the complementarity between humans and artificial intelligence. In many cases, humans and artificial intelligence will form a team. When people make decisions after checking the inferences of AI, this cooperation model has applications in many fields, and achieved significant results. Usually, this kind of achievement requires certain prerequisites. First, people must have their own judgments on the conclusions of artificial intelligence. At the same time, the results of artificial intelligence must be accurate. The tacit cooperation between the two can improve efficiency. However, with the updating of artificial intelligence systems and the expansion of data, this cooperation will be broken. On the one hand, the accuracy of artificial intelligence will decline, and because of the expansion of boundaries, people’s understanding of artificial intelligence will be broken. So after the system update, the efficiency will be reduced instead. This paper mainly studies this situation. The article hopes to be compatible with the previous method after the update, so several methods are proposed to achieve this purpose, so as to achieve more compatible and accurate updates.

It is also suggested that this idea is obtained by analogy. In software engineering, if the updated system can support legacy software, it will be compatible after the update. I agree with this kind of analogy greatly, which is similar to bionics. We can continuously apply new ideas to the computer field through this kind of thought. The method mentioned in this paper is also very necessary. In the ordinary process of artificial intelligence or machine learning, we usually build data sets for each time, and lack the concept of inheritance, which is very inconvenient. After adopting compatible ideas, it will greatly save energy and be able to serve people more smoothly.

This article introduces CAJA, a platform for measuring the impact of AI performance and updates on team performance. At the same time, a practical retraining goal is introduced in the article to improve update compatibility. The main idea is to improve update compatibility by punishing new errors. But it can also be seen from the text that trust is the core of team work. Admittedly, trust is the essence of a team, but only as the basis of work, I think that more simulations and improvements are needed to improve humanity. The combination of problem-solving factors and the key of machine learning, we know that after learning new things, people will not have a negative impact on previous skills, but we will have more perspectives and methods to think about a problem, so I think that humans and machines should be mixed, that is, a team as a whole, so that the results can be more compatible, and the human machine interaction can be more successful.

question:

  1. What are the implications of compatible AI updates?
  2. How to better treat people and machines as a whole?
  3. Whether compatible AI will affect the final training results?

Read More

2/19/20 – Lee Lisle – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

Summary

            Bansal et. al discuss how human-AI teams work in solving high-stakes issues such as hospital patient discharging scenarios or credit risk assessment. They point out that the humans in these teams often create a mental model of the AI suggestions, where the mental model is an understanding of when the AI is likely wrong about the outcome. The authors then show that updates to the AI can produce worse performance if they are not compatible with the already formed mental model of the human user. They go on to define types of compatibility for AI updates, as well as a few other key terms relating to human/AI teams. They develop a platform to measure how compatibility can affect team performance, and then measure AI update compatibility effectiveness through a user study using 25 mTurk workers. In all, they show that incompatible updates reduce performance as compared to no update at all.

Personal Reflection

            The paper was an interesting study in the effect of pushing updates without considering the user involved in the process. I hadn’t thought of the human as an exactly equal player in the team, where the AI likely has more information and could provide a better suggestion. However, it makes sense that the human leverages other sources of information and forms a better understanding of what choice to ultimately make.

            CAJA, the human/AI simulation platform, seems like a good way to test AI updates, however I struggle to see how it can be used to test other theories as the authors seem to suggest. It is, essentially, a simple user-learning game, where users figure out when to trust the machine and when to deviate. While this isn’t exactly my field of expertise, I only see the chance to change information flows and the underlying AI as ways of learning new things about human/AI collaboration. This would mean terming this as a platform is a little excessive.

Questions

  1. The authors mention that, in order to defeat mTurk scammers who click through projects like these quickly, they drop the lowest quartile (in terms of performance) out of their results. Do you think this was an effective countermeasure, or could the authors be cutting good data?
  2. From other sources, such as Weapons of Math Destruction, we can read how some AI suggestions are inherently biased (even racist) due to input data. How might this change the authors results? Do you think this is taken into consideration at all?
  3. One suggestion near the end of the paper stated that, if pushing an incompatible update, the authors of the AI should make the change explicit so that the user could adjust accordingly. Do you think this is an acceptable tradeoff to not creating a compatible update?  Why or why not?
  4. The authors note that, as the complexity of error boundary f became more complex, errors increased, so they kept to relatively simple boundaries. Is this an effective choice for this system, considering real systems are extremely complex? Why or why not?
  5. The authors state that they wanted the “compute” cost to be net 0. Does this effectively simulate real-world experiences? Is the opportunity-cost the only net negative here?

Read More

2/19/20 – Lee Lisle – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

Summary

            Geiger and Ribes cover the case of using automated tools or “bots” in order to prevent vandalism on the popular online and user-generated encyclopedia “Wikipedia.” The authors detail how editors use popular distributed cognition coordination services such as “Huggle,” and argue that these coordination applications affect the creation and maintenance of wikipedia as much as the traditional social roles of editors. The team of human and AI work together to fight vandalism in the form of rogue edits. They cover how bots assisted essentially 0% of edits in 2006 to 12% in 2009, while editors use even more bot assistance. They then deep dive into how the editors came to ban a single vandal that committed 20 false edits to Wikipedia in an hour, which they term a “trace ethnography.”

Personal Reflection

            This work was eye-opening in seeing exactly how Wikipedia editors leverage bots and other distributed cognition to maintain order in Wikipedia. Furthermore, after reading this, I am much more confident in the accuracy of articles contained on the website (possibly to the chagrin of teachers everywhere). I was surprised how easily attack edits were repelled by the Wikipedia editors, considering that hostile bot networks could be deployed against Wikipedia as well.

            I also generally enjoyed the analogy of how managing Wikipedia is like navigating a naval vessel in that both leverage significant amounts of distributed cognition in order to succeed. Showing how many roles are needed in order to understand various jobs and collaborate between people was quite effective.

            Lastly, their focus (trace ethnography) on a single vandal was an effective way of portraying what is essentially daily life for these maintainers. I was somewhat surprised that only four people were involved before banning a user; I had figured that each vandal took much longer to identify and remedy. How the process proceeded, where the vandal got repeated warnings before a (temporary) ban occurred, and how the bots and humans worked together in order to come to this conclusion, was a fascinating process that I hadn’t seen written in a paper before.

Questions

  1. One bot that this article didn’t look into is a twitter bot that tracked all changes on Wikipedia made by IP addresses used by congressional members (@CongressEdits). Its audience is not specifically intended to be the editors of Wikipedia, but how might this help them? How does this bot help the general public? (It has since been banned in 2018) How might a tool like this be abused?
  2. How might a trace ethnography be used in other applications for HCI? Does this approach make sense for domains other than global editors?
  3. How can Huggle (or the other tools) be changed in order to tackle a different application, such as version control? Would it be better than current tools?
  4. Is there a way to exploit this system for vandals? That is, are there any weaknesses to human/bot collaboration in this case?

Read More

02/19/2020 – Nan LI – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

Summary:

In this paper, the author presented that it is prevailing nowadays human and AI form a team to make decisions in a way that AI provides recommendations and humans decide to trust or not. In these cases, successful team collaboration mainly based on human knowledge on previous AI system performance. Therefore, the author proposed that though the update of the AI system would enhance the AI predictive precision, it will hurt the team performance since the updated version of the AI system usually not compatible with the mental model that humans developed with previous AI systems. To solve this problem, the author introduced the concept of compatibility of AI update with prior user experience. To examine the role of this compatibility in Human-AI teams, the author proposed methods and designed a platform called CAJA to measure the impact of updates on team performance. The outcomes show that team performance could be harmed even the updated system’s predictive accuracy improved. Finally, the paper proposed a re-training objective that can promote the compatibility of updates. In conclusion, to avoid diminish team performance, the developer should build a more compatible update without surrendering performance. 

Reflection:

In this paper, the author talked about a specific interaction, which is AI-advised human decision making. As the example presented in the paper–Patient readmission system. In these cases, an incompatible update of the AI system would indeed harm the team performance. However, I think the extent of the impact largely depends on the correlation between the human and AI systems.

If the system and the user have a high grade of interdependence, both are not specialists on a task, the system prediction accuracy and user knowledge have the same impact on the decision result, the incompatible update of the AI system will weaken the team performance. Even though this effect can be eliminated by the running-in of the user and the system later, the cost for the decision in the high-stakes domain will be very large.

However, if the system interacts with users frequently, but the system’s predictions are only one of the concerns for humans to make decisions and cannot directly affect the decision, then the impact of incompatible updates on team performance will be limited.

Besides, if humans are more expertise on the task, and can validate the correctness of the recommendation promptly, then both the improvement of the system performance and the new errors caused by the system update will not have much impact on the results. On the other hand, if the error caused by the update does not affect team performance, then when updating the system, we do not need to consider compatibility but only need to consider the improvement of system performance. As a conclusion, if there is not enough interaction between the user and the system, and the degree of interdependence is not high, or the system only serves as an auxiliary or double-check, then the system update will not have a great impact on team performance.

A compatible update is indeed helpful for users to quickly adapt to the new system, but I think the impact of update largely depends on the correlation between the user and the system, or the proportion of the system’s role in teamwork.

Besides, design a compatible update version also requires extra cost. Therefore, I think we should consider minimizing the impact of system errors on the decision-making process when designing the system and establishing human-AI interaction.

Question:

  1. What do you think about the concept of compatibility of AI update?
  2. Do you have any human-AI system examples that apply this author theory?
  3. Under what circumstances do you think the author’s theory is the most used and when it is not applicable?
  4. When we need to update the system frequently, do you think it is better to build a compatible update or to use an alternative method to solve the human adaptation costs?
  5. In my opinion, Huaman’s degree of adaptation is very high, and the cost required for humans to adapt is much smaller than the cost of developing a compatible update version. what do you think?

Word Count: 682

Read More

02/19/2020 – Vikram Mohanty – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

Summary

This paper, through a case study, highlights the invisible distributed cognition process that goes on underneath a collaborative environment like Wikipedia, and how different actors, both humans and non-humans, come together for achieving a common goal – banning a vandal on Wikipedia. The authors show the usefulness of trace ethnography as a method for reconstructing user actions and understanding better the role each actor plays in the larger scheme of things. The paper advocates for not dismissing the role of bots as mere force multipliers, but to see them in a different lens considering the wide impact they have.

Reflection

Similar to the “Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator” paper, this paper is a great example that intelligent agents (AI-infused systems, bots, scripts, etc.) should not be studied in isolation only, but through a socio-technical lens. In my opinion, that provides a more comprehensive picture of the goals these agents can and cannot achieve, the collaboration processes they may inevitably transform, the human roles they may affect and other unintended consequences than performance/accuracy metrics alone.

Trace ethnography is a powerful method for reconstructing user actions in a distributed environment, and using that to understand how multiple actors (human and non-humans) achieve a complex objective, by sub-consciously collaborating with each other. The paper advocates that bots/automation/intelligent agents should not be seen as just force multipliers or irrelevant users. This is important as a lot of current evaluation metrics focus only on quantitative measures such as performance or accuracy. This paints an incomplete, and sometimes, an irresponsible picture of intelligent agents, as they have now evolved to assume an irreplaceable role in the larger scheme of things (or goals).

The final decision-making privilege resides with the human administrator and the whole socio-technical pipeline assists each step of decision-making with all possible information available so that checks and bounds (or order, as the paper mentions) is maintained at every stage. Automated decisions, whenever taken, are grounded in some confidence of certainty. In my opinion, while building AI models, researchers should think about the AI-infused system or the real-world setting of which these algorithms would be a part of. This might motivate researchers to make these algorithms more transparent or interpretable. The lens of the user who is going to wield these models/algorithms might help further.

It’s interesting to see some of the principles of mixed-initiative systems being used here i.e. history of the vandal’s actions, templated messages, showing statuses, etc.

Questions

  1. Do you plan to use trace ethnography in your proposed project? If so, how? Why do you think it’s going to make a difference?
  2. What are some of the risks and benefits of employing a fully automated pipeline in this particular case study i.e. banning a Wikipedia vandal?
  3. A democratic online platform like Wikipedia supports the notion of anyone coming in and making changes, and thus necessitates deploying moderation workflows to curb bad actors. However, if a platform were restrictive to some degrees, a post-hoc setup may not be necessary and the platform might be less toxic. This does not necessarily be the case for Wikipedia and can also extend to SNS like Twitter/Facebook, etc. What would you prefer, a democratic one or a restrictive one?

Read More

02/19/20 – Fanglan Chen – In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

Summary

Zhou et al.’s paper “In Search of the Dream Team” introduces DreamTeam — a system that identifies effective team structures for each group of individuals by suggesting different structures and evaluating the fit of each team. How team works relates with team structures, including roles, norms, and interaction patterns. Prior organizational behavior research doubts the existence of universally perfect structures. The rationale is simple: teams broast of great diversity so one single structure cannot satisfy the full functioning of each team. The proposed DreamTeam explores values along five dimensions of team structures including hierarchy, interaction patterns, norms of engagement, decision-making norms, and feedback norms. The system leverages feedback, randomly choosing metrics such as team performance or satisfaction, to iteratively identify the team structures that facilitate the best organization of each team. Also, the authors design multi-armed bandits with temporal constraints, an algorithm that determines the timing of exploration and exploitation trade-offs across multiple dimensions to avoid overwhelming teams with too many changes. In the experiments, DreamTeam is integrated with the chat platform Slack and achieves better performance and more diverse team structures compared with baseline methods.

Reflection

The authors design a system to facilitate the organization of virtual teams. Along with the several limitations mentioned in the paper, I feel the proposed DreamTeam system is based on a comparatively narrow scope of what makes a dream team and it seems difficult to generalize the framework to a variety of domains or platforms. 

In the first place, I do not agree that there is a universal approach to design or evaluate a so-called dream team. The components that make a dream team vary in different domains. For example, in sports, I would say personality and talent play important roles in forming a dream team. Actually, it goes beyond the term “forming” that a bunch of talented individuals not only bring technical expertise to the team, but they also contribute passion, strong work ethic, and strive for peak performance in the pursuit of excellence. To extend that point, working with people having similar personalities, similar values, similar pursuits will bring some chemistry to the team work which potentially enables challenging problem solving and strategic planning. All of these are not mentioned in the demensions and nearly impossible to be evaluated quantitatively. 

Also, I think it is important to make every team member understand their role, such as why they need to tackle the tasks and how that ties to a larger purpose beyond self’s needs. This provides a clear purpose and direction of where a group of people need to move forwards as a team. I do not think the authors emphasize the importance of how such understanding influences team member level of commitment. In addition, this kind of unified purpose can avoid duplication of member efforts and prevents pulling the efforts in multiple directions. 

Last but not least, in my opinion, basing on the maximizing of rewards is not the ideal way to determine the best team structures. Human society treasure process as well as results. It can be seen as a successful teamwork as long as the whole team is motivated and working on it. If too much emphasis is put on results, then the joy will be drained out of the job for the team. As long as progressive steps are made towards achieving the goal within a reasonable time frame, the team will become better. Building an ambitious, driven and passionate team is just the start. We need to ensure that the team members survive and are nurtured so that they can deliver on the targets.

Discussion

I think the following questions are worthy of further discussion.

  • If you are the CEO of a company or a university president, would you consider using the proposed DreamTeam system to help organize your own team? Why or why not?
  • Do you think the five bandits capture all dimensions to make a dream team?  
  • Do you think the proposed DreamTeam system can be generalized to various domains? Are there any domains you think the system would not contribute towards an efficient team structure?
  • Is there anything you can think about to improve the proposed DreamTeam system?

Read More

02/19/2020 – Palakh Mignonne Jude – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

SUMMARY

In this paper, the authors focus on the efforts (both human and non-human) taken in order to moderate content on the English-language Wikipedia. The authors use trace ethnography in order to indicate how these ‘non-human’ technologies have transformed the way editing and moderation is performed on Wikipedia. These tools not only increase the speed and efficiency of the moderators, but also aide them in identifying changes that may have gone unnoticed by moderators – for example, the use of the ‘diff’ feature to identify edits made by a user enables the ‘vandal fighters’ to easily view malicious changes that may have been made to Wikipedia pages. The authors mention editing tools such as Huggle, Twinkle as well as a bot called the ClueBot that can examine edits and revert them based on a set of criteria such as obscenity, patent nonsense as well as mass removal of content by a user.  This synergy between the tools and humans has helped monitor changes to Wikipedia in near real-time and has lowered the level of expertise required by reviewers as an average volunteer with little to no knowledge of a domain is capable of performing these moderation tasks with the help of the aforementioned tools.

REFLECTION

I think it is interesting that the authors focus on the social effect on the activities done in Wikipedia due to various bots and assisted editing tools. I especially liked the analogy drawn from the work of Ed Hutchins of a navigator that is able to know the various trajectories through the work of a dozen crew members which the authors mention to be similar to blocking a vandal on Wikipedia through the combined effort of a complex network of interactions between software systems as well as human reviewers.

I thought it was interesting that the use of bots in edits increased from 2-4% in 2006 to about 16.33% in just about 4 years and this made me wonder what the current percentage of edits made by bots would be. The paper also mentions that the detection algorithms often discriminate against anonymous and newly registered users which is why I found it interesting to learn that users were allowed to reconfigure their queues such that they did not view anonymous edits as more suspicious. The paper mentions ClueBot that is capable to automatically reverting edits that contain obscene content, which made me wonder if efforts were made to develop bots that would be able to automatically revert edits that may contain hate speech and highly bigoted views.

QUESTIONS

  1. As indicated in the paper ‘Updates in Human-AI teams’, humans tend to form mental models when it comes to trusting machine recommendations. Considering that the editing tools in this paper are responsible for queuing the edits made as well as accurately keeping track of the number of warnings given to a user, do changes in the rules used by these tools affect human-machine team performance?
  2. Would restricting edits on Wikipedia to only users that are required to have non-anonymous login credentials (if not to the general public, non-anonymous to the moderators such as the implementation on Piazza wherein the professor can always view the true identity of the person posting the question) help lower the number of cases of vandalism?
  3. The study performed by this paper is now about 10 years old. What are the latest tools that are used by Wikipedia reviewers? How do they differ from the ones mentioned in this paper? Are more sophisticated detection methods employed by these newer tools? And which is the most popularly used assisted editing tool?

Read More