02/19/2020-Bipasha Banerjee – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

Summary: 

The paper discusses software development tools that help in moderating content posted or altered in the online encyclopedia, popularly known as Wikipedia. Wikipedia was built on the concept that anyone with an internet connection and a device could edit pages on the platform. However, such platforms with “anyone can edit” mantra are prone to malicious users, aka Vandals. Vandals are people who post inappropriate content in the form of text alteration, the introduction of offensive content, etc. Humans can be moderators who can scan for offensive content and remove them. However, this is a tedious task for humans to do. It is impossible for them to monitor huge amounts of content and look for small changes hidden in a large body of the text. To aid humans, there are fully automated softwares that are responsible for monitoring, editing, and overall maintenance of the platform. Examples of such software are Huggle and Twinkle. These tools work with humans and help in keeping the platform free from vandals by flagging users and taking appropriate actions as deemed necessary.

Reflection:

This paper was an interesting read on how offensive content is dealt with in platforms like Wikipedia. It was interesting to learn about different tools and how they interact with humans and help them in making the platform clean of bad content. These tools are extremely useful, and it makes use of machine affordance of dealing with large amounts of data. However, I feel we should also discuss the fact that machines need human interference to evaluate its performance. The paper mentions “leveraging the skills of volunteers who may not be qualified to review an article formally”, this statement is bold and leads to a lot of open questions. Yes, this makes it easy to hire people with lesser expertise, but at the same time, it makes us aware of the fact that machines are taking up some jobs and undermining humans’ expertise.  

Most of the tools mentioned are flagging content based on words, phrases, or deletion of enormous content. These can be defined to be rule-based rather than machine learning. Can we implement machine learning and deep learning algorithms where the tool learns from user behavior as Wikipedia is data-rich and could provide a lot of data to the model to train on? The paper mentioned that “significant removal of content” is placed higher on the filter queue. My only concern is sometimes a user might press enter by mistake. For example, take the case of git. Users write codes, and the difference is generally recorded and shown in the diff from the previous commit. If a coder writes new lines of code may be a line or two and press enter erroneously before or after the entire piece, the whole block shows as “newly added” in the diff. This is easy for a human to understand, but a machine flags such content, nonetheless. This may lead to extra work which normally would have been not in the queue or even lower.

The paper talks about the “talk page” where the warnings are posted by tools. This is a very good step as public shaming is needed to stop such baneful behavior. However, we can incorporate a harsher way to “shame” such users. This may be in the form of poster usernames on the main homepage for every category. This won’t work for anonymous, but maybe blocking their IP address would be a temporary fix, I feel like human and computer interaction is well defined in the paper and the concept of content controlling bots make our life easier.

Questions:

  1. Are machines undermining human capabilities? Do we not need expertise any more?
  2. How can such tools utilize the vast amount of data better? E.g., for training deep learning models.
  3. How could such works be extended to other platforms like Twitter?

One thought on “02/19/2020-Bipasha Banerjee – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

  1. I like the comparison you make between the ‘diff’ that is displayed by platforms such as Huggle and Twinkle with Git. I think it would be interesting to know if the platforms used for edited assistance have additional ways to ensure that meaningless whitespace is ignored and not put higher up in the filter queue.
    I agree that using a public way to display identified vandals that are non-anonymous might in fact deter future vandals from performing a similar kind of vandalism.

Leave a Reply