02/26/2020 – Mohannad Al Ameedi – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

March 4, 2020May 4, 2020 mohada4 Leave a comment

Summary

In this paper, the authors study the social roles of editing tools in Wikipedia and the way vandalism fighting is addressed. The authors focus on the effected automated tools, like robots, and assisted editing tools on the distributed editing used by the encyclopedia. Wikipedia allows anyone in the universe to edit the content of its articles, which make keeping the quality of the content a difficult task. The platform depends on distributed social network of volunteers to approve or deny changes. Wikipedia uses a source control system to help the users see the changes. The source control shows both versions of the edited content side by side which allow the editor to see the change history. The authors mention that Wikipedia uses bots and automated scripts to help editing some content and fight vandalism. They also mentioned different tools used by the platform to assist the editing process. A combination of humans, automated tasks, and assisted edit tools make Wikipedia able to handle such massive number of edits and fight vandalism attempts. Most research papers that studied the editing process are outdated since they didn’t pay a close attention to these tools, while the authors highlights the importance of these tools on improving the overall quality of the content and allow more edits to be performed. These technological tools like bots and assisted editing tools changed the way humans interact with system and have a significant social effect on the types of activities that are made possible in Wikipedia.

Reflection

I found the idea of the distributed editing and vandalism fighting in Wikipedia interesting. Giving the massive amount of contents in Wikipedia, it is very challenging to keep high quality contents giving that anyone in the universe who has access to the internet can make edit. The internal source control and the assisted tools used to help the editing job at a scale are amazing.

I also found the usage of the bots to automate the edit for some content interesting. These automated scripts can help expediting the content refresh in Wikipedia, but also cause errors. Some tools mentioned in the paper don’t even show the bots changes, so I am not sure if there some method that can measure the accuracy f these bots.

The concept of distributed editing is similar to the concept of pull request in GitHub where any one can submit a change to an open source project and only group of system owners or administrator can accept or reject the changes.

Questions

Since millions or billions of people have smart phones nowadays, the amount of anonymous edit might significantly increase. Are these tools still efficient in handling such increased volume of edits?
Can we use deep learning or machine learning in fighting vandalism or spams? The number of edits performed on articles can be treated as a rich training dataset.
Why don’t Wikipedia combine all the assisted editing tools in to one too that has the best of each tool? Do you think this a good idea or more tools means more innovation and more choices?

03/04/2020 – Mohannad Al Ameedi – Real-Time Captioning by Groups of Non-Experts

March 4, 2020March 4, 2020 mohada4 1 Comment

Summary

In this paper, the authors proposing a low latency captioning solution for the deaf and hard of hearing people that can work in real-time setting. Although, there are available solutions, but they are either very expensive or low quality. The proposed system allows people with hearing disability to request a captioning at any time and get the result in a few seconds. The system depends on a combination of non-expert crowd sourcing workers and local staff to provide the captioning. Each request will be handled by multiple people and the result will be a combination of all the participants’ input. The request will be submitted in an audio stream format and the result will be in a text format. Crowdsource platform is used to submit the request and the result is retrieved in seconds. The proposed system uses an algorithm that work on a stream manner where the input can be process as it is received and aggregate the result at the end. The system outperforms all other available options on both coverage and accuracy. The proposed solution is feasible to be applied in a production setting.

Reflection

I found the idea of real time captioning very interesting. My understanding was there is always a latency when depending on crowdsourcing and cannot be applied in real world scenarios, but it will be interesting to know how the system will work when the number of users increase.

I also found the concept of multiple people working on the same audio stream and combining the result very interesting. Collecting captions from multiple people and then trying to figure out what is unique and what is duplicate and producing a final sentence, paragraph, or script is a challenging task.

This work is like multiple people work on one task or multiple developers writing code to implement a single feature. Normally the supervisor or development lead will merge the result, but in this case the algorithm is taking care of the merge.

Questions

The authors measured the system on a limited number of users, do you think the system will continue outperforming other methods if it is get deployed in real world setting?
Since we have an increasing number of live streaming on work, school, and other places, can we use the same concept to pass the URL and get instance captioning? What are the limitations of this approach?
What are the privacy concerns with this approach especially if it is get used in medical field? Normally limited number of people get hired to help on such tasks, while the crowdsourcing is opened to a wide range of people.

02/05/20 – Mohannad Al Ameedi – Guidelines for Human-AI Interaction

February 13, 2020February 13, 2020 mohada4 Leave a comment

Summary

In this paper, the authors suggest 18 design guidelines to build for Human – AI infused systems. These guidelines try to resolve the issues with many human interaction systems that either don’t follow guidelines or follow some guidelines that are not tested or evaluated. These issues include producing offensive, unpredictable, or dangerous results and might let users stop using these systems and therefore proper guidelines are necessary. in addition, advances in the AI field introduced new user demands in area like sound recognition, pattern recognition, and translation. These 18 guidelines help users to understand what the AI systems can do and cannot do and how well can do it, show information related to the task on focus with the appropriate timing in a way that can fit the user social and culture context, make sure that the user can request services when needed or ignore unnecessary services, offer explanation why the system do certain things, maintain a memory of the user recent action and try to learn from it, etc. These guidelines went through different phases starting from consolidating different guidelines, heuristics evaluation, user study, and expert evaluation and revisions. The authors hope that these guidelines will help building better AI-infused systems that can scale and work better with the increased number of users and advances AI algorithms and systems.

Reflection

I found the idea of putting together design guidelines very interesting as it make a standardization that can help building human-AI systems, and also help evaluating or testing these systems and can be used as a baseline when building large scale AI infused systems to avoid an well known issues associated with the previous systems.

I also found that the collection of academic and industrial guidelines are interested since it is collected based on over 20 years human interaction which can be regarded as a very valuable and rich information that can be used in different domains and fields.

I agree with the authors that some of AI-infused systems that not follow certain guidelines are confusing and not effective and sometimes counterproductive when the suggestions or recommendations are irrelevant, and that explains why some AI enabled or infused systems were popular on certain times but they couldn’t satisfy the user demands and eventually stopped being used by users.

Questions

Are these guidelines followed in Amazon Mechanical Turk?
The authors mention that there is a tradeoff between generality and specializations, what tradeoff factors when need to consider?

02/04/20 – Mohannad Al Ameedi – Principles of mixed-initiative user interfaces

February 5, 2020 mohada4 Leave a comment

Summary

In this paper, the author presents first two research efforts related to the human interaction. The first effort focuses on user direct manipulation and the second focuses on automated services. The author suggests an approach that can integrated both by offering a way to allow the user to directly manipulate user interface elements, and also use automation to decrease the amount of interaction necessary to finish a task. The author presents factors that can make the integration between the automation and direct manipulation more effective. These factors include developing added value automation, considering user goals uncertainty, considering the user attention in the timing of the automated service, considering the cost and benefit of the action uncertainty, involving direct dialog with the user, and other factors.

The author proposes a system called lookout which can help users to schedule meetings and appointments by reading the user emails and extract useful information related to meetings and auto populate some fields, like the meeting time and meeting recipients, which can save the user mouse and keyboard interaction. The uses probabilistic classification system to help reduce the amount of interaction necessary to accomplish scheduling tasks. The system also uses sound recognition feature developed by Microsoft research to offer additional help to the users during the direct manipulation of the suggested information. The lookout system combines both automated services and ability to allow direct manipulation of the system. The system also asses the user interaction to decide when the automated service will not be helpful to the user to make sure that the uncertainty is well considered. The lookout system improves the human-computer interaction through the combination of reasoning machinery and direct manipulation.

Reflection

I found the idea of maintaining of a working memory of user interaction interesting. This approach to learn from the user experience as stated in the paper, but it can also use machine learning methods to predict the next required action for a new user or an existing user by learning from all other users.

I also found the lookout system very interesting since it is integrating the direct user input with automated service. The sound recognition features that was developed by Microsoft research is not only allow the user to interact via mouse and keyboard but it also give another interaction media which is extremely important to users with disabilities.

Cortana, which is a Microsoft intelligent agent, does parsing to the email too and check if the user send an email that contains a keyword like “I will schedule a meeting” or “I will follow up on that later” and then will remind the user to follow up and will present two buttons asking the user to directly interact with the alert by either dismiss the alert or by asking the system to send a follow up again next day.

Questions

Can we use the human interaction data used in lookout as a labeled data and develop a machine learning algorithm that can predict the next user interaction?
Can we use LookOut idea in a different domain?
The author suggests dozen factors to integrate the automation services with direct manipulation, which factors you think can be useful to the crowdsourcing users?

Mohannad Al Ameedi – Human Computation: A Survey and Taxonomy of a Growing Field

January 30, 2020January 30, 2020 mohada4 Leave a comment

Summary

In this paper, the authors aim to classify the Human Computation systems and compare/contrast the term with other terms like crowdsourcing, social computing, and data mining. The paper starts by presenting some definitions of the human computation where all refer to it as a utilizing the human power to solve problems that can’t be solved by the computers yet.

The paper present different computational systems that share some properties with the human computation and yet they are different. The authors highlight some computational systems like social computing, crowdsourcing, and data mining and show the similarities and distinctions with the human computation systems. All systems grouped together under the collective intelligence where humans’ intel solve a big problem.

The paper presents a classification system for human computation systems that is based on six dimensions. The dimensions include motivation, quality control, aggregation, human skills, process order, and task-request cardinality.

The authors presented different ways to that can motivate people to participate in the systems like pay, altruism, enjoyment and others. And presents the pros and cons for each approach.

The authors also presented different approach to improve the quality of the systems like output or input agreements, expert review, and multilevel reviews and ground truth seeding. All these approaches try to get better quality and ways to measure the performance of the system.

There are different aggregation approaches that collect the results of the tasks completed and to formulate the solution of the global problem.

Other dimensions like human skills, process order, and task-request cardinality discuss that skills required, the way order is processed and the pipeline the request can go through.

Reflection

I found one interesting definition of human computation interesting. It defines it as “systems of computers and large numbers of humans that work together in order to solve problems that can’t be solved by either computers or humans”. It is try that if humans can solve problems then there will be no need to use computers and also if systems can solve the problems and automate the solution then there will be no need for humans so both need to work together to solve bigger problems.

I also found the comparison between different systems including the human computation interesting. I personally was thinking that some systems like crowdsourcing is a human computation system, but it appears it is not.

I agree with the dimensions that define or classify human computation systems as they are accurate measures that help researchers to build new system and to evaluate it.

To connect to other ideas, I found the work is like dynamic programming where we have to solve small problems to eventually solve the global problem. Small tasks are distributed to workers to solve a small problem and then aggregation methods will take these solutions to solve the global problem.

I also found the ground truth seeding quality control approach in similar to the training and testing data in any machine learning algorithm.

Questions

What other dimensions can we define to classify a human computation system?
There are different approaches that can measure the quality of a human computation systems. Which one is the best?
Can we combine to motivation methods together to get better results? Like combining both pay and the enjoyment to solve a global problem?

Mohannad Al Ameedi – Beyond Mechanical Turk

January 30, 2020January 30, 2020 mohada4 Leave a comment

Summary

In this paper, the authors aim to highlight and explore the features of online crowd works other than Amazon Mechanical Turk (AMT) that is not been investigated by many researchers. They recognize that AMT as a system that made a revolution on data processing and collection, but also lack very crucial features like quality control, automation, and integration.

The paper poses many questions about human computation and presents some solutions. The questions related to the current problems with AMT, features of other platforms, and a way to measure each system.

The authors discuss the limitation the AMT system like lacking a good quality control, lacking a good management tools, and missing a process to prevent fraud, and lacking of a way to automate repeated tasks.

The paper defines criteria to evaluate or asses a crowd work platform. The assessment uses different categories like incentive program, quality measures used in the system, the worker demographic and identity information, worker skills or qualifications, and other categories that they have used to compare/contrast different systems including AMT.

The authors also reviewed like seven AMT-alternatives like ClickWorker, CloudFactory, CrowdComputing Systems, CrowdFlower, CrowdSource, MobileWorks, and oDesk. They show the benefits of each system over AMT using the criteria mentioned above, and show that there is a significant improvement on these systems that make the entire process much better and enable the worker and the requester to interact in a better way.

The crows-platforms analysis done by the authors was the only work at the time the paper was written to compare different systems on a defined criterion, and they hope that this work offer a good foundation for other researchers to get better results.

All platforms are still missing features like analytics data about each worker to provide visibilities of the work that is done by each worker. There is also a lack of security measure to make sure that the system is robust and can respond to any adversarial attack.

Reflection

I found that the features that are missing from Amazon Mechanical Turk interesting giving the volume of the system usage and also given the work that Amazon do today on the cloud computing, marketplace and other area where the quality of the work is well known.

I also found that the technical details mentioned in the paper interesting. It seems to me that the authors go lots of feedback from everybody get involved in the system.

I agree with authors on the criteria mentioned in the paper to asses crowdsourcing systems like pay motivation, quality control, and automation and system integration.

The authors didn’t specify which system is the best on their opinion and which system meet all of their criteria.

Questions

Are there any other criteria that we can use to assess the crowdsourcing systems?
The author didn’t mention which system is the best. Is there a system that can outperform others?
Is there a reason why Amazon didn’t address these findings?

01/27/20 – Mohannad Al Ameedi – Ghost Work

January 28, 2020January 28, 2020 mohada4 Leave a comment

Summary of the Reading

The authors talk about a hidden or invisible power named as a Ghost Work that work side by side with software and AI systems to make sure that data is accurate when the AI system fails to do so. Users of systems like google search, YouTube, Facebook, and other applications don’t know that there are people working behind the sense to make sure that the internet is safe. There are thousands of workers that work on demand on a specific tasks in a form of projects that eventually either built a product or help to make sure that the software is working correctly. The author called these workers a shadow workforce and their work as a Ghost Work. The author suggesting that 38% of the current work will turn into ghost work in the future because of the advancement in technology and automation. Companies like Amazon are assigning people tasks to people to manually inspect some content like hate speech in Twitter or inappropriate pictures. Such tasks can’t be automatically inspected by AI systems. The human being in the loop when the AI can’t do the perfect job. The way workers can get tasks is competitive. The workers must claim their tasks using a request dashboard and it is first in first serve. The mix between human and computer computation is called crowdsourcing, microwork, crowdwork, and human computation. The ghost work doesn’t follow employment laws, or any government regulation and it is cross boarders and the countries how do the work more are US and India because of English language fluency. Work can be micro tasks like deterring if a post on Facebook is a hate speech or macro work like ImageNET project that labeled millions of pictures to be used in AI systems. Ghost work is also known as on-demand gigs economy and there are 2.5 million adults in the United States that participate in this kind of work. Amazon was one of the first companies that depends of ghost work and have built MTruck software handle its work load to give work to thousands of people to correct spelling or typos or any incorrect information. Ghost work is important to label data to make better prediction for algorithms like machine learning, machine translation, and pattern recognition. As automation advances people might start losing their jobs but the ghost work will be more requested and that can balance out the need for the workforce.

Reflections and Connections

What I found interesting about the reading was the way people around the world are working together on tasks that eventually produce a product or service and to make the internet safe. My understanding was companies uses AI systems to flag content and then employees or contractors but didn’t know that people across the globe are working together on task. I also found interesting that the interaction between human and the AI systems will continue which can decrease the concerns that the machine will take over people job in the future.

I agree with what was mentioned in the paper regarding the fusion of the code and human and the future of the workforce as they will turn more to invisible or ghost work.

As GitHub acquired by Microsoft, I think the software development will also follow the same patterns by assigning tasks to software developers across the globe to build software modules that will participate in building large scale applications.

Questions

What things we need to put in our consideration while doing research in the AI area that might affect human in the loop?
How do we build software systems that can automatically take the feedback of manual inspection or tagging and adjust its behavior for similar incidents?
What is our role as graduate students or researchers to shape the future of the ghost work?