03/04/2020 – Nurendra Choudhary – Real-time captioning by groups of non-experts

March 4, 2020 Nurendra Choudhary 1 Comment

Summary

In this paper, the authors discuss a collaborative real-time captioning framework called LEGION:SCRIBE. They compare their system against the previous approach called CART and Automated Speech Recognition (ASR) system. The authors initiate the discussion with the benefits of captioning. They proceed to explain the expensive cost of hiring stenographers. Stenographers are the fastest and most accurate captioners with access to specialized keyboards and expertise in the area. However, they are prohibitively expensive (100-120$ an hour). ASR is much cheaper but their low accuracy deems them inapplicable in most real-world scenarios.

To alleviate the issues, the authors introduce SCRIBE framework. In SCRIBE, crowd-workers caption smaller parts of the speech. The parts are merged using an independent framework to form the final sentence. The latency of the system is 2.89s, emphasizing its real-time nature, which is a significant improvement over ~5s of CART.

Reflection

The paper introduces an interesting approach to collate data from multiple crowd workers for sequence learning tasks. The method has been applied before in cases such as Google Translate (translating small phrases) and ASR (voice recognition of speech segments). However, SCRIBE distinguishes itself by bringing in real-time improvement in the system. But, the system relies on the availability of crowd workers. This may lead to unreliable behaviour in the system. Additionally, the hired workers are not professionals. Hence, the quality is affected by human behavioral features such as mindset, emotions or mental stamina. I believe a study on the evolution of SCRIBE overtime and its dependence on such features needs to be analyzed.

Furthermore, I question the crowd management system. Amazon MT cannot guarantee real-time labourers. Currently, given the supply of workers with respect to the tasks, workers are always available. However, as more users adopt the system, this need not always hold true. So, crowd management systems should provide alternatives that guarantee such requirements. Also, the work provider needs to find alternatives to maintain real-time interaction, in case the crowd fails. In case of SCRIBE, the authors can append an ASR module in a situation of crowd failure. ASR may not give the best results but would be able to ensure smoother user experience.

The current development system does not consider the volatility of crowd management systems. This makes them an external single point of failure. I think there should be a push in the direction of simultaneously adopting multiple management systems for the framework to increase their reliability. This will also improve system efficiency because it has a more diverse set of results as choice. Thus benefiting the overall model structure and user adoption.

Questions

Google Translate uses a similar strategy by asking its users to translate parts of sentences. Can this technique be globally applied to any sequential learning framework? Is there a way we can divide sequences into independent segments? In case of dependent segments, can we just use a similar merging module or is it always problem-dependent?
The system depends on the availability of crowd workers. Should there be a study on the availability aspect? What kind of systems would be benefitted from this?
Should there be a new crowd work management system with a sole focus on providing real-time data provisions?
Should the responsibility of ensuring real-time nature be on the management system or the work provider? How will it impact the current development framework?

Word Count: 567

03/04/2020 – Nurendra Choudhary – Combining crowdsourcing and google street view to identify street-level accessibility problems

March 1, 2020March 4, 2020 Nurendra Choudhary Leave a comment

Summary

In this paper, the authors discuss a crowd-sourcing method utilizing Amazon MT workers to identify accessibility issues in google street view images. They utilize two levels of street views for annotations: image-level and pixel-level. They evaluate intra and inter-annotator agreement and conclude a feasible level of accuracy of 81% (increased to 93% with minor quality control additions) for real-world scenarios.

The authors initiate the paper with a discussion about the necessity of such approaches. The solution could lead to more accessibility-aware solutions. The paper utilizes precision, recall and f1-score to consolidate and evaluate image-level annotations. For pixel-level annotations, the authors utilize two sets of evaluation metrics: overlap between annotated pixels and precision-recall scores. The experiments depict an inter-annotator agreement that makes the system feasible in real-world scenarios. The authors also utilize majority voting between annotators to improve the accuracy further.

Reflection

The paper introduces an interesting approach to utilize crowd-sourced annotations for static image databases. This leads me to question other cheaper sources of images that can be utilized for this purpose. For example, google maps provides a more frequently updated set of images. Also, acquiring these images is more cost-effective. I think this will be a better alternative to the street-view images.

Additionally, the paper adopts majority voting to improve its results. Theoretically, this should lead to perfect accuracy. The method gets 93% accuracy after the addition. I would like to see examples where the method fails. This will enable development of better collation strategies in the future. I understand that in some cases, the image might be too unclear. However, examples of such failures would give us more data to improve the strategies.

Also, the images contain much more data than currently being collected. We can build an interpretable representation of such images that collect all world information contained in the images. However, the computational effectiveness and validity is still questionable. But, if we are able to better information systems, such representations might enable a huge leap forward in the AI research (similar to ImageNet). We can also combine this data to build a profile of any place such that it helps any user that wants to access it in the future (e.g.; accessibility of restaurants or schools). Furthermore, given the time-sensitivity of accessibility, I think a dynamic model will be better than the proposed static approach. However, this will require a cheaper method of acquiring street-view data. Hence, we need to look for alternative sources of data that may provide comparable performance while limiting the expenses.

Questions

What is the generalization of this method? Can this be applied to any static image database? The paper focuses on accessibility issues. Can this be extended to other issues such as road repairs and emergency lane systems?
Street view data collection requires significant effort and is also expensive. Could we utilize Google maps to achieve reasonable results? What is a possible limitation to applying the same approach on Google satellite imagery?
What about the time sensitivity of the approach? How will it track real-time changes to the system? Does this approach require constant monitoring?
The images contain much more information. How can we exploit it? Can we use it to detect infrastructural issues with government services such as parks, schools, roads etc.?

Word Count: 560

02/26/2020 – Nurendra Choudhary -Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems

February 25, 2020February 25, 2020 Nurendra Choudhary Leave a comment

Summary

In this paper, the authors discuss the acceptance of imperfect AI systems by human users. They, specifically, consider the case of email-scheduling assistants for their experiments. Three features are adopted to interface between users and the assistant: Accuracy Indicator (indicates expected accuracy of the system), Example-based Explanation design (explanation of sample test-cases for pre-usage development of human mental models) and Control Slider Design (to let users control the system aggressiveness, False Positive vs False Negative Rate).

The participants of the study completed a 6 step procedure. The procedures analyzed their initial expectations of the system. The evaluation showed that the features helped in setting the right expectation for the participants. Additionally, it concluded that systems with High Recall gave a pseudo-sense of higher accuracy than High Precision. The study shows that user expectations and acceptance can be satisfied through not only intelligible explanations but also tweaking model evaluation metrics to emphasize on one over another (Recall over Precision in this case).

Reflection

The paper explores user expectations of AI and their corresponding perceptions and acceptance. An interesting idea is tweaking evaluation metrics. In previous classes and a lot of current research, we discuss utilizing interpretable or explainable AI as the fix for the problem of user acceptance. However, the research shows that even simple measures such as tweaking evaluation metrics to prefer recall over precision can create a pseudo-sense of higher accuracy for users. This makes me question the validity of current evaluation techniques. Current metrics are statistical measures for deterministic models. The statistical measures directly correlated with user acceptance because of human comprehensibility of their behaviour. However, given the indeterministic nature of AI and its incomprehensible nature, old statistical measures may not be the right way to validate AI models. For our specific research/problems, we need to study end-user more closely and design metrics that correlate to the user demands.

Another important aspect is the human comprehensibility of AI systems. We notice from the paper that addition of critical features significantly increased user acceptance and perception. I believe there is a necessity for more such systems across the field. The features help the user expectation of the system and also help adoption of AI systems in real-world scenarios. The slider is a great example of manual control that could be provided to users to enable them to set their expectations from the system. Explanation of the system also helps develop human mental models so they can understand and adapt to AI changes faster. Example, search engines or recommender systems record information on users. If the users understand they store and utilize for their recommendation, they would modify their usage accordingly to fit the system requirements. This will improve system performance and also user experience. Also, it will lead to a sense of pseudo-deterministic nature in AI systems.

Questions

Can such studies help us in finding relevance of evaluation metric in the problem’s context?
Evaluation metrics have been designed as statistical measures. Has this requirement changed? Should we design metrics based on user experience?
Should AI be developed according to human expectations or independently?
Can this also be applied to processes that generally do not directly involve humans such as recommender systems or search engines?

Word Count: 557

02/26/2020 – Nurendra Choudhary – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

February 25, 2020February 25, 2020 Nurendra Choudhary Leave a comment

Summary

In this paper, the authors design explainable Machine Learning models to enhance their fairness perception. In this case, they study COMPAS, a model that predicts a criminal’s chance of reoffending. They explain the drawbacks and fairness issues with COMPAS (overestimates the chance for certain communities) and analyze the significance of change that Explainable AI (XAI) can bring to this fairness issue. They generated automatic explanations for COMPAS utilizing previously developed templates (Binns et al. 2018). The explanations are based on 4 templates: Sensitivity, Case, Demographic and Input-Influence.

The authors hire 160 MT workers with certain criterias such as US residence and MT expertise. The workers are a diverse set but show no significant impact on the results’ variance. The experimental setup is a questionnaire that judges the worker’s criteria for making fairness judgements. The results show that the workers have heterogeneous criteria for making fairness judgements. Additionally, the experiment highlights two fairness issues: “unfair models (e.g., learned from biased data), and fairness discrepancy of different cases (e.g., in different regions of the feature space)”.

Reflection

AI works in a very stealthy manner. The reason is that most of the algorithms detect patterns in a latent space that is incomprehensible to humans. The idea of using automatically generated standard templates to construct explanations to AI behaviour should be generalized to other AI research areas. The experiments show the change in human behavior with respect to explanations. I believe such explanations could not only help the general population’s understanding but also help researchers in narrowing down the limitations of these systems.

From the case of COMPAS, I question the future roles that interpretable AI makes possible. If AI is able to give explanations for its prediction, then I think it shall play the role of an unbiased judge better than humans. Societal biases are embedded in humans and they might subconsciously affect our choices. Interpreting these choices in humans is a complex self-criticism endeavour. But, for AI, systems as given in the paper can generate human comprehensible explanations to validate their predictions. Thus, making AI an objectively fairer judge than humans.

Additionally, I believe evaluation metrics for AI lean towards improving their overall prediction. However, I believe that comparable models that emphasize interpretability should be given more importance. But, a drawback to such metrics is the necessity of human evaluation for interpretability. This will impede the rate of progress in AI development. We need to develop better evaluation strategies for interpretability. In this paper, the authors hired 160 MT workers. Given it is a one-time evaluation, this study is possible. However, if this needs to be included in the regular AI development pipeline, we need more scalable approaches to avoid prohibitively expensive evaluation costs. One method could be to rely on a less-diverse test set for the development phase and increase diversity according to the real-world problem setting.

Questions

How difficult is it to provide such explanations for all AI fields? Would it help in progressing AI understanding and development?
How should we balance between explainability and effectiveness of AI models? Is it valid to lose effectiveness in return for interpretability?
Would interpretability lead to adoption of AI systems in sensitive matters such as judiciary and politics?
Can we develop evaluation metrics around suitability of AI systems for real-world scenarios?

Word Count: 567

02/19/2020 – Nurendra Choudhary – Updates in Human-AI Teams

February 18, 2020February 18, 2020 Nurendra Choudhary Leave a comment

Summary

In this paper, the authors study the role of studying human-AI team performance in contrast to their individual performance and explain its necessity. They explain the importance of human inference of AI tools. Humans develop mental models of AI’s performance. Advances made in AI’s algorithm only evaluate the improvement in the prediction. However, the improvements cause behavioral changes in AI that do not fit the human’s mental models and reduce the overall performance of their team. To alleviate this, the authors propose a new logarithmic loss that considers the compatibility between human mental models and AI models for making updates to the AI model.

The authors construct user studies to show the development of human mental models across different conditions. Additionally, they illustrate the degradation in overall team performance with improvement in AI’s prediction. Furthermore, they show the addition of the additional loss increases the overall team performance of the AI model while increasing AI’s prediction efficiency.

Reflection

Humans and AI form formidable teams in multiple environments and I think such a study as a necessity for further development of AI. Most state-of-the-art AI systems are not independently useful in real-world and rely on human intervention from time-to-time (as discussed in previous classes). Till a point of time where this situation exists, we cannot improve AI independently and have to consider the humans involved in the task. I believe the evaluation metrics currently used in AI research are completely focussed on the AI’s prediction. However, this needs to change and the paper is a great primary step in the direction. I believe we should construct more such evaluation metrics for various other AI tasks. But, if we develop our evaluation metrics around human-AI teams, we take the risk of potentially making AI systems reliant on human input. Hence, there is a possibility that AI systems never independently solve our problems. I believe the solution lies in interpretability.

Current AI techniques rely on statistical spaces that are not human-interpretable. Focusing on making these spaces interpretable allows human comprehensibility. Interpretable AI is a rising research topic in several subareas of AI and I believe it can solve the current dilemma. We can develop AI systems independently and all the updates will be comprehensible by humans and they can accordingly update their mental models. But, we interpretability is not a trivial subject. Recent work has only shown incremental progress and the work still compromises on prediction ability for interpretability. The effectiveness of AI is observed because of their ability to recognize patterns in dimensions incomprehensible to human beings. The current paper and interpretability both require human understanding of the model and I am not sure if this is possible.

Questions

Can we have evaluation metrics for other tasks based on this? Will it involve human evaluation? If so, how do we maintain comparative fairness across such metrics?
If we continue evaluating Human-AI teams together, will we ever be able to develop completely independent AI systems?
Should we focus on making the AI systems interpretable or their performance?
Is interpretable AI the future for real-world systems? Think about, for every search query made, the user is able to see all their features that aids the system’s decision making process.

Word Count: 545

02/19/2020 – Nurendra Choudhary – The Work of Sustaining Order in Wikipedia

February 18, 2020February 18, 2020 Nurendra Choudhary Leave a comment

Summary

In this paper, the authors discuss the problem of maintaining order in open-edit information corpora, specifically Wikipedia here. They start with explaining the near-immunity of Wikipedia to vandalism that is achieved through a synergy between humans and AI. Wikipedia is open to all editors and the team behind the system is highly technical. However, the authors study on its immunity dependence on the community’s social behavior. They show that vandal fighters are networks of people that identify the vandals based on a network of behavior. They are supported by AI tools but banning a vandal is yet not a completely automated process. The process of banning a user is a requires individual editor judgements at a local level and a collective decision at a global level. This creates a heterogeneous network and emphasizes on decision corroboration by different actors.

As given in the conclusion, “this research has shown the salience of trace ethnography for the study of distributed sociotechnical systems”. Here, trace ethnography combines the ability of editors with data across their actions to analyze vandalism in Wikipedia.

Reflection

It is interesting to see that Wikipedia’s vandal fighters include such a seamless cooperation between humans and AI. I think this is another case where AI can leverage human networks for support. The more significant part is that the tasks are not trivial and require human specialization and not just plain effort. Also, collaboration is a significant part of AI’s capability. Human editors analyze the articles in the local context. AI can efficiently combine the results and target the source of these errors by building a heterogeneous network of such decisions. Further, human beings analyze these networks to ban vandals. This methodology applies the most important abilities of both humans and bots. The collaboration involves the best attributes of humans, i.e; judgement and of AI, i.e; pattern recognition. Also, it effectively utilizes this collaboration against vandals who are independent or small networks of mal-practitioners who do not have access to the bigger picture.

The methodology utilizes distributed work patterns for accomplishing different tasks of editing and moral agency. Distributing the work enables involvement of human beings on trivial tasks. However, combining the results to attain logical inferences is not humanly possible. This is because the vast amount of data is incomprehensible to humans. But, humans have the ability to develop algorithms that the machine can apply at a larger-scale to get such inferences. However, the inferences do not have a fixed structure and require human intelligence to retrieve desired actions against vandalism. Given that, most of the cases of such vandalism are by independent humans, a collaborative effort by AI can greatly turn the odds for vandal fighters. This is because AI aids humans by utilizing the bigger picture incomprehensible to just humans.

Questions

If vandals have access to the network, will they be able to destroy the synergy?
If there’s more motivation like political or monetary gain, will it give rise to a kind-of mafia network of such mal-practitioners? Will the current methodology still be valid in such a case?
Do we need a trust-worthiness metric for each Wikipedia page? Can the page be utilized as reference for absolute information?
Wikipedia is a great example of crowd-sourcing and this is a great article for crowd-control on these networks. Can this be extended to other crowd-sourcing softwares like Amazon MT or information blogs?

02/05/2020 – Nurendra Choudhary – Guidelines for Human-AI Interaction

February 4, 2020February 5, 2020 Nurendra Choudhary Leave a comment

Summary

In this paper, the authors propose a set of 18 guidelines for human-AI interaction design. The guidelines are codified 150 AI-related design recommendations collected from diverse sources. Additionally, they also validate the design from both the users’ and expert’s perspective.

For the users, the principles are evaluated by 49 HCI practitioners by testing out a familiar AI-driven feature of a product. The goal is to estimate the number of guidelines followed/not followed by the feature. The feedback form also had a field for “does not apply” with a corresponding explanation field. Also, the review included a clarity component to find out ambiguity in the guidelines. From the empirical study, the authors were able to conclude that all the guidelines were majorly clear and hence could be applied to human-AI interactions. The authors revised the guidelines according to the feedback and conducted an expert review

The guidelines are really suitable when deploying ML systems in the real-world. Generally, in the AI community, researchers do not find any immediate concrete benefits for developing user-friendly systems. However, when such systems need to be deployed for real-world users, the user experience or human-AI interaction becomes a crucial part of the overall mechanism.

For the experts, the old and new guidelines were presented and they agreed on the revised guidelines for all but one (G15). From this, the authors conclude the effectiveness of the review process.

Reflection

Studying its applicability is really important (like the authors did in the paper), because I do not feel all of them are necessary for the diverse number of applications. It is interesting to notice that for photo organizers, most of the guidelines are already being followed and that they include the most number of “does not apply”. Also, e-commerce seems to be plagued with issues. I think this is because of the gap in transparency. The AI systems in photo-organizers need to be advertised to the users and it directly affects their decisions. However, on the other hand for e-commerce, the AI systems work in the background to influence user choices.

AI systems steadily learn new things and its several times not interpretable by the researchers who invented them. So, this I believe is an unfair ask. However, as the AI research community pushes for increased interpretability in the systems, I believe it is possible and will definitely help users. Imagine if you could explicitly set the features attached to your profile to improve your search recommendations.

Similarly, focus on “relevant social norms” and “mitigate social biases” are presently not currently focus but I believe these will grow over time to form a dominant area of ML research.

I think we can use these guidelines as tools to diversify AI research into more avenues focusing on building systems that inherently maintain these principles.

Questions

Can we study the feasibility and cost-to-benefit ratio of making changes to present AI systems based on these guidelines?
Can such principles be evaluated from the other perspective? Can we give better data guidelines for AI to help it learn?
How frequently does the list need to evolve with the evolution of ML systems?
Do the users always need to know the changes in AI? Think about interactive systems, the AI learns in real-time. Wouldn’t there be too many notifications to track for a human user? Would it become something like spam?

Word Count: 569

1/28/2020 – Nurendra Choudhary – Human Computation

January 29, 2020February 5, 2020 Nurendra Choudhary 3 Comments

Summary:

In this paper, the authors detail the emergence of Human Computation as a new area for research. They provide a solid definition of area’s constitution as:

The problems fit the general paradigm of computation, and as such might someday be solvable by computers.
The human participation is directed by the computational system or process.

They further solidify the definition by comparing it with other similar fields and finding the niche differences between them. Additionally, the paper provides a classification system for human computation using six dimensions:

Motivation: Speaks about different reward mechanisms for humans in the framework.
Quality Control: Discusses various measures used to maintain quality of work and control humans that cheat the system’s reward.
Aggregation: Strategies for combining work done parallely by independent human labour into a block of usable data.
Human Skill: The extent of human’s ability to facilitate machines. Defines the places where general human computational abilities are better.
Process Order: Different frameworks of defining workflows with humans (requesters/workers) and computers.
Task-Request Cardinality: This defines the task assignment combinations such as if one task can be given to multiple humans or multiple tasks to the same human.

Reflection:

The authors define human computation by relying on previous definitions. But, the previous definitions show the dynamic nature of the field and hence, the definition should evolve over time. This, also, affects the derived system classification and its nomenclature. But, the paper is interesting in its effort to provide clear differences between human computation and relevant fields.

I was very interested in learning the different motivations (except monetary benefit) for human workforce. Similar motivations can be seen in early software development, where developers worked on open-source software without a monetary reward but just out of their nature for contributing to society. I believe given any field, monetary reward can never be the only motivation. Human beings are social animals with stronger instinct to contribute to society. Monetary benefit, on the other hand, is just perceptual. The classification nomenclature given by authors is very static to current worker platforms and I believe is subject to change overtime and hence, I believe it has limited advantage.

Questions:

Can this definition for human computation be static or is it subject to change overtime?
How can the task-request process, itself, change over-time?
To what extent should humans support machines before the challenges overcome the rewards?
What exactly is defined as a growing field? Does it mean the field will grow in research or will it spring out more sub-fields?

1/28/20 – Nurendra Choudhary – Beyond Mechanical Turk

January 29, 2020January 29, 2020 Nurendra Choudhary 1 Comment

Summary:

In this paper, the authors analyze and compare different crowd work platforms. They comment that research into such platforms has been limited to Mechanical Turk and their study wishes to encompass more of them.

They compare seven AMT alternatives namely ClickWorker, CloudFactory, CrowdComputing Systems, CrowdFlower, CrowdSource, MobileWorks, and oDesk. They evaluate the platforms on 12 different metrics to answer the high-level concerns of quality control, poor worker management tools, missing fraud prevention measures and lack of automated tools. The paper also distinctly provides a need from requesters to employ their own specialized workers through such platforms and apply their own management systems and workflows.

The analysis shows diversity of these platforms and identifies some commonalities such as “peer review, qualification tests, leaderboards, etc.” and also some contrastive features such as “automated methods, task availability on mobiles, ethical worker treatment, etc.”

Reflection:

The paper provides great evaluation metrics to judge aspects of a crowd work platform. The suggested workflow interfaces and tools can greatly streamline the process for requesters and workers. However, I don’t think these crowd work platforms are businesses. Hence, incentive is required to invest in such additional processes. In the case of MT, the competitors do not have enough market share to promote viability of additional streamline processes. I think as the processes become more complex, requesters will be limited by the current framework and a market opportunity will force the platforms to evolve by integrating the processes mentioned in the paper. This will be a natural progress based on traditional development cycles.

I am sure a large company like Amazon definitely has the resources and technical skills to lead such a maneuver for MT and other platforms will follow suit. But the most important aspect for change would be a market stimulus driven by necessity and not just desire. Currently, the responsibility falls on the requester because the requirement for the processes is rare.

Also, the paper only analyzes from a requester perspective. Currently, the worker is just a de-humanized number but adding such workflows may lead to discrimination between geographical regions or distrust in a worker’s declared skill sets. This will bring the real-world challenges in the “virtual workplace” and more often lead to challenging work conditions for remote workers. This condition might also lead to worrisome exclusivity which the current platforms avoid really well. However, I believe user checks and fraud networks in the system are areas that the platforms should really focus to improve user experience for requesters.

I think a different version of the service should be provided to corporations who need workflow management and expert help. For quality control, I believe the research community should investigate globally applicable efficient processes for these areas.

Questions:

How big is the market share of Mechanical Turk compared to other competitors?
Does Mechanical Turk need to take a lead in crowd work reforms?
Is the difference between platforms due to the kind of crowd work they support? If so, which type of work has better worker conditions?
How difficult is for MT integrate the quality controls and other challenges mentioned in the paper?

01/22/20 – Nurendra Choudhary – Ghost Work

January 27, 2020January 28, 2020 Nurendra Choudhary Leave a comment

Summary

In the Introduction, the authors define Ghost Work as the massive manual human labour that supports Artificial Intelligence (AI) systems’ ability to provide seamless user experience. It also describes a study that the authors conducted to analyze the humans involved in the Ghost Work. The section provides examples of such workers and their working situation/condition. They also discuss the lack of legal policies or definitions to incorporate such workers into the employment laws.

Additionally, the authors discuss the changing employment environment where different companies need similar low-skilled work. However, the work availability fluctuates on a project and so a platforms that pool readily-available low-skilled workers form a necessary supplement to the work-force.

Chapter 1 discusses examples of some platforms that act as rendezvous points for the work-force and companies that require instant labour. It starts off with the example of Amazon’s Mechanical Turk, the pioneer work-sharing platform. It moves on to discuss a case-study of ImageNet, a labour-intensive data annotation project and explains ghost workers’ role in the progress of the entire AI community. It also gives examples of other platforms like UHRS, LeadGenius, Amara and UpWork. The examples display the contrast and similarities between Micro (pieces of projects) and Macro (entire projects/sub-projects) Ghost Work. It also shows the difference in platform’s conduct towards its workers.

Reflection

Ghost work is a very important aspect of current AI methods. The human intuition has been developed over a number of generations. Currently, AI solves tasks that are very basic to humans. But, as it develops the complexity of tasks will increase exponentially. As mentioned, for sentiment analysis, the system relies on input human emotions to determine the sentiment of a sentence. However, recent research need sentiment analysis of human expressions (more complex for humans). This shows a trend with diminishing profits. Currently, the AI problems needs expertise that 99% of humans possess. But complexity will bring this number down to a point where a few in the world will possess such knowledge. Hence, the current architecture of recruiting ghost workers will no longer strive.

Another point is the rate of this complexity. If it is high, then we need insignificant changes in employment policies. However, if the trend is going to continue for a significant amount of time, then we need immediate policies to avoid worker abuse. The ghost workers’ contributions are momentous. However, due to their massive numbers and relative low-skill requirement, the returns are significantly curtailed. Unionization works as a solution to this problem. The concept was initially formed for this very purpose. Unions will have the power to fight against the massive corporations for fair policies and compensation for their valuable labour. A short example is given in Chapter 1, where CrowdFlower’s workers filed a suit against the company for labor practices. The platforms need to be responsible for the employees because that is the source of their compensation (like Uber is responsible for its drivers).

Questions

Will there be a point when AI will not need any human input? Will there be a point when we exhaust all human intelligence and intuition?
Can we put the workers in an existing employment classification like contract labourers?
How can we quantify the exact profit of Ghost Work for the original contractors? Can we utilize this to appropriately compensate the workers?
Where do the similarities between Ghost Work and contract labourers end?
Can we determine the area of expertise that needs to be developed for more such work? Or is it based on necessity?