03/04/20 – Akshita Jha – Pull the Plug? Predicting If Computers or Humans Should Segment Images

Summary:
“Pull the Plug? Predicting If Computers or Humans Should Segment Images” by Gurari et. al. talks about image segmentation. They propose a resource allocation framework that tries to predict when best to use a computer for segmenting images and when to switch to humans. Image segmentation is the process of “partitioning a single image into multiple segments” in order to simplify the image into something that is easier to analyze. The authors implement two systems that decide when to replace humans with computers to create fine-grained segments and when to replace computers with humans in order to get coarse segments. They demonstrate through experiments that this mixed model of humans and computers beats the state of the art systems for image segmentation. The authors use the resource allocation framework, “Pull the Plug”, on humans or computers. They do this by giving the system an image and trying to predict if an annotation should from a human or a computer. The authors evaluate the model using Pearson’s correlation coefficient (CC) and mean absolute error (MAE). CC indicates the correlation strength of the predicted score to the actual scores given by the Jaccard index on the ground truth. MAE is the average prediction errors. The authors thoroughly experiment with initializing segmentation tools and reducing human effort initialization.

Reflections:
This is an interesting work that successfully makes uses of mixed modes involving both humans and computers to enrich the precision and accuracy of a task. The two methods that the authors design for segmenting an image was particularly thoughtful. First, given an image, the authors design a system that tries to predict whether the image requires fine-grained segmentation or coarse-grained segmentation. This is non-trivial as this task requires the system to possess a certain level of “intelligence”. The authors use segmentation tolls but the motivation of the system design is to remain agnostic to these particular segmentation tools. The systems rank several segmentation tools by using a tool designed by the authors to predict the quality of the segmentation. The system then allocates the available human budget to create coarse segmentations. The second system tries to capture whether an image requires fine-grained segmentation or not. They do this by building on the coarse segmentation given by the first system. The second system refines the segmentation and allocates the available human budget to create fine-grained segmentation for low predicted quality segmentations. Both these tasks rely on the system proposed by the authors to predict the quality of candidate segmentation.

Questions:
1. The authors rely on their proposed system of predicting the quality of candidate segmentations. What kind of errors do you expect?
2. Can you think of a way to improve this system?
3. Can we replace the segmentation quality prediction system with a human? Do you expect the system to improve or would the performance go down? How would it affect the overall experience of the system?
4. In most such systems, humans are needed only for annotation. Can we think of more creative ways to engage humans while improving the system performance?

Read More

03/04/20 – Akshita Jha – Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-to-Language Technology for the Blind

Summary:
“Toward Scalable Social Alt Text: Conversational Crowdsourcing as a Tool for Refining Vision-to-Language Technology for the Blind” by Salisbury et. al. talks about the important problem of accessibility. The authors talk about the challenges that arise from an automatic image captioning system and how the imperfections in the system may hinder a blind person’s understanding of social media posts that have embedded imagery. The authors use mixed methods to evaluate and subsequently modify the captions generated by the automated system for images embedded in social media posts. They study how crowdsourcing can enhance the existing workflows and that provide scalable and useful alt text for the blind. The imperfections of the current automated captioning system hinder the user’s understanding of an image. The authors do a detailed analysis of the conversations collected by them to design user-friendly experiences that can effectively assist blind users. The authors focus on three research questions: (i) What value is provided by a state-of-the-art vision-to-language API in assisting BVI users, and what are the areas for improvement? (ii) What are the trade-offs between alternative workflows
for the crowd assisting BVI users? (iii) Can human-in-the loop workflows result in reusable content that can be shared with other BVI users? The authors study varying levels of human engagements and automated systems to come up with a final system that better understands the requirements for creating good quality al-text for blind and visually impaired users.

Reflections:
This is an interesting work as it talks about the often ignored problem of accessibility. The authors focus on images embedded in social media posts. Most of the times the automatic captions given by an automated system trained using a machine learning algorithm are inadequate and non descriptive. This might not be so much of a problem for day to day users but can be a huge challenge for blind people. This is a thoughtful analysis done by the authors keeping accessibility in mind. The authors validate their approach by running a follow-up study with seven blind and visually impaired users. The users were asked to compare the uncorrected vision to language caption and the alt text provided by their system. The findings showed that the blind and visually impaired users would prefer the conversational system designed by the authors to better understand the images. However, if the authors had taken the feedback from the target user group while developing the system that would have been more helpful instead of just asking the users to test the system. Also, the tweets used by the authors might not be representative of the kinds of tweets in the target users’ timeline.

Questions:
1. What do you think about the approach taken by the authors to generate the alt-text?
2. Would it have been helpful to conduct a survey to understand the needs of the blind and visually impaired users before developing the system?
3. Don’t you think using a conversational agent to understand the image embedded in tweets is too cumbersome and time consuming?

Read More

02/26/2020 – Akshita Jha – Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning

Summary:
“Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning” by Kaur et. al. talks about the interpretability tools that are being used to help data scientists and machine learning researchers. Very few of these tools have been evaluated to understand whether or not they achieve their goals with respect to interpretability. The authors extensively study two models: GAM and SHAP in detail. They conduct a contextual inquiry and a survey of data scientists to figure out how they utilize the information provided by these machine learning tools for their benefit. They highlight the qualitative themes from the model and conclude with the implications for researchers and tool designers.

Reflections:
There are two major aspects of interpretability: (i) Building interpretable models, (ii) Users’ understanding of these interpretable models. The paper does a good job of providing an in-depth analysis of the user’s understanding of these interpretable models. However, the authors focus on understanding a data scientist’s view of these tools. I feel that the quality of the interpretability of these models should be given by unskilled end users. The authors talk about the six themes that are captured by these values: (i) missing values, (ii) changes in data, (iii)duplicate data, (iv)redundant features, (v) ad-hoc categorization and (vi) debugging difficulties. They incorporate these into the “contextual inquiry”. More nuanced patterns for these might be revealed if an in-depth study is conducted. Also, depending on the domain knowledge of the participants, the interpretability scores might be interpreted differently. The authors should have tried to take this into account while surveying the candidates. Also, most people have started using deep learning models. It is, therefore, important to focus on the interpretability of these deep learning models. Authors focus on tabular data which might not be very helpful in the real world. A detailed study needs to be conducted in order to understand the interpretability in deep learning models. Something else I found interesting was the authors attributing the method of usage of these models to understanding system 1 and system 2 as decsribed by Kahneman. Humans make quick and automatic decisions based on ‘system 1’ because of missing values unless they are encouraged to engage their cognitive thought process which prompts ‘system 2’ kind of thinking.  The pilot interview was conducted on a very small group of users (N=6) to identify the common issues faced by the data scientists for in their work. A more representative survey should have been conducted for data scientists of different skill sets to help them better.
Questions:
1. What is post-hoc interpretability? Is that enough?
2. Should the burden lie on the developer to explain the predictions of a model?
3. Can we incorporate interpretability while making decisions?
4. How can humans help in such a scenario apart from evaluating the quality of the interpretable model?

Read More

02/26/2020 – Akshita Jha – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

Summary:
“Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment” by Dodge et. al. talks about explainable machine learning and how to ensure fairness. They conduct an empirical study involving around 160 Amazon Mechanical Turk workers. They demonstrate that certain explanations are considered “inherently less fair” while others may help in enhancing the people’s confidence in the fairness of algorithms. They also talk about different kinds of model interpretability; (i) model wide fairness and (ii)case-specific fairness discrepancies. They also show that people react differently to different styles of explanations based on individual differences. They conclude with a discussion on how to provide personalized and adaptive explanations. There are 21 different definitions of fairness. In general, fairness can be defined as “….discrimination is considered to be present if for two individuals that have the same characteristic relevant to the decision making and differ only in the sensitive attribute (e.g., gender/race) a model results in different decisions”. Disparate impact is the consequence of deploying unfair models where one protected group is affected negatively compared to the protected group. This paper talks about the explanation given by machine learning models and how such models can be inherently fair or unfair.

Reflections:
The researchers attempt to answer three primary research questions: (i) How do different styles of explanation impact fairness judgment of an ML system? They study in depth if certain ML models are more effective in teasing out the unfairness of the models. They also analyze if some explanations are inherently fairer. The second questions that the researchers tackle are (ii) How do individual factors in cognitive style and prior position on algorithmic fairness impact the fairness judgment with regard to different explanations? Lastly, the researchers question the benefits and the drawbacks of different explanations in supporting fairness judgment of ML systems? The researchers offer various explanations that can be based on input features, demographic features, sensitive features, and case-based explanations. The authors conduct an online survey and ask participant different questions. However, an individual’s background might also influence the answers given by the mechanical turkers. The authors perform a qualitative as well as quantitative analysis. One of the major limitations of this work is that the analysis was performed by crowd workers with limited experience whereas in real life the decision is made by lawyers. Additionally, the authors could have used LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (Shapley Additive Values) values for offering post-hoc explanations. The authors have also not studied an important element which is the confidence as they did not control for it.

Questions:
1. Which other model is unfair? Give some examples?
2. Are race and gender the only sensitive attributes? Can models discriminate based on some other attribute? If yes, which ones?
3. Who is responsible for building unfair ML models?
4. Are explanations of unfair models enough? Does that build enough confidence in the model?
5. Can you think of any adverse effects of providing model explanations?

Read More

02/18/20 – Akshita Jha – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

Summary:
“The Work of Sustaining Order in Wikipedia: The Banning of a Vandal” by Geiger and Ribes examines the role of software tools in the English Wikipedia, specifically involving autonomous and assisted editing. Wikipedia is a “free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.” Bots are “fully-automated software agents that perform algorithmically-defined tasks involved with editing, maintenance, and administration in Wikipedia.” Different bots have different functions which can range from simple tasks like correcting grammatical errors to more complicated tasks like detecting personal insults. The authors present a detailed case study: “The Banning of a Vandal”. The authors talk about “Huggle”, that is the most widely used editing tool across Wikipedia that queues all the edits. The user then has the option to perform a variety of actions like ‘revert’, ‘warn’, etc. on each of the edits that is displayed. The user does not have the option to select which edit he wants to make changes to. An anonymous user had been vandalizing multiple Wikipedia pages and was not discouraged by the warning and comments given by the moderators. Eventually, this rogue user was blocked by making use of the network of moderators or vandal fighters and the bots but it was more cumbersome than expected. In addition to the quantitative and the qualitative studies, the research also demonstrated the importance of trace ethnography for studying such sociotechnical systems.

Reflections:
This is an interesting work. It was particularly insightful as I was unaware of the role of multiple bots in Wikipedia editing. Bots and humans working cohesively have helped make Wikipedia the widely used resource it currently is. Making Wikipedia a free resource that allows editing by volunteers comes with a cost. This paper helped highlight the limitations of the Wikipedia bots and how a significant amount of effort is needed from multiple moderators to ban a vandal from Wikipedia. Each moderator makes a local judgement but the Wikipedia talk pages help keep a record of all the warnings against a particular user. Certain kinds of vandalism, like inserting obscenities and profanities, are easy to detect. However, if a vandal deletes an important section from the Wikipedia page, that might involve significant cognitive effort from moderators to identify and rectify. An interesting question is how would Wikipedia be effected, if it made use of a completely automated bot instead of the hybrid system it currently uses. Would the bots be able to determine the significance of an edit or a change? How would that change the moderators behaviors and actions? Since, automated tools help determine the kind of social activities that are possible on Wikipedia, will having a completely automated bot significantly alter Wikipedia and the user involvement? It would also be interesting to see if we can use trace ethnography to study Reddit, which is another big sociotechnical system.

Questions:
1. How did such a network come into place?
2. Do you think certain kinds of Wikipedia pages are more susceptible than others to vandalism?
3. Will completely automated bots help?
4. Can we conduct such a case study for Reddit? Why? Why not?

Read More

02/18/20 – Akshita Jha – Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator

Summary:
“Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator” by Jhaver et al. talks about the popular social media website Reddit and the unusual unpaid human moderators and automated moderator collaboration. Reddit moderators make use of the heavily configurable automated program called, ‘Automoderator’ to help make decisions about the content that should be removed from the website. The authors interview 16 Reddit moderators to understand how they benefit from the moderating tool, ‘Automod’ and how they adapt and configure it to reflect the subreddit’s policies to help them moderate the subreddit effectively. The authors also offer valuable insights that may benefit the creators of the platforms, designers of automated regulation systems, scholars of platform governance, and content moderators. The authors conclude by pointing out that the moderation system in reddit is a collaborative effort between humans as well as the automated systems. This hybrid system works but there is definitely a scope for improvement in the development and deployment of these tools.

Reflections:
Online platforms can be a boon or a bane depending on how people choose to engage with it. Regulation might seem necessary to ensure that low quality posts (these posts can be treated as noise) do not drown out informative and worthy posts on the site. However, this is a challenging task. Deciding whether a post is appropriate for the subreddit puts a lot of responsibility on the moderator. In some cases the moderator might be a bot, ‘Automod’ and in other cases the platform relies on paid or unpaid volunteers. Reddit moderators are unpaid. The authors in this work analysed 5 different subreddits: ‘r/photoshopbattles’, ‘r/space’, ‘r/oddlysatisfying’, ‘r/explainlikeiamfive’ and ‘r/politics’. It’s interesting that some reddit moderators prefer to implement moderation bots from scratch while others make use of tools made by others. It’s intriguing how making use of tools made by others forms a sense of community of moderators within the bigger community of reddit. Most redditors use ‘Automod’ which was initially created by ‘Chad Birch’ using the Reddit API in January 2012. However, a major drawback of this study is that all the moderators that the authors interviewed were males. It would be helpful to get the perspective of female moderators, if there are any, since the user base for Reddit is disproportionately male. I feel the authors should have selected ‘r/AskHistorians’ as one of the subreddits for analysis since it’s widely known to be highly moderated and content driven. It would have also been interesting to deep dive into the comments that ‘Automod’ marked as offensive but were not. This would help improve the performance of the moderator while informing us of its limitations. One might also need to wonder about the consequences if the subreddit community grows larger. There might be a need to reflect on the existing tools and their scale.

Questions:
1. Do you agree that social media content should be moderated?
2. What about the mental health of the moderators?
3. What kind of resources should be make available to the moderators since they are dealing with sensitive content all the time?

Read More

02/04/2020 – Akshita Jha – Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

Summary:
“Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research” by Vaughan is survey paper that provides an informative overview of the crowdsourcing research for the machine leaning community. There are four main application areas:
(i)Data Generation: This is made up of two types of work. The first type of data aggregation is where the several crowdworkers are assigned the same data point and asked to annotate it. The machine learning algorithm then aggregates this data and finalizes the response. The second type of research in data aggregation involves modifying the system to get quality responses from crowdworkers.
(ii)Evaluation and debugging of the model: The crowdsourced workers can help debug and evaluate unsupervised machine learning algorithms like topic modelling, LDA, generative models etc.
(iii)Hybrid systems that utilize both machines and humans to expand its capabilities: Humans and machines have complementary strengths which, if made proper use of, can result in effective systems that help humans as well as improve the machine’s understanding.
(iv)Crowdsourced behavioral experiments that gather data and improve our understanding of how humans would like to interact with machine learning systems: Behavioral experiments can help us understand what how humans would like to interact with the system and the changes that can be made to improve the end user satisfaction.

Reflections:
In my limited knowledge about crowdworkers, I was aware of their importance for data aggregation. The author does a good job highlighting other areas where machine learning researchers might benefit from utilizing the power of crowdworkers. What I found particularly interesting were the case studies making use of crowdworkers to debug models and evaluate their interpretability. When we think of “debugging” models and finding out flows in the system, we mostly try to view things from the developer’s point of view and rely on them completely to debug and evaluate the model’s performance. Using crowdworkers for the task seems like a useful application areas which more machine learning researchers should be aware of. These tasks might also be of greater interest to the crowdworkers because they are not repetitive and involve active participation of the crowdworkers. “Human debugging” can help the system by taking into account the crowdworkers feedback to uncover bottlenecks in machine learning models. Hybrid techniques that involve using human feedback also seems like a promising application area where the system relies extensively on human judgement to make the right decisions. This also puts more responsibility on the machine learning researchers to be creative and come up with unique ways to involve humans. Setting up pilot studies can help in this front. Pilot studies can prove useful as they demonstrate how a lay man interacts with a system and the gaps that exist which should be filled up by the researchers in order to ensure a cohesive experience for the end user. However, care should be taken to ensure that the effort put in by the crowdworkers for building these systems does not go unappreciated.

Questions:
1. Did you agree with the applications of crowdworkers presented in this survey?
1. What steps can be taken to make machine learning researchers aware of these potential applications?
2. Apart from fairly compensating the workers, what steps can be taken to value their contributions?

Read More

02/04/2020 – Akshita Jha – Power to the People: The Role of Humans in Interactive Machine Learning

Summary:
“Power to People: The Role of Humans in Interactive Machine Learning” by Amershi et. al. talks about the tightly coupled interactivity between systems and end users and how to better user experiences while improving system performance. The workflow for conventional machine learning algorithms involves a long drawn out process of training/pre-training, fine-tuning, iteratively tuning hyper-parameters, etc. to improve the target metrics. In comparison, the feedback in the interactive machine learning workflow are rapid, focused and incremental. Prominent real-world examples of interactive machine learning systems include recommender systems like Amazon and Netflix. Interactive machine learning has also been used for image segmentation where the users were asked to mark the foreground and the background image. The system took this feedback into consideration and improved its performance. Similarly, interactive music composition definitely helps improve the system but has also shown to train the students. The authors also present case studies that explore novel interfaces for interactive machine learning. For example, experimentation providing the ability to the end user to modify the input to observe the effect on the final result or the output, studies attempting to understand the efficacy of active vs. passive learning, enabling the users to query the learner as opposed to only answering questions, enabling users to provide active feedback and critique the learner’s output etc. In all the above examples, the user and the system are tightly coupled and form a cohesive unit which is difficult to study in isolation.

Reflections:
The paper presents several case surveys that highlights the differences between machines and humans. One particular case study that I found particularly interesting was where the researchers tried to use human feedback for training a reinforcement learning based model. In conventional reinforcement learning, the agent works in a simulated task environment and receives rewards based on each of its actions. The agent then tries to find ideal policies to best complete the task at hand. It does this maximizing the rewards. Unlike machine learning’s tendency to penalize the agent, humans in the loop focused on giving positive feedback more than the negative feedback which motivated the agent to follow a greedy algorithm. This led to an undesired effect on the agent that actively avoided getting to the goal. This result is fascinating for several reasons: (i) It effectively demonstrates the difference between the way the computers learns and the manner in which human psychology operates and (ii) It shows what can be changed in the system to incorporate human feedback and make it more effective and user friendly. Another unexpected insight was that people value transparency. It was surprising to find out that knowing more about the “black box” model helped in getting better labels. In order to design effective systems, it is critical to understand what humans expect while interacting with a system.

Questions:
1. Which systems do we interact with most on a daily system? Are they interactive?
2. Can we develop metrics to appropriately evaluate a model’s ability to interact?
3. Apart from reinforcement learning are there other any specific machine learning algorithms that might benefit from having humans in the loop?

Read More

01/29/20 – Akshita Jha – Human Computation: A Survey and Taxonomy of a Growing Field

Summary:
“Human Computation: A Survey and Taxonomy of a Growing Field” by Quinn and Bederson classifies human computation systems different dimensions. They also point out the subtle differences between human computation, crowd sourcing, social computing, data mining and collective intelligence. Traditionally, human computation is defined as “a paradigm for utilizing human processing power to solve problems that computers cannot yet solve.” Although, there is some overlap between human computation and crowd-sourcing, the major idea behind crowds-sourcing is that it works by employing members of the public in place of traditional human workers. Social computing, on the other hand, differs from human computation in the manner such that it studies natural human behavior mediated by technology. The third term, data mining, focuses on using technology to analyse the data generated by humans. All these terms partly fall in the category of collective intelligence. Well-developed human computation systems are also examples of collective intelligence. For example, to create a gold standard dataset for machine translation, several humans have to provide translations for the same system. This is an example of the collective intelligence of the crowd as well as human expertise to solve computationally challenging tasks. The authors present a classification system based on six different dimensions: (i)Motivation, (ii)Quality Control, (iii)Aggregation, (iv)Human Skill, (v)Process Order and, (vi)Task Request Cardinality. The paper further talks about the various sub categories in these dimensions and how these sub categories influence the output. This work presents a useful framework that can be help researchers categorize their work into one of these categories.

Reflections:
This was an interesting read as it presented an overview of the field of human computation. The paper does a commendable job of highlighting the subtle differences in the different sub-fields of HCI. The authors classify human computation into one of the six dimensions depending on the sample values. However, I think there might be some overlap between these sample values and sub categories. For example, the amount of ‘pay’ a person receives definitely determines the motivation for the task but it may also determine the quality of work. If the worker feels that he is adequately compensated, that might prove an incentive to produce quality work. Similarly, ‘redundancy’ and ‘multi-level review’ are part of the ‘quality check’ dimension but they can also fall in the ‘task request cardinality’ dimension as multiple users are required to perform similar tasks. Another point to be considered here is that although, the authors differentiate between crowd-sourcing and human computation, several parallels can be drawn between them using the dimensions presented. For example, it will be interesting to observe whether the features employed by different crowd platforms can be categorized into the dimensions highlighted in this paper and whether they have the same sub class or do they vary depending on the kind of task at hand.

Questions:
1. Can this work extend to related fields like social computing and crowd-sourcing?
2. How can we categorize ethics and labor standards based on these dimensions?
3. Is it possible to add new dimensions to human computation?

Read More

01/29/20 – Akshita Jha – Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms

Summary:
“Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms” by Donna Vakharia and Matthew Lease talks about the limitations of Amazon Mechanical Turk (AMT) and presents a qualitative analysis of newer vendors who offer different models for achieving quality crowd work. AMT was one of the first platforms to offer paid crowd work. However, nine years after its launch, it still is in its beta stage because of its limitations that fail to take into account the skill, experience and ratings of the worker and the employer and its minimal infrastructure that does not support collecting analytics. Several platforms like ClickWorker, CloudFactory, CrowdComputing Systems (now WorkFusion), CrowdFlower, CrowdSource, MobileWorks (now LeadGenius), and oDesk have tried to mitigate these limitations by coming up with a more advanced workflow that ensures quality work from crowd workers. The authors identify four major limitations of AMT: (i)Inadequate Quality Control, (ii)Inadequate Management Tools, (iii)Missing support for fraud prevention, and (iv)Lack of automated tools. The authors also list down several metrics to qualitatively assess other platforms: (i)Key distinguishing features of the platform, (ii)Source of the workforce, (iii)Worker demographics, (iv)Worker Qualifications and Reputations, (v)Recommendations, (vi)Worker Collaboration, (vii)Rewards and Incentives, (viii)Quality Control, (ix)API offerings, (x)Task Support and (xi)Ethics and Sustainability. These metrics prove useful for a thorough comparison of different platforms.

Reflections:
One of the major limitations of AMT is that there are no pre-defined tests to check the quality of the worker. In contrast, other platforms ensure that they test their workers in one or the other way before assigning them tasks. However, these tests might not always reflect the ability of the workers. The tests need to be designed keeping the task in mind and this makes standardization a big challenge. Several platforms also believe in offering their own workforce. This can have both positive and negative impacts. The positives being that the platforms can perform a thorough vetting while the negatives are that this might limit the diversity of the workforce. Another drawback of AMT is that workers seem interchangeable as there is no way to distinguish one from the other. Other platforms try to use badges to display worker skills and use a leaderboard to rank their workers. This can lead to unequal distribution of work which might be merit based but there is a need to perform a deeper analysis of the ranking algorithms in order to ensure that there is no unwanted bias in the system. Some of the platforms employ automated tools to perform tasks which are repetitive and monotonous. This comes with its own set of challenges. As machines become more “intelligent”, humans need to develop better and more specialized skills in order to remain useful. More research needs to be done in order to better understand the working and limitations of such “hybrid” systems.

Questions:
1. With crowd platforms employing automated tools, it is interesting to discuss whether these platforms can still be categorised as a crowd platform.
2. This was more of a qualitative analysis of crowd platforms. Is there a way to quantitatively rank these platforms? Can we use the same metrics?
3. Are there certain minimum standards that every crowd platform should adhere to?

Read More