04/29/2020 – Bipasha Banerjee – VisiBlends: A Flexible Workflow for Visual Blends

Summary

The authors define a method to include graphics to draw attention to messages, specifically the blending of two images from two different concepts. They started the paper with the beautiful example of Starbucks and summer, and how the two ideas generated the resultant image. The paper mentions that the process they used was the hybrid blend of two images into one object, which represents both the concepts effectively. The process of visual blending is defined here to be two concepts resulting in two objects and then the integration of both the objects into one, yet, making sure that both the original objects are visible. The VisiBlend system involves various steps in the entire workflow. Firstly, the users need to brainstorm associated concepts, then find images from both the concepts, followed by annotating them correctly. Finally, the algorithm chooses the images to blend and generates final images for the mix, which is evaluated by the user. It was found that the tool helped in the improvement of the resultant image. 

Reflection

This is an interesting study that can be used in the field of advertisement and marketing, among others. The main attraction of the paper is that the system helps people who are novices in graphic designing. The steps are straightforward to follow, and the results showed that using the VisiBlend system proved to be more helpful. This paper essentially tries to find the similarity between images as opposed to texts. 

One of the drawbacks that the users pointed out was the time taken to find relevant images to a concept. I think this problem could be mitigated if we were to even consider scenes as opposed to a single image. Various computer vision algorithms provide tools to get bounding boxes and improve image quality. This would help users save time and get components from larger images too.

I also kept on thinking about how we can extend this idea to other possible areas. One thing that comes to my mind is if we could combine images and text to find analogies between them. By combining them, I do not mean just providing a keyword and getting an image. That is what Google or any search algorithm does. However, I mean, if a user searches for a word, then gets an image they like, could we link articles to that? We could also try and find text related to the final blended image. I am not sure about the feasibility; however, this would be an interesting experiment. 

Additionally, I also was intrigued to know if participants having good technical (CS background) or artistic background (Graphics) are faster in finding images. I believe that the more “internet searching” talent we have, the lesser the time. 

Questions

  1. What other fields can this idea be extended to? Could we possibly find texts related to the blended image? For example, for New-York and night, we could find articles that talk about both the concepts. This may result in finding more images too. 
  2. Would using complex images like scenes as opposed to only singular images help? Computer Vision algorithms can be used to get bounds boxes out of such images to get individual objects. 
  3. Do people with better technical or graphical skills perform better?

Read More

04/29/2020 – Bipasha Banerjee – Accelerating Innovation Through Analogy Mining

Summary 

The paper by Hope et al. talks about analogy mining from texts, primarily data that is unstructured. They have used a product description dataset from Quirky.com, which is described to be a product innovation website, to find products that are similar. To be specific, they have used “purpose” and the “mechanism” of products to find analogies between items. They have also considered the traditional similarity metrics and techniques, namely the TF-IDF, LSA, GloVe, and LDA, to compare the proposed approach. Amazon Mechanical Turk crowd workers were used to create the training dataset. A recurrent neural network was then used to learn the representations of purpose and mechanism from the human-annotated training data. It is mentioned in the paper that they wanted to see if their approach enhanced creativity in idea generation. They tried to measure this by using graduate students to judge the idea generated on three main areas, novelty, quality, and feasibility. It was concluded that there was an increase of creativity among the participants of the study.                                                                                        

Reflection

The paper is an interesting read on the topic of finding analogies in texts. I really found it interesting how they defined similarities based on the purpose and the mechanisms in which the products worked. I know that the authors mentioned that since the dataset was about product description, the purpose-mechanism structure worked in finding analogies. However, they suggested some complex or hierarchial levels for more complex datasets like scientific papers. The only concern I had with this comment was, wouldn’t increasing the complexity of the training data further complicate the process of finding analogies? Instead of hierarchical level, I think it is best to add other labels to the text to find similarities. I think what I am suggesting is similar to what was done in the paper by Chang et al. [1], where background and findings were also included along with the labels included here.

The paper is a good groundwork on the work of finding similarities while using crowd workers to create the training data. This methodology, in my opinion, truly forms a mixed-initiative structure. Here, the authors did extensive evaluation and experimentation on the AI side of things. I really liked the way they compared against the traditional information retrieval mechanisms to find analogies. 

I liked that the paper also aimed to find if the creativity was increased. My only concern was “creativity” although defined is subjective. They said that they used graduate students but did not mention their background. Hence, a graduate student with a relatively creative background, say a minor in a creative field may view things differently.

In conclusion, I found this research to be strong as it included verification and validation of the results from all angles and not only the AI or the human side. 

Questions

  1. Are you using a similarity metric in your course project? If yes, what are the algorithms you are using? ( In our project, we are not using any similarity metric, but I have used all the traditional metrics mentioned in the paper in my research work before).
  2. Other than scientific data, what other kinds of complex datasets would need additional labels or hierarchical labeling?
  3. Do you agree with the authors’ way of finding if the study had enhanced creativity? 

References

  1. Chan, Joel, et al. “Solvent: A mixed initiative system for finding analogies between research papers.” Proceedings of the ACM on Human-Computer Interaction 2.CSCW (2018): 1-21.

Read More

04/22/2020 – Bipasha Banerjee – The Knowledge Accelerator: Big Picture Thinking in Small Pieces

Summary 

The paper talks about breaking larger tasks into smaller sub-tasks and then evaluating the performance of such systems. Here, the authors approach of dividing a large piece of work, mainly online work, into smaller chunks which would then use crowdworkers to perform the required tasks. The authors created a prototype system called “Knowledge Accelerator”. Its main goal is to use crowdworkers and help and find answers to open-ended, complex questions. However, the workers would only see part of the entire problem and work on a small amount of the task. It is mentioned that the maximum payment for any one task was $1. This gives an idea about how granular and simple tasks the authors wanted the crowdworkers to accomplish. The experiment was divided into two phases. In the first phase, the workers had to label some categories which were later used in the classification task. The second phase, on the other hand, required the workers to clean the output the classifier produced. This task involved the workers looking at the existing clusters and then tagging the new clips into an existing or a new cluster.

Reflection

I liked the way the authors approach the problem by dividing a huge problem into smaller manageable parts which in-turn becomes easy for workers to annotate. For our course project, we initially wanted the workers to read an entire chapter from an electronic thesis and dissertation and then label the department from which they think the document should belong to. We were not considering the fact that such a task is huge and would take a person around 15-30 minutes to complete. Dr. Luther pointed us in the right direction, where he asked us to break the chapter in parts and then present it to the workers. The paper also mentioned that too much context for workers could prove to be confusing. We can further decide better on how to divide the chapters so that we provide just the right amount of context.

I liked how the paper mentioned their ways of finding the sources, the filtering, and clustering techniques. It was interesting to see the challenges that they encountered while designing the task. This portion helps future researchers in the field to understand the mistakes and the decisions the authors took. I would view this paper as a guideline on how to best break a task into pieces so that it is easy as well as detailed enough for Amazon Mechanical Turkers. 

Finally, I would like to point out that it was mentioned in the paper that only workers from the US were only considered. The reason was also mentioned in the footnote, that because of currency conversion, the value of $ is relative. I thought this was a very thoughtful point to add and bring light to. This helps maintain the quality of the work involved. Although, I think a current currency converter (API) could have been incorporated to compensate accordingly. Since the paper deals with searching for relevant answers for complex questions, involving workers from other countries might help improve the final answer. 

Questions

  1. How are you breaking a task into sub-tasks for the course project? (We had to modify our task design for our course project and divide a larger piece of text into smaller chunks)
  2. Do you think that including workers from other countries would help improve the answers? (After considering the currency difference factor and compensating the same based on the current exchange rate.)
  3. How can we improve the travel-related questions? Would utilizing workers who are “travel-enthusiasts or bloggers” improve the situation?

Note: This is an extra submission for this week’s reading.

Read More

04/22/2020 – Bipasha Banerjee – SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers

Summary 

The paper by Chan et al. is an interesting read on finding analogies from research papers. The main domain considered here is scientific papers. The annotation scheme has been divided into four categories, namely, background, purpose, mechanism, and findings. This paper’s goal is to make it easier for researchers to find related work in their field. The author conducted three studies to test their approach and its feasibility. The first study consisted of domain-experts annotating a particular abstract in their research area. The second study, on the other hand, focussed mainly on how the model could tackle the real-world problem where a researcher needs to find relevant papers in their area to act as inspiration, related-work, or even baselines for their experiments. The last study, however, was very different from the first two, where an experienced researcher annotated the data or used the system for solving their research problem. The third study, on the other hand, used crowdworkers to annotate abstracts. The platforms utilized by the authors were Upwork and Amazon Mechanical Turk.

Reflection

The mixed-initiative model developed by the authors is an excellent step in the right direction to find analogies in scientific papers. There are traditional approaches in natural language processing that help find similarities in textual data. The website [1] gives an excellent insight into the steps involved in finding similarities in texts. However, when it comes to scientific data, just using these steps is not enough. Additionally, most of the models involved are trained using generic websites and news data (like CNN or DailyMail). Hence, most of the scientific jargon is “out of vocabulary” (OOV) for such models. Hence, I appreciate that the authors used human annotations along with traditional methods in information retrieval (like TF-IDF, etc.) to tackle the problem at hand.

Additionally, for finding the similarity metric, they took multiple categories into account, like Purpose+Mechanism. This is definitely useful when finding similarities in the text data. I also liked the fact that for the studies, they considered normal crowdworkers in addition to people with domain knowledge. I was intrigued to find that 75% of the time, the annotations of crowdworkers matched with the researchers. Hence the conclusion that “crowd annotations still improve analogy-mining” is valuable. Not only that, getting researchers in large amounts in one domain just to annotate the data is difficult, sometimes there are very few people in one domain of research. Rather than having to find researchers who are available to annotate such data, it is good that we can use existing methods available.

Lastly, I would like to mention that I liked that the paper identified the limitations very well, and the scope for future work has also been clearly mentioned. 

Questions

  1. Would you agree that level of expertise of the human annotators would not affect the results for your course project? If yes, could you please clarify?

(For my class project, I think I would agree with the paper’s findings. I work on reference string parsing, and I don’t think we need experts just to label the tokens.)

  1. Could we have more complex categories or sub-categories rather than just the four identified?
  2. How would this extend to longer pieces of texts like chapters of book-length documents? 

References 

  1. https://medium.com/@Intellica.AI/comparison-of-different-word-embeddings-on-text-similarity-a-use-case-in-nlp-e83e08469c1c 

Read More

04/22/2020 – Bipasha Banerjee – Opportunities for Automating Email Processing: A Need-Finding Study

Summary

The primary goal of the paper is to provide automation support to users in terms of email handling. The authors first tried to determine what are the automatic features that users wanted in their email service and what are the informational and computational needs when it came to implementing such a system. They conducted three experiments, out of which, the first experiment was to gauge what kind of features the users wanted to be automated. In this particular study, there was no boundary as to what can or can’t be implemented. So this experiment effectively gave all the range of tasks and features users would “wish” their email interface provided. The second experiment aimed to find all the current automated implementations available on the market. This involved sifting through GitHub repositories to find projects aiming to solve the current gap regarding automation of email processing. Finally, the last experiment involved giving the users to code their “ideal” features using the YouPS interface. This study consisted mainly of students from engineering backgrounds familiar with Python programming.

Reflection

The paper provided an interesting perspective on how users want their email clients to perform. For this, it was important to understand the needs of the people. The authors’ do this by conducting the first experiment on finding the ideal features that users look for. I liked the way the task of customer discovery of needs was approached. However, I wanted to point out that the median age range of participants’ was 20-29, and all had a university affiliation. It would be interesting to see what older people from both university and industry background want in such email clients. Getting the perspective of a senior researcher or a senior manager is important. I feel that these people are the ones who receive far more number of emails and would want and need automated email processing. 

I resonated with most of the needs that users’ pointed out and recognized some of the existing features that my current email client provides. For example, google generally keeps an option of “follow up” if an email sent didn’t get a response or the “reply” option if the email received hasn’t been replied for n-days. I am particularly interested in the different modes that could be set up. This would prove to be useful where the user could focus on work and periodically check a particular label like “to-do” or “important.” Additionally, only getting an important notification is also a priceless feature, in my opinion. 

Having loved all the proposed features in this paper, I would also like to point out some of the flaws, in my opinion. First, some of the applied rules might cause disruptions in case of important emails. One of the features mentioned was to automatically mark an email “read” when the consecutive emails come from the same recipient. This would work in case of a “social” or “promotions” email. However, this might end up making the user do more tasks, i.e., find from the read emails the ones that he actually never read. Additionally, I was also curious to know how security was handled here. Emails are anyways not known to be a secure medium of communication, and using this tool on top of it might make it further unsecure. Especially when research-related topics are discussed in the emails, they might be prone to breach? 

Questions

  1. What are the features you look for when it comes to email management? I would want to be only notified about emails that are important. 
  2. What other systems could benefit from such studies other than email processing? Could this be used to improve recommender systems, other file organizing software? 
  3. Would it be useful to take the input of senior researchers and managers? They are people who receive a lot of emails, and knowing their needs would be useful.
  4. How was the security handled in the YouPS system? 

Read More

04/15/2020 – Bipasha Banerjee – Algorithmic Accountability

Summary 

The paper provides a perspective on algorithmic accountability from the journalists’ eyes. The motivation of the paper is to detect how algorithms influence various decisions in different cases. The author investigates explicitly the area of computational journalism and how such journalists could use their power to “scrutinize” to uncover bias and other issues current algorithms pose. He lists out a few of the decisions that algorithms make and which has the potential to affect the algorithms capability to be unbiased. Some of the decisions are classification, prioritization, association, filtering, and algorithmic accountability. It is also mentioned that transparency is a key factor in building trust in an algorithm. The author then proceeds to discuss reverse engineering by providing examples of a few case studies. Reverse engineering is described in the paper as a way by which the computational journalists have reverse engineered to the algorithm. Finally, he points out all the challenges the method poses in the present scenario.

Reflection

The paper gives a unique perspective on the algorithmic bias from a computational journalists’ perspective. Most of the papers we read come from either completely the computational domain or the human-in-the-loop perspective. Having journalists who are not directly involved in the matter is, in my opinion, brilliant. This is because journalists are trained to be unbiased. From the CS perspective, we tend to be “AI” lovers and want to defend the machine’s decision and consider it as true. The humans using the system wither blindly trust them or completely doubt them. Journalists, on the other hand, are always motivated to seek the truth, however unpleasant it might be. Having said that, I am intrigued to know the computational expertise level of the journalists. Although having an-in-depth knowledge about AI systems might introduce a separate kind of bias. Nonetheless, this would be a valid experiment to conduct. 

The challenges that the author mentioned include ethics, legality, among others. These are some of the challenges that are not normally discussed. We, from the computational side, need to be aware of these challenges. The “legal ramification” could be enormous if we do not end up using authorized data to train the model and then publish the results. 

I agree with the author that transparency indeed helps bolster confidence in an algorithm. However, I also agree that it is difficult for companies to be transparent in the modern digital competitive era. It would be difficult for companies to take the risk and make all the decisions public. I believe there might be a middle ground for companies; they could publish part of the algorithmic decisions like the features they use and let the users know what data is being used. This might help improve trust. For example, Facebook could publish the reasons why they recommend a particular post, etc.

Questions

  1. Although the paper talks about using computational journalism, how in-depth is the computational knowledge of such people? 
  2. Is there a way for an algorithm to be transparent, yet the company not lose its competitive edge?
  3. Have you considered the “legal and ethical” aspect of your course project? I am curious to know about the data that is being used and other models etc.?

Read More

04/15/2020 – Bipasha Banerjee – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

The paper emphasizes the importance of a mixed-initiative model for fact-checking. It points out the advantages of humans and machines working closely together to verify the veracity of the facts. The paper’s main aim from the mixed-initiative approach was to make the system, especially the user interface, more transparent. The UI presents a claim to the user along with a list of articles related to the statement. The paper also mentions all the prediction models that have been used to create the UI experience. Finally, the authors conducted three experiments using crowd workers who had to predict the correctness of claims presented to them. In the first experiment, the users were shown the results page without the prediction of the truthfulness of the claim. Users were subsequently divided into two subgroups, where one group was given slightly more information. In the second experiment, the crowdworkers were presented with interactive UI. They, too, were further divided into two subgroups, with one group having the power to change the initial predictions. The third experiment was a gamified version of the previous experiment. The authors concluded that human-ai collaboration could be useful, although the experiment brought into light some contradictory findings. 

Reflection

I agree with the author’s approach that the transparency of a system leads to the confidence of the user using a particular system. My favorite thing about the paper is that the authors describe the systems very well. They do a very good job of describing the AI models as well as the UI design and give a good explanation to their decisions. I also enjoyed reading about the experiments that they conducted with the crowdworkers. I had a slight doubt about how the project handled latency, especially when the related articles were presented to the workers in real-time.

I also liked how the experiments were conducted in sub-groups, with a group having information not presented to the other. This shows that a lot of use cases were thought of when the experimentation took place. I agree with most of the limitations that the authors wrote. I particularly agree that if the veracity of predictions is shown to the users, there is a high chance of that influencing people. We, as humans, have a tendency to believe machines and its prediction blindly. 

I would also want to see the work being performed on another dataset. Additionally, if the crowdworkers have knowledge about the domain in the discussion, how does that affect the performance? It is definite that having knowledge would improve detecting the claim of a statement. Nonetheless, this might help in determining to what extent. A potential use case could be researchers reading claims from research papers in their domain and assessing their correctness. 

Questions

  1. How would you implement such systems in your course project?
  2. Can you think of other applications of such systems?
  3. Is there any latency associated when the user is produced with the associated articles? 
  4. How would the veracity claim system extend to other domains (not news based)? How would it perform on other datasets? 
  5. Would experience (in one domain) crowdworkers perform better? The answer is likely yes, but how much? And how can this help improve targeted systems (research paper acceptance, etc.)?

Read More

04/08/2020 – Bipasha Banerjee – CrowdScape: Interactively Visualizing User Behavior and Output

Summary

The paper focuses on tackling the problem of quality control of the work done by crowdworkers. They created a system named CrowdScape to evaluate the work done by humans through mixed-initiative machine learning and interactive visualization. The provided details of quality control in crowdsourcing. This involved mentioning various methods that help evaluate the content. Some methods mentioned were post-hoc output evaluations, behavioral traces, and integrated quality control. CrowdScape is a system developed by the authors to capture worker behavior, also in the form of interactive data visualizations. The system incorporates various techniques to monitor user behavior. It helps to understand if the work was done diligently or was done in a rush. The output of the work is indeed a good indicator of the quality of the work; however, an in-depth review of the user behavior is needed to understand the method in which the worker completed the task.

Reflection

To be very honest, I found this paper fascinating and extremely important for research work in this domain. Ensuring the work submitted is of good quality not only helps legitimize the output of the experiment but also increases trust in the platform as a whole. I was astonished to read that about one-third of all submissions are of low quality. The stats suggest that we are wasting a significant amount of resources. 

The paper mentions that the tool uses two sources of data, output, and worker behavior. I was intrigued by how they took into account the worker’s behavior, like accounting for the time taken to complete the task, the way the work was completed, including scrolling, key-press, and other activities. I was curious to know if the worker’s consent was explicitly taken. It would also be an interesting study to see if knowing that the behavior is being recorded affects performance. Additionally, dynamic feedback can also be incorporated. By feedback, I mean, if the worker is supposed to take “x” min, alerting them that the time taken on the task is too low. This will prompt them to take the work more seriously and avoid unnecessary rejection of the task.

I have a comment on the collection of YouTube video tutorials. One of the features taken into account was ‘Total Time’, that signified if the worker had seen the video completely first and then summarized the content. However, I would like to point out that sometimes videos can be watched at an increased playback speed. I sometimes end up watching most tutorial related videos at 1.5 times speed. Hence, if the total time taken is lesser than expected, it might simply signify that they might have watched it at a different speed. A simple check could help solve the problem. YouTube generally has a fixed number of playback speeds. Considering that into account, when calculating the total time might be a viable option. 

Questions

  1. How are you ensuring the quality of the work completed by crowdworkers for your course project?
  2. Were the workers informed that their behavior was “watched”? Would the behavior and, subsequently, the performance change if they are aware of the situation?
  3. Workers might use different playback speeds to watch videos. How is that situation handled here?

Read More

04/08/2020 – Bipasha Banerjee – Agency plus automation: Designing artificial intelligence into interactive systems

Summary

The paper discusses the fact that computer-aided products should be considered to be an enhancement of human work rather than it being a replacement. The paper emphasizes that technology, on its own, is not always full proof and that humans, at times, tend to rely completely on technology. In fact, AI in itself can yield faulty results due to biases in the training data, lack of enough data, among other factors. The authors point out how the coupling of human and machine efforts can be done successfully through some examples of autocompleting of google search and grammar/spelling correction. The paper aimed to use AI techniques but in a manner that makes sure that humans remain the primary controller. The authors considered 3 case studies, namely data wrangling, data visualization for exploratory analysis, and natural language translation, to demonstrate how shared representations perform. In each case, the models were designed to be human-centric and to have automated reasoning enabled. 

Reflection

I agree with the authors’ statement about data wrangling that most of the time is spent in cleaning and preparing the data than actually interpreting or applying the task one specializes in. I was amused by the idea that users’ work of transforming the data is cut short and aided by the system that suggests users the proper action to take. I believe this would indeed help the users of the system if they get the desired options directly recommended to them. If not, it will help improve the machine further. I particularly found it interesting to see that users preferred to maintain control. This makes sense because, as humans, we have an intense desire to control.

The paper never explains clearly who the participants of the system are. This would be essential to know who the users were exactly and how specialized they are in the field they are working on. It would also give an in-depth idea about the experience they had interacting with the system, and thus I feel the evaluation would be complete.  

The paper’s overall concept is sound. It is indeed necessary to have a seamless interaction between man and the machine. They have mentioned three case studies. However, all of them are data-oriented. It would be interesting to see how the work can be extended to other forms – videos, images. Facebook picture tagging, for example, does this task to some extent. It suggests users with the “probable” name(s) of the person in the picture. This work can also be used to help detect fake vs. real images or if the video has been tampered.

Questions

  1. How are you incorporating the notion of intelligent augmentation in your class project?
  2. Case studies are varied but mainly data-oriented. How would this work differ if it was to imply images? 
  3. The paper mentions “participants” and how they provided feedback etc. However, I am curious to know how they were selected? Particularly, the criteria that were used to select users to test the system.

Read More

03/25/2020 – Bipasha Banerjee – Evaluating Visual Conversational Agents via Cooperative Human-AI Games

Summary 

This paper aims to measure AI performances using the human-in-the-loop approach as opposed to only sticking to the traditional benchmark scores. For this purpose, they have evaluated the performance of a visual conversational agent interaction with humans effectively forming a Human-AI team. GuessWhich is a human computation game that was designed to study the interactions. They named the visual conversational agent ALICE that was trained in a supervised manner on a visual dataset. The visual CA was also pre-trained with supervised learning, and reinforcement learning was used to fine-tune the model. Both the human and the AI component of the experiment needs to be aware of each other’s imperfections and must infer information as and when needed. The experiments were performed using Amazon Mechanical Turks, and the AI component was chosen as the ABOT from the paper in [1]. This combination turned out to be the most effective. The AI component, named ALICE, had two components, one normal supervised learning and the other using reinforcement learning. It was found that the human and ALICESL outperformed human and ALICERL combination, which was contrary to the performance when only using AI. Hence, it proves that AI benchmarking tools do not accurately represent performance when humans are present in the loop.

Reflection

The paper proposes a novel interaction to include humans in the loop when using AI conversational agents. From an AI perspective, we look at standard evaluation metrics like F1, precision, and recall is used to gauge the performance of the model being used. The paper built from a previous work that considered only AI models interacting with each other. It was found that a reinforcement learning model performed way better than a standard supervised technique. However, when humans are involved in the loop, the supervised learning mechanism performs better than its reinforced counterpart. This signifies that our current AI evaluation techniques do not take into account the human context as effectively. 

The authors mentioned that, at times, people would discover the strength of the AI system and try to interact accordingly. Hence, we can conclude that humans and AI are both learning from each other at some capacity. This is a good way to leverage the strengths of humans and AI.

It would also be interesting to see how this combination works in identifying images correctly when the task is complex. If the stakes are high and the image identifying task involves using both humans and machines, would such combinations work? It was noted that the AI system did end up answering some questions incorrectly, which ultimately led to incorrect guessing by humans. Hence, in order to make such combinations work seamlessly, more testing, training with vast amounts of data is necessary. 

Questions

  1. How would we extend this work to other complex applications? Suppose if the AI and humans were required to identify potential threats (where stakes are high) in security?
  2. It was mentioned that the game was played for nine rounds. How was this threshold selected? Would it have worked better if the number was greater? Or would it rather confuse humans more?
  3. The paper mentions that “game-like setting is constrained by the number of unique workers” who accept their tasks. How can this constraint be mitigated? 

References

[1] https://arxiv.org/pdf/1703.06585.pdf

Read More