03/04/2020- Bipasha Banerjee – Pull the Plug? Predicting If Computers or Humans Should Segment Images

Summary

The paper by Gurari et al. discusses the segmentation of images and when segmentation should be done by humans and when is a machine only approach applicable. The work described in this paper is interdisciplinary, involving computer vision and human computation. They have considered both fine-grained as well as coarse-grained segmentation approaches to determine where the human or the machine perform better. The PTP framework describes whether to pull the plug on humans or machines. The framework aims to predict if the labeling of images should come from humans or machines and the quality of the labeled image. Their prediction framework is a regression model that captures the segmentation quality. The training data was populated with masks to reflect the quality of the segmentation. The three algorithms used are Hough transform with circles, Otsu thresholding, and adaptive thresholding. For labels, the Jacquard index was considered to indicate the quality of each instance. Nine features were proposed derived from the binary segmentation mask to catch the failure. It was finally derived that a mixed approach performed better than completely relying on humans or computers. 

Reflection 

The use of machines vs. humans is a complex debate. Leveraging both machine and human capabilities is necessary for efficiency and dealing with “big data.” The paper aims to find when to use computers to create coarse-grained segments and when to replace with humans for fine-grained data. I liked that the authors published the code. This helps in the advancement of research and reproducibility.

The authors have used three datasets but all based on images. In my opinion, detecting images is a relatively simple task to identify bounding boxes. I work with texts, and I have observed that segmentation results of large amounts of text are not simple. Most of the available tools fail to segment long documents like ETDs effectively. Nonetheless, segmentation is an important task, and I am intrigued to see how this work can be extended to text. 

Using crowd workers can be tricky. Although Amazon Mechanical Turk allows requesters to specify the experience, quality, etc. of workers, however, the time taken by a worker can vary depending on various factors. Would humans familiar with the dataset or the domain annotate faster? This needs to be thought of well, in my opinion, especially when we are trying to compete against machines. Machines are faster and good at handling vast amounts of data whereas; humans are good at accuracy. This paper highlights the old problem of accuracy vs. speed.

Questions

  1. The segmentation has been done on datasets with images. How does this extend to text? 
  2. Would experts on the topic or people familiar with databases require less time to annotate?
  3. Although three datasets have been used, I wonder if the domain matters? Would complex images affect the accuracy of machines?

Read More

03/04/2020 – Dylan Finch – Pull the Plug?

Word count: 596

Summary of the Reading

The main goal of this paper is to make image segmentation more efficient. Image segmentation as it is now, requires humans to help with the process. there are just some images that machines cannot segment on their own. However, there are many cases where an image segmentation algorithm can do all of the work on its own. This presents a problem: we do not know when we can use an algorithm and when we have to use a human, so we have to have humans review all of the segmentations. This is highly inefficient. This paper tries to solve this problem by introducing an algorithm that can decide when a human is required to segment an image. The process described in the paper involves scoring each segmented image done by machines, then giving humans the task of reviewing the lowest scoring images. Overall, the process was very effective and saved a lot of human effort.

Reflections and Connections

I think that this paper gives a great example of how humans and machines should interact, especially when it comes to humans and AIs interacting. Often times, we set out in research with the goal of creating a completely automated process that throws the human away and tries to create an AI or some other kind of machine that will do all of the work. This is often a very bad solution. AIs as they currently are, are not good enough to do most complex tasks all by themselves. In the cases of tasks like image segmentation, this is an especially big issue. These tasks are very easy for humans to do and very hard for AIs to do. So, it is good to see a researcher who is willing to use human strengths to make up for the weaknesses of machines. I think it is a good thing to have both things working together.

This paper also gives us some very important research, trying to answer the question of when we should machines and when we should use humans. This is a very tough question and it comes up in a lot of different fields. Humans are expensive, but machines are often imperfect. It can be very hard to decide when you should use one or the other. This paper does a great job of answering this question for image segmentation and I would love to see more similar research in other fields explain when it is best to use humans and machines in those fields. 

While I like this paper, I do also worry that it is simply moving the problem, rather than actually solving it. Now, instead of needing to improve a segmentation algorithm, we need to improve the scoring algorithm for the segmentations. Have we really improved the solution or have we just moved the area that now needs further improvement? 

Questions

  1. How could this kind of technology be used in other fields? How can we more efficiently use human and machine strengths together?
  2. In general, when do you think it is appropriate to create a system like this? When should we not fully rely on AI or machines?
  3. Did this paper just move the problem, or do you think that this method is better than just creating a better image segmentation algorithm? 
  4. Does creating systems like this stifle innovation on the main problem?
  5. Do you think machines will one day be good enough to segment images with no human input? How far off do you think that is?

Read More

03/04/2020 – Pull the Plug? Predicting If Computers or Humans Should Segment Image – Yuhang Liu

Summary:

This paper examines a new image segmentation method. Image segmentation is a key step in any image analysis task. There have been many methods before, including low-efficiency manual methods and automated methods that can produce high-quality pictures, but these methods have certain disadvantages. The authors therefore propose a distribution framework that can predict how best to assign fixed labor to collect higher quality segmentation for a given image and automated method. Specifically, the author has implemented two systems, which can perform the following processing on images when doing image segmentation:

  1. Use computers instead of humans to create the rough segmentation needed to initialize the segmentation tool,
  2. Use computers to replace humans to create the final fine-grained segmentation. The final experiments also proved that relying on this hybrid, interactive segmentation system can achieve faster and more efficient segmentation.

Reflection:

Once, I did a related image recognition project. Our subject is a railway turnout monitoring system based on computer vision, which is to detect the turnout of the railroad track from the picture, and the most critical step is to separate the outline of the railroad track. At that time, we only using the method of computer separation, the main problem we encountered at the time was that when the scene became complicated, we would face to complex line segments, which would affect the detection results. As mentioned in this paper, using human-machine, the combined method can greatly improve the accuracy rate. I very much agree with it, and hope that one day I can try it myself. At the same time, what I most agree with is that the system can automatically assign work instead of all photos going through a same process. For a photo, only the machine can participate, or artificial processing is required. This variety of interactive methods, It is far more advantageous than a single method, which can greatly save workers’ time without affecting accuracy, and the most important point is that complex interaction methods can adapt to process more diverse pictures. Finally, I think similar operations can be applied to other aspects. This method of assigning tasks through the system can coordinate the working relationship between humans and machines, for example, in other fields, such as sound sentiment analysis and musical background separation. In these aspects, humans have the incomparable advantages of machines and can achieve good results, but it takes a long time and is very expensive. Therefore, if we can classify this kind of thinking, deal with the common working relationship between humans and machines, and give complex situations to people or pass the rough points of the machine first, then the separation cost will be greatly reduced, and the accuracy rate will not be affected, so I It is believed that this method has great application prospects, not only because of the many application directions of image separation, but we can also learn from this idea to complete more detailed analysis in more fields.

Question:

  1. Is this idea of cooperation between man and machine worth learning?
  2. Because the system defines the working range of people and machines, will the machine reduce the accuracy due to the results of human work?
  3. Does man-machine cooperation pose new problems, such as increasing costs?

Read More

3/4/20 – Jooyoung Whang – Pull the Plug? Predicting If Computers or Humans Should Segment Images

In this paper, the authors attempt to appropriately distribute human and computer resources for creating segmentation of foreground objects in an image to achieve highly precise segmentations. They introduce that the segmentation process consists of roughly segmenting the image (initialization), and then going through another fine-grained iteration to come up with the final result. They repeat their study for both of the steps. To figure out where to allocate human resources, the authors’ proposed an algorithm that tries to score the acquired segmentations by detecting: highly jagged edges on the boundary, non-compact segmentations, near-edge segmentation locations, and segmentation area ratio to the full image. The authors find that a mix of humans and computers for image segmentation performs better than when completely using one or the other.

I liked the authors’ proposed algorithm to detect when a segmentation fails. It was interesting to see that they focused on visible features and qualities that humans can see instead of relying on deep neural networks that are often hard to interpret the internal workings of. At the same time, I am a little concerned about whether the proposed visual features for failed segmentations are enough to generalize and scale for all kinds of images. For example, the authors note that failed segmentations often have highly jagged edges. What if the foreground object (or an animal in this case) was a porcupine? The score would be fairly low even when an algorithm correctly segments the creature from the background. Of course, the paper reports that the method generalized well for everyday images and biomedical images, so my concern may be a trivial one.

As I am not experienced in the field of image segmentation analysis, I wondered if there were any case where an image contained more than one foreground objects and only one of them is of interest to a researcher. From my short knowledge about fore and background separation, a graph search is done by treating the image as a graph of connected pixels to find pixels that stand out. It does not care about “objects of interest.” It made me curious if it was possible to give additional semantic information in the process.

The followings are the questions that I had while reading the paper:

1. Do you think the qualities that PTP looks for is enough to measure the score of the quality of segmented images? What other properties would a failed segmentation have? One quality I can think of is that failed segmentations often have disjoint parts in the segmentations.

2. Can you think of some cases where PTP could fail? Would there be any case where the score for a segmentation score really low even if the segmentation was done correctly?

3. As I’ve written in my reflection, are there methods that allow segmentation algorithms to consider the “interest” for an object? For example, if an image contained a car and a cat both in the foreground and the researcher was interested in the cat, would the algorithm be able to only separate out the cat?

Read More

03/04/20 – Akshita Jha – Pull the Plug? Predicting If Computers or Humans Should Segment Images

Summary:
“Pull the Plug? Predicting If Computers or Humans Should Segment Images” by Gurari et. al. talks about image segmentation. They propose a resource allocation framework that tries to predict when best to use a computer for segmenting images and when to switch to humans. Image segmentation is the process of “partitioning a single image into multiple segments” in order to simplify the image into something that is easier to analyze. The authors implement two systems that decide when to replace humans with computers to create fine-grained segments and when to replace computers with humans in order to get coarse segments. They demonstrate through experiments that this mixed model of humans and computers beats the state of the art systems for image segmentation. The authors use the resource allocation framework, “Pull the Plug”, on humans or computers. They do this by giving the system an image and trying to predict if an annotation should from a human or a computer. The authors evaluate the model using Pearson’s correlation coefficient (CC) and mean absolute error (MAE). CC indicates the correlation strength of the predicted score to the actual scores given by the Jaccard index on the ground truth. MAE is the average prediction errors. The authors thoroughly experiment with initializing segmentation tools and reducing human effort initialization.

Reflections:
This is an interesting work that successfully makes uses of mixed modes involving both humans and computers to enrich the precision and accuracy of a task. The two methods that the authors design for segmenting an image was particularly thoughtful. First, given an image, the authors design a system that tries to predict whether the image requires fine-grained segmentation or coarse-grained segmentation. This is non-trivial as this task requires the system to possess a certain level of “intelligence”. The authors use segmentation tolls but the motivation of the system design is to remain agnostic to these particular segmentation tools. The systems rank several segmentation tools by using a tool designed by the authors to predict the quality of the segmentation. The system then allocates the available human budget to create coarse segmentations. The second system tries to capture whether an image requires fine-grained segmentation or not. They do this by building on the coarse segmentation given by the first system. The second system refines the segmentation and allocates the available human budget to create fine-grained segmentation for low predicted quality segmentations. Both these tasks rely on the system proposed by the authors to predict the quality of candidate segmentation.

Questions:
1. The authors rely on their proposed system of predicting the quality of candidate segmentations. What kind of errors do you expect?
2. Can you think of a way to improve this system?
3. Can we replace the segmentation quality prediction system with a human? Do you expect the system to improve or would the performance go down? How would it affect the overall experience of the system?
4. In most such systems, humans are needed only for annotation. Can we think of more creative ways to engage humans while improving the system performance?

Read More

Subil Abraham – 03/04/2020 – Pull the Plug

The paper proposes a way of solving the issue of deciding when a computer or human should do the work of foreground segmentation of images. Foreground segmentation is a common task in computer vision where the idea is that there is an element in an image that is the focus of the image and that is what is needed for actual processing. However, automatic foreground segmentation is not always reliable so sometimes it is necessary to get humans to do it. The important question is deciding which images you send to humans for segmentation because hiring humans are expensive. The paper proposes a machine learning method that calculates the quality of a given coarse or fine grained segmentation and decide if it is necessary to bring in a human to do the segmentation. They evaluate their framework by examining the quality of different segmentation algorithms and are able to acheive the quality equivalent to 100% human work by using only 32.5% human effort for Grab Cut segmentation, 65% human effort for Chan Vese, and 70% human effort for Lankton.

The authors have pursued a truly interesting idea in that they are not trying to create a better way of automatic image segmentation, but rather creating a way of determining if the auto image segmentation is good enough. My initial thought was couldn’t something like this be used to just make a better automated image segmenter? I mean, if you can tell the quality, then you know how to make it better. But apparently that’s a hard enough problem that it is far more helpful to just defer to a human when you predict that your segmentation quality is not where you want it. It’s interesting that they talk about pulling the plug on both computers and humans but the focus of the paper seems to be focused on pulling the plug on computers i.e. the human workers are the backup plan in case the computer can’t do the quality work and not the other way around. This applies to both their cases, coarse grained and fine grained segmentation work. I would like to see future work where the primary work is done by humans first and then test to see how pulling the plug on the human work would be effective and where the productivity would increase. This would have to be work in something that is purely in the human domain (i.e. can’t use regular office work because that is easily automatable).

  1. What are examples of work where we pull the plug on the human first, rather pulling the plug on computers?
  2. It’s an interesting turn around that we are using AI effort to determine quality and decide when to bring humans in, rather than improving the AI of the original task itself. What other tasks could you apply this, where there are existing AI methods but an AI way of determining quality and deciding when to bring in humans would be useful?
  3. How would you set up a segmentation workflow (or another application’s workflow) where when you pull the plug on the computer or human, you are giving the best case result to the other for improvement, rather than starting over from scratch?

Read More

03/04/2020 – Palakh Mignonne Jude – Pull the Plug? Predicting If Computers or Humans Should Segment Images

SUMMARY

The authors of this paper aim to build a prediction system that is capable of determining whether the segmentation of images should be done by humans or computers, keeping in mind that there is a fixed budget of human annotation effort. They focus on the task of foreground object segmentation. They utilized varied domain image datasets such as the Biomedical Image Library with 271 grayscale microscopy images sets, Weizmann with 100 grayscale everyday object images, and Interactive Image Segmentation with 151 RGB everyday object images with the aim of showcasing the generalizability of their technique. They developed a resource allocation framework ‘PTP’ that predicts if it should ‘Plug The Plug’ on machines or humans. They conducted studies on both coarse segmentation as well as fine-grained segmentation. The ‘machine’ algorithms were selected from among the algorithms currently used for foreground segmentation such as Otsu thresholding, Hough transform, etc. The regression model was built using a multiple linear regression model. The 522 images from the 3 data sets mentioned earlier were given to crowd workers from AMT to perform coarse segmentation. The authors found that their proposed system was able to eliminate 30-60 minutes of human annotation time.

REFLECTION

I liked the idea of the proposed system that capitalized on the strengths of both humans and machines and aims to identify when the skill of one or the other is more suited for the task at hand. It reminded me about reCAPTCHA (as highlighted by the paper ‘An Affordance-Based Framework for Human Computation and Human-Computer Collaboration’) that also utilized multiple affordances (both human and machine) in order to achieve a common goal.

I found it interesting to learn that this system was able to eliminate 30-60 minutes of human annotation time. I believe that if such a system were to be used effectively, it would enable developers to build systems faster and ensure that human efforts are not wasted in any way. I thought it was good that the authors attempted to incorporate variety when selecting their data sets, however, I believe that it would have been interesting if the authors had combined these data sets with a few more data sets that contained more complex images (ones with many images that could have been in the foreground). I also liked that the authors have published their code as an open source repository for future extensions of their work.

QUESTIONS

  1. As part of this study, the authors focus on foreground segmentation. Would the proposed system extend well in case of other object segmentation or would the quality of the segmentation and the performance of the system be hampered in any way?
  2. While the authors have attempted to indicate the generalizability of their system by utilizing different data sets, the Weizmann and BU-BIL datasets were grayscale images with relatively clear foreground images. If the images were to contain multiple objects, would the amount of time that this system eliminated be as high? Is there any relation between the difficulty of the annotation task and the success of this system?
  3. Have there any been any new systems (since this paper was published) that attempt to build on top of the methodology proposed by the authors in this paper? What modifications/improvements could be made to this proposed system to improve it (if any improvement is possible)?

Read More

03/04/20 – Sukrit Venkatagiri – Pull the Plug?

Paper: Danna Gurari, Suyog Jain, Margrit Betke, and Kristen Grauman. 2016. Pull the Plug? Predicting If Computers or Humans Should Segment Images. 382–391. 

Summary: 
This paper proposes a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and methods. The framework uses a “pull-the-plug” model, predicting when to use human versus computer annotators. More specifically, the paper proposes a system that intelligently allocates computer effort to replace human effort for initial coarse segmentations. Second, it automatically identifies images to have humans re-annotate by predicting which of the images the automated methods did not segment well enough. This method could be used for a variety of uses cases, and the paper tests it on three datasets and 8 segmentation methods. The findings show that this method significantly outperformed prior work across a variety of metrics, ranging from quality prediction, initial segmentation, fine-grained segmentation, and cost.

Reflection:
Overall, this was an interesting paper to read that is largely focused on performance and accuracy. The paper shows that the methods are superior to prior work and is now the state of the art for image segmentation when it comes to these three datasets, and for saving costs. 

I wonder what this paper might have looked like if it was more focused on creativity and innovation, rather than performance and cost-savings. For example, in HCI there are studies of using crowds to generate ideas, solve mysteries, and critique designs. Perhaps this approach might be used in a way that humans and machines can provide suggestions and they build off of each other.

More specifically, related to this paper, I wonder how the results would generalize to datasets other than the three used here, or to real-world examples, for perhaps self-driving cars, etc. Certainly, a lot more work needs to be done, and the system would need to be real-time, meaning human computation might not be a feasible method for self-driving cars. Though, certainly they could be used for generating training dataset for self-driving car algorithms.

This entire approach relies on the proposed prediction module, and it would be interesting to explore other edge cases where the predictions are better made by humans rather than through machine intelligence.

Finally, the finding that the computer segmented images more similarly to experts than crowd workers was interesting, and I wonder why—was it because the computer algorithms were trained on expert-generated training sets? Perhaps the crowd workers would perform better over time or with training. In that case, the results might have been better overall when combining the two.

Questions:

  1. How might you use this approach in your class project?
  2. Where does CV fail and where can humans augment it? What about the reverse?
  3. What are the limitations of a “pull-the-plug” approach, and how can they be overcome?
  4. Where else might this approach be used?

Read More

03/04/2020 – Sushmethaa Muhundan – Pull the Plug? Predicting If Computers or Humans Should Segment Images

The paper proposes a resource allocation framework that intelligently distributes work between a human and an AI system in the context of foreground object segmentation. The advantages of using a mix of both humans and AI rather than either of them alone is demonstrated via the study conducted. The goal is to ensure that high-quality object segmentation results are produced while using considerably less human efforts involved. Two systems are implemented as part of this paper that automatically decide when to transfer control from the human to the AI component and vice versa, depending on the quality of segmentation encountered at each phase. The first system eliminates the need for human annotation effort by replacing human efforts with computers to generate coarse object segmentation which is refined by segmentation tools. The second system predicts the quality of the annotations and automatically identifies a subset of them that needs to be re-annotated by humans. Three diverse datasets were used to train and validate the system and these include datasets representing visible, phase contrast microscopy, and fluorescence microscopy images.

The paper explores leveraging the complementary strengths of humans and AI and allocates resources accordingly in order to reduce human involvement. I particularly liked the focus on quality throughout the paper. This particular system that employs a mixed approach mechanism ensures that the quality of the traditional systems which relied heavily on human involvement is met. The resultant system was successfully able to reduce significant hours of human effort and also maintain the quality of the resultant foreground object segmentation of images which is great.

Another aspect of the paper that I found impressive was the conscious effort to develop a single prediction model that is applicable across different domains. Three diverse datasets were employed as part of this initiative. The paper talks about the disadvantages of other systems that do not work well on multiple datasets. In such cases, only a domain expert or computer vision expert would be able to predict when the system would succeed. This paper claims that this is altogether avoided in this system. Also, the decision to intentionally include humans only once per image is good as opposed to the existing system where human effort is required multiple times during the initial segmentation phase of each image.

  1. This paper primarily focuses on reducing human involvement in the context of foreground object segmentation. What other applications can extend the principles of this system to achieve reduced involvement of humans in the loop while ensuring that quality is not affected?
  2. The system deals with predicting the quality of image segmentation outputs and involves the human to re-annotate only the lowest quality ones. What other ideas can be employed to ensure reduced human efforts in such a system?
  3. The paper implies that the system proposed can be applied across images from multiple domains. Were the three datasets described varied enough to ensure that this is a generalized solution?

Read More