03/04/2020 – Palakh Mignonne Jude – Pull the Plug? Predicting If Computers or Humans Should Segment Images

SUMMARY

The authors of this paper aim to build a prediction system that is capable of determining whether the segmentation of images should be done by humans or computers, keeping in mind that there is a fixed budget of human annotation effort. They focus on the task of foreground object segmentation. They utilized varied domain image datasets such as the Biomedical Image Library with 271 grayscale microscopy images sets, Weizmann with 100 grayscale everyday object images, and Interactive Image Segmentation with 151 RGB everyday object images with the aim of showcasing the generalizability of their technique. They developed a resource allocation framework ‘PTP’ that predicts if it should ‘Plug The Plug’ on machines or humans. They conducted studies on both coarse segmentation as well as fine-grained segmentation. The ‘machine’ algorithms were selected from among the algorithms currently used for foreground segmentation such as Otsu thresholding, Hough transform, etc. The regression model was built using a multiple linear regression model. The 522 images from the 3 data sets mentioned earlier were given to crowd workers from AMT to perform coarse segmentation. The authors found that their proposed system was able to eliminate 30-60 minutes of human annotation time.

REFLECTION

I liked the idea of the proposed system that capitalized on the strengths of both humans and machines and aims to identify when the skill of one or the other is more suited for the task at hand. It reminded me about reCAPTCHA (as highlighted by the paper ‘An Affordance-Based Framework for Human Computation and Human-Computer Collaboration’) that also utilized multiple affordances (both human and machine) in order to achieve a common goal.

I found it interesting to learn that this system was able to eliminate 30-60 minutes of human annotation time. I believe that if such a system were to be used effectively, it would enable developers to build systems faster and ensure that human efforts are not wasted in any way. I thought it was good that the authors attempted to incorporate variety when selecting their data sets, however, I believe that it would have been interesting if the authors had combined these data sets with a few more data sets that contained more complex images (ones with many images that could have been in the foreground). I also liked that the authors have published their code as an open source repository for future extensions of their work.

QUESTIONS

  1. As part of this study, the authors focus on foreground segmentation. Would the proposed system extend well in case of other object segmentation or would the quality of the segmentation and the performance of the system be hampered in any way?
  2. While the authors have attempted to indicate the generalizability of their system by utilizing different data sets, the Weizmann and BU-BIL datasets were grayscale images with relatively clear foreground images. If the images were to contain multiple objects, would the amount of time that this system eliminated be as high? Is there any relation between the difficulty of the annotation task and the success of this system?
  3. Have there any been any new systems (since this paper was published) that attempt to build on top of the methodology proposed by the authors in this paper? What modifications/improvements could be made to this proposed system to improve it (if any improvement is possible)?

Leave a Reply