03/04/2020- Bipasha Banerjee – Pull the Plug? Predicting If Computers or Humans Should Segment Images

Summary

The paper by Gurari et al. discusses the segmentation of images and when segmentation should be done by humans and when is a machine only approach applicable. The work described in this paper is interdisciplinary, involving computer vision and human computation. They have considered both fine-grained as well as coarse-grained segmentation approaches to determine where the human or the machine perform better. The PTP framework describes whether to pull the plug on humans or machines. The framework aims to predict if the labeling of images should come from humans or machines and the quality of the labeled image. Their prediction framework is a regression model that captures the segmentation quality. The training data was populated with masks to reflect the quality of the segmentation. The three algorithms used are Hough transform with circles, Otsu thresholding, and adaptive thresholding. For labels, the Jacquard index was considered to indicate the quality of each instance. Nine features were proposed derived from the binary segmentation mask to catch the failure. It was finally derived that a mixed approach performed better than completely relying on humans or computers. 

Reflection 

The use of machines vs. humans is a complex debate. Leveraging both machine and human capabilities is necessary for efficiency and dealing with “big data.” The paper aims to find when to use computers to create coarse-grained segments and when to replace with humans for fine-grained data. I liked that the authors published the code. This helps in the advancement of research and reproducibility.

The authors have used three datasets but all based on images. In my opinion, detecting images is a relatively simple task to identify bounding boxes. I work with texts, and I have observed that segmentation results of large amounts of text are not simple. Most of the available tools fail to segment long documents like ETDs effectively. Nonetheless, segmentation is an important task, and I am intrigued to see how this work can be extended to text. 

Using crowd workers can be tricky. Although Amazon Mechanical Turk allows requesters to specify the experience, quality, etc. of workers, however, the time taken by a worker can vary depending on various factors. Would humans familiar with the dataset or the domain annotate faster? This needs to be thought of well, in my opinion, especially when we are trying to compete against machines. Machines are faster and good at handling vast amounts of data whereas; humans are good at accuracy. This paper highlights the old problem of accuracy vs. speed.

Questions

  1. The segmentation has been done on datasets with images. How does this extend to text? 
  2. Would experts on the topic or people familiar with databases require less time to annotate?
  3. Although three datasets have been used, I wonder if the domain matters? Would complex images affect the accuracy of machines?

4 thoughts on “03/04/2020- Bipasha Banerjee – Pull the Plug? Predicting If Computers or Humans Should Segment Images

  1. Hello! I found the idea of extending image segmentation to text segmentation interesting. I had similar questions about how this can be extended to other areas when I was reading the paper. I feel that while this paper is a good start in the direction of reducing human efforts whenever feasible, it is still limited to the area of image segmentation. Future work is definitely needed to extend these concepts to other areas. For instance, the idea of using the AI component to produce results, sorting these based on quality and involving the humans only for the lower quality results could be incorporated in the field of text segmentation.

  2. Hi Bipasha, interesting questions! I’m not sure what you mean with regards to trying to do segmentation with text. How would that work? Do you mean named entity recognition and tagging? Although I certainly think it could be applied to other domains, such as video and audio waveforms. This relates to your third questions, in that the domain does really matter. Perhaps what accuracy and precision is in one domain might be different in another.

    1. Hi Sukrit! Thanks for the comment. By segmentation on texts, I mean breaking a large piece of text like books into smaller segments like chapters. There are tools available on the market like Grobid, but, we have found that the performance of such tools to detect chapters is not great. Hence, I was wondering if, for such tasks, we can decide when to “pull the plug” and use humans instead of machines, that would be useful.

      By domain, I was referring to images that are hard to detect. Mostly medical images come to my mind, but I know that the authors had trained on biomedical images. I was wondering if certain images are difficult to segment (be it fine-grained/ coarse-grained) by machines all the time, and humans are the only best options?

      1. This might be interesting: CrowdForge: Crowdsourcing Complex Work (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/39980.pdf). Here, they actually do the opposite — write an article by combining separate chunks. Maybe adopting something similar could work. Seed the process with some high-level topics, assign different groups of workers to find content related to each topic and then break down into different topics. I wouldn’t be surprised if this wasn’t tried before.

        Slightly unrelated, this is a famous example of crowdsourcing being used for shortening, proofreading, and editing parts of text documents: http://up.csail.mit.edu/other-pubs/soylent.pdf

Leave a Reply