Summary:
The main objective of this paper is to investigate the feasibility of using crowd workers to locate and assess sidewalk accessibility problems using Google Street View imagery. To achieve this goal, the author conducted two studies to examine the feasibility of finding, labeling sidewalk accessibility problems. The paper uses the results of the first study to prove the possibility of labeling tasks, define what does good labeling performance like, and also provide verified ground truth labels that can be used to assess the performance of crowd workers. Then, the paper evaluates the annotation correctness from two discrete levels of granularity: image level and pixel level. The previous evaluation check for the absence or presence of a label and the later examination in a more precise way, which related to image segmentation work in computer vision. Finally, the paper talked about the quality control mechanisms, which include statistical filtering, an approach for revealing effective performance thresholds for eliminating poor quality turkers, and verification interface, which is a subjective approach to validates labels.
Reflection:
The most impressive point in this paper is the feasibility study, study 1. Since this study not only investigates the feasibility of the labeling work but also provides a standards of good labeling performance and indicate the validated ground truth labels, which can be used to evaluate the crowd worker’s performance. This pre-study provides all the clues, directions, and even the evaluation matrix for the later experiment. It provides the most valuable information for the early stage of the research with a very low workload and effort. I think sometimes it is a research issue that we put a lot of effort into driving the project forward instead of preparing and investigate the feasibility. As a result, we stuck by some problems that we can foresee if we conduct a pre-study.
However, I don’t think the pixel-level assessment is a good idea for this project. Because the labeling task does not require such a high accuracy for the inaccessible area, and it is to accurate to mark the inaccessible area with the unite of the pixel. As the table indicated in the paper’s results of pixel-level agreement analysis, the area overlaps for both binary classification, and multiclass classification are even no more than 50%. Also, though, the author thinks even a 10-15% overlap agreement at the pixel level would be sufficient to localize problems in images, this makes me more confused about whether the author wants to make an accurate evaluation or not.
Finally, considering our final project, it is worth to think about the number of crowd workers that we need for the task. We need to think about the accuracy of turkers per job. The paper made a point that performance improves with turker count, but these gains diminish in magnitude as group size grows. Thus, we might want to figure out the trade-off between accuracy and cost so that we can have a better idea of choice for hiring the workers.
Questions:
- What do you think about the approach for this paper? Do you believe a pre-study is valuable? Will you apply this in your research?
- What do you think about the matrix the author used for evaluating the labeling performance? What else matrix would you like to apply in assessing the rate of overlap area?
- Have you ever considered how many turkers you need to hire would meet your accuracy need for the task? How do you evaluate this number?
Word Count: 578