03/04/20 – Lulwah AlKulaib- CrowdStreetView

Summary

The authors try to assess the accessibility of sidewalks by hiring AMT workers to analyze Google Street View images. Traditionally, sidewalk assessment is conducted in person via street audits which are  highly labor intensive and expensive or by reporting calls from citizens. The authors propose using their system as an alternative for a proactive solution to this issue. They perform two studies:

  • A feasibility study (Study 1): examines the feasibility of the labeling task with six dedicated labelers including three wheelchair users
  • A crowdsourcing study (Study 2): investigates the comparative performance of turkers

In study 1, since labeling sidewalk accessibility problems is subjective and potentially ambiguous, the authors investigate the viability of labeling across two groups:

  • Three members of the research team
  • Three wheelchair users – accessibility experts

They use the results of study 1 to provide ground truth labels to evaluate crowdworkers performance and to get a baseline understanding of what labeling this dataset looks like. In study 2, the authors investigate the potential of using crowd workers to perform the labeling task. They evaluate their performance on two levels of labeling accuracy:

  • Image level: tests for the presence or absence of the correct label in an image 
  • Pixel level: examines the pixel level accuracies of the provided labels

They show that AMT workers are capable of finding accessibility problems with an accuracy of 80.6 % and determining the correct problem type with an accuracy of 78.3%. They get better results when using majority voting as a labeling technique 86.9% and 83.9% respectively. They collected 13,379 labels, 19,189verification  labels from 402 workers. Their findings suggest that crowdsourcing both the labeling task and the verification task leads to a better quality result.

Reflection

The authors have selected experts in the paper as wheelchair users, when in real life they’re civil engineers. I wonder how that would have changed their labels/results. Since accessibility in the street is not only for wheelchair users. It’s worth investigating by using a pool of multiple experts. 

I also think that selecting the dataset of photos to work on was a requirement for this labeling system, else it would have been tedious amount of work on “bad” images. I can’t imagine how this would be a scalable system on google street view as a whole. The dataset requires refinement to be able to label.

In addition, the focal point of the camera was not considered and reduces the scalability of the project. Even though the authors suggest a solution of installing a camera angled towards sidewalks, until that is implemented, I don’t see how this model could work well in the real world (not a controlled experiment).

Discussion

  • What are improvements that the authors could have done to their analysis?
  • How would their labeling system work for random Google street view photos?
  • How would the focal point of the GSV camera affect the labeling? 
  • If cameras were angled towards sidewalks, and we were able to get a huge amount of photos for analysis, what would be a good way to implement this project?

Read More

3/4/20 – Jooyoung Whang – Pull the Plug? Predicting If Computers or Humans Should Segment Images

In this paper, the authors attempt to appropriately distribute human and computer resources for creating segmentation of foreground objects in an image to achieve highly precise segmentations. They introduce that the segmentation process consists of roughly segmenting the image (initialization), and then going through another fine-grained iteration to come up with the final result. They repeat their study for both of the steps. To figure out where to allocate human resources, the authors’ proposed an algorithm that tries to score the acquired segmentations by detecting: highly jagged edges on the boundary, non-compact segmentations, near-edge segmentation locations, and segmentation area ratio to the full image. The authors find that a mix of humans and computers for image segmentation performs better than when completely using one or the other.

I liked the authors’ proposed algorithm to detect when a segmentation fails. It was interesting to see that they focused on visible features and qualities that humans can see instead of relying on deep neural networks that are often hard to interpret the internal workings of. At the same time, I am a little concerned about whether the proposed visual features for failed segmentations are enough to generalize and scale for all kinds of images. For example, the authors note that failed segmentations often have highly jagged edges. What if the foreground object (or an animal in this case) was a porcupine? The score would be fairly low even when an algorithm correctly segments the creature from the background. Of course, the paper reports that the method generalized well for everyday images and biomedical images, so my concern may be a trivial one.

As I am not experienced in the field of image segmentation analysis, I wondered if there were any case where an image contained more than one foreground objects and only one of them is of interest to a researcher. From my short knowledge about fore and background separation, a graph search is done by treating the image as a graph of connected pixels to find pixels that stand out. It does not care about “objects of interest.” It made me curious if it was possible to give additional semantic information in the process.

The followings are the questions that I had while reading the paper:

1. Do you think the qualities that PTP looks for is enough to measure the score of the quality of segmented images? What other properties would a failed segmentation have? One quality I can think of is that failed segmentations often have disjoint parts in the segmentations.

2. Can you think of some cases where PTP could fail? Would there be any case where the score for a segmentation score really low even if the segmentation was done correctly?

3. As I’ve written in my reflection, are there methods that allow segmentation algorithms to consider the “interest” for an object? For example, if an image contained a car and a cat both in the foreground and the researcher was interested in the cat, would the algorithm be able to only separate out the cat?

Read More

03/04/20 – Lulwah AlKulaib- SocialAltText

Summary

The authors propose a system to generate Alt text for images embedded in social media posts by utilizing crowd workers. Their goal is to have a better experience for the blind and visually impared (BVI) when using social media. Existing tools provide imperfect descriptions some by automatic caption generation, and others by object recognition. These systems are not enough as in many cases their results aren’t descriptive enough for BVI users. The authors study how crowdsourcing can be used for both:

  • evaluating the value provided of existing automated approaches
  • Enabling workflows that provide scalable and useful alt text for BVI users

They utilize real-time crowdsourcing to test experiments with varied depth levels of interaction of the crowd in assisting visually impaired users. They show the shortcomings of existing AI image captioning systems and compare them with their method. The paper suggests two experiences:

  • TweetTalk: is a conversational assistant workflow.
  • Structured Q&A: that builds upon and enhances the state of the art generated captions.

They evaluated the conversational assistant with 235 crowdworkers. They evaluated 85 tweets for the baseline image caption, each tweet was evaluated 3 times with a total of 255 evaluations.

Reflection

The paper presents a novel concept and their approach is a different take on utilizing crowdworkers. I believe that the experiment would have worked better if they tested it on some visually impared users. Since the crowdworkers hired were not visually impaired, it makes it harder to say that BVI users would have the same reaction. Since the study targets BVI users, they should have been the pool of testers. People interact with the same element in different ways and what they showed seemed too controlled. Also, the questions were not all the same for all images, which makes this harder to generalize. The presented model tries to solve a problem for social media photos and not having a plan to repeat for each photo might make interpreting images difficult. 

I appreciated the authors’ use of existing systems and their attempt at improving the AI generated captions. Their results obtain better accuracy compared to state of the art work.

I would have loved seeing how different social media applications measured compared with each other. Since different applications vary in how they present photos. Twitter for example gives a limited amount of character count while Facebook could present more text which might help BVI users in understanding the image better. 

In the limitations section, the authors mention that human in the loop workflows raise privacy concerns and that the alt text would generalize to friendsourcing and utilizing social network users. I wonder how that generalizes to social media applications in real time. And how reliable would friendsourcing be for BVI users.

Discussion

  • What are improvements that you would suggest to better the TweetTalk experiment?
  • Do you know of any applications that use human in the loop in real time?
  • Would you have any privacy concerns if one of the social media applications integrated a human in the loop approach to help BVI users?

Read More

03/04/2020 – Ziyao Wang – Combining crowdsourcing and google street view to identify street-level accessibility problems

In this paper, the authors focused on the mechanism that using untrained crowdworkers to find and label accessibility problems in Google Street View imagery. They provide the workers images from Google Street View imagery to let them find, label and access sidewalk accessibility problems. They compared the results of this labeling task completed by six dedicated labelers including three wheelchair users and by MTurk workers. The comparison shows that the crowdworkers can determining the presence of an accessibility problem with high accuracy, which means this mechanism is promising about sidewalk accessibility. However, that mechanism still have problems such as locating the GSV camera in geographic space and selecting an optimal viewpoint, sidewalk width problem and the age of the images. In the experiments, the workers cannot label some of the images due to camera position, and the images may be captured three years ago. Additionally, there is no method to measure the width of the sidewalk, which is a need by the wheelchair users.

Reflections:

The authors combined the Google Street View imagery and MTurk Crowdsourcing to build a system which can detect accessibility challenges. This kind of hybrid system has a high accuracy in the finding and labeling of such kind of accessibility challenges. If this system can be used practically, the disables will benefit a lot with the help of the system.

However, there is some problems in the system. As is mentioned in the paper, the images in the Google Street View are old. Some of the images may be captured years ago. If the detection is based on these pictures, some new access problems will be detected. For this problem, I have a rough idea about letting the users of the system to update the image library. When they found some difference between the images from library and practical sidewalk, they can upload the latest pictures captured by them. As a result, other users will not suffer from the images’ age problem. However, this solution will change the whole system. Google Street View imagery requires professional capture devices which is not available to most of the users. As a result, the Google Street View will not update its imagery using the photos captured by the users, and the system cannot update itself using the imagery. Instead, the system has to build its own image library, which is totally different from the introduced system in the paper. Additionally, the photos provided by the users may be with low resolution, and it will be difficult for the MTurk workers to label the accessibility challenges.

Similarly, the problem that the workers cannot measure the width of the sidewalk can be solved if users can upload the width when they are using the system. However, it still faces the problem of lacking an own database and the system needs to be modified hugely.

Instead of detecting accessibility challenges, I think the system is more useful in tracking and labeling bike lanes. Compared with the accessibility of sidewalk, to detect the existence of bike lanes will suffer less from the age problem, because even the bike lanes were built years ago, they can still work. Also, there is no need to measure the width of the lanes, as all the lanes should have enough space for bikes to pass.

Question:

Is there any approach to solve the age problem, camera point problem and measuring width problem in the system?

What do you think about applying such a system to track and label bike lanes?

What other kinds of street detection problems can this system being applied to?

Read More

03/04/2020- Ziyao Wang – Real-time captioning by groups of non-experts

Traditional real-time captioning tasks are completed by professional captionists. However, the cost to hire them is expensive. Alternatively, some automatic speech recognition systems have been developed. But there is still problem that these systems perform badly when the audio quality is low or there are multiple people talking. In this paper, the authors developed a system which can hire several non-expert workers to do the caption task and merge their works together to obtain a high accuracy caption output. As the workers have a significant lower salary compared with the experts, the cost will be reduced even multiple workers are hired. Also, the system has a good performance collecting workers’ jobs and merging them to get a high accuracy output with low latency.

Reflections:

When solving problems with the requirement of high accuracy and low latency, I always hold the view that only AI or experts can complete such kind of tasks. However, in this paper, the authors showed us that non-experts can also complete this kind of tasks if we can have a group of people work together.

Compared with the professionals, hiring non-experts will cost much less. Compared with AI, people can handle some complicated situations better. This system combined this two advantages and provided a cheap real-time captioning system with high accuracy.

It is for sure that this system has lots of advantages, but we should still consider it critically. For the cost, it is true that hiring non-experts will spend much less than hiring professional captionists. However, the system needs to hire 10 workers to get 80 to 90 percentage accuracy. Even though the workers have a low salary, for example 10 dollars per hour, the total cost will reach 100 dollars per hour. Hiring experts will only cost around 120 dollars for one hour, which shows that the saving of applying the system is relatively low.

For the accuracy part, there is possibility that all the 10 workers missed a part of the audio. As a result, even merging all the results provided by the workers, the system will still miss this part’s caption. Instead, though the AI system may provide caption with errors, the system can at least provide something for all words in the audio.

For these two reasons, I think hiring less workers, for example three to five workers, to fix the errors in the system generated caption will save more money while the system can still maintain high accuracy. And with the provided caption, the workers’ tasks will be easier, and they may provide more accurate results. Also, for the circumstances in which AI system performs well, the workers will not need to spent time typing, and the latency of the system will be reduced.

Questions:

What are the advantages of hiring non-expert humans to do the captioning compared with the experts or AI systems?

Will a system hiring less workers to fix the errors in the AI generated caption be cheaper? Will this system perform better?

For the system mentioned in the second question, does it have any limitations or drawbacks?

Read More

03/04/2020 – Palakh Mignonne Jude – Combining Crowdsourcing and Google Street View To Identify Street-Level Accessibility Problems

SUMMARY

The authors of this paper aim to investigate the feasibility of recruiting MTurk workers to label and assess sidewalk accessibility problems as can be viewed by making use of Google Street View. The authors conducted two studies, the first, with 6 people (3 from their team of researchers and 3 wheelchair users) and the second, that investigated the performance of turkers. The authors created an interactive labeling interface as well as a validation interface (to help users to accept/reject previous labels).  The authors proposed different levels of annotation correctness comprising of two spectra – localization spectrum which includes image level and pixel level granularity and specificity spectrum which includes the amount of information evaluated for each label. They defined image-level correctness in terms of accuracy, precision, recall, and f-measure. In order to computer inter-rater agreement at the image-level, they utilized Fleiss’ kappa. In order to evaluate the more challenging pixel-level agreement, they aimed to verify the labeling by indicating that pixel-level overlap was greater between labelers on the same image versus across different images. The authors used the labels produced from Study 1 as the ground truth dataset to evaluate turker performance. The authors also proposed two quality control approaches – filtering turkers based on a threshold of performance and filtering labels based on crowdsourced validations.

REFLECTION

I really liked the motivation of this paper especially given the large number of people that have physical disabilities. I am very interested to know how something like this would extend to other countries such as India as it would greatly aid people with physical disabilities over there since there are many places with poor walking surfaces and do not have support for wheelchairs. I think that having such a system in place in India would definitely help disabled people be better informed about places that can be visited.

I also liked the quality control mechanisms of filtering tuckers and filtering labels since these appear to be good ways to improve the overall quality of the labels obtained. I thought it was interesting that the performance of the system improved with tucker count but the gains diminished in magnitude as the group size grew. I thought that the design of the labelling and verification interface was good and that it made it easy for users to perform their tasks.

QUESTIONS

  1. As indicated in the limitations section, this work ‘ignored practical aspects such as locating the GSV camera in geographical space and selecting an optimal viewpoint’. Has any follow-up study been performed that takes into account these physical aspects? How complex would it be to conduct such a study?
  2. The authors mention that image quality can be poor in some cases due to a variety of factors. How much of an impact would this cause to the task at hand? Which labels would have been most affected if the image quality was very poor?
  3. The validation of labels was performed by crowd workers via the verification interface. Would there have been any change in the results obtained if experts had been used for the validation of labels instead of crowd workers (since they may have been able to identify more errors in the labels as compared to normal crowd workers)?

Read More

03/04/20 – Sukrit Venkatagiri – Pull the Plug?

Paper: Danna Gurari, Suyog Jain, Margrit Betke, and Kristen Grauman. 2016. Pull the Plug? Predicting If Computers or Humans Should Segment Images. 382–391. 

Summary: 
This paper proposes a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and methods. The framework uses a “pull-the-plug” model, predicting when to use human versus computer annotators. More specifically, the paper proposes a system that intelligently allocates computer effort to replace human effort for initial coarse segmentations. Second, it automatically identifies images to have humans re-annotate by predicting which of the images the automated methods did not segment well enough. This method could be used for a variety of uses cases, and the paper tests it on three datasets and 8 segmentation methods. The findings show that this method significantly outperformed prior work across a variety of metrics, ranging from quality prediction, initial segmentation, fine-grained segmentation, and cost.

Reflection:
Overall, this was an interesting paper to read that is largely focused on performance and accuracy. The paper shows that the methods are superior to prior work and is now the state of the art for image segmentation when it comes to these three datasets, and for saving costs. 

I wonder what this paper might have looked like if it was more focused on creativity and innovation, rather than performance and cost-savings. For example, in HCI there are studies of using crowds to generate ideas, solve mysteries, and critique designs. Perhaps this approach might be used in a way that humans and machines can provide suggestions and they build off of each other.

More specifically, related to this paper, I wonder how the results would generalize to datasets other than the three used here, or to real-world examples, for perhaps self-driving cars, etc. Certainly, a lot more work needs to be done, and the system would need to be real-time, meaning human computation might not be a feasible method for self-driving cars. Though, certainly they could be used for generating training dataset for self-driving car algorithms.

This entire approach relies on the proposed prediction module, and it would be interesting to explore other edge cases where the predictions are better made by humans rather than through machine intelligence.

Finally, the finding that the computer segmented images more similarly to experts than crowd workers was interesting, and I wonder why—was it because the computer algorithms were trained on expert-generated training sets? Perhaps the crowd workers would perform better over time or with training. In that case, the results might have been better overall when combining the two.

Questions:

  1. How might you use this approach in your class project?
  2. Where does CV fail and where can humans augment it? What about the reverse?
  3. What are the limitations of a “pull-the-plug” approach, and how can they be overcome?
  4. Where else might this approach be used?

Read More

03/04/20 – Fanglan Chen – Real-time Captioning by Groups of Non-experts

Summary

Lasecki et al.’s paper “Real-time Captioning by Groups of Non-experts” explores a new approach of relying on a group of non-expert captionists to provide speech captions of good quality, and presents an end-to-end system called LE-GION: SCRIBE which allows collective instantaneous captioning for live lectures on-demand. In the speech captioning task, professional stenographers can achieve high accuracy. However, the manual efforts are very expensive and must be arranged in advance. For effective captioning, the researchers introduce the idea of having a group of non-expects to caption audio and merging their inputs to achieve more accurate captions. Their proposed SCRIBE has two components, one is an interface for real-time captioning designed to collect the partial captions from each crowd worker, and the other is real-time input combiner for merging the collective captions into a single out-put stream in real-time. Their experiments show that proposed solution is feasible and non-experts can provide captioning of good quality and content coverage with short per-word latency. The proposed model can be potentially extended to allow dynamic groups to exceed the capacity of individuals in various human performance tasks.

Reflection

This paper conducts an interesting study of how to achieve better performance of a single task via collaborative efforts of a group of individuals. I think this idea aligns with ensemble modeling in machine learning. The idea presented in the paper is to generate multiple partial outputs (provided by team members and crowd workers) and then use an algorithm to automatically merge all of the noisy partial inputs into a single output. Similarly, ensemble modeling is a machine learning method where multiple diverse models are developed to generate or predict an outcome, either by using multiple different algorithms or using different training data sets. Then the ensemble model aggregates the output of each base model and generates the final output. The motivation for relying on a group of non-expert captionists to achieve better performance beyond the capacity of each non-expert corresponds to the idea of using ensemble models to reduce the generalization error and get more reliable results. As long as the base models are diverse and independent, the performance of the model increases when the ensemble approach is used. This approach also seeks the collaborative efforts of crowds in obtaining the final results. In both approaches, even though the model has multiple human/machine inputs as its sources, it acts and performs as a single model. I would be curious to see how ensemble models perform on the same task compared with the crowdsourcing proposed in the paper.

In addition, I think the proposed framework in the paper may work for general audio captioning. I am wondering how it would perform in regards to domain-specific lectures. As we know, lectures in many domains, such as medical science, chemistry, psychology, etc. are expected to have some terminologies that might be difficult to capture by an individual without the professional background in the field. There would be possible cases that none of the crowd worker can type those terms correctly, which may result in the incorrect caption. I think the paper can be strengthened with a discussion about under what kind of situations the proposed method works best. To continue the point, another possibility is to leverage the advantages of pre-trained speed recognition models and crowd works to develop a human-AI team to achieve desirable performance.

Discussion

I think the following questions are worthy of further discussion.

  • Would it be helpful if the recruiting process of crowd workers involves the consideration on their backgrounds, especially for some domain-specific lectures?
  • Although ASR may not be reliable on its own, is it useful leverage it as a contributor to the input of crowd workers? 
  • Is there any other potential to add a machine-in-the-loop component in the proposed framework?
  • What do you think about the proposed approach compared with the ensemble modeling that merges the outputs of multiple speech recognition algorithms to get the final results?

Read More

03/04/20 – Fanglan Chen – Combining Crowdsourcing and Google Street View to Identify Street-level Accessibility Problems

Summary

Hara et al.’s paper “Combining Crowdsourcing and Google Street View to Identify Street-level Accessibility Problems” explores the crowdsourcing approach to locate and assess sidewalk accessibility issues by labeling Google Street View (GSV) imagery. Traditional approaches for sidewalk assessment relies on street audits which are very labor intensive and expensive or by reporting calls from citizens. The researchers propose using their designed interactive user interface as an alternative to proactively deal with this issue. Specifically, they investigates the viability of the labeling sidewalk issues amongst two groups of diligent and motivated labelers (Study 1) and then explores the potential of relying on crowd workers to perform this labeling task and evaluate performance at different levels of labeling accuracy (Study 2). By investigating the viability of labeling across two groups (three members of the research team and three wheelchair users), the results of study 1 is used to provide ground truth labels to evaluate crowd workers performance and to get a baseline understanding of what labeling this dataset looks like. Study 2 explores the potential of using crowd workers to perform the labeling task. Their performance is evaluated on both image and pixel levels of labeling accuracy. The findings suggest that it is feasible to use crowdsourcing for the labeling and verification tasks, which leads to the final result of better quality.

Reflection

Overall, this paper proposes an interesting approach for sidewalk assessment. What I think most is how feasible we can use that to deal with real-world issues. In the scenario studied by the researchers, the sidewalk under poor condition has severe problems and relates to a larger accessibility issue of urban space. The proposed crowdsourcing approach is novel. However, if we take a close look at the data source, we may question to what extent it can facilitate the assessment in real-time. It seems impossible to update the Google Street View (GSV) imagery on a daily basis. The image sources are historical instead the ones that can reflect the current conditions of the road sidewalks. 

I think the image quality may be another big problem in this approach. Firstly, the resolution of the GSV imagery is comparatively low and sometimes under poor light conditions, which is challenging to let the crowd workers make the correct judgement. There is possibility to use some existing machine learning models to enhance the image quality via increasing its resolution or adjusting the brightness. That could be a potential place to introduce the assistance of machine learning algorithms to achieve better results in the task.

In addition, the focal point of the camera was another issue which may reduce the scalability of the project. The CSV imagery is not collected merely for the sidewalk accessibility assessment, which would usually contain a lot of noises (e.g. block objects). It would be interesting to conduct a study about how much percent of the GSV imagery is of good quality in regards to the sidewalk assessment task.

Discussion

I think the following questions are worthy of further discussion.

  • Are there any other important accessible issues existing but not considered in the study?
  • What are improvements you can think about the authors could improve their analysis?
  • What other potential human performance tasks can be explored by incorporating street view images?
  • How effective do you think this approach can deal with the urgent real-world problems?

Read More

03/04/2020 – Palakh Mignonne Jude – Pull the Plug? Predicting If Computers or Humans Should Segment Images

SUMMARY

The authors of this paper aim to build a prediction system that is capable of determining whether the segmentation of images should be done by humans or computers, keeping in mind that there is a fixed budget of human annotation effort. They focus on the task of foreground object segmentation. They utilized varied domain image datasets such as the Biomedical Image Library with 271 grayscale microscopy images sets, Weizmann with 100 grayscale everyday object images, and Interactive Image Segmentation with 151 RGB everyday object images with the aim of showcasing the generalizability of their technique. They developed a resource allocation framework ‘PTP’ that predicts if it should ‘Plug The Plug’ on machines or humans. They conducted studies on both coarse segmentation as well as fine-grained segmentation. The ‘machine’ algorithms were selected from among the algorithms currently used for foreground segmentation such as Otsu thresholding, Hough transform, etc. The regression model was built using a multiple linear regression model. The 522 images from the 3 data sets mentioned earlier were given to crowd workers from AMT to perform coarse segmentation. The authors found that their proposed system was able to eliminate 30-60 minutes of human annotation time.

REFLECTION

I liked the idea of the proposed system that capitalized on the strengths of both humans and machines and aims to identify when the skill of one or the other is more suited for the task at hand. It reminded me about reCAPTCHA (as highlighted by the paper ‘An Affordance-Based Framework for Human Computation and Human-Computer Collaboration’) that also utilized multiple affordances (both human and machine) in order to achieve a common goal.

I found it interesting to learn that this system was able to eliminate 30-60 minutes of human annotation time. I believe that if such a system were to be used effectively, it would enable developers to build systems faster and ensure that human efforts are not wasted in any way. I thought it was good that the authors attempted to incorporate variety when selecting their data sets, however, I believe that it would have been interesting if the authors had combined these data sets with a few more data sets that contained more complex images (ones with many images that could have been in the foreground). I also liked that the authors have published their code as an open source repository for future extensions of their work.

QUESTIONS

  1. As part of this study, the authors focus on foreground segmentation. Would the proposed system extend well in case of other object segmentation or would the quality of the segmentation and the performance of the system be hampered in any way?
  2. While the authors have attempted to indicate the generalizability of their system by utilizing different data sets, the Weizmann and BU-BIL datasets were grayscale images with relatively clear foreground images. If the images were to contain multiple objects, would the amount of time that this system eliminated be as high? Is there any relation between the difficulty of the annotation task and the success of this system?
  3. Have there any been any new systems (since this paper was published) that attempt to build on top of the methodology proposed by the authors in this paper? What modifications/improvements could be made to this proposed system to improve it (if any improvement is possible)?

Read More