03/04/2020 – Ziyao Wang – Combining crowdsourcing and google street view to identify street-level accessibility problems

In this paper, the authors focused on the mechanism that using untrained crowdworkers to find and label accessibility problems in Google Street View imagery. They provide the workers images from Google Street View imagery to let them find, label and access sidewalk accessibility problems. They compared the results of this labeling task completed by six dedicated labelers including three wheelchair users and by MTurk workers. The comparison shows that the crowdworkers can determining the presence of an accessibility problem with high accuracy, which means this mechanism is promising about sidewalk accessibility. However, that mechanism still have problems such as locating the GSV camera in geographic space and selecting an optimal viewpoint, sidewalk width problem and the age of the images. In the experiments, the workers cannot label some of the images due to camera position, and the images may be captured three years ago. Additionally, there is no method to measure the width of the sidewalk, which is a need by the wheelchair users.

Reflections:

The authors combined the Google Street View imagery and MTurk Crowdsourcing to build a system which can detect accessibility challenges. This kind of hybrid system has a high accuracy in the finding and labeling of such kind of accessibility challenges. If this system can be used practically, the disables will benefit a lot with the help of the system.

However, there is some problems in the system. As is mentioned in the paper, the images in the Google Street View are old. Some of the images may be captured years ago. If the detection is based on these pictures, some new access problems will be detected. For this problem, I have a rough idea about letting the users of the system to update the image library. When they found some difference between the images from library and practical sidewalk, they can upload the latest pictures captured by them. As a result, other users will not suffer from the images’ age problem. However, this solution will change the whole system. Google Street View imagery requires professional capture devices which is not available to most of the users. As a result, the Google Street View will not update its imagery using the photos captured by the users, and the system cannot update itself using the imagery. Instead, the system has to build its own image library, which is totally different from the introduced system in the paper. Additionally, the photos provided by the users may be with low resolution, and it will be difficult for the MTurk workers to label the accessibility challenges.

Similarly, the problem that the workers cannot measure the width of the sidewalk can be solved if users can upload the width when they are using the system. However, it still faces the problem of lacking an own database and the system needs to be modified hugely.

Instead of detecting accessibility challenges, I think the system is more useful in tracking and labeling bike lanes. Compared with the accessibility of sidewalk, to detect the existence of bike lanes will suffer less from the age problem, because even the bike lanes were built years ago, they can still work. Also, there is no need to measure the width of the lanes, as all the lanes should have enough space for bikes to pass.

Question:

Is there any approach to solve the age problem, camera point problem and measuring width problem in the system?

What do you think about applying such a system to track and label bike lanes?

What other kinds of street detection problems can this system being applied to?

Read More

03/04/2020- Ziyao Wang – Real-time captioning by groups of non-experts

Traditional real-time captioning tasks are completed by professional captionists. However, the cost to hire them is expensive. Alternatively, some automatic speech recognition systems have been developed. But there is still problem that these systems perform badly when the audio quality is low or there are multiple people talking. In this paper, the authors developed a system which can hire several non-expert workers to do the caption task and merge their works together to obtain a high accuracy caption output. As the workers have a significant lower salary compared with the experts, the cost will be reduced even multiple workers are hired. Also, the system has a good performance collecting workers’ jobs and merging them to get a high accuracy output with low latency.

Reflections:

When solving problems with the requirement of high accuracy and low latency, I always hold the view that only AI or experts can complete such kind of tasks. However, in this paper, the authors showed us that non-experts can also complete this kind of tasks if we can have a group of people work together.

Compared with the professionals, hiring non-experts will cost much less. Compared with AI, people can handle some complicated situations better. This system combined this two advantages and provided a cheap real-time captioning system with high accuracy.

It is for sure that this system has lots of advantages, but we should still consider it critically. For the cost, it is true that hiring non-experts will spend much less than hiring professional captionists. However, the system needs to hire 10 workers to get 80 to 90 percentage accuracy. Even though the workers have a low salary, for example 10 dollars per hour, the total cost will reach 100 dollars per hour. Hiring experts will only cost around 120 dollars for one hour, which shows that the saving of applying the system is relatively low.

For the accuracy part, there is possibility that all the 10 workers missed a part of the audio. As a result, even merging all the results provided by the workers, the system will still miss this part’s caption. Instead, though the AI system may provide caption with errors, the system can at least provide something for all words in the audio.

For these two reasons, I think hiring less workers, for example three to five workers, to fix the errors in the system generated caption will save more money while the system can still maintain high accuracy. And with the provided caption, the workers’ tasks will be easier, and they may provide more accurate results. Also, for the circumstances in which AI system performs well, the workers will not need to spent time typing, and the latency of the system will be reduced.

Questions:

What are the advantages of hiring non-expert humans to do the captioning compared with the experts or AI systems?

Will a system hiring less workers to fix the errors in the AI generated caption be cheaper? Will this system perform better?

For the system mentioned in the second question, does it have any limitations or drawbacks?

Read More

02/26/2020 – Ziyao Wang – Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning

As machine learning models are deployed in variety domains of industry, it is important to design some interpretability to help model users, such as data scientists and machine learning practitioners, better understand how these models work. However, there have been little researches focused on the evaluation of the performance of these tools. The authors in this paper did experiments and surveys to fill this gap. They interviewed 6 data scientists from a large technology company to find out the most common issues faced by data scientists. Then they conducted a contextual inquiry towards 11 participants based on the common issues using the InterpretML implementation of the Gams and the SHAP python package. Finally, they made a survey of 197 data scientists. With the experiments and surveys, the authors highlighted the misuse and over-trust problem and the need for the communication between members of HCI and ML communities.

Reflection:

Before reading this paper, I hold the view that the interpretability tools should be able to cover most of the data scientists’ need. However, now I have the view that the tools for interpretation are not designed by the ML community, which will result in the lack of accuracy of the tools. When data scientists or machine learning practitioners want to use these tools to learn how the models operate, they may face problems like misuse or over-trust. I don not think this is the users’ fault. Tools are designed for make users feel more convenient when doing tasks. If the tools will make users confuse, the developers should make change to the tools to give users better user experiences. In this case, the authors suggested that the members of HCI and ML communities should work together when developing the tools. This need the members to leverage their strength so that the designed tools can let users understand the models easily while the tools are user-friendly. Meanwhile, comprehensive instructions should be written to explain how the users can use the tools to understand the models accurately and easily. Finally, both the efficiency and accuracy of both the tools and the implementation of models will be improved.

From data scientists and machine learning practitioners’ point of view, they should try to avoid to over-trust the tools. The tools cannot fully explain the models and there may be mistakes. The users should always be critic to the tools instead of fully trusting them. They should read the instructions carefully, understand how to use the tools and what the tools are used for, what is the models being used for and how to use the models. If they can consider thoughtfully when using these tools and models, instead of guessing the meaning of the results from the tools, the number of misuse and over-trust cases will be decreased sharply.

Questions:

  1. How to design a proposed interactive interpretability tool? What kinds of interactions should be included?
  2. How to design a tool that can make users to dig the models conveniently instead of letting them use the models without knowing how the models work?
  3. How to design tools which can leverage the strength of mental models mostly

Read More

02/26/2020 – Ziyao Wang – Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment

In this paper, the authors focused on fairness of machine learning system. As machine learning has been widely applied, it is important to let the models judge fairly, which needs the evaluation from developers, users and general public. With this aim, the authors conduct an empirical study with different generated explanations to understand the meaning of them towards people’s fairness judgement of ML systems. With an experiment involving 160 MTurk workers, they found that the fairness judgement of models is a complicated problem. They found that certain explanations are considered inherently less fair and others will enhance people’s trust of the algorithm, different fairness problems may be more effectively exposed through different styles of explanation and there are individual differences because each person’s unique background and judgment criteria. Finally, they suggested that the evaluation should support different needs of fairness judgment and consider individual differences.

Reflections:

This paper provides me three main thoughts. Firstly, we should pay attention to explain the machine learning models. Secondly, we should consider different needs when evaluating the fairness of the models. Finally, when train models or design human-AI interactions, we should consider the users’ individual differences.

For the first point, the models are well trained and can perform well. However, the public may not trust them as they know little about the algorithms. If we can provide them fair and friendly explanations, the public trust in the output of machine learning systems may be increased. Also, if they are provided explanations, they may propose suggestions related to practical situations, which will improve the accuracy and fairness of the systems reversely. Due to this, all the machine learning system developers should pay more attention to write appropriate explanations.

Secondly, the explanation and the models should consider different needs. For the experienced data scientists, we could leave comprehensive explanations to let them able to dig deeper. For the people who are experiencing machine learning system for the first time, we should leave user-friendly and easy to understand explanations. For the systems which will be used by users with different backgrounds, we may need to write different versions of explanations, for example, one user-instruction and one developer instruction.

For the third point, which is the most complicated one, it is hard to implement a system which will satisfy people with different judgments. Actually, I am thinking if it is possible to develop systems with interface for users to input their bias, which may solve this problem a little bit. It is not possible to train a model which will satisfy everyone’s preference. As a result, we could only train a model which will make most of the users or majority of the public satisfied or just leave a place to let our users to select their preferences.

Questions:

What kinds of information should we provide to the users in the explanations?

Apart from the crowd-sourcing workers, which groups of people should also be involved in the survey?

What kinds of information you would like to have if you need to judge a system?

Read More

02/19/2020 – Ziyao Wang – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

The authors introduced the fact that in human-AI hybrid decision making system, the updates, which aiming at improving accuracy of AI system, may bring harmful effect to the teamwork. For experienced workers who are advised by AI system, they have built a mental model for the AI system, which will improve the correctness of teamwork’s results. However, the updates which will improve the accuracy of the AI system, may result in the difference between the updated model and the worker’s mental model. Finally, the user cannot make appropriate decisions with the help of AI system. In this paper, the researchers proposed a platform named CAJA, which can help to evaluate the compatibility between AI and human. With the results from experiments using CAJA, developers can learn how to make updates compatible while being still of high accuracy.

Reflection:

Before reading this paper, I kept the thought that it is always good to have a AI system with higher accuracy. However, this paper provides me a new point of view. Instead of only the performance of systems, we should also consider the cooperation between the system and human workers. In this paper, the updates in AI system will destroy the mental system in human mind. The experienced workers should have built a good cooperate system with the AI tools. They know about which advices should be taken and which ones may contain errors. If the patch makes the system to be accurate while reducing the correctness rate of the part which is trusted by human, the accuracy of the whole hybrid system will also be reduced. Human may not trust the updated system until they got a new balance with the updated system. During this period, the performance of this hybrid system will be reduced to a low level which is even worse than keeping the previous system which is not updated. For this reason, the developers should also try to maximize the performance of the system before release the application to the users. As a result, new updates will not make large changes to the system, and human can be more familiar to the updated system.

We can learn from this fact that we should never ignore the interaction between human and AI system. A good design of the interaction can contribute to the improvement of the performance of the whole system. In the meantime, a system with poor human-AI interaction may be harmful to the whole system. When we try to implement a system which needs both human affordance and AI affordance, we should pay more attention to the cooperate between human and AI. We should leverage the affordance from both sides, instead of only focusing on the AI system. We should put us in the position in the designer of the whole system with the view of overall situation rather than just consider ourselves as programmer and only focus on the program.

Questions:

What’s the criteria for deciding whether the updates are compatible or not?

Will releasing instructions for each update to the users valuable to reduce the harm of updates?

If we have a new version of system which will improve the accuracy greatly, however the users’ mental model is totally different from it,  how to reach a balance which will maximize the performance of the whole hybrid system?

Read More

02/05/20 – Ziyao Wang – Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

The author did a survey about how crowdsourcing was applied in research on machine learning. Firstly, previous researches were reviewed to conclude categories for the application of crowdsourcing in the machine learning area. The applications were broken into four categories: data generation, evaluating and debugging models, hybrid intelligence systems and behavior studies to inform machine learning research. In each of the categories, the author discussed several specific areas. For each area, the author concluded several related types of research and made an introduction to each of the research. Finally, the author did analyze on understanding the crowd workers. Though crowdsourcing seemed to greatly help machine learning researches, the author did not ignore the problems in this system, such as dishonesty among workers. Finally, this survey gave researchers who focused on machine learning and applied to crowdsource four advice: maintain a three relationship with crowd workers, care about good task design and use pilots.

Reflection:

From this survey, readers can have a thorough view of the applications of crowdsourcing in machine learning research. It concludes most of the state-of-the-art in machine learning areas related to crowdsourcing. Traditional machine learning always facing problems like lack of data, models cannot be judged, lack of user feedback or system not trustable. However, with the application of crowdsourcing, all these problems can be solved with the help of crowdsourcing workers. Though this is only a survey of previous researches, it actually lets readers get a comprehensive view of this combination of technology.

This survey reminds us of the importance of reviewing previous works. When we want to do research about a topic, there will be thousands of researches which may help. However, it is impossible to view all the papers. Instead, if there is a survey that summarized all previous works and categorized them into several more specific categories, we can easily get a comprehensive view of the topic and new ideas may occur. In this paper, with the research of the four categories of application of crowdsourcing in machine learning, the author comes up with the idea to do research to understand the crowd and finally made suggestions for future researchers. Similarly, if we can do a survey of what we want to do as our projects, we may find out what is a need and what is novel in this field, which will lead to the success of the projects and the development of the field.

Also, it is important to consider critically. In this survey, though the author concluded numerous contributions of crowdsourcing towards machine learning researches, he still discussed the potential risk of this application, for example, dishonesty among workers. This is important for future researches and should not be ignored. In our projects, we should also think critically so that the drawbacks of the ideas we proposed can be judged fairly and the project can be practical and valuable.

Problems:

Which factors can contribute to a good task design?

Is there any solution that can solve the problem of dishonesty among workers instead of mitigating it?

In the experiments which aim to find out user reaction towards something, can the reaction of the paid workers be considered similar to the reaction of practical users?

Read More

02-05-2020 – Ziyao Wang – Principles of Mixed-Initiative User Interfaces

The author proposed an idea about combining both automated services and direct manipulation in developing user interfaces. With the consideration about 12 factors include developing significant value-added automation, uncertainty about a user’s goals, user’s attention, costs, benefits, dialog and so on, the author did research on the Lookout system, which is mixed-initiative user interfaces that enable users and intelligent agents to collaborate efficiently. The factors of value-added calendaring and scheduling service, decision making under uncertainty, multiple interaction modalities and handling invocation failures are evaluated, and future expectations are set. After the discussion about costs and benefits and the research on the Lookout system, the combination of reasoning machinery and direct manipulation was proved to have a promising chance for improving human-computer interaction.

Reflection:

Though this paper is written in 1999, the idea behind the paper is still valuable now. The combination of automated services and direct manipulation has been widely applied to current user-interfaces. For example, in designing of the user-interface of Taobao, the developers built numerous modules. The modules can be arranged by the users according to their preference and in the meantime, there is an AI system that will arrange the modules according to the search history and the user actions. With the combination of these two arrangements, user experience is improved significantly. Apart from Taobao, most current popular applications or websites have similar systems that were recommended by this paper written in 1999. Apart from this paper, there must be other old papers which contain valuable ideas nowadays. For this reason, there is a necessity for the current researchers to review old papers regularly.

It is for sure that we need to read up-to-date papers, which represent the current state-of-the-art. However, some of the ideas prompted in the old papers still work now. Some of the ideas proposed in these papers were impossible to be implemented at that time. As a result, the papers were ignored by other researchers. However, with the development of technology, we should review the papers which propose ideas that were not possible to implement from time to time. Someday, these not real ideas may become true with technology development.

Apart from the idea proposed in the paper, I have another thought about how the author can think about this idea. At that time, researchers focused on both the tools for users to directly manipulate user-interfaces and automated services which can sense user activity and take automated actions. However, researches that focus on the combination of the two aspects are limited. The author considered from both sides and directed a new way for the improvement of human-computer interaction. Similarly, if we can combine two up-to-date research topics which have similarity, novel solutions to some of the current challenges may be proposed and this may be applied in our course projects.

Question:

Which applications applied this proposed approach combining both automated services and direct manipulation?

What should we do if the agent’s decision is conflicted with the user’s decision?

Is it ethical for agents to track user activities? If not, how can agents service automatically?

Read More

01/29/20 – Ziyao Wang – An Affordance-Based Framework for Human Computation and Human-Computer Collaboration.

In this paper, the authors conducted literature review on papers and publications which represent the state of the art in human-computer collaboration and human computation area. From these literature review, they identified the affordance into two groups: human-intelligence and machine-intelligence. Though they introduced the affordance can be split into two groups, there is also systems like reCAPTCHA and PatViz which can benefit from the combination of the two intelligences. Finally, they provided examples of how to utilize this framework and some advice on future researches. They announced that human adaptability and machine sensing is two extension of current work. Also, future work (find way to measure human work, assess human work in practice and account for individual differences in human operators) need combination of experts in theoretical computer science, as well as in psychology and neuroscience.

Reflections:

Primarily, I felt that both human affordance and machine affordance contribute to the success of current systems. There is great importance in allocate the tasks and make human and machine can support each other. Current systems may suffer from poor human-computer collaboration. For example, systems cannot assign proper work to human workers or the user interface is difficult to use. To avoid such kind of situation, policies and guidance are needed. There should be a common used evaluation criteria and make restriction on industry.

Secondly, researchers can benefit from making an overview of the related research areas. In most of the cases, to solve a problem we may need help from experts in different areas. As a result, the category of the problem may become ambiguous. As a result, researchers from different aspects may waste their effort on similar researches and they may be not able to get help from previous research in another area. For this reason, it is important to make a category of the researches with similar goals or related techniques. Current or future researches will benefit from such kind of category and the discussions between experts will become much easier. As a result, more ideas can be proposed and researchers can find out some fields which they have not been considered before.

Additionally, in the human-computation and human-computer collaborative systems, problems are solved using both human-intelligence and machine-intelligence. For such a comprehensive area, it is important to do reflection regularly. With the reflections, researchers can make comprehensive consideration on the problems they are going to solve. With the table in the paper, the affordance of human-intelligence and machine-intelligence can be overviewed. Additionally, we can find out in which areas there have been a lot of research and to which areas we should pay more attention to. With this common framework, the understanding and discussion on previous work would be much easier and novel ideas will occur. This kind of reflection can be applied in other areas too, which will result in rapid development in each industry.

Questions:

Why there is no updates in systems which is considered hard to use?

How to assess human work and machine work in practical?

For user interface, is it more important to let new workers use easily but there is limitation in customization or let experienced workers able to customize and reach high efficiency however new user may face some difficulty in using?

Read More

01/29/20 – Ziyao Wang – Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms

The authors found that most of current research on crowd work focused on AMT, and there is little exploration of other platforms. To enrich the diversity of crowd work researches and accelerate progress of development of crowd work, the authors contrast the key problems which they have found in AMT versus features of other platforms. With the comparison, the authors found different crowd platforms provide solution to different problems in AMT. However, all the platforms cannot provide sufficient support to well-designed worker analytics. Additionally, they made a conclusion that there is still a number of limitations in AMT and research on crowd work will benefit from diversifying. With the contributions of the authors, future research on alternative platforms can benefit from not considering AMT as the baseline.

Reflection:

AMT is the first platform for crowd work and many people are benefit by using this platform. However, if all companies and workers only use AMT, the lack of diversity will result in a low development progress of crowd work platform.

From AMT’s point of view, if there are no other crowd work platforms, it is hard for AMT to effectively find solutions to its limitations and problems. In this paper, for example, lack of automated tool is one of AMT’s limitations. However, the platform WorkFusion allows usage of machine automated workers, which is an improvement to AMT. If there is only one platform, it is hard for the platform to find its limitation by itself. But with the competitor platforms, to provide better user experience and defeat other adversary companies, they have to do a lot of research to keep their platforms up to date. As a result, the research of crowd work will be pushed to develop rapidly.

For other platforms, they should not just copy the pattern of AMT. Though AMT is the most popular crowd work platform, it still has some limitation. Other platforms can copy the advantages of AMT. However, they should avoid AMT’s limitations and find solutions to these drawbacks. A good baseline can help the initiation of other platforms. But if other platforms are just totally follow the baseline, all the platforms will suffer from same drawbacks and no one can propose any solution to them. To avoid this kind of situation, the companies should develop own platforms, avoid the drawbacks and learn advantages from other platforms.

The researchers should not just focus on AMT. Of course, the researches on AMT will receive more attention as most of the users use AMT. However, it is harmful to the development of crowd work platform. Even some platforms developed solutions for some problems, if the researchers just ignore these platforms, these solutions will not be able to spread out and the researchers and other companies will spend unnecessary effort to do research on similar solutions.

Question:

What is the main reason for most of the companies select AMT as their crowd work service provider?

What is the most significant advantage for each of the platforms? Is there any chance to develop a platform which has all other platforms’ advantages?

Why most of the researchers focused on AMT only? Is it possible to advice more researchers to do cross-platform analysis?

Read More

01/22/2020 – Ziyao Wang – Ghost Work

The introduction talks about what is the ghost work. People who are hired by APIs to do some work which cannot be solved by artificial intelligence are called ghost work. They may determine whether some posts contain adult content or whether the person log in to the account is the account holder. These people are like ghosts since none of the APP users or the programmers do not recognize their appearance. Chapter 1 mainly discusses about the occurrence and development of ghost work. MTurk was built when amazon facing problem with correcting e-book information. Afterwards, they used this API to hire students to do the job and more companies paid them to get similar service. With years of development, the API can let workers do macro-tasks under the leading of full-time employees. However, there are also problems in such ghost work. The hired workers can hardly protect their own profits and companies can hardly find the cracker when issue occurs.

Reflection:

This kind of ghost work is beneficial for both the companies and the workers. The workers from poor areas can make profit doing such kind of job. In the example, a skilled worker can make salary of $40 per day, which is relatively high in some areas, for example, small towns in China. Also, this kind of job can be completed without time and area limitation. This means housewives or retired people can do such kind of job.

In the meanwhile, companies can make profits too. Due to there is no time and area limitations, companies can always find cheap labor.  This means companies can save expanse on hiring and make more profits. Because they can hire people all over the world, they have workers work in different hours in the day. This is similar like they are hiring 24-hour workers, which means more profits and better user-experience.

However, this kind of ghost work contains risks too. When problems occur, companies can hardly find the crackers as they are hiring too many workers without knowing who they are. Also, human make mistakes. As a result, these hired workers are likely to make mistakes when doing tasks. One more points is that the companies do not know the background of the hired people. Sometimes the hired workers may be really bad. In the worst case, they can hardly understand the words on screen and the results will be not trustworthy.

For the employees, no one can ensure their rights. When companies refused to pay their salary, they can find nowhere to ask their salary back. Also, when they hardly solve some problems, they may find someone else has solved these problems and they can get no salary. One more point is that it cannot be ensured that there are tasks whenever. There may be limited tasks in some workers’ working time, which will make these workers receive limited salary.

Questions:

Is there any policy currently to protect the users of these APIs, both companies and hired workers?

How can the workers protect themselves when the companies refuse to pay their salary?

How to deal with mistakes caused by the workers? Any remedies or punishments?

Read More