04/29/2020 – Ziyao Wang – IdeaHound: Improving large-scale collaborative ideation with crowd-powered real-time semantic modeling

Innovation without guidance is a hard task. To help the users, the authors proposed that a map of the solution space that is being explored can inspire and direct exploration. With research on current automated approaches, they found that all the approaches cannot build adequate maps. In addition to this, all the current deployed crowdsourcing approaches require external workers to do tedious semantic judgment tasks. To resolve this problem, they presented IdeaHound. The system can seamlessly integrate semantic tasks of organizing ideas into users’ idea generation activities. They used several case studies to prove that the system can yield high-quality maps without detracting from idea generation. And the users are found willing to use the system to simultaneously generate and organize ideas, and the system can obtain more accurate models than existing approaches.

Reflections:

The idea of the paper is really interesting. Instead of hiring crowd workers to do the idea clustering, they designed an interface that can let the users write their ideas and make the users able to group the ideas by themselves. Instead of letting a group of people write the ideas and hiring another group of people to cluster the ideas, making the first group of people able to do both idea generation and idea clustering is more efficient and more accurate. According to the idea groups, the system can generate a map with all ideas locates on it. If two groups of ideas have similarities, they will position nearby. With this map, the users can be inspired and provide more ideas. Additionally, if the users write some ideas, the system will automatically recommend some ideas which have similarity with them. This can also inspire users to come up with more novel ideas. I really like the example in the authors’ presentation. When not given an inspiring map or generate ideas automatically, we may get outputs like pizzas with top of broccoli, which is hardly accepted by most of the people. This makes me aware of the importance of the system.

I think this kind of interaction should be learned by all other current applications which deployed crowdsourcing. The users of the systems are willing to do some more tasks than required if they are interesting or they think the tasks are meaningful. Humans are not hired by AIs. Instead, AIs should be the helper of humans when humans are doing some tasks. As is said, we should leverage both human strength and machine strength. In this kind of idea generation task, it is more efficient to let machines support humans to complete the tasks.

Questions:

Will you do the ideas clustering automatically when you are only asked to provide ideas while you are able to do the clustering using the system?

Can this kind of idea, letting users able to do something instead of requiring them to do something, being applied to other applications?

Do you like pizzas with top of broccoli? What is your opinion on these ideas generated by systems? What about using a system to generate ideas and let human workers select the useful ones from all the ideas?

Read More

04/29/2020 – Ziyao Wang – VisiBlends: A Flexible Workflow for Visual Blends

The authors designed the VisiBlends system, which is used to do visual blends. This is a quite challenging task before because the generated object should contain the two input objects while these objects should be still distinguishable. However, the system leveraged both human strength and AI strength to achieve this goal. Humans need to find relative images, annotate images for shape and coverage, and evaluate the outputs generated by Automatic Blend Synthesis. Machines will apply matching algorithm according to human’s classification and blend the objects. Though the evaluation section mentioned that the workflow often fails, the system is flexible, and users can iterate and adapt easily. With the system, even novices can collaboratively complete the difficult visual blending tasks easily.

Reflections:

The system is interesting and useful since a user who knows nothing about visual blending can obtain a good blended design from a group of people who knows little about design, program, or visual blending. This can be a good example of human-AI interaction. Neither the inexperienced workers nor the AI itself cannot design such a kind of challenging task. However, with the system, they can collaborate to complete the tasks.

I like the experiment in which workers are divided into two groups. One group is exposed to a set of previous blended objects while the other one does not. I was surprised about the result that the group of workers who have experience with what the previous outputs look like performs worse than the other group. I think this is because their minds are limited by the successful outputs so they cannot have novel ideas of blending after brainstorming. The system works like it uses an exhaustive mechanism to get human ideas. If the human’s ideas are limited by some previous success, the result will lose some blending with novel ideas. If I am a worker who has some experience with image blending, I would consider whether I can blend the images or not instead of just thinking about whether the output would be interesting or not. However, the blending tasks are carried out by automatic mechanisms, so those guys who list all kinds of possibilities would probably obtain better ideas.

I think the system can be better used by experts instead of workers. If the workers have no experience with this system, it is hard for them to find proper images that can be used in the blending task later. They may spend a huge amount of time uploading images having no objects with proper shapes. Instead, experts can have an image of how the output would be looked at and provide only the proper images. Or a database of images should be provided to the users so that they can only make selections from the images which are good for blending.

Questions:

How can the system’s success rate be improved?

Does it worth to hire some professionals to involve in some of the stages of the system?

Can we design a system in which the user can upload pictures, annotate the images on himself, and the system generates the output according to his annotations? What is the advantage of hiring workers to do these tasks?

Read More

04/22/2020 – Ziyao Wang – SOLVENT: A Mixed-Initiative System for Finding Analogies Between Research Papers

The authors introduced a mixed-initiative system named SOLVENT in this paper. In the system, humans annotate aspects of research papers that denote their background, purpose, mechanism, and findings. A computational model is used to construct a semantic representation from these annotations which are valuable for finding analogies among papers. They tested their system through three experiments. The first one used annotation from domain expert researchers, the second one used annotation from experts outside the papers’ domain and the last one used crowdsourcing. From the experiments’ results, the authors found that the system was able to detect analogies among different domains and even crowd-sourcing workers, who have limited domain knowledge, are able to do the annotations. Additionally, the system has a better performance than similar systems.

Reflections

I used to only search papers within the computer science area when I did my projects or tried to solve a problem. Of course, sometimes we can get inspired by the ideas in papers from other domains, but it is quite difficult to find such a paper. There are countless papers in various domains, and there was not an efficient method to find needed papers from other domains. Additionally, even we found a paper that is valuable to the problem we are solving, the lack of background knowledge may make it difficult for us to understand the ideas behind the paper.

This system is a great help to the above situation. It can let people find related papers from other domains even they have limited knowledge about that domain. Even though the amount of papers is increasing sharply, we can still find the papers we need efficiently with this system. Before, we can only search for keywords related to the specific area. With this system, we can try to search for ideas instead of specific words. This is beneficial if some of the papers used abbreviations or analogies in the titles. If we only use keyword searching, we may miss these papers. But with idea searching, these papers will be marked as valuable. Also, the human annotation in the system can help us to understand the ideas of the papers easily, and we can exclude irrelated papers with high efficiency.

One more point is that cross-domains projects or researches are increasing significantly nowadays. For these studies, they needed to read a huge amount of papers in both domains and then they can have a novel idea to solve the cross-domain problem. If these researchers can have the system, they can find similarities in both domains easily and can have headlines about the background, purpose, mechanism, and findings in the papers. The efficiency of these researches can be improved, and the researchers can find more novel interdisciplinary studies with the help of the system.

Questions:

Will the performance of the system decrease when dealing with larger size database and more difficult papers?

Is it possible to update the system results regularly when newly published papers are added into the database?

Can this system be applied in the industries? For example, find the similarity of the mechanisms in the production of two products and use the findings to improve the mechanisms.

Read More

04/22/2020 – Ziyao Wang – Opportunities for Automating Email Processing: A Need-Finding Study

The authors conducted a series of surveys regarding email automation. Firstly, they held a workshop which invited 13 computer science students who are able to program. They were required to write email rules using natural language or pseudocode to identify categories of needed email automation. Then they analyzed the source code of scripts on GitHub to see what is needed and already developed by programmers. Finally, they deployed a programmable system YouPS which enables users to write custom email automation rules and made a survey after they used the system for one week. Finally, they found that currently limited email automation cannot meet users’ requirements, about 40% or the rules cannot be deployed using existed systems. Also, they concluded these extra user requirements for future development.

Reflections

The topic of this paper is really interesting. We use email every day and sometimes are annoyed by some of the emails. Though the email platforms already deployed some automation and allow users to customize their own scripts, some of the annoying emails can still go into users’ inboxes and some important emails are classified as spams. For me, I used to adjust myself to the automation mechanism. Check my spams and delete the advertisements from the inbox every day. But it would be great if the automation can be more user-friendly and provide more labels or rules for users to customize. This paper focused on this problem and did a thorough series of surveys to understand the users’ requirements. All the example scripts shown in the results seem useful to me and I really want the system can be deployed practically.

We can also learn from the methods used by the authors to do the surveys. Firstly, they hired computer science students to find general requirements. These students can seem like pilots. From these pilots, the researchers can have an overview of what is needed by the users. Then they did background researches according to the findings from the pilots. Finally, they combined the findings from both pilots and background researches to implement a system and test the system with the crowdsource workers, who can represent the public. This series of works is a good example of our projects. For future projects, we may also follow this workflow.

From my point of view, a significant limitation in the paper is that they only test the system on a small group of people. Neither computer science students nor programmers who upload their codes to GitHub cannot represent the public. Even the crowd workers still cannot represent the public. Most of the public knows little about programming and do not complete Hits on MTurk. Their requirements are not considered. If the condition available, the surveys should be done with more people.

Questions:

What is your preference in email automation? Do you have any preference which is not provided by current automation?

Can the crowd workers represent the public?

What should we do if we want to test systems with the public?

Read More

04/15/20 – Ziyao Wang – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

The authors focused on fact-checking, which is the task of assessing the veracity of claims. They proposed a mixed-initiative approach to fact-checking. In this system, they combined human knowledge and experience with AI’s efficiency and scalability in information retrieval. They argue that if we want to use fact-checking models practically, the models should be transparent, supporting to integrating user knowledge and have quantification and communication of model uncertainty. Following these principles, they developed their mixed-initiative system and did experiments with participants from MTurk. They found that the system can help humans when they are giving correct predictions and could be harmful when they are giving wrong predictions. And the interaction between participants and the system is not as effective as expected. Finally, they found that making tasks to be games does not help in users’ performance. In conclusion, they found that users are intended to trust models, and may be affected by the models to make the wrong choice. For this reason, transparent models are important in mixed-initiative systems.

Reflection:

I have tried to use the system mentioned in the paper. It is quite interesting. However, for the first time, I used it, I am confused about what should I do to use it. Though the interface is similar to Google.com and I am quite sure I should type something into the text box, there are limited instructions about what should I type, how the system will work and what should I do after searching my typed claim. Also, after I searched for a claim, the result page is still confusing. I know the developers want to show me some findings of the claim and provide me with the prediction result of the system. However, I am still confused about what should I do, and some given searching results are not related to my typed claim.

After several times of use, I got familiar with the system and it does help in my judgement of whether a claim is correct or not. I agree with the authors that some feedbacks about not being able to interact with the system properly comes from the users’ unfamiliar of the system. But apart from this, the authors should provide more instructions to the users so that they can get familiar with the system quickly. I think this is related to the transparency of the system and may raise users’ trust.

Another issue I found during use is that there are no words like the results can only be used as a reference, you should make a judgement using your own mind, or similar explanations. I think this may be a reason that the error rate of users’ results increased significantly when the system made wrong predictions. Participants may change their own minds when they saw that the prediction result is different from their own results because they think know little about the system and may think that system would be more likely to get the correct answer. If the system is more transparent to the users, the users may be able to provide more correct answers to the claims.

Questions:

How to let the participants make correct judgements when the system provides wrong predictions?

What kinds of instructions should be added so that participants can get familiar with the system more quickly?

Can this system be used in areas other than fact-checking?

Read More

04/15/2020 – Ziyao Wang – Algorithmic accountability

In this report, the author studied about how algorithms execute and are worthy of scrutiny by computational journalists. He used methods such as transparency and reverse engineering to analyze the algorithms. Also, he analyzed five kinds of atomic decisions, including prioritization, classification, association, filtering, and algorithmic accountability, to assess algorithmic power. For the reverse engineering part, he analyzed numerous daily cases and presented a new scenario of reverse engineering which considers both inputs and outputs. He considered the variable observability of I/O relationships and identifying, sampling, and finding newsworthy stories about algorithms. Finally, the author discussed challenges that may be faced by the application of algorithmic accountability reporting in the future. Also, he proposed that transparency can be used to effectively force applications to take journalistic norms when newsroom algorithms are applied.

Reflections:

I am really interested in the reverse engineering part of this report. The author concluded different cases of researchers doing reverse engineering towards algorithms. It is quite exciting to understand the opportunities and limitations of the reverse engineering approach to investigating algorithms. And, reverse engineering is valuable in explaining how algorithms work and finding limitations of the algorithms. As many current applied algorithms or models are trained using unsupervised learning or deep learning, it is hard for us to understand and explain them. We can only use metrics like recall or precision to evaluate them. But with reverse engineering, we can know about how the algorithms work and modify them to avoid limitations and potential discriminations. However, I think there may be some ethical issues in reverse engineering. When some bad guys did reverse engineering to some applications, they can steal the ideas in the developed applications or algorithms. Or, they may bypass the security system of the application making use of the drawbacks they found using reverse engineering.

For the algorithmic transparency, I felt that I paid little attention to this principle before. I used to only consider whether the algorithm works or not. However, after reading this report, I felt that algorithmic transparency is an important aspect of system building and maintenance. Instead of letting researchers employing reverse engineering to find the limitations of systems, it is better to make some part of the algorithms, the use of the algorithms and some other data to the public. On one hand, this will raise the public trust of the system due to its transparency. On the other hand, experts from outside the company or the organization can make a contribution to the improvement and secure the system. However, currently, transparency is far from a complete solution to balancing algorithmic power. Apart from the author’s idea that researchers can apply reverse engineering to analyze the systems, I think both corporations and governments can pay more attention to the transparency of the algorithms.

Questions:

I am still confused about how to find the story behind the input-output relationship after reading the report. How can we find out how the algorithm operates with an input-output map?

How can we avoid crackers making use of reverse engineering to do attacks?

Apart from journalists, which groups of people should also employ reverse engineering to analyze systems?

Read More

04/08/2020 – Ziyao Wang – CrowdScape: Interactively Visualizing User Behavior and Output

The authors presented CrowdScape, which is a system used for supporting the human evaluation of increasing numbers of complex crowd work. The system used interactive visualization and mixed-initiative machine learning to combine information about worker behavior with the worker outputs. This system can help users to better understand the crowd workers and leverage their strength. They developed the system from three points to meet the requirement of quality control in crowdsourcing: output evaluation, behavioral traces, and integrated quality control. They visualized the workers’ behavior, quality of outputs and combined the findings of user behavior with user outputs to evaluate the work of the crowd workers. This system has some limitations, for example, it cannot work if the user completes the work in a separate text editor and the behavior traces are not detailed enough. However, this system is still good support for quality control.

Reflections:

How to evaluate the quality of the outputs made by the crowdsourcing workers? For those complex tasks, there is no single correct answer, and we can hardly evaluate the work of the workers. Previously, researchers proposed methods in which they traced the behavior of the workers and evaluated their work. However, this kind of method is still not accurate enough as workers may provide the same output while completing tasks in different ways. The authors provide us a novel approach that evaluates the workers from outputs, behavior traces and the combination of these two kinds of information. This combination increases the accuracy of their system and is able to do analysis on some of the complex tasks.

This system is valuable for crowdsourcing users. They can better understand the workers by building a mental model of them. As a result, they can distinguish good results from the poor ones. In projects related to crowdsourcing, developers will sometimes receive a poor response by inactive workers. With this system, they can only keep the valuable results for their research, which may increase the accuracy of their models, get a better view of their systems’ performance and get detailed feedback.

Also, for system designers, the visualization tool for behavioral traces is quite useful if they want to get detailed user feedback and user interactions. If they can analysis on these data, they can know what kinds of interactions are needed by their users and provide a better user experience.

However, I think there may be ethical issues in this system. Using this system, the hits publishers can obtain workers’ behavior while doing the hits. They can collect mouse movement, scrolling, keypresses, focus events and clicks information of the user. I think this may raise some privacy issues and these kinds of information may be used for crimes. The workers’ computers would be risky if their habits are collected by crackers.

Questions:

Can this system be applied to some more complex tasks other than purely generative tasks?

How can the designers use this system to design interfaces which can provide a better user experience?

How can we prevent crackers from using this system to collect user habits and do attacks on their computers?

Read More

04/08/20 – Ziyao Wang – Agency plus automation: Designing artificial intelligence into interactive systems

The author proposed that currently much developers and researchers are too optimistic as they do not see human labor under automated services. This kind of long-standing focus on only the AI side results in the lack of researches about the interaction between AI and humans. The author proposed that it is better to let AI enrich human’s intellectual work rather than let AI replace human. The author introduced interactive systems in three areas, which are data wrangling, exploratory analysis, and natural language translation. The author integrated integrate proactive computational support into these systems and described the performance of these hybrid systems using predictive models. In conclusion, he proposed that only a fluent interleaving of both automated suggestions and direct manipulation can enable more productive and flexible work.

Reflections:

As a student who had only one class about human-computer interaction before attending this course, I kept the wrong view that we should focus on the AI side of the data analytic side. Instead, there is a huge improvement in user experience if the system has a good design of human-computer interaction. The author of this paper reminds me of the importance of the interface and how AI can do for human works.

It is not correct to let a human do what they can do, let AI do want they can do and combine both sides of work simply. Instead, the interaction between humans and AI can be designed to leverage the strength of both sides and make the whole system have better performance. I like the idea about let AI to suggest for human when human is dealing with some work. This can let the results to be highly accurate and let the processing speed to decrease. Human does not need to search for information online and AI does not need to predict what is human thinking. They can just cooperate to let humans ensure accuracy while AI provides quick background searching and complex calculation. They reached harmony in the designed system.

Additionally, the author mentioned that it is not always efficient to let AI support human. For some simple job, AI can complete on itself. In these cases, it would be more efficient to let AI do the task on itself without asking advice from human workers. For a simple job, for example, some simple labeling, even human gives no advice, AI can still reach high accuracy. If we let AI wait for human response, time would be wasted. As a result, a fluent interleaving of both automated suggestions and direct manipulation can reach the highest performance of a system. We should consider both suggestion and automatic operations in our system designs.

Questions:

The author proposed that systems should have a fluent interleaving of both automated suggestions and direct manipulation. Is there any situation in which AI assistants would decrease the performance of humans and humans should complete the tasks on themselves?

What’re the criteria for determining whether the AI should make suggestions to human or it should direct manipulation?

Compared with letting humans correct AI results, what’re the advantages of letting AI suggest humans?

Read More

03/25/2020 – Ziyao Wang – All Work and No Play? Conversations with a Question-and-Answer Chatbot in the Wild

The authors recruited 337 participants with diverse backgrounds. The participants are required to use CHIP, a QA agent system, for five to six weeks. After which, they are required to do a survey about their use of the system. Then, the authors did data analytics on the survey results to find out the t kinds of conversational interactions did users have with the QA agent in the wild and the signals for inferring user satisfaction with the agent’s functional performance, and playful interactions. Finally, the authors characterized users’ conversational interactions with a QA agent in the wild, suggested signals for inferring user satisfaction which can be used to develop adaptive agents and provided nuanced understanding of the underlying functions of users’ conversational behaviors, such as distinguish conversations with instrumental purpose from conversations with playful intentions.

Reflections:

The QA agent is an important application of AI technology. Though this kind of system was designed to work as a secretary who knows everything which can be reached through Internet, and it did well in all the conversations which serve primarily instrumental purposes, it can perform badly in conversations with playful intentions. For example, Siri can help you call someone, help you schedule an Uber and help you search the instructions for your device. However, when you are happy with something and sing a song to it, it is not able to understand your meaning and may disappoint you by response to you that it cannot understand you. This is a quite hard task, as each human has its own habits and AI can hardly distinguish whether the conversation has playful intention, or the conversation is about working purpose. As working purpose conversations are more important, the systems pretends to assume most of the conversations are with instrumental purposes. This assumption will ensure that no instrumental purposed conversation will be missed, however, it may decrease users’ satisfaction about the system when user want to play with the agent. Though developers understand this fact, it is still hard to let AI system to distinguish the purpose of the conversation.

This situation can be changed with the findings in this paper. The results in this paper show us about how to distinguish the purpose of the conversations and evaluate whether the user is satisfied with the conversation or not. As a result, the agent system can adapt itself to meet users’ needs and increase users’ satisfaction. So, developers of QA agent systems should consider the characterized forms of conversational interactions and the signals in conversational interactions for inferring user

Satisfaction in their future development. I think the QA agents in the future will become more adaptive according to users’ habits and user satisfaction of the agents will increase.

Questions:

How we can make use of the characterized forms of conversational interactions? Are there any suggestions about what the agent should response in each kind of conversation?

With the signals in conversational interactions for inferring user satisfaction, how can we develop a self-adaptive agent system?

Do the young people use QA agent the most compared with other groups of people? What kinds of participants should also be recruited to extend the coverage of the findings?

Read More

03/25/2020 – Ziyao Wang – Evaluating Visual Conversational Agents via Cooperative Human-AI Games

The authors proposed that instead of only measuring the AI progress in isolation, it is better to evaluate the performance of the whole human-AI teams via interactive downstream tasks. They designed a human-AI cooperative game which requires human work with an answerer-bot to identify a secret image known to bot but unknown to the human from a pool of images. At the end of the task, the human needs to identify the secret image from the pool. Though AI trained by reinforcement learning performs better than AI trained by supervised learning with an AI questioner, there is no significant difference between them when they work with human. The result shows that there appears to be a disconnect between AI-AI and human-AI evaluations, which means – progress on former does not seem predictive of progress on latter.

Reflections:

This paper proposed that there appears to be a disconnect between AI-AI and human-AI evaluations, which is a good point in future research. Compared with hiring people to evaluate models, using AI system to evaluate systems is much more efficient and cheaper. However, the AI system which is approved by the evaluating system may performs badly when interact with human. As is proved by the authors, though the model trained by reinforcement learning performs better than models trained by supervised learning, the two kinds of models have similar performance when they cooperate with human workers. A good example is the GAN learning system. Even the generative network passed the evaluation of the discriminative network, human can still easily discriminate the generated results from practical ones in most of the cases. For example, the generated images on the website thispersondoesnotexist.com passed the discriminative network, however, for most of them we can easily find the abnormal part in the pictures, which can prove the picture is faked. This finding is important for future researches. In future researches, the researchers should not only focus on the simulating work environment of systems, which will result in totally different results in the evaluation of system’s performance. Tests in which human involves in the workflow are needed to evaluate a trained model.

However, on the other side, even though the AI evaluated models may not be able to meet human needs, the training process is much more efficient than supervised learning or learning process which involves human evaluation. From this point, though the evaluation in which only AI involves may not be that accurate, we can still apply this kind of measurement in developing. For example, we can let AI to do first round evaluation, which is cheap and highly efficient. Then, we can apply human discriminators to evaluate the AI evaluated system. As a result, the whole developing can benefit from the advantage from both sides and the evaluation process can be both efficient and accurate.

Questions:

What else can the developed Cooperative Human-AI Game can do?

What is the practical use of the ALICE?

Is it for sure that human-AI performance is more important than AI-AI performance? Is there any scenario in which AI-AI performance is more important?

Read More