02/05/2020 – Sukrit Venkatagiri – Power to the People: The Role of Humans in Interactive Machine Learning

Paper: Power to the People: The Role of Humans in Interactive Machine Learning

Authors: Saleema Amershi, Maya Cakmak, W. Bradley Knox, Todd Kulesza

Summary:
This paper talks about the rise of interactive machine learning systems, and how to improve these systems while also as users’ experiences through a set of case studies. Typically, a machine learning workflow involves a complex, back-and-forth process of collecting data, identifying features, experimenting with different machine learning algorithms, tuning parameters, and then having the results be examined by practitioners and domain experts. Then, the model is updated to take in their feedback, which can affect performance and start the cycle anew. In contrast, feedback in interactive machine learning systems are iterative, rapid, andare explicitly focused on user interaction and the ways in which end users interact with the system. The authors present case studies that explore multiple interactive and intelligent user interfaces, such as gesture-based music systems as well as visualization systems such as CueFlick and ManiMatrix. Finally, the paper concludes with a discussion of how to develop a common language across diverse fields, distilling principles and guidelines for human-machine interaction, and a call to action for increased collaboration between HCI and ML.

Reflection:
The several case studies are interesting in that they highlight the differences between typical machine learning workflows and novel IUIs, as well as the differences between humans and machines. I find it interesting that most workflows often leave the end-user out of the loop for “convenience” reasons, but it is often the end-user who is the most important stakeholder.

Similar to [1] and [2], I find it interesting that there is a call to action for developing techniques and standards for appropriately evaluating interactive machine learning systems. However, the paper does not go into much depth into this. I wonder if it is because of the highly contextual nature of IUIs that make it difficult to develop common techniques, standards, and languages. This in turn highlights some epistemological issues that need to be addressed within both the HCI and ML communities. 

Another fascinating finding is that people valued transparency in machine learning workflows, but that this transparency did not always equate to higher (human) performance. Indeed, it may just be a placebo effect where humans feel that “knowledge is power” but that it would not have made any difference. Transparency has other benefits, other than how it relates to accuracy, however. For example, transparency in how a self-driving car works can help debug whom to exactly blame in the case of a self-driving car accident. Perhaps the algorithm was at fault, a pedestrian was, the driver was, the developer was, or it was due to unavoidable circumstances, a la, a force of nature. With interactive systems, it is crucial to understand human needs and expectations.

Questions:

  1. This paper also talks about developing a common language across diverse fields. We notice the same idea in [1] and [2]. Why do you think this hasn’t happened yet?
  2. What types of ML systems might not work for IUIs? What types of systems would work well?
  3. How might these recommendations and findings change with systems where there is more than one end-user, for example, an IUI that helps an entire town decide zoning laws, or an IUI that enables a group of people to book a vacation together.
  4. What was the most surprising finding from these case studies?

References:
[1] R. Jordon Crouser and Remco Chang. 2012. An Affordance-Based Framework for Human Computation and Human-Computer Collaboration. IEEE Transactions on Visualization and Computer Graphics 18, 12: 2859–2868. https://doi.org/10.1109/TVCG.2012.195
[2] Jennifer Wortman Vaughan. 2018. Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research. Journal of Machine Learning Research 18, 193: 1–46. http://jmlr.org/papers/v18/17-234.html

Read More

02/05/20 – Runge Yan – Power to the People: The Role of Humans in Interactive Machine Learning

When given a pattern and clear instruction on classification, a machine can learn quickly on a certain task. Cases are addressed to provide a sense of what’s users’ role in interactive machine learning: how does the machine influence the users and vice versa. Then, several features of people involving in interactive machine learning are stated as guidelines to understand the end-user effect on learning process:

People are active, tend to give positive awards and they want to be a model for the learner. Also, with the nature of human intelligence, they want to provide extra information in a rather simple decision, which lead to another feature that proper transparency in the system is valued by people and therefore help reduce error rate in labeling.

Several guidelines are presented in the interactivity. Instead of a small number of professionals designing the system, people can involve more in the process and collect the data they want. A novel interactive machine learning system should be flexible on input and output: User could try input with reasonable variation, assess the quality of the model and ask even the model directly; the output can be evaluated by the users rather than “experts”, a possible explanation of error case can be provided by users and the modification of models is no longer forbidden for users.

Details are discussed to further suggest what methods are better fit in a more interactive system: Common language, principles and guidelines, techniques and standards, volume handling, algorithm and collaboration between HCI, etc. This paper laid a comprehensive foundation for future research on this topic.

Reflection

I once contributed to a dataset on sense making and explanation. My job is to write two similar sentences where only one word (phrase) is different – one of them is common sense and the other is nonsense. For further information, I wrote three sentences trying to explain why the nonsense is appropriate, with only one of them best describe the reason. The model should understand the five sentences, pick the nonsense and find the best explanation. I was asked to be kind of extreme, for example, to write down a pair of “I put an eggplant in the fridge” and “I put an elephant in the fridge”. A mild difference is not allowed, for example, “I put a TV in the fridge.” A model will learn quickly for extreme comparison, however, I’d prefer an iterative learning process where the difference narrows down (Still one of them is nonsense and the other is common sense).

When I try to be a contributor on Figure Eight (previously CrowdFlower), the tutorial and intro task is quite friendly. I was asked to identify a LinkedIn account – whether he/she ’s still working in the same position/ same company. The assistance of decision makes me feel comfortable – I know what’s my job and some possible obstacles along the way, and I can tell the difficulties increase in a reasonable way. When there’s more information in that cannot be described by selecting options, I’m able to provide additional notes to the system, which makes me feel that my work is valuable.

More interactivity is needed to improve the model into next level, but with a previous restricted rule and its restricted output, the openness of the system is a crucial point to determine.

Question

  1. More flexibility means more workload on the system and more requirement on users. How to balance user contribution? For example, if this user wants to do an experiment input and that user is unwilling to do. Will the system accept both input or only the qualified users?
  2. How do we address the contribution of the users?

Read More

2/5/2020 – Jooyoung Whang – Principles of Mixed-Initiative User Interfaces

This paper seeks to find when it is good to allow direct user manipulation versus automated services (agents) for a human-computer interaction system. The author ends up with the concept of mixed-initiative user interfaces, a system that seeks to pull out maximum efficiency using both sides’ perks and collaboration. In the proposal, the author claims that the major factor to consider when providing automated services is addressing the performance uncertainty and predicting the user’s goals. According to the paper, many poorly designed systems fail to gauge when to provide automated service and misinterpret user intention. To overcome these problems, the paper addresses that automated services should be provided when it is certain it can give additional benefits than when doing it manually by the user. The author also writes that effective and natural transfer of control to the user should be provided so that the users can efficiently recover and step forward towards their goals upon encountering errors. The paper also provides a use case of a system called, “LookOut.”

I greatly enjoyed and appreciated the example that the author provided. I personally have never used LookOut before, but it seemed like a good program from reading the paper. I liked that the program gracefully handled subtleties such as recognizing phrases like “Hmm..” to sense that a user’s thinking. It was also interesting that the paper tries to infer a user’s intentions using a probabilistic model. I recognized keywords such as utility and agents that also frequently appear in the machine learning context. In my previous machine learning experience, an agent acted according to policies leading to maximum utility scores. The paper’s approach is similar except it involves user input and the utility is the user’s goal achievement or intention. The paper was a nice refresher for reviewing what I learned in AI courses as well as putting humans into the context.

The followings are the questions that I’ve come up with while reading the paper:

1. The paper puts a lot of effort in trying to accurately acquire user intention. What if the intention was provided in the first place? For example, the user could start using the system by selecting their goal from a concise list. Would this benefit the system and user satisfaction? Would there be a case where it won’t (such as even misinterpreting the provided user goal)?

2. One of the previous week’s readings provided the idea of affordances (what a computer or a human is each better at doing than the other). How does this align with automated service versus direct human manipulation? For example, since computers are better at processing big data, tasks related to this would preferably need to be automated.

3. The paper seems to assume that the user always has a goal in mind when using the system. How about purely exploratory systems? In scientific research settings, there are a lot of times when the investigators don’t know what they are looking for. They are simply trying to explore the data and see if there’s anything interesting. One could claim that this is still some kind of a goal, but it is a very ambiguous one as the researchers don’t know what would be considered interesting. How should the system handle these kinds of cases?

Read More

02/05/2020 – Palakh Mignonne Jude – Guidelines for Human -AI Interaction

SUMMARY

In this paper, the authors propose 18 design guidelines for human-AI interaction with the aim that these guidelines would serve as a resource for practitioners. The authors codified over 150 AI-related design recommendations and then through multiple phases, and refinement processes modified this list and defined 18 generally applicable principles. As part of the first phase, the authors reviewed AI products, public articles, and relevant scholarly papers. They obtained a total of 168 potential guidelines which were then clustered to form 35 concepts. This was followed by a filtration process that reduced the number of concepts to 20 guidelines. As part of phase 2, the authors conducted a modified heuristic evaluation attempting to identify both applications and violations of the proposed guidelines. They utilized 13 AI-infused products/features as part of this evaluation study. This phase helped to merge, split, and rephrase different guidelines and reduced the total number of guidelines to 18. In the third phase, the authors conducted a user study with 49 HCI practitioners in an attempt to understand if the guidelines were applicable across multiple products and to obtain feedback about the clarity of the guidelines. The authors ensured that the participants had experience in HCI and were familiar with discount usability testing methods. Modifications were made to the guidelines based on the feedback obtained from the user study based on the level of clarity and relevance of the guidelines. In the fourth phase, the authors conducted an expert evaluation of the revisions. These experts comprised of people who had work experience in UX/HCI and were well-versed with discount usability methods. With the help of these experts, the authors assessed whether the 18 guidelines were easy to understand. After this phase, they published a final set of 18 guidelines.

REFLECTION

After reading the 1999 paper on ‘Principles of Mixed-initiative User Interfaces’, I found that the study performed by this paper was much more extensive as well as more relatable as the AI-infused systems considered were systems that I had some knowledge about as compared to the LookOut system that I have never used in the past. I felt that the authors performed a thorough comparison and included various important phases in order to formulate the best set of guidelines. I found that it was interesting that this study was performed by researchers from Microsoft 20 years after the original 1999 paper (also done at Microsoft). I believe that the authors provided a detailed analysis of each of the guidelines and that it was good that they included identifying applications of the guidelines as part of the user study.

I felt that some of the violations reported by people were very well thought out; for example, when reporting a violation for an application where the explanation was provided but inadequate with respect to the navigation product – ‘best route’ was suggested, but no criteria was given for why the route was the best. I feel that such notes provided by the users were definitely useful in helping the authors better assimilate good and generalizable guidelines.

QUESTION

  1. Which, in your experience, among the 18 guidelines did you find to be most important? Was there any guideline that appeared to be ambiguous to you? For those that have limited experience in the field of HCI, were there any guidelines that seemed unclear or difficult to understand?
  2. The authors mention that they do not explicitly include broad principles such as ‘build trust’, but instead made use of indirect methods by focusing on specific and observable guidelines that are likely to contribute to building trust. Is there a more direct evaluation that can be performed in order to measure building trust?
  3. The authors mention that it is essential that designers evaluate the influences of AI technologies on people and society. What methods can be implemented in order to ensure that this evaluation is performed? What are the long-term impacts of not having designers perform this evaluation?
  4. For the user study (as part of phase 3), 49 HCI practitioners were contacted. How was this done and what environment was used for the study?

Read More

02/05/20 – Vikram Mohanty – Principles of Mixed-Initiative User Interfaces

Paper Authors: Eric Horvitz

Summary

This is a formative paper on how mixed-initiative user interfaces should be designed, taking into account the principles surrounding users’ abilities to directly manipulate the objects, and combining it with principles of interface agents targeted towards automation. The paper outlines 12 critical factors for the effective integration of automated services with direct manipulation interfaces, and illustrates these points through different features of LookOut, a piece of software that provides automated scheduling services from emails in Microsoft Outlook.

Reflection

  1. This paper has aged well over the last 20 years. Even though this work has led to updated renditions which take into account recent developments in AI, the core principles outlined in this paper (i.e. being clear about the user’s goals, weighing in costs and benefits before intervening during the user’s actions, ability for users to refine results, etc.) still hold true till date.
  2. The AI research landscape has changed a lot since this paper came out. To give some context, modern AI-based techniques such as deep learning wasn’t prevalent both due to the lack of datasets and computing power. The internet was nowhere as big as it is right now. The cost of automating everything back then would obviously be bottlenecked by the lack of datasets. That feels like a strong motivation for aligning automated actions with the user’s goals and actions and factoring in context-dependent costs and benefits. For e.g. assigning a likelihood that an email message that has just received the focus of attention is in the goal category of “User will wish to schedule or review a calendar for this email” versus the goal category of “User will not wish to schedule or review a calendar for this email” based on the content of the messages.” This is predominantly goal-driven and involves exploring the problem space to generate the necessary dataset. Right now, we are not bottlenecked by problems like lack of computing power or unavailability of datasets, and if we do not follow what the paper advocates about aligning automated actions with the user’s goals and actions or factoring in the context, we may end up with meaningless datasets or unnecessary automation.
  3. These principles do not treat agent intervention lightly at all. In a fast-paced world, in the race towards automation, this particular point might get lost easily. For LookOut’s intervention with a dialog or action, multiple studies were conducted to identify the most appropriate timing of messaging services as a function of the nature of the message. Carefully handling the presentation of automated agents is crucial for a positive user experience.
  4. The paper highlights how the utility of system taking action when a goal is not desired can depend on any combination of the user’s attention status or the screen real estate or users being more rushed. This does not seem like something that can be easily determined by the system on its own or algorithm developers. System developers or designers may have a better understanding of such real-world possible scenarios, and therefore, this calls for researchers from both fields to work together towards a shared goal.
  5. Uncertainties or the limitations of AI should not come in the way of solving hard problems that can benefit users. Designing intelligent user interfaces that can leverage the complementary strengths of humans and AI can help solve problems that cannot be solved on its own by either parties. HCI folks have long been at the forefront of thinking about how humans will interact with AI, and how to do work that allows them to do so effectively.

Questions

  1. Which principles, in particular, do you find useful if you are designing a system where the intelligent agent is supposed to aid the users in open-ended problems that do not have a clear predetermined right/wrong solution i.e. search engines or Netflix recommendations?
  2. Why don’t we see the “genie” or “clippy” anymore? What does it tell about this – “employing socially appropriate behaviors for agent−user interaction”?
  3. A) For folks who work on building interfaces, do you feel some elements can be made smarter? How do you see using these principles in your work? B) For folks who work on developing intelligent algorithms, do you consider end-user applications in your work? How do you see using these principles in your work? Can you imagine different scenarios where your algorithm isn’t 100% accurate.

Read More

02/05/20 – Fanglan Chen – Guidelines for Human-AI Interaction

The variability of current AI designs as well as automated inferences of failures – ranging from the  disruptive or confusing to the more serious – calls for creating more effective and intuitive user experiences with AI. The paper “Guidelines for Human-AI interaction” enriches the ongoing conversation on heuristics and guidelines towards human-centered design for AI systems. In this paper, Amershi et al. identified more than 160 potential recommendations for Human-AI interaction from respected sources that ranged from scholarly research papers to blog posts and internal documents. Through a 4-phase framework, the research team systematically distilled and validated the guideline candidates into a unified set of 18 guidelines. This work empowers the community by providing a resource for designers working with AI and facilitates future research into the refinement and development of principles for human-AI interaction.

The proposed 18 guidelines in the paper are grouped into four sections that prescribe how an AI system should behave upon initial interaction, as the user interacts with the system, when the system is wrong, and over time. As far as I can see, the major research question is how to control automated inferences to some extent when they are performing under uncertainty. We can imagine that it would be extremely dangerous in scenarios in which humans are unable to intervene when AI makes incorrect decisions. Take autonomous vehicles for example, AI may behave abnormally under the real-world situations that it has not faced in its training. How to integrate efficient dismissal or correction is an important question to consider in the initial design of the autonomous system.

Also, we need to be aware of that while the guidelines for Human-AI Interaction are developed to support design decisions, they are not intended to be used as a simple checklist. One of the important intentions is to support and stimulate conversations between user experience and engineering practitioners that lead to better AI design. Another takeaway from this paper is that there will always be numerous situations where AI designers must consider trade-offs among guidelines and weigh the importance of one or more over others. Beyond the 4-phase framework presented in the paper, I think there are at least two points worth of discussion. Firstly, the 4-phase framework is more like a narrowing down process, while no open-ended questions are raised in the feedback circle. The functioning and goals of apps in different categories may vary. Rising capabilities and use cases may suggest there is a need for additional guidelines. As the AI design advances, we may need more innovative ideas about the future AI design instead of constraining to the existing guidelines. Secondly, it seems all the evaluators participated in the user study are in the domain of HCI and a number of them gain years of experience in the field. I’m wondering if the opinions of end users without HCI experience need to be considered as well and how a wider involvement would impact the final results. I think the following questions are worthy of further discussion.

  • Which of the 18 proposed design guidelines are comparatively difficult to employ in AI designs? Why?
  • Besides the proposed guidelines, are there any design guidelines worthy of attention but not discussed in the paper?
  • Some of the guidelines seem to be of greater importance than others in user experience of specific domains. Do you think the guidelines need to be tailored to the specific categories of applications?
  • In the user study, do you think it would be important to include end users who actually use the app but without experience studying on HCI?

Read More

02/05/2020 – Sukrit Venkatagiri – Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

Paper: Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research
Author: Jennifer Wortman Vaughan

Summary:
This is a survey paper that provides an overview of crowdsourcing research as it applies to the machine learning community. It first provides an overview of crowdsourcing platforms, followed by an analysis of how crowdsourcing has been used in ML research. Specifically, in generating data, evaluating and debugging models, in hybrid intelligent systems, and its use in behavioral experiments. The paper then reviews crowdsourcing literature that studies the behavior of the crowd, their motivations, and ways to improve work quality. In particular, the paper focuses on dishonest worker behavior, ethical payment for crowd work, and the communication and collaboration patterns of crowd workers. Finally, the paper concludes with a set of best practices to be followed for optimal use of crowdsourcing in machine learning research.

Reflection:
Overall, the paper provides a thorough and analytic overview of the applications of crowdsourcing in machine learning research, as well as useful best practices for machine learning researchers to better make use of crowdsourcing. 

The paper largely focuses on ways crowdsourcing has been used to advance machine learning research, but also subtly talks about how machine learning can advance crowdsourcing research. This is interesting because it points to how these two fields are highly interrelated and co-dependent. For example, with the GalaxyZoo project, researchers attempted to optimize crowd effort, which meant that fewer judgements were necessary per image, allowing more images to be annotated overall. Other interesting uses of crowdsourcing were in evaluating unsupervised models and model interpretability. 

On the other hand, I wonder what a paper that was more focused on HCI research would look like. In this paper, humans are placed “in the loop,” while in HCI (and the real world) it’s often the machine that is in the loop of a human’s workflow. For example, the paper states that hybrid intelligent systems “leverage the complementary strengths of humans and machines to expand the capabilities of AI.” A more human-centered version would be “[…] to expand the capabilities of humans.”

Another interesting point is that all the hybrid intelligent systems mentioned in Section 4 had their own metrics to assess human, machine, and human+machine performance. This speaks to the need for a common language for understanding and then assessing human-computer collaboration, which is described in more detail in [1]. Perhaps it is the unique, highly-contextual nature of the work that prevents more standard comparisons across hybrid intelligent systems. Indeed, the paper mentions this with regards to evaluating and debugging models, that “there is no objective notion of ground truth.”

Lastly, the paper talks about two relevant topics for this semester, the first is algorithmic aversion and how participants who were given more control in algorithmic decision-making systems were more accurate, not because the human judgements were more accurate, but because the humans were more willing to listen to the algorithm’s recommendations. I wonder if this is true in all contexts, and how best to incorporate this work into mixed-initiative user interfaces. The second topic of relevance is that the quality of crowd work naturally varied with payment. However, very high wages increased the quantity of work but not always the quality. Combined with the various motivations that workers have, it is not always clear how much to pay for a given task, necessitating the need for pilot studies—which this paper also heavily insists on. However, even if it was not explicitly mentioned, one thing is certain: we must pay fair wages for fair work [2].

Questions:

  1. What are some new best-practices that you learned about crowdsourcing? How do you plan to apply it in your project?
  2. How might you use crowdsourcing to advance your own research? Even if it isn’t in machine learning.
  3.  Plenty of jobs are seemingly menial, e.g., assembly jobs in factories, working in a call center, delivering mail, yet no one has tried to make these jobs more “meaningful” and motivating to increase people’s willingness to do the task.
    1. Why do you think there is such a large body of work around making crowd work more intrinsically motivating?
    2. Imagine you are doing crowd work for a living, would you prefer to be paid more for a boring task, or paid less for a task masquerading as a fun game?
  4. How much do you plan to pay crowd workers for your project? Additional reference: [2].
  5. ML systems abstract away the human labor that goes into making it work, especially as seen in the popular press. How might we highlight the invaluable role played by humans in ML systems? By “humans,” I mean the developers, the crowd workers, the end-users, etc.

References:
[1] R. Jordon Crouser and Remco Chang. 2012. An Affordance-Based Framework for Human Computation and Human-Computer Collaboration. IEEE Transactions on Visualization and Computer Graphics 18, 12: 2859–2868. https://doi.org/10.1109/TVCG.2012.195
[2] Whiting, Mark E., Grant Hugh, and Michael S. Bernstein. “Fair Work: Crowd Work Minimum Wage with One Line of Code.” In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 7, no. 1, pp. 197-206. 2019.

Read More

2/5/2020 – Jooyoung Whang – Guidelines for Human-AI Interaction

The paper is a good extraction of various design recommendations of human-AI interaction systems that have been collected for more than 20 years since the rise of AI. The authors run 4 iterations of filtering to end up with a final set of 18 guidelines that have been thoroughly reviewed and used. Their source of data comes from commercial AI products, user reviews, and related literature. In each of the iterations, the authors:

1. Extracted the initial set of guidelines

2. Reduced the number down via internal evaluation

3. Performed user study to verify relevance and clarity

4. Tested the guidelines with experts of the field

The authors provide a nicely summarized table containing all the guidelines and their examples. Rather than going in-depth about the resulting guidelines themselves, the authors focus more on the process and feedback that they received. The authors conclude by stating that the provided guidelines are mostly for general design cases and not specific ones.

When I was examining the guideline table, I liked how it was divided into four cases in the design iteration. In a usability engineering class that I took, I learned that a product’s design lifecycle consists of Analyze, Design, Prototype, and Evaluate, in their respective order (and can repeat). I could see that the guidelines focus a lot on Analyze, Design, and Evaluate. It was interesting that prototyping wasn’t strongly implied in the guidelines. I assume it may have been because the whole design iteration was considered a pass of prototyping. It may also have been because a system involving artificial intelligence is too hard to create a low fidelity prototype. The reason for going through a prototyping process is to quickly filter out what works and what doesn’t. As the nature of artificial intelligence requires extensive training and complicated reasoning, a pass of prototyping will accordingly take longer than other kinds of products.

It is very interesting that the guidelines (for long term) instruct that the AI system must inform its actions to the users. In my experience using AI systems such as voice recognition not knowing about machine learning techniques, the system mostly appeared as a black box. I also observed many people who intentionally tried not to use these kinds of systems because of suspicion. I think revealing portions of information and giving control to the users is a very good idea. This will allow more people to quickly adjust to the system.

The followings are the questions that came up to me when I was reading the paper:

1. As in my reflection, it is expensive to go through an entire design process for human-AI systems. Would there be a good workaround for this problem?

2. How much control do you think is appropriate to give to the users of the system? The paper mentions informing how the system will react to certain user actions and allowing the user to choose whether or not to use the system. But can we and should we allow further control?

3. The paper focuses on general cases of designing human-AI systems. They note that they’ve intentionally left out special cases. What kinds of special systems do you think will not need to follow the guidelines?

Read More

02/05/20 – Myles Frantz – Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

Throughout this paper, a solo Microsoft researcher created a seemingly comprehensive (and almost exhaustive it seems) methods in which Crowd Sourcing can be used to enhance and improve various aspects of Machine Learning. Not only limiting the study to one of the most common Crowdsourcing platforms; a multitude of other platforms were included within the study as well, including but not limited to: CrowdFlower, ClickWorker, and Prolific Academic. Through reading and summarizing around 200 papers, the key areas of affordance were categorized under 4 different categories; Data generation (the accuracy and quality of data being generated), Evaluating and debugging models (the accuracy of the predictions), Hybrid intelligence systems (the collaboration between human and a), and Behavioral studies to inform machine learning research (realistic human interactions and responses to ai systems). Each of these categories has several examples underneath of it, further describing various aspects, their benefits and their disadvantages. Included in these sub-categories are factors such as speech recognition, determining human behavior (and their general attitude) towards specific types of ads, and crowd workers inter communication. With these various factors laid out the author insists that the platform and the requestors ensure their crowd workers have a good relationship, have good task design, and are thoroughly tested.

I agree with the vastness and how comprehensive thus survey is. Many of the points seem to acknowledge most of the region of the research area. Furthermore, it also doesn’t seem this work could easily be set into a more compact state.

I do whole-fully agree with one of the lasting points of ensuring the consumers of the platform (requesters and the crowd workers) in a good and working relationship. There is a multitude of platforms overcorrecting or under correcting their issues upsetting their target audience and cliental, therefor creating negative press and temporarily dipping their stock. Such a leading example of this is Youtube and their child content, where people have been sending ads illegally towards children. Youtube in turn overcorrected and still ended up with negative press since they hurt several of their creators.

Though not a fault of the survey, I disagree with the methods of Hybrid Forecasting (“producing forecasts about geopolitical events”) and Understanding Reactions to Ads. These seem to be an unfortunate but inevitable outcome with how companies and potentially governments are attempting to predict and potentially get ahead of incidents. Advertisements are not as relatively bad, however in general it seems the practice of ensuring the perfect balance of targeting the user and creating the perfect environment for viewing an advertisement seems to be malicious and not for the betterment of humanity.

  • While impractically impossible, I would like to see what the industry has created in the aspect of Hybrid Forecasting. Without knowing how far this kind of technology has spread creates an imagination like a few Black Mirror episodes.
  • From the authors I would like to see which platforms host each of the subcategories of features. This could be done on the readers side though this might seem a study in and of itself.
  • My final question would be requesting a subjective comparison of the “morality” of each platform. This could be done in comparing the quality of the workers in their discussion or how strong the gamification is between platforms.

Read More

02/05/20 – Myles Frantz – Guidelines for Human-AI Interaction

Through this paper, the various Microsoft authors created, and survey tested a set of guide lines (or best approaches) for designing and creating AI Human interactions. Throughout their study, they went through 150 AI design recommendations, ran their initial set of guidelines through a strict set of heuristics, and finally through multiple rounds in user study consisting of 49 moderates (with at least 1 year of self-reported experience) HCI practitioners. From this, the resulting 18 guidelines had the categories of “Initially” (at the start of development), “During interaction”, “When wrong” (the ai system), and “Over time”. These categories include some of the following (but not limited to): “Make clear what the system can do”, “Support efficient invocation”, and “Remember recent interactions”. Throughout the user study, these guidelines were tested to how relevant they would be in the specific avenue of technology (such as Navigation and Social Networks). Throughout these ratings, at least 50% of the respondents thought the guidelines were clear, while approximately 75% of the resonant thought the guidelines were at least neutral (or all right to understand). Finally, a set of HCI experts were asked to ensure further revisions on the guidelines were accurate and better reflected the area.

I agree and really appreciate the insight into the relevancy testing of each guideline on each section of industry. Not only does this help to avoid mis-appropriation of guidelines into unintended sections, it also helps create a guideline for the guidelines. This will help ensure people implementing these set of guidelines have a better idea as to the best place they could be used.

I also agree and like the thorough testing that went into the vetting process for these guidelines. Within last weeks readings it seems the surveys were majority or solely based on the surveys of papers and subjective to the authors. Having various rounds of testing with people who have generally high average of experience within the field grants great support to the guidelines.

  • One of my questions for the authors would be a post-mortem of the results and their impact upon the industry. Regardless of the citations it would be interesting to see how many platforms integrate these guidelines into their systems and to what extent.
  • Following up on the previous question, I would like to see another paper (possibly survey) exploring the different methods of implementations used throughout the different platform. A comparison between the different platforms would help to better showcase and exemplify each guideline.
  • I would also like to see each of these guidelines run against a sample of expert psychologists and determine their affects in the long run. Along with what was described in the paper (Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research) as algorithm aversion (“a phenomenon in which people fail to trust an algorithm once they have seen the algorithm make a mistake”), I would like to see if these guidelines would create an environment making the interaction to immersive that the human subjects are either rejecting it or completely accepting of it.

Read More