4/15/2020 – Nurendra Choudhary – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

In this paper, the authors study the human-side or human trust in automated fact checking systems. In their experiments, they show that human beings were able to improve their accuracy when subjected to correct model predictions. However, the human judgement is also shown to degrade in case of incorrect model predictions. This establishes the trustful relationship between humans and their fact-checking models. Additionally, the authors find that humans interacting with the AI system improve their predictions significantly, suggesting transparency of models as a key aspect to human-AI interaction.

The authors provide a novel mixed-initiative framework to integrate human intelligence with fact checking models. Also, they analyze the benefits and drawbacks of such integrated systems.  

The authors also point out several limitations in their approach such as non-American representation in MTurk workers and bias towards AI predictions. Furthermore, they point out the system’s effectiveness in mediating debates and also convey real-time fact checks in an argument setting. Also, interaction with the tool could serve as a platform of identifying the reason for opinion differences.

Reflection

The paper is very timely in the sense that fake news has become a widely used tool of political and social gain. People, unfamiliar with the power of the internet, tend to believe unreliable sources and form very powerful opinions based on them. Such a tool can be extremely powerful in eliminating such controversies. The idea of analyzing human role in AI fact checkers is also extremely important. AI fact checkers lack perfect accuracy and given the problem, perfect accuracy is a requirement. Hence, the role of human beings in the system cannot be undermined. However, human mental models tend to trust the system after correct predictions and do not efficiently correct itself for incorrect predictions. This becomes an inherent limitation for these AI systems. Thus, the paper’s idea of introducing transparency is extremely appropriate and necessary. Given more incite into the mechanism of fact checkers, human beings would be able to better optimize their mental models, thus improving the performance of collaborative human-AI team.

AI systems can analyze huge repositories of information and humans can perform more detailed analysis. In that sense, fact-checking human-AI team utilizes the most important capabilities of both human and AI. However, as pointed out in the paper, humans tend to overlook their own capabilities and rely on model prediction. This could be due to human trust after some correct predictions. Given the plethora of existing information, it would be really inconvenient for humans to assess it all. Hence, I believe these initial trials are extremely important to build the correct amount of trust and expectations.

Questions

  1. Can a fact-checker learn from its mistakes pointed out by humans? How would that work? Would that make the fact-checker dynamic? If so, what is the extent of this change and how would the human mental models adapt effectively to such changes?
  2. Can you suggest a better way of model interaction between humans and models? Also, what other tasks can such interaction be effective?
  3. As pointed out in the paper, humans tend to overlook their own capabilities and rely on model prediction? What is a reason for this? Is their a way to make the collaboration more effective?
  4. Here, the assumption is that human beings are the veto authority. Can there be a case when this is not true? Is it always right to trust the judgement of humans (in this case underpaid crowd workers)?

Word Count: 588

Read More

04/15/2020 – Nan LI – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

This paper introduced a mix-initiative model which allow human and machine to check the authenticity of a claim cooperatively. This fact-checking model also considers the existing interface design principle, which supports understandability and actionability of interface. The main objective of their design is to provide transparency, supports for integrating user knowledge, and explanation of system uncertainty. To prioritize transparency over raw predictive performance, the author used a more transparent prediction model, linear models instead of deep neural networks. Further, users allow to change the reputations and stances of the system prediction. To evaluate how the system could help users to assess the factuality of claims, the author conducted three user studies with MTurk workers. The study results indicate that users might over-trust the system. The system prediction can help the user when the claim is predicted as correct. At the same time, it also degrades human performance when the system prediction errors due to the biases implicit in training data.

Reflection

I think the design of this approach is valuable. Because it does not blindly pursue the accuracy of prediction results, but also consider the transparency, understandability, and actionability of the interface. These attempts would improve the user experience since the user have more knowledge of how the system works and thus provide more trust. On the other hand, this might be the cause that the user may over-trust the system, as indicated in the paper’s experiment results. But still, I think the design of the approach is a nice try.

However, I don’t see the possibility that this system can help users. Although the design is very user friendly, it does not leverage human ability; instead, it just allows humans to participant in such a fact-checking process. Even though the design of the fact-checking process is reasonable and understandable for users, but the expectations from users side require too much mental work such as read a lot of information, thinking, and reasoning. This is a reasonable process, but it is too burdensome.

Moreover, based on the observation of the figures in the paper, I don’t think the system could facilitate the user in determining the authenticity of the claim, and I believe the experiment results also found this. Further, I found that the accuracy of the user’s judgment depends more on the type of claim. Different claims have a significant difference accuracy; this impact is even higher than the effect of the system.

It is also interesting to see the user’s feedback after they complete the task. It seems one of the users has the same opinion as me regarding the amount of information needed to read. The most impressive feedback is that the user would be confused if they have more options. I think these conditions only happen when they not sure about the correctness, and they have the right to change the system output. Finally, we can also see from the comment that when the system has the same judgment as users, users will be more sure of their answer. Still, if the system predicts results indicate the different judgments with users, this will seriously affect the accuracy of the user’s judgment. This is understandable because when someone questions your decision, no matter how confident you are, you will waver a little, let alone a machine that has 70% accuracy.

Questions:

  1. Do you think the system could really help humans to detect the factuality of claims? Why or why not?
  2. When the author design the model, to achieve the goal of transparency, they give up the higher accuracy prediction model instead of using linear models. What do you think of this? Which one is more important for your design? Transparency? Accuracy?
  3. What do you think of the design interface? Does it provide too much information to users? Do you like the design or not.

Word Count: 638

Read More

04/15/2020 – Nan LI – ALGORITHMIC ACCOUNTABILITY Journalistic investigation of computational power structures

Summary:

In this paper, the author first presents the algorithmic power, prioritization, classification, association, as well as filtering. Then the author concludes based on the description of algorithmic power that a significant number of humans would be influenced by algorithms outcome. Thus, the author made the point that it is significant to interpreting the output of algorithms in the course of making higher-level decisions. Next, the author examined the possibility and weaknesses of requiring algorithm transparency. Therefore, the paper introduces a replaced method called reverse engineering. In this work, journalistics combined the interviews, document reviews as well as reverse engineering analysis to shed light on the algorithms’ functioning. They introduced five cases of studies of journalistic investigations and also presented the challenges and opportunities for doing algorithmic accountability work. The primary process of the inquiry includes identifying newsworthy algorithm, sampling the input-output relationships to study the correlations, and finally seeking a story. Finally, the author provides a series of suggestions regarding the transparency policy for algorithms.

Reflection:

First, algorithm accountability is not a new topic nowadays. With the penetration of algorithms into our lives, the application of algorithms to all walks of life, not only for entertainment, learning, daily tools, but even for the significant impact on our experiences of security issues, privacy issues, and even the distribution of social resources. People are starting to ask the question, can algorithms be trusted? To what extent are they trustworthy? I have also seen many examples of guessing and analyzing the internal structure of such a black box. I want to share one of them.

The approach of reversing engineering, especially the process that sampling the input-output relationships of the algorithms to study the correlations, remind me of a news report which identifies the bias from the algorithm. That algorithm was designed for individual risk assessment, which is predicting the likelihood of each committing a future crime. It has been increasingly common in courtrooms across the nation, but in 2014, it was accused that the risk scores might be injecting bias into the courts. The way people found the bias in the algorithm is the same as reverse engineering. Here’s there finding in that paper ( I also put the link below):

  • The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
  • White defendants were mislabeled as low risk more often than black defendants.

Based on this outcome, it seems that reverse engineering is essential and efficient. I think this is a better way to examine the algorithm accountability than transparency. As mentioned in this article, leaving aside the trade secrets problem, disclose the source code of algorithms might helpful for specialists but does not able to improve user experience since they may not make meaningful choices considering their lack of expertise. Thus, identify the issue of algorithms instead of focus on the implementation process is more efficient in encouraging the designer to perfect the algorithm.

Questions:

  1. Do you think the algorithm is trustworthy? How much confidence do you have in an algorithm?
  2. What do you think about transparency? How transparent do you think the algorithm should be?
  3. What do you think of reverse engineering? Does it work? Do you have any other examples regarding this approach?

Link:https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Word Count: 544

Read More