Summary
Nguyen et al.’s paper “Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking” explores the usage of automatic fact-checking, the task of assessing the veracity of claims, as an assistive technology to augment human decision making. Many previous papers propose automated fact-checking systems, but few of them consider the possibility to have humans as part of a Human-AI partnership to complete the same task. By involving humans in fact-checking, the authors study how people would understand, interact with, and establish trust with an AI fact-checking system. The authors introduce their design and evaluation of a mixed-initiative approach to fact-checking, leveraging people’s judgment and experience with the efficiency and scalability of machine learning and automated information retrieval. Their user study shows that crowd workers involved in the task tend to trust the proposed system with improved participant accuracy with the access to claims when exposed to correct model predictions. But sometimes the trust is so strong that getting exposure to the model’s incorrect predictions reduces their accuracy in the task.
Reflection
Overall, I think this paper conducted an interesting study on how the proposed system actually influences human’s access to the factuality of claims in the fact-checking task. However, the model transparency studied in this research is different from what I expected. When talking about model transparency, I am expecting an explanation of how the training data is collected, what variables are used to train the model, and how the model works in a stepwise process. In this paper, the approach to increase the transparency of the proposed system is to release the source articles based on what the model provides a true or false judgment on the given claim. The next step is letting the crowd workers in the system group go through each source of news articles and see if that makes sense and whether they agree or disagree on the system’s judgment. In this task, I feel a more important transparency problem here is how the model gets the retrieved articles and how it ranks them in a presented way. Some noises in the training data may bring some bias in the model, but there is little we can tell merely based on checking the retrieved results. That makes me think that there might be different levels of transparency, at some level, we can check the input and output at each step, and at another level, we may get exposure to what attributes the model actually uses to make the prediction.
The authors conducted three experiments with a participant survey on how users would understand, interact with, and establish trust with a fact-checking system and how the proposed system actually influences users’ access to the factuality of claims. The experiments are conducted via a comparative study between a control group and a system group to show that the proposed system actually works. Firstly, I would like to know if the randomly recruited workers in the two groups have some differences among demographics that may potentially have an impact on the final results. Is there a better way to conduct such experiments? Secondly, the performance difference between the two groups in regard to human error is so small and there is no additional proof that the performance difference is statistically significant. Thirdly, the paper reports the experimental results on five claims, even with a claim that has incorrectly supportive articles (claim 3), which seems not to be representative. The task is kind of misleading. Would it be better with quality control of the claims in the task design?
Discussion
I think the following questions are worthy of further discussion.
- Do you think with the article source presented by the system that the users develop more trust about the system?
- What are the reasons behind that some claims with the retrieval results of the proposed system downgrade the human performance in the fact-checking task?
- Do you think there is any flaw in the experimentation design? Can you think of a way to improve it?
- Do you think we need personalized results in this kind of task where the ground truth is provided? Why or why not?
I am totally agree with your opinion about the transparency. I think it is good for author to consider the transparency, but kind in a wrong direction. Public’ s expectation of transparency indicates the explanation of how systems works. We worried if there is bias included in data or the method. On the contrary, we don’t want to the system sacrifice the the performance for transparency. I feel this system design is more like to show the user that they have evidence of their output which are the resource articles. And I believe this is why users are over trust the system. In addition, it is really hard to not doubt your own conclusion when you have the different outcome with an AI machine with accuracy higher than 70% nowadays. And I think this is why user’s performance would degrade when the system output give the wrong judgement.
I am skeptical about the results that displaying sources develops more trust in the system. It does create an echo chamber for the users; which might be a good or a bad thing depending on the issue at hand. However, I don’t think that alone is enough to build more trust in the system.