04/15/20 – Akshita Jha – Believe It or Not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary:
This paper discusses the issue of fact-checking, i.e., the estimation of the credibility of a given statement, which is extremely pertinent in today’s climate. With the generation and exploration of information becoming increasingly simpler, the task of judging the trustworthiness of the found information is becoming increasingly challenging. Researchers have attempted to address this issue with a plethora of AI based fact-checking tools. However, a majority of these are based on neural network models that provide no description, whatsoever, of how they have arrived at a particular verdict. It has not yet been addressed how this lack of transparency, in addition to a poor user interface, affects the usability or even, the perceived authenticity of such tools. This tends to breed mistrust in the minds of the user towards the tool, which ends up being counter-productive to the original goal. This paper tackles this important issue head-on by designing a transparent system that mixes the scope and scalability of an AI system with inputs from the user. While this transparency increases the user’s trust in the system, the user’s input also improves the predictive accuracy of the system. On a very important note, there is an additional side-benefit that this user interaction addresses the deeper issue of information illiteracy in our society, by cultivating the skill of questioning the veracity of a claim and the reliability of a source. The researchers build a model that uses NLP tools to aggregate and assess various articles related to a certain statement and assign a stance to each of the articles. The user can modify the weights associated with each of these stances and influence the verdict generated by the model on the veracity of the claim. They, further, conduct three studies, wherein they judge the usability, the effectiveness, and flaws of this model. They compare the assessments of multiple statements by the participants, before and after being exposed to the model’s prediction. They also verify if the interaction with the model provides additional support as compared to simply displaying the result. A third task estimates if the gamification of this task has any effects. While the third task seems inconclusive, the first two tasks lead the researchers to conclude that interaction with the system causes an increase in the user’s trust in the model’s results, albeit even in the case of a false prediction of the model. However, this interaction also helps improve the prediction of the model, for the claims tested herein.

Reflections:
The paper brings up an interesting point about the transparency of the model. When people talk about an AI system normally a binary system comes to the mind, which takes in a statement, assesses it in a black box, and returns a binary answer. What is interesting is that the ability to interact with the predictions of the models, enables the user to improve their own judgment and even compensate for the lacking of the model. The aspect of user interaction aspect in AI systems has been grossly overlooked. While there are some clear undeniable benefits of this model, there is a very dangerous issue that this human modifiable fact-checking could lead to. Using the slider to modify the reputation of a given source, can potentially lead to user’s inducing their own biases into the system, and effectively creating echo chambers of their own views. This could nefariously impact the verdict of the ML system and thus reinforcing the possible prejudices of the user. I would suggest that the model should assign x% weightage to its own assessment of the source and (100-x)% to the user’s assessment. This would be a step in ensuring that the user’s prejudices do not suppress the model’s judgment completely. However, without doubt, the advantage of this interaction, inadvertently, helping a user learn how to tackle misinformation and check the reputation of sources, is highly laudable. This is definitely worth considering in future models along these lines. From the human perspective, the bayesian or linear approaches adopted by these models make them very intuitive to understand. However, one must not underestimate the effectiveness of neural networks in being more powerful in aggregating relevant information and assessing its quality. A simple linear approach is bound to have its fallacies, and hence, it would be interesting to see a model with uses the power of neural networks in addition to these techniques with help with the transparency aspect. On a side note, it would have been useful to have more information on the NLP and ML methods used. The related work regarding these models is also insufficient to provide a clear background about the existing techniques. One glaring issue with the paper is their circumventing of the insignificance of their result in task #2. They mention that the p-value is just below the threshold. However, statistics teaches us that the exact value is not of importance, it’s the threshold that is set before conducting the experiment that matters. Thus, the statement “..slightly larger than the 0.05..” is simply careless.

Questions:
1. Why is there no control group in task 2 and task 3 as well?
2. What are your general thoughts on the paper? Do you approve of the methodology?

Leave a Reply