Summary
Fact checking is important to be done in a timely manner, especially nowadays when it’s used on live TV shows. While existing work presents many automated fact-checking systems, the human-in-the-loop is neglected.This paper presents the design and evaluation of a mixed initiative approach to fact-checking. The authors combine human knowledge and experience with the efficiency and scalability of automated information retrieval and machine learning. The authors present a user study in which participants used the proposed system to help their own assessment of claims. The presented results suggest that individuals tend to trust the system especially that participant accuracy assessing claims improved when exposed to correct model predictions.Yet, the participants’ trust is overestimated when the model was wrong. The exposure to the system’s predictions often reduced human accuracy. Participants that were given the option to interact with these incorrect predictions were often able improve their own performance. This suggests that in order to have better models, they have to be transparent especially when it comes to human-computer interaction as AI models might fail and humans could be the key factor in correcting them.
Reflection
I enjoyed reading this paper. It was very informative on the importance of transparent models in AI and machine learning. Also how transparent models could make the performance better when we include the human in the loop.
In their limitations, the authors discuss important points in relying on crowdworkers. They explain how MTurk participants should not all be given the same weight when looking at their responses since different participant demographics or incentives may influence findings. For example, non-US MTurk workers may not be representative of American news consumers or familiar with the latest and that could affect their responses. The authors also acknowledge that MTurk workers are paid by the task and that could cause some of them to respond by agreeing with the model’s response when in reality that is not the case, just so they could acquire the HIT and get paid. They found a minority of these responses and it made me think of ways to mitigate it. Like the papers from last week studying the behavior of an MTurk worker while completing the task might be an indicator if the worker actually agrees with the model or it is just to get paid.
The authors mention the negative impact that could potentially stem from their work and that could be as we saw in their experiment the model did a mistake but the humans over trusted it. The dependability on AI and technology makes users give them credit more than they should and such errors could impact the users perception of the truth.Addressing these limitations should be an essential requirement for further work.
Discussion
- Where would you use a system like this most?
- How would you suggest to mitigate errors produced by the system?
- As humans, we trust AI and technology more than we should, how would you redesign the experiment to ensure that the crowdworkers actually check the presented claims?