04/15/20 – Lulwah AlKulaib-BelieveItOrNot

Summary

Fact checking is important to be done in a timely manner, especially nowadays when it’s used on live TV shows. While existing work presents many automated fact-checking systems, the human-in-the-loop is neglected.This paper presents the design and evaluation of a mixed initiative approach to fact-checking. The authors combine human knowledge and experience with the efficiency and scalability of automated information retrieval and machine learning. The authors present a user study in which participants used the proposed system to help their own assessment of claims. The presented results suggest that individuals tend to trust the system especially that participant accuracy assessing claims improved when exposed to correct model predictions.Yet, the participants’ trust is overestimated when the model was wrong. The exposure to the system’s predictions often reduced human accuracy. Participants that were given the option to interact with these incorrect predictions were often able improve their own performance. This suggests that in order to have better models, they have to be transparent especially when it comes to human-computer interaction as AI models might fail and humans could be the key factor in correcting them. 

Reflection

I enjoyed reading this paper. It was very informative on the importance of transparent models in AI and machine learning. Also how transparent models could make the performance better when we include the human in the loop.

In their limitations, the authors discuss important points in relying on crowdworkers. They explain how MTurk participants should not all be given the same weight when looking at their responses since different participant demographics or incentives may influence findings. For example, non-US MTurk workers may not be representative of American news consumers or familiar with the latest and that could affect their responses. The authors also acknowledge that MTurk workers are paid by the task and that could cause some of them to respond by agreeing with the model’s response when in reality that is not the case, just so they could acquire the HIT and get paid. They found a minority of these responses and it made me think of ways to mitigate it. Like the papers from last week studying the behavior of an MTurk worker while completing the task might be an indicator if the worker actually agrees with the model or it is just to get paid.

The authors mention the negative impact that could potentially stem from their work and that could be as we saw in their experiment the model did a mistake but the humans over trusted it. The dependability on AI and technology makes users give them credit more than they should and such errors could impact the users perception of the truth.Addressing these limitations should be an essential requirement for further work.

Discussion

  • Where would you use a system like this most?
  • How would you suggest to mitigate errors produced by the system?
  • As humans, we trust AI and technology more than we should, how would you redesign the experiment to ensure that the crowdworkers actually check the presented claims? 

2 thoughts on “04/15/20 – Lulwah AlKulaib-BelieveItOrNot

  1. Hi,

    I also thought that there were not enough measures to gauge when a user was overtrusting the system and how to fix this problem. Their proposed system is ultimately up to the user (as in gives too much power to the user) and the system’s classification can change in any direction according to the user input. I think a bit of a restrictive measure should be put in so that the system can stay relatively immune to user bias.
    Regarding your first question, I would use this system a lot to fact check rumors that people sometimes tell me. If I find that the system used many reliable sources to either accept or deny the claim, then I can decide the claim’s validity.

  2. I agree with your points about the limitations of the paper, and I (like I did in my summary) would like to point again to the fact that their attention check was ignored but they still included those in the results. I’m just not sure how to account for that issue, as it seems like they found that the workers were just clicking through but decided they were doing the task accurately.

    To redesign it, I would likely keep the attention check in (and actually use it), while also providing an accuracy bonus and making sure the workers understood that completing the task with accuracy would double their pay. The task itself was already on the low end by targeting $7.50 an hour, so giving a bonus that doubled their pay if they got them all right would only just net $15 an hour (which is still pretty low at 31,200 a year for a standard 40 hour work week). It’s possible that,, with this bonus, the attention check isn’t needed as they should be paying more attention anyway, but I’d include it regardless.

Leave a Reply