In this paper, the authors state that the current fully automatic fact-checking systems are not good enough for three reasons: model transparency, taking world facts into consideration, and model uncertainty communication. So, the authors went on and built a system including humans in the loop. Their proposed system uses two classifiers that each predict the reliability of a supporting document of a claim and the veracity of the document. Using these weighted classifications, the confidence of the system’s prediction about a claim is shown to the user. The users can further manipulate the system by modifying the weights of the system. The authors conducted a user study of their system with Mturk workers. The authors found that their approach was effective, but also noted that too much information or misleading predictions can lead to big user errors.
First off, it was hilarious that the authors cited Wikipedia to introduce Information Literacy in a paper about evaluating information. I personally took it as a subtle joke left by the authors. However, it also led me to a question about the system. If I did not miss it, the authors did not explain where the relevant sources or articles came from that supported a claim. I was a little concerned if some of the articles used in the study were not reliable sources.
Also, the users conducted the user study using their own defined set of claims. While I understand this was needed for efficient study, I wanted to know how the system would work in the wild. If a user searched a claim that he or she knows is true, would the system agree with high confidence? If not, would the user have been able to correct the system using their interface? It seemed that some portion of the users were confused, especially with the error correction part of the system. I think these would have been valuable to know and would seriously need to be addressed if the system were to become a commercial product.
These are the questions that I had while reading the paper:
1. How much user intervention do you think is enough for these kinds of systems? I personally think if the users are given too much power over the system, users will apply their bias to the correction and get false positives.
2. What would be a good way for the system to only retrieve ‘reliable’ sources to reference? Stating that a claim is true based on a Wikipedia article would obviously not be so assuring. Also, academic papers cannot address all claims, especially if they are social claims. What would be a good threshold? How could this be detected?
3. Given the current system, would you believe the results that the system gives? Do you think the system addresses the three requirements that the authors introduced which all fact-checking systems should possess? I personally think that system transparency is still lacking. The system shows a lot about what kind of sources it used and how much weight it’s putting into them, but it does not really explain how it made the decision.
I think what the authors do for reliable sources (at least at the scale of this project) is that they used the Emergent dataset. They talk about it a little bit. Since that is an annotated collection of articles, that might be where they look for their “trusted” sources. I feel like you would need human intervention here to vet any new sources and them them to the model’s dataset so that it can make predictions because otherwise it won’t be able to calculate reputation.