04/15/20 – Myles Frantz – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

Within this very politicly charged time, it is hard for the average person to decipher any accurate information from the news media. Making this even more difficult are all the media sources creating contradictory information. Despite the variety of companies running fact-checking sources, this team created a fact-checking system that is based on mixing both crowd source and machine learning. Using a machine learning algorithm with a user interface that allows mechanical turk workers to tweak the reputation and whether citations support a claim. These tools allow a user to tweak sources retrieved to read the raw information. The team also created a gamified interface allowing better and more integrated usage of their original system. Overall, the participants appreciated the ability to tweak the sources and to determine the raw sources supporting or not supporting the claim. 

Reflection

I think there is an inherent issue with the gaming experiment created by the researchers. Not part of the environment but based on the nature of humans. Using a gamified method, I believe humans will inherently try gaming the system. Using a smaller scale of this implemented within their research experiment while restricting it in other use cases. 

I believe a crowd worker fact checker service will not work. Given a fact checker service that is crowd sourced is an easy target for any group of malicious actors. Using a common of variety of techniques, actors have used Distributed Denial Of Service (DDOS) attacks to overwhelm and control the majority of responses. These kind of attacks have also been used for controlling block chain transactions and the flow of money. Utilizing a fully fledged crowd sourced fact-checker, this can easily be prone to being overridden through the various actors. 

In general I believe allowing users more visibility into the system encourages more usage. Using some program or some Internet of Things (IoT) device people are likely feeling as though they do not have much control over the flow of the internal programming. Creating this insight and slight control of the algorithm may help give the impression of more control to the consumers of these devices. This amount of control may help encourage people to put their trust back into programs. This is likely due to the the nature of machine learning algorithms and they’re iterative learning process. 

Questions

  • Measuring mechanical turk attention is done usually by creating a baseline question. This ensures if the work is not paying attention (I.e. clicking as fast as they can) they will not answer the baseline question accurately. Given the team did not discard these workers do you think the removal of their answers would support the theory of the team? 
  • Along the same lines of questioning, despite the team regarding the user’s other interactions as measuring their attentiveness, do you think it is wise they ignored the attention check? 
  • Within your project, are you planning on implementing a slide like this team did to help interact with your machine learning algorithm? 

Read More

04/15/2020 – Mohannad Al Ameedi – Believe it or not Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

In this paper, the authors propose a mixed initiative approach for fact checking that combine both human knowledge and experience with the automated information retrieval and machine learning. The paper discusses the challenges of the massive amount of information available today on the internet that some of them might not be accurate which introduce a risk to the information consumers. The proposed system retrieve relevant information about a certain topic and use machine learning and natural language processing to assess the factuality of the information and provide a confidence level to the users and let the user decide wither to use the information or do a manual research to validate the claims. The partnership between the artificial intelligence system and human interaction can offer effective fact checking that can support the human decision in a salable and effective way.

Reflection

I found the approach used by the authors to be very interesting. I personally had a discussion with a friend recently about a certain topic that was mentioned in Wikipedia, and I thought the numbers and facts mentioned were accurate but it turns out the information were wrong and he asked me to check an accreditable source. If I had the opportunity to use the proposed system on the paper, then accredited source could have ranked higher than Wikipedia.

The proposed system is very important in our digital age where so much information is generated on a daily bias and we are not only searching for information, but we are also receiving so much information through social media related to current events and some of these events have high impact on our life and we need to assess the factuality of these information and the proposed system can help a lot on that.

The proposed system is like a search engine that not only rank document based on relevance to the search query but also based on the fact-checking assessment of the information. The human interaction is like the relevance feedback in search engine which can improve the retrieval of the information which leads to a better ranking.

Questions

  • The AI systems can be biased because the training data can be biased. How can we make the system unbiased?
  • The proposed system use information retrieval to retrieve relevant articles about a certain topic and then use the machine learning to validate the source of the information and then present the confidence level of each article. Do you think the system should filter out articles with poor accuracy as they might confuse the user? Or they might be very valuable?
  • With the increase usage of social networking, many individuals write or share fake news intentionally or unintentionally. Millions of people post information every day. Can we use the proposed system to assess the fake news? if yes, then can we scale the system to assess millions or billions of tweets or posts?

Read More

04/15/2020 – Sushmethaa Muhundan – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

This work aims to design and evaluate a mixed-initiative approach to fact-checking by blending human knowledge and experience with the efficiency and scalability of automated information retrieval and ML. The paper positions automatic fact-checking systems as an assistive technology to augment human decision making. The proposed system fulfills three key properties namely model transparency, support for integrating user knowledge, and quantification and communication of model uncertainty. Three experiments were conducted using MTurk workers to measure the participants’ performance in predicting the veracity of given claims using the system developed. The first experiment compared users who perform the task with and without seeing ML predictions. The second compared a static interface with an interactive interface where the users were provided options to mend or override the predictions of the AI system. Results showed that the users were generally able to use the interface but this was of little use when the predictions were accurate. The last experiment compared a gamified task design with a non-gamified one, but no significant differences in performance were found. The paper also discusses the limitations of the proposed system and explores further research opportunities.

I liked the fact that the focus of the paper was more on designing automated systems that were user-friendly rather than focussing on improving prediction accuracy. The paper takes into consideration the human element of the human-AI interaction and focuses on making the system better and more meaningful. The proposed system aims to learn from the user and provide a personalized prediction based on the opinions and inputs from the user.

I liked the focus on transparency and communication. Transparent models help users better understanding the internal workings of the system and hence helps build trust. Regarding communication, I feel that conveying the confidence of the prediction helps the users make an informed decision. This is much better than a system that might have high precision but does not communicate the confidence scores. In cases where this system makes an error, the consequences are likely to be precarious since the user might blindly follow the prediction of the system.

The side-effect of making the system transparent was interesting. Not only would transparency lead to higher trust levels, but it would also help teach and structure the user’s own information literacy skills regarding the logical process to follow for assessing claim validity. Thereby, the system proposed truly leveraged the complementary strengths of the human and the AI.

  • Apart from the three properties incorporated in the study (transparency, support for integrating user knowledge, and communication of model uncertainty), what are some other properties that can be incorporated to better the AI systems?
  • The study aims to leverage the complementary strengths of humans and AI but certain results were inconclusive as noted in the paper. Besides the limitations enumerated in the paper, what are other potential drawbacks of the proposed system?
  • Given that the study presented is in the context of automated fact-checking systems, what other AI systems can these principles be applied to?

Read More

04/15/2020 – Dylan Finch – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Word count: 567

Summary of the Reading

This paper presents the design and evaluation of a system that is designed to help people check the validity of claims. The system starts with a user entering a claim into the system. The system then shows the user a list of articles related to the claim with a prediction that is based on the articles, about whether or not the claim is true. The system will give a percentage chance for whether the article is true. Each article that the system shows also shows a reputation score for the source of the article and a support score for the article. The user can then adjust these if they don’t think that the system has accurate information. 

The system seemed to help users come to the right conclusions when it had the right data but also seemed to make human judgements worse when the system had inaccurate information. This shows the usefulness of such a system and also gives a reason to be careful about implementing it.

Reflections and Connections

I think that this article tackles a very real and very important problem. Misinformation is more prevalent now than ever, and it is getting harder and harder to find the truth. This can have real effects on people’s lives. If people get the wrong information about a medication or a harmful activity, they may experience a preventable injury or even death. Misinformation can also have a huge impact on politics and may get people to vote in a way they might not otherwise.

The paper brings up the fact that people may over rely on a system like this, just blindly believing the results of the system without putting more thought into it and I think that is the paradox of this system. People want correct information and we want it to be easy to find out if something is correct or not, but the fact of the matter is that it’s just not easy to find out if something is true or not. A system like this would be great when it worked and told people the truth, but it would make the problem worse when it came to the wrong conclusion and then made more people more confident in their wrong answer. No matter how good a system is, it will still fail. Even the best journalists in the world, writing for the most prestigious newspapers in the world, will get things wrong. And, a system like this one will get things wrong even more often. The fact of the matter is that people should always be skeptical and they should always do research before believing that something is true or not, because no easy answer like this can ever be 100% right and if it can’t be 100% right, we shouldn’t trick ourselves into trusting it any more than we should. This is a powerful tool, but we should not rely on it or anything like it.

Questions

  1. Should we even try to make systems like this if they will be wrong some of the time?
  2. How can we make sure that people don’t over rely on systems like this? Can we still use them without only using them?
  3. What’s the best way to check facts? How do you check your facts?

Read More

04/15/20 – Akshita Jha – Believe It or Not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary:
This paper discusses the issue of fact-checking, i.e., the estimation of the credibility of a given statement, which is extremely pertinent in today’s climate. With the generation and exploration of information becoming increasingly simpler, the task of judging the trustworthiness of the found information is becoming increasingly challenging. Researchers have attempted to address this issue with a plethora of AI based fact-checking tools. However, a majority of these are based on neural network models that provide no description, whatsoever, of how they have arrived at a particular verdict. It has not yet been addressed how this lack of transparency, in addition to a poor user interface, affects the usability or even, the perceived authenticity of such tools. This tends to breed mistrust in the minds of the user towards the tool, which ends up being counter-productive to the original goal. This paper tackles this important issue head-on by designing a transparent system that mixes the scope and scalability of an AI system with inputs from the user. While this transparency increases the user’s trust in the system, the user’s input also improves the predictive accuracy of the system. On a very important note, there is an additional side-benefit that this user interaction addresses the deeper issue of information illiteracy in our society, by cultivating the skill of questioning the veracity of a claim and the reliability of a source. The researchers build a model that uses NLP tools to aggregate and assess various articles related to a certain statement and assign a stance to each of the articles. The user can modify the weights associated with each of these stances and influence the verdict generated by the model on the veracity of the claim. They, further, conduct three studies, wherein they judge the usability, the effectiveness, and flaws of this model. They compare the assessments of multiple statements by the participants, before and after being exposed to the model’s prediction. They also verify if the interaction with the model provides additional support as compared to simply displaying the result. A third task estimates if the gamification of this task has any effects. While the third task seems inconclusive, the first two tasks lead the researchers to conclude that interaction with the system causes an increase in the user’s trust in the model’s results, albeit even in the case of a false prediction of the model. However, this interaction also helps improve the prediction of the model, for the claims tested herein.

Reflections:
The paper brings up an interesting point about the transparency of the model. When people talk about an AI system normally a binary system comes to the mind, which takes in a statement, assesses it in a black box, and returns a binary answer. What is interesting is that the ability to interact with the predictions of the models, enables the user to improve their own judgment and even compensate for the lacking of the model. The aspect of user interaction aspect in AI systems has been grossly overlooked. While there are some clear undeniable benefits of this model, there is a very dangerous issue that this human modifiable fact-checking could lead to. Using the slider to modify the reputation of a given source, can potentially lead to user’s inducing their own biases into the system, and effectively creating echo chambers of their own views. This could nefariously impact the verdict of the ML system and thus reinforcing the possible prejudices of the user. I would suggest that the model should assign x% weightage to its own assessment of the source and (100-x)% to the user’s assessment. This would be a step in ensuring that the user’s prejudices do not suppress the model’s judgment completely. However, without doubt, the advantage of this interaction, inadvertently, helping a user learn how to tackle misinformation and check the reputation of sources, is highly laudable. This is definitely worth considering in future models along these lines. From the human perspective, the bayesian or linear approaches adopted by these models make them very intuitive to understand. However, one must not underestimate the effectiveness of neural networks in being more powerful in aggregating relevant information and assessing its quality. A simple linear approach is bound to have its fallacies, and hence, it would be interesting to see a model with uses the power of neural networks in addition to these techniques with help with the transparency aspect. On a side note, it would have been useful to have more information on the NLP and ML methods used. The related work regarding these models is also insufficient to provide a clear background about the existing techniques. One glaring issue with the paper is their circumventing of the insignificance of their result in task #2. They mention that the p-value is just below the threshold. However, statistics teaches us that the exact value is not of importance, it’s the threshold that is set before conducting the experiment that matters. Thus, the statement “..slightly larger than the 0.05..” is simply careless.

Questions:
1. Why is there no control group in task 2 and task 3 as well?
2. What are your general thoughts on the paper? Do you approve of the methodology?

Read More

4/15/2020 – Nurendra Choudhary – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

In this paper, the authors study the human-side or human trust in automated fact checking systems. In their experiments, they show that human beings were able to improve their accuracy when subjected to correct model predictions. However, the human judgement is also shown to degrade in case of incorrect model predictions. This establishes the trustful relationship between humans and their fact-checking models. Additionally, the authors find that humans interacting with the AI system improve their predictions significantly, suggesting transparency of models as a key aspect to human-AI interaction.

The authors provide a novel mixed-initiative framework to integrate human intelligence with fact checking models. Also, they analyze the benefits and drawbacks of such integrated systems.  

The authors also point out several limitations in their approach such as non-American representation in MTurk workers and bias towards AI predictions. Furthermore, they point out the system’s effectiveness in mediating debates and also convey real-time fact checks in an argument setting. Also, interaction with the tool could serve as a platform of identifying the reason for opinion differences.

Reflection

The paper is very timely in the sense that fake news has become a widely used tool of political and social gain. People, unfamiliar with the power of the internet, tend to believe unreliable sources and form very powerful opinions based on them. Such a tool can be extremely powerful in eliminating such controversies. The idea of analyzing human role in AI fact checkers is also extremely important. AI fact checkers lack perfect accuracy and given the problem, perfect accuracy is a requirement. Hence, the role of human beings in the system cannot be undermined. However, human mental models tend to trust the system after correct predictions and do not efficiently correct itself for incorrect predictions. This becomes an inherent limitation for these AI systems. Thus, the paper’s idea of introducing transparency is extremely appropriate and necessary. Given more incite into the mechanism of fact checkers, human beings would be able to better optimize their mental models, thus improving the performance of collaborative human-AI team.

AI systems can analyze huge repositories of information and humans can perform more detailed analysis. In that sense, fact-checking human-AI team utilizes the most important capabilities of both human and AI. However, as pointed out in the paper, humans tend to overlook their own capabilities and rely on model prediction. This could be due to human trust after some correct predictions. Given the plethora of existing information, it would be really inconvenient for humans to assess it all. Hence, I believe these initial trials are extremely important to build the correct amount of trust and expectations.

Questions

  1. Can a fact-checker learn from its mistakes pointed out by humans? How would that work? Would that make the fact-checker dynamic? If so, what is the extent of this change and how would the human mental models adapt effectively to such changes?
  2. Can you suggest a better way of model interaction between humans and models? Also, what other tasks can such interaction be effective?
  3. As pointed out in the paper, humans tend to overlook their own capabilities and rely on model prediction? What is a reason for this? Is their a way to make the collaboration more effective?
  4. Here, the assumption is that human beings are the veto authority. Can there be a case when this is not true? Is it always right to trust the judgement of humans (in this case underpaid crowd workers)?

Word Count: 588

Read More

04/15/2020 – Nan LI – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

This paper introduced a mix-initiative model which allow human and machine to check the authenticity of a claim cooperatively. This fact-checking model also considers the existing interface design principle, which supports understandability and actionability of interface. The main objective of their design is to provide transparency, supports for integrating user knowledge, and explanation of system uncertainty. To prioritize transparency over raw predictive performance, the author used a more transparent prediction model, linear models instead of deep neural networks. Further, users allow to change the reputations and stances of the system prediction. To evaluate how the system could help users to assess the factuality of claims, the author conducted three user studies with MTurk workers. The study results indicate that users might over-trust the system. The system prediction can help the user when the claim is predicted as correct. At the same time, it also degrades human performance when the system prediction errors due to the biases implicit in training data.

Reflection

I think the design of this approach is valuable. Because it does not blindly pursue the accuracy of prediction results, but also consider the transparency, understandability, and actionability of the interface. These attempts would improve the user experience since the user have more knowledge of how the system works and thus provide more trust. On the other hand, this might be the cause that the user may over-trust the system, as indicated in the paper’s experiment results. But still, I think the design of the approach is a nice try.

However, I don’t see the possibility that this system can help users. Although the design is very user friendly, it does not leverage human ability; instead, it just allows humans to participant in such a fact-checking process. Even though the design of the fact-checking process is reasonable and understandable for users, but the expectations from users side require too much mental work such as read a lot of information, thinking, and reasoning. This is a reasonable process, but it is too burdensome.

Moreover, based on the observation of the figures in the paper, I don’t think the system could facilitate the user in determining the authenticity of the claim, and I believe the experiment results also found this. Further, I found that the accuracy of the user’s judgment depends more on the type of claim. Different claims have a significant difference accuracy; this impact is even higher than the effect of the system.

It is also interesting to see the user’s feedback after they complete the task. It seems one of the users has the same opinion as me regarding the amount of information needed to read. The most impressive feedback is that the user would be confused if they have more options. I think these conditions only happen when they not sure about the correctness, and they have the right to change the system output. Finally, we can also see from the comment that when the system has the same judgment as users, users will be more sure of their answer. Still, if the system predicts results indicate the different judgments with users, this will seriously affect the accuracy of the user’s judgment. This is understandable because when someone questions your decision, no matter how confident you are, you will waver a little, let alone a machine that has 70% accuracy.

Questions:

  1. Do you think the system could really help humans to detect the factuality of claims? Why or why not?
  2. When the author design the model, to achieve the goal of transparency, they give up the higher accuracy prediction model instead of using linear models. What do you think of this? Which one is more important for your design? Transparency? Accuracy?
  3. What do you think of the design interface? Does it provide too much information to users? Do you like the design or not.

Word Count: 638

Read More