04/15/20 – Myles Frantz – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

Within this very politicly charged time, it is hard for the average person to decipher any accurate information from the news media. Making this even more difficult are all the media sources creating contradictory information. Despite the variety of companies running fact-checking sources, this team created a fact-checking system that is based on mixing both crowd source and machine learning. Using a machine learning algorithm with a user interface that allows mechanical turk workers to tweak the reputation and whether citations support a claim. These tools allow a user to tweak sources retrieved to read the raw information. The team also created a gamified interface allowing better and more integrated usage of their original system. Overall, the participants appreciated the ability to tweak the sources and to determine the raw sources supporting or not supporting the claim. 

Reflection

I think there is an inherent issue with the gaming experiment created by the researchers. Not part of the environment but based on the nature of humans. Using a gamified method, I believe humans will inherently try gaming the system. Using a smaller scale of this implemented within their research experiment while restricting it in other use cases. 

I believe a crowd worker fact checker service will not work. Given a fact checker service that is crowd sourced is an easy target for any group of malicious actors. Using a common of variety of techniques, actors have used Distributed Denial Of Service (DDOS) attacks to overwhelm and control the majority of responses. These kind of attacks have also been used for controlling block chain transactions and the flow of money. Utilizing a fully fledged crowd sourced fact-checker, this can easily be prone to being overridden through the various actors. 

In general I believe allowing users more visibility into the system encourages more usage. Using some program or some Internet of Things (IoT) device people are likely feeling as though they do not have much control over the flow of the internal programming. Creating this insight and slight control of the algorithm may help give the impression of more control to the consumers of these devices. This amount of control may help encourage people to put their trust back into programs. This is likely due to the the nature of machine learning algorithms and they’re iterative learning process. 

Questions

  • Measuring mechanical turk attention is done usually by creating a baseline question. This ensures if the work is not paying attention (I.e. clicking as fast as they can) they will not answer the baseline question accurately. Given the team did not discard these workers do you think the removal of their answers would support the theory of the team? 
  • Along the same lines of questioning, despite the team regarding the user’s other interactions as measuring their attentiveness, do you think it is wise they ignored the attention check? 
  • Within your project, are you planning on implementing a slide like this team did to help interact with your machine learning algorithm? 

Read More

04/15/2020 – Mohannad Al Ameedi – Believe it or not Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

In this paper, the authors propose a mixed initiative approach for fact checking that combine both human knowledge and experience with the automated information retrieval and machine learning. The paper discusses the challenges of the massive amount of information available today on the internet that some of them might not be accurate which introduce a risk to the information consumers. The proposed system retrieve relevant information about a certain topic and use machine learning and natural language processing to assess the factuality of the information and provide a confidence level to the users and let the user decide wither to use the information or do a manual research to validate the claims. The partnership between the artificial intelligence system and human interaction can offer effective fact checking that can support the human decision in a salable and effective way.

Reflection

I found the approach used by the authors to be very interesting. I personally had a discussion with a friend recently about a certain topic that was mentioned in Wikipedia, and I thought the numbers and facts mentioned were accurate but it turns out the information were wrong and he asked me to check an accreditable source. If I had the opportunity to use the proposed system on the paper, then accredited source could have ranked higher than Wikipedia.

The proposed system is very important in our digital age where so much information is generated on a daily bias and we are not only searching for information, but we are also receiving so much information through social media related to current events and some of these events have high impact on our life and we need to assess the factuality of these information and the proposed system can help a lot on that.

The proposed system is like a search engine that not only rank document based on relevance to the search query but also based on the fact-checking assessment of the information. The human interaction is like the relevance feedback in search engine which can improve the retrieval of the information which leads to a better ranking.

Questions

  • The AI systems can be biased because the training data can be biased. How can we make the system unbiased?
  • The proposed system use information retrieval to retrieve relevant articles about a certain topic and then use the machine learning to validate the source of the information and then present the confidence level of each article. Do you think the system should filter out articles with poor accuracy as they might confuse the user? Or they might be very valuable?
  • With the increase usage of social networking, many individuals write or share fake news intentionally or unintentionally. Millions of people post information every day. Can we use the proposed system to assess the fake news? if yes, then can we scale the system to assess millions or billions of tweets or posts?

Read More

04/15/2020 – Subil Abraham – Nguyen et al., “Believe it or not”

In today’s era of fake news where new information is constantly spawning everywhere, the great importance of fact checking cannot be understated. The public has a right to remain informed and be able to obtain true information from accurate, reputable sources. But all too often, people are inundated with too much information and the cognitive load of fact checking everything would be too much. Automated fact checking has made strides but previous work has focused primarily on model accuracy and not on the people who need to use them. This paper is the first to study an interface for humans to use a fact checking tool. The tool is pretrained on the Emergent dataset of annotated articles and sources and uses two models, one that predicts article stance on a claim and the other that calculates the accuracy of the claim based on the reputation of the sources. The application works by taking a claim and retrieving articles that talk about the claim. It uses the article stance model to classify if the articles are for or against the given claim, and then predicts the claim’s accuracy based on the collective reputation of its sources. It conveys that its models are not accurate and provides confidence levels for its accuracy claims. It also provides sliders for the human verifiers to adjust the predicted stance of the articles and also to adjust the source reputation according to their beliefs or new information. The authors run three experiments to test the efficacy of the tool for human fact checkers. They find that the users tend to trust the system, which can be problematic when the system is inaccurate.

I find it interesting that for the first experiment, the System group’s error rate somewhat follows the stance classifiers error rate. The crowd workers are probably not going through the process of independently verifying the stance of the articles and simply trust the predicted stance they are shown. Potentially this could be mitigated by adding incentives (like extra reward) to have them actually read the articles in full. But on the flip side, we can see that their accuracy (supposedly) becomes better when they are given the sliders to modify the stances and reputation. Maybe that interactivity was the clue they needed to understand that the predicted values aren’t set in stone and could potentially be inaccurate. Though I find it strange that the Slider group in the second experiment did not adjust the sliders if they were questioning the sources. What I find even stranger though is the fact that the authors decided to keep the claim that allowing the users to use the sliders made them more accurate. This claim is what most readers would take away unless they were carefully reading the experiments and the riders. And I don’t like that they kept the second experiment results despite them not showing any useful signal. Ultimately, I don’t buy into their push that this tool is something that is useful for the general user as it stands now. And I also don’t really see how this tool could serve as a technological mediator for people with opposing views, at least not the way they described it. I find that this could serve as a useful automation tool for expert fact checkers as part of their work but not for the ordinary user, which is what they model by using crowdworkers. I like the ideas that the paper is going for, of having automated fact checking that helps for the ordinary user and I’m glad they acknowledge the drawbacks. But I think there are too many drawbacks that prevent me from fully buying into the claims of this paper. It’s poetic that I have my doubts about the claims of a paper describing a system that asks you to question claims.

  1. Do you think this tool would actually be useful in the hands of an ordinary user? Or would it serve better in the hands of an expert fact checker?
  2. What would you like to see added to the interface, in addition to what they already have?
  3. This is a larger question, but is there value in having the transparency of the machine learning models in the way they have done (by having sliders that we can manipulate to see the final value change)? How much detail is too much? What about for more complex models where you can’t have that instantaneous feedback (like style transfer) how do you provide explainability there?
  4. Do you find the experiments rigorous enough and conclusions significant enough to back up the claims they are making?

Read More

04/15/2020 – Subil Abraham – Diakopoulos, “Algorithmic accountability”

Algorithms have pervaded our every day lives, because computers have become essential in our every day lives. Their pervasion also means that they need to be closely scrutinized to ensure that they are functioning like they should, without bias, obeying the guarantees the creators have promised. Algorithmic Accountability is a category of journalism where the journalists investigate these algorithms to validate their claims and find if there are any violations. The goal is to find mistakes and omissions or bias creeping into the algorithms because though computers do exactly what they’re told, they are still created by humans with blinspots. They classify the four kinds of decisions that algorithm decision making falls under. They claim that transparency alone is not enough because full transparency can often be prevented by trade secret excuses. They utilize the idea of reverse engineering where they put in inputs and observe the outputs, without looking at the inner workings because journalists are often dealing with black box algorithms. They look at five case studies of journalists who’ve done such investigations with reverse engineering, as well as putting a theory and a methodology on how to find news-worthy stories in this space.

This paper is a very interesting look from the non CS/HCI perspective of studying how algorithms function in our lives. This paper, coming from the perspective of journalism and looking at the painstaking way journalists investigate these algorithms. Though not the focus, this work also brings to light the incredible roadblocks that come with investigating proprietary software, especially those from large secretive companies who would leverage laws and expensive lawyers to fight such investigations if it is not in their favor. In an ideal world, everyone would have integrity and would disclose all the flaws in their algorithms but that’s unfortunately not the case which is why the work these journalists are doing is important, especially when they don’t have easy access to the algorithms they’re investigating, and sometimes don’t have access to the right inputs. There is a danger here that a journalist could end up being discredited because they did the best investigation they could with the limited resources they have but the PR team of the company they’re investigating latches on to a poor assumption or two to discredit the otherwise good work. The difficulty in performing these investigations, especially for journalists who may not have prior training or experience in dealing with computers, exemplifies the need for at least some computer science education for everyone so that they can better understand the systems they’re dealing with and have a better handle on running investigations as algorithms pervade even in our lives.

  1. Do you think some of the laws in place that allow companies to obfuscate their algorithms should be relaxed to allow easier investigation?
  2. Do you think current journalistic protections are enough for journalists investigating these algorithms?
  3. What kind of tools or training can be given to journalists to make it easier for them to navigate this world of investigating algorithms?

Read More

04/15/20 – Jooyoung Whang – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

In this paper, the authors state that the current fully automatic fact-checking systems are not good enough for three reasons: model transparency, taking world facts into consideration, and model uncertainty communication. So, the authors went on and built a system including humans in the loop. Their proposed system uses two classifiers that each predict the reliability of a supporting document of a claim and the veracity of the document. Using these weighted classifications, the confidence of the system’s prediction about a claim is shown to the user. The users can further manipulate the system by modifying the weights of the system. The authors conducted a user study of their system with Mturk workers. The authors found that their approach was effective, but also noted that too much information or misleading predictions can lead to big user errors.

First off, it was hilarious that the authors cited Wikipedia to introduce Information Literacy in a paper about evaluating information. I personally took it as a subtle joke left by the authors. However, it also led me to a question about the system. If I did not miss it, the authors did not explain where the relevant sources or articles came from that supported a claim. I was a little concerned if some of the articles used in the study were not reliable sources.

Also, the users conducted the user study using their own defined set of claims. While I understand this was needed for efficient study, I wanted to know how the system would work in the wild. If a user searched a claim that he or she knows is true, would the system agree with high confidence? If not, would the user have been able to correct the system using their interface? It seemed that some portion of the users were confused, especially with the error correction part of the system. I think these would have been valuable to know and would seriously need to be addressed if the system were to become a commercial product.

These are the questions that I had while reading the paper:

1. How much user intervention do you think is enough for these kinds of systems? I personally think if the users are given too much power over the system, users will apply their bias to the correction and get false positives.

2. What would be a good way for the system to only retrieve ‘reliable’ sources to reference? Stating that a claim is true based on a Wikipedia article would obviously not be so assuring. Also, academic papers cannot address all claims, especially if they are social claims. What would be a good threshold? How could this be detected?

3. Given the current system, would you believe the results that the system gives? Do you think the system addresses the three requirements that the authors introduced which all fact-checking systems should possess? I personally think that system transparency is still lacking. The system shows a lot about what kind of sources it used and how much weight it’s putting into them, but it does not really explain how it made the decision.

Read More

04/15/20 – Ziyao Wang – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

The authors focused on fact-checking, which is the task of assessing the veracity of claims. They proposed a mixed-initiative approach to fact-checking. In this system, they combined human knowledge and experience with AI’s efficiency and scalability in information retrieval. They argue that if we want to use fact-checking models practically, the models should be transparent, supporting to integrating user knowledge and have quantification and communication of model uncertainty. Following these principles, they developed their mixed-initiative system and did experiments with participants from MTurk. They found that the system can help humans when they are giving correct predictions and could be harmful when they are giving wrong predictions. And the interaction between participants and the system is not as effective as expected. Finally, they found that making tasks to be games does not help in users’ performance. In conclusion, they found that users are intended to trust models, and may be affected by the models to make the wrong choice. For this reason, transparent models are important in mixed-initiative systems.

Reflection:

I have tried to use the system mentioned in the paper. It is quite interesting. However, for the first time, I used it, I am confused about what should I do to use it. Though the interface is similar to Google.com and I am quite sure I should type something into the text box, there are limited instructions about what should I type, how the system will work and what should I do after searching my typed claim. Also, after I searched for a claim, the result page is still confusing. I know the developers want to show me some findings of the claim and provide me with the prediction result of the system. However, I am still confused about what should I do, and some given searching results are not related to my typed claim.

After several times of use, I got familiar with the system and it does help in my judgement of whether a claim is correct or not. I agree with the authors that some feedbacks about not being able to interact with the system properly comes from the users’ unfamiliar of the system. But apart from this, the authors should provide more instructions to the users so that they can get familiar with the system quickly. I think this is related to the transparency of the system and may raise users’ trust.

Another issue I found during use is that there are no words like the results can only be used as a reference, you should make a judgement using your own mind, or similar explanations. I think this may be a reason that the error rate of users’ results increased significantly when the system made wrong predictions. Participants may change their own minds when they saw that the prediction result is different from their own results because they think know little about the system and may think that system would be more likely to get the correct answer. If the system is more transparent to the users, the users may be able to provide more correct answers to the claims.

Questions:

How to let the participants make correct judgements when the system provides wrong predictions?

What kinds of instructions should be added so that participants can get familiar with the system more quickly?

Can this system be used in areas other than fact-checking?

Read More

04/15/2020 – Dylan Finch – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Word count: 567

Summary of the Reading

This paper presents the design and evaluation of a system that is designed to help people check the validity of claims. The system starts with a user entering a claim into the system. The system then shows the user a list of articles related to the claim with a prediction that is based on the articles, about whether or not the claim is true. The system will give a percentage chance for whether the article is true. Each article that the system shows also shows a reputation score for the source of the article and a support score for the article. The user can then adjust these if they don’t think that the system has accurate information. 

The system seemed to help users come to the right conclusions when it had the right data but also seemed to make human judgements worse when the system had inaccurate information. This shows the usefulness of such a system and also gives a reason to be careful about implementing it.

Reflections and Connections

I think that this article tackles a very real and very important problem. Misinformation is more prevalent now than ever, and it is getting harder and harder to find the truth. This can have real effects on people’s lives. If people get the wrong information about a medication or a harmful activity, they may experience a preventable injury or even death. Misinformation can also have a huge impact on politics and may get people to vote in a way they might not otherwise.

The paper brings up the fact that people may over rely on a system like this, just blindly believing the results of the system without putting more thought into it and I think that is the paradox of this system. People want correct information and we want it to be easy to find out if something is correct or not, but the fact of the matter is that it’s just not easy to find out if something is true or not. A system like this would be great when it worked and told people the truth, but it would make the problem worse when it came to the wrong conclusion and then made more people more confident in their wrong answer. No matter how good a system is, it will still fail. Even the best journalists in the world, writing for the most prestigious newspapers in the world, will get things wrong. And, a system like this one will get things wrong even more often. The fact of the matter is that people should always be skeptical and they should always do research before believing that something is true or not, because no easy answer like this can ever be 100% right and if it can’t be 100% right, we shouldn’t trick ourselves into trusting it any more than we should. This is a powerful tool, but we should not rely on it or anything like it.

Questions

  1. Should we even try to make systems like this if they will be wrong some of the time?
  2. How can we make sure that people don’t over rely on systems like this? Can we still use them without only using them?
  3. What’s the best way to check facts? How do you check your facts?

Read More

04/15/2020 – Dylan Finch – What’s at Stake: Characterizing Risk Perceptions of Emerging Technologies

Word count: 553

Summary of the Reading

This paper presents a review of expert and non-expert feelings toward risks with emerging technologies. The paper used a risk survey that was previously used to assess perceptions of risk. This survey was sent out to experts, in the form of people with careers related to technology, and non-experts, in the form of workers on MTurk. While MTurk workers might be slightly more tech-savvy than average, they also tend to be less educated. 

The results showed that experts tended to think more things were more risky. The non-experts tended to downplay the risks of many activities much more than the experts. The results also showed that more voluntary risks were seen as less risky than other forms of risk. It seems like people perceive more risk when they have less control. It also showed that both experts and non-experts saw many emerging technologies as non voluntary, even though these technologies usually get consent from users for everything.

Reflections and Connections

I think that this paper is more important than ever, and it will only continue to get more important as time goes on. In our modern world, more and more of the things we interact with everyday are data driven technologies that weld extreme power, both to help us do things better and for bad actors to hurt innocent people. 

I also think that the paper’s conclusions match up with what I expected. Many new technologies are abstract and the inner workings of them are never seen. They are also much harder to understand for laypersons than the technology of decades past. In the past, you could see that your money was secure in a vault, you could see that you had a big lock on you bike and that it would be hard to steal it, you would know that the physical laws of nature make it hard for other people to steal your stuff, because you had a general idea of how hard it was to break your security measures and because you could see and feel the things you had to protect yourself. Now, things are much different. You have no way of knowing what is protecting your money at the bank. You have no way of knowing, much less understanding the security algorithms that companies use to keep your data safe. Maybe they’re good, maybe they’re not, but you probably won’t know until someone hacks in. The digital world also disregards many of the limits that we experienced in the past and in real life. In real life, it is impossible for someone in India to rob me, without going through a lot of hassle. But, an online hacker can break into bank accounts all across the world and be gone without a trace. This new world of risk is just so hard to understand because we aren’t used to it and because it looks so different to the risks we experience in real life.

Questions

  1. How can we better educate people on the risks of the online world?
  2. How can we better connect abstract online security vulnerabilities to real world, easy to understand vulnerabilities?
  3. Should companies need to be more transparent about security risks to their customers?

Read More

04/15/2020 – Sushmethaa Muhundan – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

This work aims to design and evaluate a mixed-initiative approach to fact-checking by blending human knowledge and experience with the efficiency and scalability of automated information retrieval and ML. The paper positions automatic fact-checking systems as an assistive technology to augment human decision making. The proposed system fulfills three key properties namely model transparency, support for integrating user knowledge, and quantification and communication of model uncertainty. Three experiments were conducted using MTurk workers to measure the participants’ performance in predicting the veracity of given claims using the system developed. The first experiment compared users who perform the task with and without seeing ML predictions. The second compared a static interface with an interactive interface where the users were provided options to mend or override the predictions of the AI system. Results showed that the users were generally able to use the interface but this was of little use when the predictions were accurate. The last experiment compared a gamified task design with a non-gamified one, but no significant differences in performance were found. The paper also discusses the limitations of the proposed system and explores further research opportunities.

I liked the fact that the focus of the paper was more on designing automated systems that were user-friendly rather than focussing on improving prediction accuracy. The paper takes into consideration the human element of the human-AI interaction and focuses on making the system better and more meaningful. The proposed system aims to learn from the user and provide a personalized prediction based on the opinions and inputs from the user.

I liked the focus on transparency and communication. Transparent models help users better understanding the internal workings of the system and hence helps build trust. Regarding communication, I feel that conveying the confidence of the prediction helps the users make an informed decision. This is much better than a system that might have high precision but does not communicate the confidence scores. In cases where this system makes an error, the consequences are likely to be precarious since the user might blindly follow the prediction of the system.

The side-effect of making the system transparent was interesting. Not only would transparency lead to higher trust levels, but it would also help teach and structure the user’s own information literacy skills regarding the logical process to follow for assessing claim validity. Thereby, the system proposed truly leveraged the complementary strengths of the human and the AI.

  • Apart from the three properties incorporated in the study (transparency, support for integrating user knowledge, and communication of model uncertainty), what are some other properties that can be incorporated to better the AI systems?
  • The study aims to leverage the complementary strengths of humans and AI but certain results were inconclusive as noted in the paper. Besides the limitations enumerated in the paper, what are other potential drawbacks of the proposed system?
  • Given that the study presented is in the context of automated fact-checking systems, what other AI systems can these principles be applied to?

Read More

04/15/2020 – Sushmethaa Muhundan – What’s at Stake: Characterizing Risk Perceptions of Emerging Technologies

This work aims to explore the impact of perceived risk on choosing to use technology. A survey was conducted to assess the mental models of users and technologists regarding the risks of using emerging, data-driven technologies. Guidelines to develop a risk-sensitive design was then explored in order to address the perceived risk and mitigate it. This model aimed to identify when misaligned risk perceptions may warrant reconsideration. Fifteen risks relating to technology were devised and a total of 175 participants were recruited to process the perceived risks relating to each of the above categories. The participants comprised of 26 experts and 149 non-experts. Results showed that the technologists were more skeptical regarding using data-driven technologies as opposed to non-experts. Therefore, the authors urge designers to strive harder to make the end-users aware of the potential risks involved in the systems. The study recommends that design decisions regarding risk mitigation features for a particular technology should be sensitive to the difference between the public’s perceived risk and the acceptable marginal perceived risk at that risk level.

Throughout the paper, there is a focus on creating design guidelines that reduce risk exposure and increase public awareness relating to potential risks and I feel like this is the need of the hour. The paper focuses on identifying remedies to set appropriate expectations in order to help the public make informed decisions. This effort is good since it is striving to bridge the gap and keep the users informed about the reality of the situation.

It is concerning to note that the results found technologists to be more skeptical regarding using data-driven technologies as opposed to non-experts. This is perturbing since this shows that the risks relating to the latest technologies are perceived as stronger by the group who is involved in the creation of those risks than users of the technology.

Although the participant’s count of experts and non-experts was skewed, it was interesting that when the results were aggregated, the top three highest perceived risks were the same. The only difference was the order of ranking.

It was interesting to note that majority of both groups rated nearly all risks related to emerging technologies as characteristically involuntary. This strongly suggests that the consent procedures in place are not effective. Either the information is not being conveyed to the users transparently or the information is represented in a complex manner and hence the content is not understood the end-users.

  • In the context of the current technologies we use on a daily basis, which factor is more important from your point of view: personal benefits (personalized content) or privacy?
  • The study involved a total of 175 participants comprised of 26 experts and 149 non-experts. Given that there is a huge difference in these numbers and the divide is not even close to being equal, was it feasible to analyze and draw conclusions from the study conducted?
  • Apart from the suggestions in the study, what are some concrete measures that could be adopted to bridge the gap and keep the users informed about the potential risks associated with technology?

Read More