04/15/20 – Ziyao Wang – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

The authors focused on fact-checking, which is the task of assessing the veracity of claims. They proposed a mixed-initiative approach to fact-checking. In this system, they combined human knowledge and experience with AI’s efficiency and scalability in information retrieval. They argue that if we want to use fact-checking models practically, the models should be transparent, supporting to integrating user knowledge and have quantification and communication of model uncertainty. Following these principles, they developed their mixed-initiative system and did experiments with participants from MTurk. They found that the system can help humans when they are giving correct predictions and could be harmful when they are giving wrong predictions. And the interaction between participants and the system is not as effective as expected. Finally, they found that making tasks to be games does not help in users’ performance. In conclusion, they found that users are intended to trust models, and may be affected by the models to make the wrong choice. For this reason, transparent models are important in mixed-initiative systems.

Reflection:

I have tried to use the system mentioned in the paper. It is quite interesting. However, for the first time, I used it, I am confused about what should I do to use it. Though the interface is similar to Google.com and I am quite sure I should type something into the text box, there are limited instructions about what should I type, how the system will work and what should I do after searching my typed claim. Also, after I searched for a claim, the result page is still confusing. I know the developers want to show me some findings of the claim and provide me with the prediction result of the system. However, I am still confused about what should I do, and some given searching results are not related to my typed claim.

After several times of use, I got familiar with the system and it does help in my judgement of whether a claim is correct or not. I agree with the authors that some feedbacks about not being able to interact with the system properly comes from the users’ unfamiliar of the system. But apart from this, the authors should provide more instructions to the users so that they can get familiar with the system quickly. I think this is related to the transparency of the system and may raise users’ trust.

Another issue I found during use is that there are no words like the results can only be used as a reference, you should make a judgement using your own mind, or similar explanations. I think this may be a reason that the error rate of users’ results increased significantly when the system made wrong predictions. Participants may change their own minds when they saw that the prediction result is different from their own results because they think know little about the system and may think that system would be more likely to get the correct answer. If the system is more transparent to the users, the users may be able to provide more correct answers to the claims.

Questions:

How to let the participants make correct judgements when the system provides wrong predictions?

What kinds of instructions should be added so that participants can get familiar with the system more quickly?

Can this system be used in areas other than fact-checking?

Read More

04/15/2020 – Ziyao Wang – Algorithmic accountability

In this report, the author studied about how algorithms execute and are worthy of scrutiny by computational journalists. He used methods such as transparency and reverse engineering to analyze the algorithms. Also, he analyzed five kinds of atomic decisions, including prioritization, classification, association, filtering, and algorithmic accountability, to assess algorithmic power. For the reverse engineering part, he analyzed numerous daily cases and presented a new scenario of reverse engineering which considers both inputs and outputs. He considered the variable observability of I/O relationships and identifying, sampling, and finding newsworthy stories about algorithms. Finally, the author discussed challenges that may be faced by the application of algorithmic accountability reporting in the future. Also, he proposed that transparency can be used to effectively force applications to take journalistic norms when newsroom algorithms are applied.

Reflections:

I am really interested in the reverse engineering part of this report. The author concluded different cases of researchers doing reverse engineering towards algorithms. It is quite exciting to understand the opportunities and limitations of the reverse engineering approach to investigating algorithms. And, reverse engineering is valuable in explaining how algorithms work and finding limitations of the algorithms. As many current applied algorithms or models are trained using unsupervised learning or deep learning, it is hard for us to understand and explain them. We can only use metrics like recall or precision to evaluate them. But with reverse engineering, we can know about how the algorithms work and modify them to avoid limitations and potential discriminations. However, I think there may be some ethical issues in reverse engineering. When some bad guys did reverse engineering to some applications, they can steal the ideas in the developed applications or algorithms. Or, they may bypass the security system of the application making use of the drawbacks they found using reverse engineering.

For the algorithmic transparency, I felt that I paid little attention to this principle before. I used to only consider whether the algorithm works or not. However, after reading this report, I felt that algorithmic transparency is an important aspect of system building and maintenance. Instead of letting researchers employing reverse engineering to find the limitations of systems, it is better to make some part of the algorithms, the use of the algorithms and some other data to the public. On one hand, this will raise the public trust of the system due to its transparency. On the other hand, experts from outside the company or the organization can make a contribution to the improvement and secure the system. However, currently, transparency is far from a complete solution to balancing algorithmic power. Apart from the author’s idea that researchers can apply reverse engineering to analyze the systems, I think both corporations and governments can pay more attention to the transparency of the algorithms.

Questions:

I am still confused about how to find the story behind the input-output relationship after reading the report. How can we find out how the algorithm operates with an input-output map?

How can we avoid crackers making use of reverse engineering to do attacks?

Apart from journalists, which groups of people should also employ reverse engineering to analyze systems?

Read More

04/15/20 – Jooyoung Whang – What’s at Stake: Characterizing Risk Perceptions of Emerging Technologies

In this paper, the authors conduct a survey with a listing of known technological risks, asking the participants to rate the severity of each risk. The authors state that their research is an extension of prior work done in the 1980s. The paper’s survey was taken between experts and non-experts, where experts were collected from Twitter and non-experts from Mturk. From the old work and their own, the authors found that people tend to rate voluntary risks low even if in reality they are high. They also found that many emerging technological risks were regarded as involuntary. It was also shown that non-experts tended to underestimate the risks of new technologies. The authors also introduce a risk-sensitive design based on their findings. The authors show a risk-perception graph that can be used to decide whether a proposed technology is perceived by non-experts as risky as experts think or are underestimated and whether the design is acceptable.

This paper nicely captures the user characteristics of technical risk perception. I liked that the paper did not end explaining the results but also went further to propose a tool for technical designers. However, it was a little unclear to me how to use the tool. The risk-perception graph that the authors show only has “low” and “high” on the axis’s labels, which are very subjective terms. A way to quantify risk perception would have served nicely.

This paper also made me think what’s the point of providing terms of use for a product if the users get the feeling that they have involuntarily exposed to risk. I feel like a better representation would be needed. For example, a short summary outlining the most important risks in a short sentence and providing details in a separate link would be more effective than throwing a wall of text at a (most likely) non-technical user.

I also think a way to address the gap of risk perception between designers and users is to involve users in the development process in the first place. I am unsure of the exact term, but I recall learning about the term users-in-the-loop development cycle from a UX class. This development method allows designers to fix user problems early in the process and end up with higher quality products. I feel it would also inform the designers more about potential risks.

These are the questions that I had while reading the paper:

1. What are some disasters that may happen due to the gap in risk perception between users and designers of a system? Would any additional risks occur due to this gap?

2. What would be a good way to reduce the gap in risk perception? Do you think using the risk-perception graph from the paper is useful for addressing this gap? How would you measure the risk?

3. Would you use the authors’ proposed risk-sensitive design approach in your project? What kind of risks do you expect from your project? Are they technical issues and do you think your users will underestimate the risk?

Read More

04/15/2020 – Myles Frantz – Algorithmic accountability

Summary

With the prevalence of technology, the mainstream programs that help the rise of it not only dictate the technological impact but also the direction of news media and people’s opinions. With journalists turning to various outlets and adapting to the efficiency created by technology, the technology used may introduce bias based on their internal sources or efficiencies and therefor introduce bias into their story. This team measured multiple algorithms against four different categories: prioritization, classification, association, and filtering. Using a combination of these different categories, these are then measured within a user survey to measure how different auto complete features bias their opinions. Using these measurements, it has also been determined by the team that popular search engines like Google specifically tailor results based on other information the user has previously searched. For a normal user this makes sense however for some investigative journalist these results may not accurately represent a source of truth. 

Reflection

Noted by the team, there is a strong conflict in the transparency used within an algorithm. These transparency discrepancies may be due to certain government concerns dependent on certain secrets. These creates a strong sense of resiliency and distrust against the use of certain algorithms based. Though these secrets are claimed for national security, there may be misuse of power or overstepping of definition that overuses the term for personal or political gain and are not correctly appropriated. These kinds of acts may be located at any level of government, from the lowest of actors to the highest of rankings.  

One of the key discussion points raised by the team to fix this potential bias in independent research is to teach journalists how better to use computer systems. This may only seem to bridge the journalist’s new medium they are not familiar with. This could also be seen as an attempt to create a handicap for the journalists to better understand a truly fragmented news system. 

Questions

  • Do you think introducing journalists into a computer science program would extend their capabilities or it would only further direct their ideas while potentially removing certain creativity? 
  • Since there is a type of monopolization throughout the software ecosystem, do you believe people are “forced” to use such technologies that tailor the results? 
  • Given how a lot of technology uses user information for potential misuse, do you agree with this information being introduced with a small disclaimer acknowledging the potential preference? 
  • There are a lot of services that offer you better insights to clean your internet trail and clear any biases internet services cache to ensure a faster and more tailored search results. Have you personally used any of these programs or step by step guides to clean your internet footprint? 
  • Many programs capture and record user usage with a small disclaimer at the end detailing their usage on data. It is likely many users do not read these for various reasons. Do you think if normal consumers of technology were to see how corrective and auto biasing the results could be that they would continue using the services? 

Read More

04/15/2020 – Dylan Finch – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Word count: 567

Summary of the Reading

This paper presents the design and evaluation of a system that is designed to help people check the validity of claims. The system starts with a user entering a claim into the system. The system then shows the user a list of articles related to the claim with a prediction that is based on the articles, about whether or not the claim is true. The system will give a percentage chance for whether the article is true. Each article that the system shows also shows a reputation score for the source of the article and a support score for the article. The user can then adjust these if they don’t think that the system has accurate information. 

The system seemed to help users come to the right conclusions when it had the right data but also seemed to make human judgements worse when the system had inaccurate information. This shows the usefulness of such a system and also gives a reason to be careful about implementing it.

Reflections and Connections

I think that this article tackles a very real and very important problem. Misinformation is more prevalent now than ever, and it is getting harder and harder to find the truth. This can have real effects on people’s lives. If people get the wrong information about a medication or a harmful activity, they may experience a preventable injury or even death. Misinformation can also have a huge impact on politics and may get people to vote in a way they might not otherwise.

The paper brings up the fact that people may over rely on a system like this, just blindly believing the results of the system without putting more thought into it and I think that is the paradox of this system. People want correct information and we want it to be easy to find out if something is correct or not, but the fact of the matter is that it’s just not easy to find out if something is true or not. A system like this would be great when it worked and told people the truth, but it would make the problem worse when it came to the wrong conclusion and then made more people more confident in their wrong answer. No matter how good a system is, it will still fail. Even the best journalists in the world, writing for the most prestigious newspapers in the world, will get things wrong. And, a system like this one will get things wrong even more often. The fact of the matter is that people should always be skeptical and they should always do research before believing that something is true or not, because no easy answer like this can ever be 100% right and if it can’t be 100% right, we shouldn’t trick ourselves into trusting it any more than we should. This is a powerful tool, but we should not rely on it or anything like it.

Questions

  1. Should we even try to make systems like this if they will be wrong some of the time?
  2. How can we make sure that people don’t over rely on systems like this? Can we still use them without only using them?
  3. What’s the best way to check facts? How do you check your facts?

Read More

04/15/2020 – Dylan Finch – What’s at Stake: Characterizing Risk Perceptions of Emerging Technologies

Word count: 553

Summary of the Reading

This paper presents a review of expert and non-expert feelings toward risks with emerging technologies. The paper used a risk survey that was previously used to assess perceptions of risk. This survey was sent out to experts, in the form of people with careers related to technology, and non-experts, in the form of workers on MTurk. While MTurk workers might be slightly more tech-savvy than average, they also tend to be less educated. 

The results showed that experts tended to think more things were more risky. The non-experts tended to downplay the risks of many activities much more than the experts. The results also showed that more voluntary risks were seen as less risky than other forms of risk. It seems like people perceive more risk when they have less control. It also showed that both experts and non-experts saw many emerging technologies as non voluntary, even though these technologies usually get consent from users for everything.

Reflections and Connections

I think that this paper is more important than ever, and it will only continue to get more important as time goes on. In our modern world, more and more of the things we interact with everyday are data driven technologies that weld extreme power, both to help us do things better and for bad actors to hurt innocent people. 

I also think that the paper’s conclusions match up with what I expected. Many new technologies are abstract and the inner workings of them are never seen. They are also much harder to understand for laypersons than the technology of decades past. In the past, you could see that your money was secure in a vault, you could see that you had a big lock on you bike and that it would be hard to steal it, you would know that the physical laws of nature make it hard for other people to steal your stuff, because you had a general idea of how hard it was to break your security measures and because you could see and feel the things you had to protect yourself. Now, things are much different. You have no way of knowing what is protecting your money at the bank. You have no way of knowing, much less understanding the security algorithms that companies use to keep your data safe. Maybe they’re good, maybe they’re not, but you probably won’t know until someone hacks in. The digital world also disregards many of the limits that we experienced in the past and in real life. In real life, it is impossible for someone in India to rob me, without going through a lot of hassle. But, an online hacker can break into bank accounts all across the world and be gone without a trace. This new world of risk is just so hard to understand because we aren’t used to it and because it looks so different to the risks we experience in real life.

Questions

  1. How can we better educate people on the risks of the online world?
  2. How can we better connect abstract online security vulnerabilities to real world, easy to understand vulnerabilities?
  3. Should companies need to be more transparent about security risks to their customers?

Read More

04/15/2020 – Sushmethaa Muhundan – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

This work aims to design and evaluate a mixed-initiative approach to fact-checking by blending human knowledge and experience with the efficiency and scalability of automated information retrieval and ML. The paper positions automatic fact-checking systems as an assistive technology to augment human decision making. The proposed system fulfills three key properties namely model transparency, support for integrating user knowledge, and quantification and communication of model uncertainty. Three experiments were conducted using MTurk workers to measure the participants’ performance in predicting the veracity of given claims using the system developed. The first experiment compared users who perform the task with and without seeing ML predictions. The second compared a static interface with an interactive interface where the users were provided options to mend or override the predictions of the AI system. Results showed that the users were generally able to use the interface but this was of little use when the predictions were accurate. The last experiment compared a gamified task design with a non-gamified one, but no significant differences in performance were found. The paper also discusses the limitations of the proposed system and explores further research opportunities.

I liked the fact that the focus of the paper was more on designing automated systems that were user-friendly rather than focussing on improving prediction accuracy. The paper takes into consideration the human element of the human-AI interaction and focuses on making the system better and more meaningful. The proposed system aims to learn from the user and provide a personalized prediction based on the opinions and inputs from the user.

I liked the focus on transparency and communication. Transparent models help users better understanding the internal workings of the system and hence helps build trust. Regarding communication, I feel that conveying the confidence of the prediction helps the users make an informed decision. This is much better than a system that might have high precision but does not communicate the confidence scores. In cases where this system makes an error, the consequences are likely to be precarious since the user might blindly follow the prediction of the system.

The side-effect of making the system transparent was interesting. Not only would transparency lead to higher trust levels, but it would also help teach and structure the user’s own information literacy skills regarding the logical process to follow for assessing claim validity. Thereby, the system proposed truly leveraged the complementary strengths of the human and the AI.

  • Apart from the three properties incorporated in the study (transparency, support for integrating user knowledge, and communication of model uncertainty), what are some other properties that can be incorporated to better the AI systems?
  • The study aims to leverage the complementary strengths of humans and AI but certain results were inconclusive as noted in the paper. Besides the limitations enumerated in the paper, what are other potential drawbacks of the proposed system?
  • Given that the study presented is in the context of automated fact-checking systems, what other AI systems can these principles be applied to?

Read More

04/15/2020 – Sushmethaa Muhundan – What’s at Stake: Characterizing Risk Perceptions of Emerging Technologies

This work aims to explore the impact of perceived risk on choosing to use technology. A survey was conducted to assess the mental models of users and technologists regarding the risks of using emerging, data-driven technologies. Guidelines to develop a risk-sensitive design was then explored in order to address the perceived risk and mitigate it. This model aimed to identify when misaligned risk perceptions may warrant reconsideration. Fifteen risks relating to technology were devised and a total of 175 participants were recruited to process the perceived risks relating to each of the above categories. The participants comprised of 26 experts and 149 non-experts. Results showed that the technologists were more skeptical regarding using data-driven technologies as opposed to non-experts. Therefore, the authors urge designers to strive harder to make the end-users aware of the potential risks involved in the systems. The study recommends that design decisions regarding risk mitigation features for a particular technology should be sensitive to the difference between the public’s perceived risk and the acceptable marginal perceived risk at that risk level.

Throughout the paper, there is a focus on creating design guidelines that reduce risk exposure and increase public awareness relating to potential risks and I feel like this is the need of the hour. The paper focuses on identifying remedies to set appropriate expectations in order to help the public make informed decisions. This effort is good since it is striving to bridge the gap and keep the users informed about the reality of the situation.

It is concerning to note that the results found technologists to be more skeptical regarding using data-driven technologies as opposed to non-experts. This is perturbing since this shows that the risks relating to the latest technologies are perceived as stronger by the group who is involved in the creation of those risks than users of the technology.

Although the participant’s count of experts and non-experts was skewed, it was interesting that when the results were aggregated, the top three highest perceived risks were the same. The only difference was the order of ranking.

It was interesting to note that majority of both groups rated nearly all risks related to emerging technologies as characteristically involuntary. This strongly suggests that the consent procedures in place are not effective. Either the information is not being conveyed to the users transparently or the information is represented in a complex manner and hence the content is not understood the end-users.

  • In the context of the current technologies we use on a daily basis, which factor is more important from your point of view: personal benefits (personalized content) or privacy?
  • The study involved a total of 175 participants comprised of 26 experts and 149 non-experts. Given that there is a huge difference in these numbers and the divide is not even close to being equal, was it feasible to analyze and draw conclusions from the study conducted?
  • Apart from the suggestions in the study, what are some concrete measures that could be adopted to bridge the gap and keep the users informed about the potential risks associated with technology?

Read More

04/15/20 – Akshita Jha – Believe It or Not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary:
This paper discusses the issue of fact-checking, i.e., the estimation of the credibility of a given statement, which is extremely pertinent in today’s climate. With the generation and exploration of information becoming increasingly simpler, the task of judging the trustworthiness of the found information is becoming increasingly challenging. Researchers have attempted to address this issue with a plethora of AI based fact-checking tools. However, a majority of these are based on neural network models that provide no description, whatsoever, of how they have arrived at a particular verdict. It has not yet been addressed how this lack of transparency, in addition to a poor user interface, affects the usability or even, the perceived authenticity of such tools. This tends to breed mistrust in the minds of the user towards the tool, which ends up being counter-productive to the original goal. This paper tackles this important issue head-on by designing a transparent system that mixes the scope and scalability of an AI system with inputs from the user. While this transparency increases the user’s trust in the system, the user’s input also improves the predictive accuracy of the system. On a very important note, there is an additional side-benefit that this user interaction addresses the deeper issue of information illiteracy in our society, by cultivating the skill of questioning the veracity of a claim and the reliability of a source. The researchers build a model that uses NLP tools to aggregate and assess various articles related to a certain statement and assign a stance to each of the articles. The user can modify the weights associated with each of these stances and influence the verdict generated by the model on the veracity of the claim. They, further, conduct three studies, wherein they judge the usability, the effectiveness, and flaws of this model. They compare the assessments of multiple statements by the participants, before and after being exposed to the model’s prediction. They also verify if the interaction with the model provides additional support as compared to simply displaying the result. A third task estimates if the gamification of this task has any effects. While the third task seems inconclusive, the first two tasks lead the researchers to conclude that interaction with the system causes an increase in the user’s trust in the model’s results, albeit even in the case of a false prediction of the model. However, this interaction also helps improve the prediction of the model, for the claims tested herein.

Reflections:
The paper brings up an interesting point about the transparency of the model. When people talk about an AI system normally a binary system comes to the mind, which takes in a statement, assesses it in a black box, and returns a binary answer. What is interesting is that the ability to interact with the predictions of the models, enables the user to improve their own judgment and even compensate for the lacking of the model. The aspect of user interaction aspect in AI systems has been grossly overlooked. While there are some clear undeniable benefits of this model, there is a very dangerous issue that this human modifiable fact-checking could lead to. Using the slider to modify the reputation of a given source, can potentially lead to user’s inducing their own biases into the system, and effectively creating echo chambers of their own views. This could nefariously impact the verdict of the ML system and thus reinforcing the possible prejudices of the user. I would suggest that the model should assign x% weightage to its own assessment of the source and (100-x)% to the user’s assessment. This would be a step in ensuring that the user’s prejudices do not suppress the model’s judgment completely. However, without doubt, the advantage of this interaction, inadvertently, helping a user learn how to tackle misinformation and check the reputation of sources, is highly laudable. This is definitely worth considering in future models along these lines. From the human perspective, the bayesian or linear approaches adopted by these models make them very intuitive to understand. However, one must not underestimate the effectiveness of neural networks in being more powerful in aggregating relevant information and assessing its quality. A simple linear approach is bound to have its fallacies, and hence, it would be interesting to see a model with uses the power of neural networks in addition to these techniques with help with the transparency aspect. On a side note, it would have been useful to have more information on the NLP and ML methods used. The related work regarding these models is also insufficient to provide a clear background about the existing techniques. One glaring issue with the paper is their circumventing of the insignificance of their result in task #2. They mention that the p-value is just below the threshold. However, statistics teaches us that the exact value is not of importance, it’s the threshold that is set before conducting the experiment that matters. Thus, the statement “..slightly larger than the 0.05..” is simply careless.

Questions:
1. Why is there no control group in task 2 and task 3 as well?
2. What are your general thoughts on the paper? Do you approve of the methodology?

Read More

4/15/2020 – Nurendra Choudhary – What’s at Stake: Characterizing Risk Perceptions ofEmerging Technologies

Summary

In this paper, the authors study the associated risk perception of human mental models to AI systems. For analyzing risk perception, they study 175 individuals, both individually and comparatively, while also factoring in psychological factors. Additionally, they also analyze the factors that lead to people’s conceptions or misconceptions in risk assessment. Their analysis shows that technologists or AI experts consider the studied risks as posing more threat to society than non-experts. Such differences, according to the author, can be utilized to define system design and decision-making.

However, most of the subjects agree that such system risks (identity theft, personal filter bubbles) were not voluntarily introduced in the system but were a consequence or side-effects of integrating some valuable tools or services. The paper also discusses risk-sensitive designs that need to be applied when the difference between public and expert opinion on risk is high. They emphasize on the integration of risk-sensitivity earlier in the design process rather than the current process where it is an after-thought of a deployed system.

Reflection

Given the recent usability of AI technologies in everyday lives (Tesla cars, Google Search, Amazon Marketplace, etc.), this study is very necessary. The risks do not just involve test subjects but a much larger populace that is unable to comprehend these technologies that intrude in their daily lives. These leave them vulnerable to exploitation. Several cases of identity theft or spam treachery have already taken victims due to lack of awareness. Hence, it is very crucial to analyze the amount of information that can reduce such cases. Additionally, a system should provide a comprehensive analysis of its limitations and possible misuse.

Google Assistant records all conversations to detect its initiation phrases “OK Google”. It depends on the fact that the recording is a stream and no data is stored except a segment. However, a possible listener to extract the streams and utilize another program to integrate them into comprehensible knowledge that can be exploited. Users are confident in the system due to the speech segmentation. However, an expert can see-through the given ruse and imagine the listener scenario just based on the knowledge that such systems exist. This knowledge is not entirely expert-oriented and can be transferred to users, thus preventing exploitation.   

Questions

  1. Think about systems that do not rely or have access to user information (e.g. Google Translate, Duckduckgo). What information can they still get from users? Can this be utilized in an unfair manner? Would these be risk-sensitive features? If so, how should the system design change?
  2. Unethical hackers generally work in networks and are able to adapt to security reinforcements. Can security reinforcements utilize risk-sensitive designs to overcome hacker adaptability? What such changes could be thought of in the current system?
  3. Experts tend to show more caution towards technologies. What amount of knowledge introduces such caution? Can this amount be conveyed to all the users of a particular product? Would this knowledge help risk-sensitivity?
  4. Do you think the individuals selected for the task are a representative set? They utilized MTurk for their study. Isn’t there an inherent presumption of being comfortable with computers? How could this bias the study? Is the bias significant?

Word Count: 542

Read More