04/15/2020 – Yuhang Liu – Algorithmic Accountability Journalistic investigation of computational power structure

Summary: in this paper, the author has mentioned that, in modern society, automated algorithms have become more and more important, and algorithms gradually regulate all aspects of our lives, but the outline of their functions may still be difficult to grasp. So, it is necessary to elucidating and articulating the algorithms’ power. The author proposes a new notion “algorithmic accountability reporting”. This concept can reveal how algorithms work, and it is well worth reviewing by computing journalists. The author explores methods such as transparency and reverse engineering, and how they can be useful in elucidating algorithmic capabilities. And the author analyzes the case studies of five journalists on algorithm research, and describes the challenges and opportunities they face when working on algorithm accountability. The final concept proposed by the author has highlights and main contributions: (1) It proposed the theoretical lens of various atomic algorithm decisions. These decisions raised some major issues that can guide algorithm research and algorithm transparency policy development. (2) It can conduct preliminary evaluation and analysis of the algorithm through algorithmic accountability, including various restrictions. and author discuss the challenges faced in adopting this reporting method, including human resources, legitimacy and ethics, and look ahead to how journalists themselves use transparency when using algorithms

Reflection: I think the author has put forward a very innovative idea. This is also the first point that comes to my mind when I meet or use some new algorithms: what is the boundary of this algorithm, and what scope can it be applied to. For example, the insurance algorithm of an insurance company, we all know that the insurance cost is generated based on a series of attributes, but people are often uncertain about the proportion of each attribute in the insurance algorithm, then there will be some doubts about the results, and even think some results are immoral. Therefore, it is very important to study the capabilities and boundaries of an algorithm.

At the same time, the concept of reverse engineering is mentioned in the article, that is, the ability to study algorithms by studying input and output, but there are often such mechanisms in some websites. It makes the algorithm dynamic, so we need other methods to solve this kind of problem. However, once the input-output relationship of the black box is determined, the challenge becomes a data-driven search for news stories. Therefore, I think the algorithm is more inclined to understand whether there is an unreasonable situation in an algorithm, and the root cause of this unreasonable situation is whether it is caused by man or negligence, or it is people’s deep-rooted ideas . So, in some aspects, I think exploring the borders of algorithms is exploring the morality of algorithms. Therefore, I think this article provides a framework for reviewing the morality of the algorithm. This method can effectively explore a place where the algorithm is unreasonable, and for news reporters, it can be used to discover meaningful news.

In addition, I think the framework described in this article is a special way of human-computer interaction, that is, people study the machine itself, and understand the process of algorithm operation through the feedback of the machine. This also broadened my understanding of human-computer interaction.

Problem:

  1. Do you think the framework mentioned in the paper can be used in detecting the ethic issues of an algorithm?
  2. Can this system be used in a automatic system to elucidating and articulating the algorithms’ power?
  3. Is there any other value of detecting algorithms’ power except news value?

Read More

04/15/2020 – Yuhang Liu – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary:

This article discusses a system about fact detection. First of all, the article proposes that fact detection is a very important, challenging and time-sensitive. Usually, in this type of system, the human influence on the system is ignored, but the human influence is very important in this type of system. Therefore, this article establishes a hybrid startup system for fact checking. Enable users to interact with ML predictions to complete challenging fact checks. The author designed the interface through which the user can know the source of the prediction. In some applications, when the users know the prediction results but is not satisfied, the author also allows the user to use his own beliefs or inferences to cover these predictions. Through this system, the authors have come to a conclusion that when the model’s results are correct, these predictions will have a very positive impact on people. However, people should not overly trust the model’s predictions, when users think the predictions is wrong, the prediction result can be improved through interactive methods. And this also reflects the importance of a transparent, interactive system for fact detection from the side.

Reflection:

When I saw the title of this article, I thought that this article maybe have a same topic with my project, using crowdsourced workers to distinguish fake news, but when I read it to a certain extent, I found that this is not the case. But I think it affirmed my thinking in some aspects. First, fact detection is a very challenging project, especially when real-time is needed, so it is very necessary to rely on human power, and due to lack of Marked data, so if you want to directly complete the task through machine learning, in some cases, the prediction results will point in a completely opposite direction. For example, in my project, rumors and refuting rumors are both may be considered as a rumor, so we need crowd workers to distinguish it.

Secondly, for the project itself mentioned in the article, I think its method is a very good direction. First of all, human judgment is particularly important in this kind of system. This is also the main idea of many human-computer interaction systems to improve accuracy through humans. I think this method in the article is a good start. In a transparent system, let people decide whether to cover the forecast results. Not only do they not force people to participate in the system, but also let people make predictions There are very important weights.

But at the same time, I think the system also has some of the limitations described in the article. For example, the purpose of crowdsourcing workers and its own concerns may affect the results of the final system, so I think the article proposes a good direction, but we need to be more Careful research.

Problem:

  1. Do you think users usually can find the prediction is incorrect and cover it when a system is wrong?
  2. What role does the transparency of the system play in the interaction?
  3. How to prevent users trust the prediction too far in other human and computer interaction systems?

Read More

04/15/2020 – Palakh Mignonne Jude – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

SUMMARY

The authors of this paper design and evaluate a mixed-initiative fact-checking approach that blends prior human knowledge with the efficiency of automated ML systems. The authors found that users tend to over-trust the model which could degrade human accuracy. They conducted three randomized experiments – the first, compares user who perform the task with or without viewing ML predictions, the second, compares a static interface with an interactive one (that enables users to fix model predictions), and the third, compares a gamifies task design to a non-gamified one. The authors designed an interface that displays the claim, the predicted correctness, and relevant articles. For the first experiment, the authors considered responses from 113 participants with 58 assigned to Control and 55 to System. For the second experiment, the authors considered responses from 109 participants with 51 assigned to Control and 58 to Slider. For the third experiment, the authors considered responses from 106 participants, and found no significant differences between the two groups.

REFLECTION

I liked the idea of a mixed-initiative approach to fact checking that builds on the affordances of both humans and AI. I found that it was good that the authors designed the experiments such that the confidence scores (and therefore the fallibility) of the system was openly shown to the users. I also felt that the interface design was concise and appropriate without being overly complex. I also liked the design of the gamified approach and was surprised to learn that the game design did not impact participant performance.

I agree that for this case in particular, participant demographics may affect the results. Especially since the news articles considered were mainly related to American news. I wonder how much if a difference in the results would be observed in a follow-up study that considers different demographics as compared to this study. I also agree that caution must be exercised with such mixed-initiative systems as imperfect data sets would have a considerable impact on model predictions and that the humans should not blindly trust the AI predictions). It would definitely be interesting to see the results obtained when users check their own claims and interact with other user’s predictions.

QUESTIONS

  1. The authors explain that the incorrect statement on Tiger Woods was due to the model having learnt the bi-gram ‘Tiger Woods’ incorrectly – something that a more sophisticated classifier may have avoided. How much of an impact would such a classifier have made on the results obtained overall? Have other complementary studies been conducted?
  2. The authors found that a smaller percentage of users used the sliders than expected. They state that while the sliders were intended to be intuitive, they may require a learning curve causing lesser users to adopt it. Would the use of a tutorial that enabled users to familiarize themselves have helped in this case?
  3. Were the experiments conducted in this study adequate? Are there any other experiments that the authors should have conducted in addition to the ones mentioned?

Read More

04/15/2020 – Palakh Mignonne Jude – What’s at Stake: Characterizing Risk Perceptions of Emerging Technologies

SUMMARY

The authors of this paper adapt a survey instrument from existing risk perception literature to analyze the perception of risk surrounding newer emerging data-driven technologies. The authors surveyed 175 participants (26 experts and 149 non-experts). They categorize an ‘expert’ to be anyone working in a technical role or earning a degree in a computing field. Inspired by the original 1980’s paper ‘Facts and Fears: Understanding Perceived Risk’, the authors consider 18 risks (15 new risks and 3 from the original paper). These 15 new risks include ‘biased algorithms for filtering job candidates’, ‘filter bubbles’, and ‘job loss from automation’. The authors also consider 6 psychological factors while conducting this study. The non-experts (as well as a few who were later on considered to be ‘experts’) were recruited using MTurk. The authors borrowed quantitative measures that were used in the original paper and added two new open-response questions – describing the worst-case scenario for the top three risks (as indicated by the participant) and adding new serious risks to society (if any). The authors also propose a risk-sensitive design based on the results of their survey.  

REFLECTION

I found this study to be very interesting and liked that the authors adapted the survey from existing risk perception literature. The motivation the paper reminded me about a New York Times article titled ‘Twelve Million Phones, One Dataset, Zero Privacy’ and the long-term implications of such data collection and its impact on user privacy.

 I found it interesting to learn that the survey results indicated that both experts and non-experts rated nearly all risks related to emerging technologies as characteristically involuntary. It was also interesting to learn that despite consent processes built into software and web services; the corresponding risks were not perceived to voluntary.  I thought that it was good that the authors included the open-resource question on what the user’s perceived as the worst case scenario for the top three riskiest technologies. I liked that they provided some amount of explanation for their survey results.

The authors mention that technologists should attempt to allow more discussion around data practices and be willing to hold-off rolling out new features that raise more concerns than excitement. However, this made me wonder if any of the technological companies would be willing to perform such a task. It would probably cause external overhead and the results may not be perceived by the company to be worth the amount of time and effort that such evaluations may entail.

QUESTIONS

  1. In addition to the 15 new risks added by the authors for the survey, are there any more risks that should have been included? Are there any that needed to be removed or modified from the list? Are there any new psychological factors that should have been added?
  2. As indicated by the authors, there are gaps in the understanding of the general public. The authors suggest that educating the public would enable this gap to be reduced more easily as compared to making the technology less risky. What is the best way to educate the public in such scenarios? What design principles should be kept in mind for the same?
  3. Have any follow-up studies been conducted to identify ‘where’ the acceptable marginal perceived risk line should be drawn on the ‘Risk Perception Curve’ introduced in the paper?  

Read More

04/15/20 – Lee Lisle – Believe it or Not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

Ngyuen et al’s paper discusses the rise of misinformation and the need to combat it via tools that can verify claims while also maintaining users’ trust of the tool. They designed an algorithm that finds sources that are similar to a given claim to determine whether or not the claim is accurate. They also weight the sources based on esteem. They then ran 3 studies (with over 100 participants in each) where users could interact with the tool and change settings (such as source weighting) in order to evaluate their design. The first study found that the participants trusted the system too much – when it was wrong, they tended to be inaccurate, and when it was right, they were more typically correct. The second study allowed participants to change the inputs and inject their own expertise into the scenario. This study found that the sliders did not significantly impact performance. The third study focused on gamification of the interface, and found no significant difference.

Personal Reflection

               I enjoyed this paper from a 50,000 foot perspective, as they tested many different interaction types and found what could be considered negative results. I think papers that show that all work is not necessarily good have a certain amount of extra relevance – they certainly show that there’s more at work than just novelty.

I especially appreciated the study on the effectiveness of gamification. Often, the prevailing theory is that gamification increases user engagement and increases the tools’ effectiveness. While the paper is not conclusive that gamification cannot do this, it certainly lends credence to the thought that gamification is not a cure-all.

However, I took some slight issue with their AI design. Particularly, the AI determined that the phrase “Tiger Woods” indicated a supportive position. While their stance was that AIs are flawed (true), I felt that this error was quite a bit more than we can expect from normal AIs, especially ones that are being tweaked to avoid these scenarios. I would have liked to see experiment 2 and 3 improved with a better AI, as it does not seem like they cross-compared studies anyway.

Questions

  1. Does the interface design including a slider to adjust source reputations and user agreement on the fly seem like a good idea? Why or why not?
  2.  What do you think about the attention check and its apparent failure to accurately check? Should they have removed the participants with incorrect answers to this check?
  3. Should the study have included a pre-test to determine how the participants’ world view may have affected the likelihood of them agreeing with certain claims? I.E., should they have checked to see if the participants were impartial, or tended to agree with a certain world view? Why or why not?
  4. What benefit do you think the third study brought to the paper? Was gamification proved to be ineffectual, or is it a design tool that sometimes doesn’t work?

Read More

04/15/2020 – Bipasha Banerjee – Algorithmic Accountability

Summary 

The paper provides a perspective on algorithmic accountability from the journalists’ eyes. The motivation of the paper is to detect how algorithms influence various decisions in different cases. The author investigates explicitly the area of computational journalism and how such journalists could use their power to “scrutinize” to uncover bias and other issues current algorithms pose. He lists out a few of the decisions that algorithms make and which has the potential to affect the algorithms capability to be unbiased. Some of the decisions are classification, prioritization, association, filtering, and algorithmic accountability. It is also mentioned that transparency is a key factor in building trust in an algorithm. The author then proceeds to discuss reverse engineering by providing examples of a few case studies. Reverse engineering is described in the paper as a way by which the computational journalists have reverse engineered to the algorithm. Finally, he points out all the challenges the method poses in the present scenario.

Reflection

The paper gives a unique perspective on the algorithmic bias from a computational journalists’ perspective. Most of the papers we read come from either completely the computational domain or the human-in-the-loop perspective. Having journalists who are not directly involved in the matter is, in my opinion, brilliant. This is because journalists are trained to be unbiased. From the CS perspective, we tend to be “AI” lovers and want to defend the machine’s decision and consider it as true. The humans using the system wither blindly trust them or completely doubt them. Journalists, on the other hand, are always motivated to seek the truth, however unpleasant it might be. Having said that, I am intrigued to know the computational expertise level of the journalists. Although having an-in-depth knowledge about AI systems might introduce a separate kind of bias. Nonetheless, this would be a valid experiment to conduct. 

The challenges that the author mentioned include ethics, legality, among others. These are some of the challenges that are not normally discussed. We, from the computational side, need to be aware of these challenges. The “legal ramification” could be enormous if we do not end up using authorized data to train the model and then publish the results. 

I agree with the author that transparency indeed helps bolster confidence in an algorithm. However, I also agree that it is difficult for companies to be transparent in the modern digital competitive era. It would be difficult for companies to take the risk and make all the decisions public. I believe there might be a middle ground for companies; they could publish part of the algorithmic decisions like the features they use and let the users know what data is being used. This might help improve trust. For example, Facebook could publish the reasons why they recommend a particular post, etc.

Questions

  1. Although the paper talks about using computational journalism, how in-depth is the computational knowledge of such people? 
  2. Is there a way for an algorithm to be transparent, yet the company not lose its competitive edge?
  3. Have you considered the “legal and ethical” aspect of your course project? I am curious to know about the data that is being used and other models etc.?

Read More

04/15/2020 – Bipasha Banerjee – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

The paper emphasizes the importance of a mixed-initiative model for fact-checking. It points out the advantages of humans and machines working closely together to verify the veracity of the facts. The paper’s main aim from the mixed-initiative approach was to make the system, especially the user interface, more transparent. The UI presents a claim to the user along with a list of articles related to the statement. The paper also mentions all the prediction models that have been used to create the UI experience. Finally, the authors conducted three experiments using crowd workers who had to predict the correctness of claims presented to them. In the first experiment, the users were shown the results page without the prediction of the truthfulness of the claim. Users were subsequently divided into two subgroups, where one group was given slightly more information. In the second experiment, the crowdworkers were presented with interactive UI. They, too, were further divided into two subgroups, with one group having the power to change the initial predictions. The third experiment was a gamified version of the previous experiment. The authors concluded that human-ai collaboration could be useful, although the experiment brought into light some contradictory findings. 

Reflection

I agree with the author’s approach that the transparency of a system leads to the confidence of the user using a particular system. My favorite thing about the paper is that the authors describe the systems very well. They do a very good job of describing the AI models as well as the UI design and give a good explanation to their decisions. I also enjoyed reading about the experiments that they conducted with the crowdworkers. I had a slight doubt about how the project handled latency, especially when the related articles were presented to the workers in real-time.

I also liked how the experiments were conducted in sub-groups, with a group having information not presented to the other. This shows that a lot of use cases were thought of when the experimentation took place. I agree with most of the limitations that the authors wrote. I particularly agree that if the veracity of predictions is shown to the users, there is a high chance of that influencing people. We, as humans, have a tendency to believe machines and its prediction blindly. 

I would also want to see the work being performed on another dataset. Additionally, if the crowdworkers have knowledge about the domain in the discussion, how does that affect the performance? It is definite that having knowledge would improve detecting the claim of a statement. Nonetheless, this might help in determining to what extent. A potential use case could be researchers reading claims from research papers in their domain and assessing their correctness. 

Questions

  1. How would you implement such systems in your course project?
  2. Can you think of other applications of such systems?
  3. Is there any latency associated when the user is produced with the associated articles? 
  4. How would the veracity claim system extend to other domains (not news based)? How would it perform on other datasets? 
  5. Would experience (in one domain) crowdworkers perform better? The answer is likely yes, but how much? And how can this help improve targeted systems (research paper acceptance, etc.)?

Read More

04/15/2020 – Myles Frantz – Algorithmic accountability

Summary

With the prevalence of technology, the mainstream programs that help the rise of it not only dictate the technological impact but also the direction of news media and people’s opinions. With journalists turning to various outlets and adapting to the efficiency created by technology, the technology used may introduce bias based on their internal sources or efficiencies and therefor introduce bias into their story. This team measured multiple algorithms against four different categories: prioritization, classification, association, and filtering. Using a combination of these different categories, these are then measured within a user survey to measure how different auto complete features bias their opinions. Using these measurements, it has also been determined by the team that popular search engines like Google specifically tailor results based on other information the user has previously searched. For a normal user this makes sense however for some investigative journalist these results may not accurately represent a source of truth. 

Reflection

Noted by the team, there is a strong conflict in the transparency used within an algorithm. These transparency discrepancies may be due to certain government concerns dependent on certain secrets. These creates a strong sense of resiliency and distrust against the use of certain algorithms based. Though these secrets are claimed for national security, there may be misuse of power or overstepping of definition that overuses the term for personal or political gain and are not correctly appropriated. These kinds of acts may be located at any level of government, from the lowest of actors to the highest of rankings.  

One of the key discussion points raised by the team to fix this potential bias in independent research is to teach journalists how better to use computer systems. This may only seem to bridge the journalist’s new medium they are not familiar with. This could also be seen as an attempt to create a handicap for the journalists to better understand a truly fragmented news system. 

Questions

  • Do you think introducing journalists into a computer science program would extend their capabilities or it would only further direct their ideas while potentially removing certain creativity? 
  • Since there is a type of monopolization throughout the software ecosystem, do you believe people are “forced” to use such technologies that tailor the results? 
  • Given how a lot of technology uses user information for potential misuse, do you agree with this information being introduced with a small disclaimer acknowledging the potential preference? 
  • There are a lot of services that offer you better insights to clean your internet trail and clear any biases internet services cache to ensure a faster and more tailored search results. Have you personally used any of these programs or step by step guides to clean your internet footprint? 
  • Many programs capture and record user usage with a small disclaimer at the end detailing their usage on data. It is likely many users do not read these for various reasons. Do you think if normal consumers of technology were to see how corrective and auto biasing the results could be that they would continue using the services? 

Read More

04/15/20 – Myles Frantz – Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

Within this very politicly charged time, it is hard for the average person to decipher any accurate information from the news media. Making this even more difficult are all the media sources creating contradictory information. Despite the variety of companies running fact-checking sources, this team created a fact-checking system that is based on mixing both crowd source and machine learning. Using a machine learning algorithm with a user interface that allows mechanical turk workers to tweak the reputation and whether citations support a claim. These tools allow a user to tweak sources retrieved to read the raw information. The team also created a gamified interface allowing better and more integrated usage of their original system. Overall, the participants appreciated the ability to tweak the sources and to determine the raw sources supporting or not supporting the claim. 

Reflection

I think there is an inherent issue with the gaming experiment created by the researchers. Not part of the environment but based on the nature of humans. Using a gamified method, I believe humans will inherently try gaming the system. Using a smaller scale of this implemented within their research experiment while restricting it in other use cases. 

I believe a crowd worker fact checker service will not work. Given a fact checker service that is crowd sourced is an easy target for any group of malicious actors. Using a common of variety of techniques, actors have used Distributed Denial Of Service (DDOS) attacks to overwhelm and control the majority of responses. These kind of attacks have also been used for controlling block chain transactions and the flow of money. Utilizing a fully fledged crowd sourced fact-checker, this can easily be prone to being overridden through the various actors. 

In general I believe allowing users more visibility into the system encourages more usage. Using some program or some Internet of Things (IoT) device people are likely feeling as though they do not have much control over the flow of the internal programming. Creating this insight and slight control of the algorithm may help give the impression of more control to the consumers of these devices. This amount of control may help encourage people to put their trust back into programs. This is likely due to the the nature of machine learning algorithms and they’re iterative learning process. 

Questions

  • Measuring mechanical turk attention is done usually by creating a baseline question. This ensures if the work is not paying attention (I.e. clicking as fast as they can) they will not answer the baseline question accurately. Given the team did not discard these workers do you think the removal of their answers would support the theory of the team? 
  • Along the same lines of questioning, despite the team regarding the user’s other interactions as measuring their attentiveness, do you think it is wise they ignored the attention check? 
  • Within your project, are you planning on implementing a slide like this team did to help interact with your machine learning algorithm? 

Read More

04/15/2020 – Mohannad Al Ameedi – Believe it or not Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking

Summary

In this paper, the authors propose a mixed initiative approach for fact checking that combine both human knowledge and experience with the automated information retrieval and machine learning. The paper discusses the challenges of the massive amount of information available today on the internet that some of them might not be accurate which introduce a risk to the information consumers. The proposed system retrieve relevant information about a certain topic and use machine learning and natural language processing to assess the factuality of the information and provide a confidence level to the users and let the user decide wither to use the information or do a manual research to validate the claims. The partnership between the artificial intelligence system and human interaction can offer effective fact checking that can support the human decision in a salable and effective way.

Reflection

I found the approach used by the authors to be very interesting. I personally had a discussion with a friend recently about a certain topic that was mentioned in Wikipedia, and I thought the numbers and facts mentioned were accurate but it turns out the information were wrong and he asked me to check an accreditable source. If I had the opportunity to use the proposed system on the paper, then accredited source could have ranked higher than Wikipedia.

The proposed system is very important in our digital age where so much information is generated on a daily bias and we are not only searching for information, but we are also receiving so much information through social media related to current events and some of these events have high impact on our life and we need to assess the factuality of these information and the proposed system can help a lot on that.

The proposed system is like a search engine that not only rank document based on relevance to the search query but also based on the fact-checking assessment of the information. The human interaction is like the relevance feedback in search engine which can improve the retrieval of the information which leads to a better ranking.

Questions

  • The AI systems can be biased because the training data can be biased. How can we make the system unbiased?
  • The proposed system use information retrieval to retrieve relevant articles about a certain topic and then use the machine learning to validate the source of the information and then present the confidence level of each article. Do you think the system should filter out articles with poor accuracy as they might confuse the user? Or they might be very valuable?
  • With the increase usage of social networking, many individuals write or share fake news intentionally or unintentionally. Millions of people post information every day. Can we use the proposed system to assess the fake news? if yes, then can we scale the system to assess millions or billions of tweets or posts?

Read More