Paper 1: Sandvig, Christian, et al. “Auditing algorithms: Research methods for detecting discrimination on internet platforms.” Data and discrimination: converting critical concerns into productive inquiry (2014): 1-23.
Paper 2: Hannak, Aniko, et al. “Measuring personalization of web search.” Proceedings of the 22nd international conference on World Wide Web. ACM, 2013.
It is safe to say that the issue of algorithmic bias and personalization was once celebrated as a “smartness” and high service quality of internet information providers. In a study by Lee et. al. [1] the research participants attributed algorithms’ fairness and trustworthiness to its perceived efficiency and objectivity. This means reputed and widely acknowledged algorithms such as Google search appear to be more trustworthy. This makes it particularly severe when algorithms give out discriminatory information or make decisions on user’s behalf. Christian et. al.’s paper reviews the implications of discrimination on internet platforms along with the auditory methods that researchers can use to detect its prevalence and effect. Amongst the several methods proposed, Hannak et. al. in paper two utilize sockpuppet audit and crowdsourcing audit methods to measure personalization on Google search. From the review of two papers, it can be said that the bias in algorithms can occur either from the code or from the data.
- Bias in code can be attributed to financial motives. The examples given in the paper 1 about American Airlines, Google Health and product ranking highlight this fact. But, how is it different from getting store brand at a low price in any of the supermarkets? On surface, both enterprises are utilizing the platform they created to best sell their products. However, the findings in the paper 2 prove that website ranking (either having AA flight ads at the top or having Google service links at the top) is what separates fair algorithms from the unfair algorithm. (Unlike biased information providers Kroger displays its store brand at the same front as other brands). There is a clear difference between change in search rank between AMT results and the control results.
- Bias in data, I believe, is mainly caused due to the user’s personal history and the dominance of a particular type of information available. Getting similar type of information based on history can lead to echo chambers of ideologies as seen in the previous paper. There is also another type of bias in data that informs algorithms in the form of word embeddings in automatic text processing. For example, in the paper “Man is to Computer Programmer as Woman is to Homemaker”, [2] Bolukbasi et. al. state that the historical evidence of embeddings of computer science being closer to male names than female names will make search engines rank male computer scientist web pages higher than female scientists. This type of discrimination can not be blamed on a single entity but just the prevalence of biased corpus and the entire human history!
Further, I would like to comment briefly on the other two research design methods suggested in paper 1. Scraping audit can have unwanted consequences. What happens to the data that is scraped and later blocked (or moderated) by the service provider? Recently, Twitter suspended Alex Jone’s profile but his devoted followers were able to rebuild a fake profile with real tweets based on the data collected from the web crawlers and scrapers. Also, noninvasive user audits, even though completely legal can be ineffective with poor choice of experts.
Finally, given the recent events, it can be valuable to research how algorithms share information across platforms. It is common to see ads of hotels and restaurants on Facebook after booking flight tickets with Google flights. Is “Google personalization” only limited to Google?
[1] Lee, Min Kyung. “Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management.” Big Data & Society 5.1 (2018): 2053951718756684.
[2] Bolukbasi, Tolga, et al. “Man is to computer programmer as woman is to homemaker? debiasing word embeddings.” Advances in Neural Information Processing Systems. 2016.