- Sandvig, Christian, et al. “Auditing algorithms: Research methods for detecting discrimination on internet platforms.” Data and discrimination: converting critical concerns into productive inquiry (2014): 1-23.
- Hannak, Aniko, et al. “Measuring personalization of web search.” Proceedings of the 22nd international conference on World Wide Web. ACM, 2013.
Both papers deal with the topic of examining personalization and recommendation algorithms that lie at the heart of many online businesses, be it travel booking, real estate, or your plain old web search engine. The first paper “Auditing algorithms” brings up the potential for bias creeping in or even the intentional manipulation of the algorithms to advantage or disadvantage someone. It talks about the need for transparency and proposes that algorithms should be examined via audit study and proposes several methods for doing so. The second paper attempts to identify the triggers of and measure the amount of personalization in a search engine, while controlling for other aspects that could change results but is not relevant to personalization.
I think the first problem one might run into when thinking about attempting an audit study of the algorithms of an enormous entity like Google, Facebook or Youtube is the sheer scale of the task ahead of you. When you are talking about algorithms serving millions, even billions of people across the world, you have algorithms that are working with thousands or tens of thousands of variables and it is working towards finding the optimum values for each individual user. I speculate that slight change in the user’s behavior might set of a chain reaction of variable changes in the algorithm. At this scale, human engineers are no longer in the picture and also the algorithm is evolving on its own (thanks machine learning!) and it is possible that even the people who created the algorithm no longer understand how it works. Why do you think Facebook and Youtube are constantly fighting PR fires? They don’t have as much knowledge or control of their algorithms as they might claim. Even the most direct method of a code audit might see the auditors make some progress before they lose it all because the algorithm changed out from under them. How do you audit an ever shifting algorithm of that much size and complexity? The only thing I can think of is use another algorithm that audits the first algorithm since humans can’t do it at scale. But now you run into the problem of possible bias in the auditor algorithm. It’s turtles all the way down.
Even if we are talking about auditing something of a smaller scale, an audit study is still not a perfect solution because of the possibility of things slipping through the cracks. Linus’s law “Given enough eyeballs, all bugs are shallow” doesn’t really work even when everything is out in the open for scrutiny. OpenSSL was open source and a critical piece of infrastructure but the Heartbleed bug lay there unnoticed for two years regardless of many people looking for bugs. What can we do to improve the audit study methods to catch all instances of bias without allowing the study to become impractically expensive?
Coming to the second paper, I find it fascinating the vast difference in how much the later rank results change compared to rank 1. What I want to know is why are the rank 1 results so relatively stable? Is it simply a matter of having a very high pagerank and being of maximum relevance? Are there cases where a result is hard coded in for search queries (like how you often see a link to wikipedia as the first or second result in many search results)? I think focusing specifically on the rank 1 results would be an interesting experiment. Tracking the search results over a longer period of time and looking at the average time periods between rank 1 results changing and also looking at what kind of search queries see the most volatility in rank 1 results.