Personalization in online platforms could be considered a double-edged sword. At first sight, personalization looks beneficial to individual users to give them a better experience when surfing the web. It also seems to make sense since the amount of accessible information is overwhelming. Taking a deeper look, personalization and other hidden filtering algorithms raise many questions. From the fear of Filter Bubbles to the potential implicit discrimination, it became a public interest issue to scrutinize these black boxes that are deciding on our behalf what we may want to see online.
Revealing the functionality of hidden filtering algorithms is a challenging process. In their work Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms, Christian Sandvig et al. propose multiple research methods for auditing online algorithms. These methods include auditing the algorithm’s code, surveying users of a specific online platform, scraping data, using sockpuppets, or crowdsourcing. Every proposed technique faces a specific set of challenges between ethical challenges, legal challenges, and/or insufficient data or knowledge. This work analyses closely every technique and the challenges it faces, which inspires a lot of research in this area.
A few general takeaways from the work presented by Christian Sandvig et al. are:
- As mentioned in the paper, it is hard to generalize the results of an audit without being subjective to a specific platform. These subjective audit studies could give advantages to some competitors of the studied platform unless there is a regulation that ensures fairness in studies across different platforms providing the same service.
- There are a lot of legal restrictions to performing such studies. Whether workarounds are considered ethical or not depends on the importance of the results and the right of the wide base of users to know what happens behind the scene.
- Combining two or more techniques from those mentioned in the paper could lead to more beneficial results, such as combining crowdsourcing with sockpuppet accounts to design more controlled experiments. Or if possible, combining the code auditing with crowdsourcing could help in reverse-engineering the parts that are not clear.
- Finally, algorithm auditing is becoming highly important and it is necessary to open the way and relax some conditions that allow for more efficient auditing to ensure transparency of different online platforms.
One of the valuable algorithm audit studies is the one performed by Aniko Hannak et al. , presented in their work Measuring Personalization of Web Search. This work presents a well-designed study that analyses how Google search engine personalizes search results. The beauty of this work lies in the interpretability of their experimental setup and their results, as well as the generality of their approach. This work studies the factors that contribute to the personalization of Google search results. They analyzed the similarity between search results for queries made at the same time using the same keywords and studied the effect of multiple factors such as geolocation, demographics, search history, and browsing history. They quantified the personalization for these different factors as well as for different search categories.
Some of the takeaways from this work could be:
- This work serves as a first step towards building a methodology that measures web search personalization quantitatively. Although there could be more parameters and conditions to look at, the method presented by this work is a guiding step.
- The generality of their approach backs the previous point. their method could be applied to different online platforms to reveal initial traits about hidden ranking algorithms such as searching for products on e-commerce websites or displaying news in a newsfeed.
- As they mention, their findings reflect the most obvious factors that drive personalization algorithms. Starting from their work, a deeper analysis could be done to reveal other hidden traits that may carry any sort of discrimination or limit the exposure to some information.
As we mentioned at the beginning, there could be various benefits from a personalization algorithm, however, auditing these algorithms is necessary to ensure that there are no undesirable effects that could result from using them.