Reflection #6 – [09/13] – [Lindah Kotut]

  • Sandvig et. al. “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms”
  • Hannak et. al. “Measuring Personalization of Web Search”

The two papers can be connected by using the categorization provided by Sandvig to describe the work done by Hannak. Sandvig proposed a “systematic statistical investigation” (which is done by Hannak) in order to audit search algorithms. Two proposed categorizations fit Hannak’s work: Noninvasive user audit (with their use of Amazon Mechanical Turkers to probe for volatility in search results) and Sock Puppet Audit (using browser agents to mimic actual users and allow for scalable repeated perturbation of the search results).

1. User Preference
If I was to probe this question of auditing algorithm, I would start with a simple question/premise:
Why do people prefer Google as a search engine? This is an especially prescient question to ask since both of the papers agree on the dominance of Google as a search engine. Unlike previous work considering social media, where there is an involved investment by the user in creating and maintaining an account, there is no such burden required to use the search engine, with Bing and Yahoo among other search engines are being worthy competitors.

Understanding the users’ preferences whether overt or hidden, would give an added depth in considering then how the search engine either panders or accounts for personal… foibles in presenting results, and in Hannak’s case, what other axis of measurement is available to determine this.

It stands to reason, as Hannak points out, that given Google constantly tweaks how the algorithm works – It is a not an unreasonable deduction to conclude that part of the reason it does this is to account for hidden patterns in search habits that scale across user base.

This question can be asked retroactively as well: Has the user perception of search engine results changed if proof of filter-bubbles is presented? Or will they be very grateful for only receiving relevant search results? If a key to a successful business is about knowing your customer, then Google really knows its customers.

2. Volatility, Stability, Relevance … and censorship.
Both papers consider results returned by web search and claim that both approaches scale to other web searches. Image search included? For image search differs from web search. Case in point, the campaign by comedian John Oliver to subsume a tobacco company’s mascot with a… less glamorous version, which led to the rising of the “new” mascot to the top of image search (web search remained largely unchanged but for news articles).

Hannak’s work also note that the scope of their work is limited to US version of the search and the English language. This version can be served in another country however (by manually changing the extension back to .com). If we use the case of comparing volatility of the same search engine but different countries (one with censorship laws) Can this case be used to measure for censorship (and is censorship a form of bias)? — Because a measure for censorship can reveal which features (other than keywords) are used in the decision to censor and we can use this extrapolation to also consider other form of bias, intentional or not.

3.  Search Engine Optimization (and Bias)
The SEO, the process by which web presence can be “optimized” to appear high in search rankings with the use of tags, designed for mobile etc, so as to ensure that the page gets ranked favorably/contextually is a layman’s measure of auditing algorithms. Sandvig’s example of YouTube “reply girls” fits this description.

Thus, knowing that this deductive knowledge can be misused by those with the expertise to shape their websites to fit a particular demography — or as has been proven, successfully (and unethically), do this with targeted advertisements, raises the question of:

4. Who bears the responsibility?

Sandvig’s “Reply Girls” example was used to showcase how an algorithm can be subsumed to be an agent of discrimination. If proven to be the case, who is to be punished? If EU’s intention of assigning blame to platforms for the users who upload copyrighted content is anything to go by, then the blame will be laid on the algorithm owners in our case. But there is a trade-off to this, and it rounds back to the first point in this reflection — does the user care?

Leave a Reply

Your email address will not be published. Required fields are marked *