12 June 2012

Regulating Algorithms

The Guardian | Cory Doctorow | Google admits that Plato's cave doesn't exist:

Google implies that a page of search results is effectively the table of contents for a custom-made magazine assembled on the fly in response to a user's query

Google has, to date, always refused to frame itself [in an editorial position]. The pagerank algorithm isn't like an editor arguing aesthetics around a boardroom table as the issue is put to bed. The pagerank algorithm is a window on the wall of Plato's cave, whence the objective, empirical world of Relevance may be seen and retrieved.

That argument is a convenient one when the most contentious elements of your rankings are from people who want higher ranking. "We have done the maths, and your page is empirically less relevant than the pages above it. Your quarrel is with the cold, hard reality of numbers, not with our judgement."

The problem with that argument is that maths is inherently more regulatable than speech. If the numbers say that item X must be ranked over item Y, a regulator may decide that a social problem can be solved by "hard-coding" page Y to have a higher ranking than X, regardless of its relevance. This isn't censorship – it's more like progressive taxation.
That strikes me as the exact opposite of the truth. You have some numbers, you have some mathematical function, you get some output. The end. A regulator can decide that the output should be different, he can order you to claim the output is different, he can fine and jail you for refusing to state that the output is different. But the output is the output. Reality is reality. A regulator can demand that student pocket calculators must report that 23*3=70, but that neither changes the maths nor is it easier than regulating speech.

And no, this would be nothing like progressive taxation or censorship, except in that all are forms of absolutist dirigisme.

I see what Google is trying to do here, but if they had gifted lawyers they would make their arguments one level higher up. Their search results are not free-speech protected editorial content. Those are still the results of objective calculations. But their design of the PageRank algorithm is subjective and editorial. They are exercising a meta-editorial technique unavailable to people in pre-computational societies. The design of the algorithm is a form of free speech, the results it produces are mathematical and unarguable. If the judges are smart enough to think on two different levels of abstraction*
not a sure thing, to be sure
then they're in the clear.

The implication is that Google has discovered a mathematical model of relevance, a way of measuring some objective criteria that allows a computer to score and compare the relevance of different web-pages.

But there is no such mathematics. Relevance is a subjective attribute. The satisfaction you experience in regards to a search-results page is generated by your mind, and it reflects the internal state of your neurons just as much as it reflects the external reality of the results.
Yes, relevance is subjective. But once you have some definition or goal or standard you are using as an approximation for relevance then matters are not subjective any longer. Minwise hashing is minwise hashing, no matter how you define "relevance." (Minwise hashing is family of fast techniques used, among other things, for measuring similarity between sets of large documents, and thus useful for search.) There is a difference between subjectivity in the definition of the problem space and subjectivity in algorithms which run in that space.

For instance: what constitutes a noisy nuisance is subjective. Indeed, the very concept of "loudness" is subjective. But the mathematics of sound pressure, and the circuits of a sound level meter are not subjective. We can write perfectly objective rules regarding sounds over certain decibels after certain times at night despite the subjectivity of loudness or noise or nuisance.


  1. > That strikes me as the exact opposite of the truth.

    Well, it ** WAS ** written by Cory Doctorow...

  2. Ha! I never said I was *surprised* he got it backwards.