31 March 2010

Pictish and Claude Shannon

What was I just saying about Cory Doctorow and Boing Boing being wretched hive of sophistry when it comes to politics and yet delivering cool geekstuff regularly? Because here's another example from the latter category...
Boing Boing | Maggie Koerth-Baker | Pictish art may have actually been written language

How do you tell the difference between art and written language?

Oh, yeah. It's math.
[Rob Lee] and colleagues Philip Jonathan and Pauline Ziman analyzed the engravings, found on the few hundred known Pictish Stones. The researchers used a mathematical process known as Shannon entropy to study the order, direction, randomness and other characteristics of each engraving.

The resulting data was compared with that for numerous written languages, such as Egyptian hieroglyphs, Chinese texts and written Latin, Anglo-Saxon, Old Norse, Ancient Irish, Old Irish and Old Welsh. While the Pictish Stone engravings did not match any of these, they displayed characteristics of writing based on a spoken language.
There is, sadly, not a lot of detail about what specific characteristics make language stand out from decoration.[*]
It's true there isn't a lot of detail in the Discovery article that Koerth-Baker links, but you can read all about it in Lee, Jonathan and Ziman's paper.  It's in press with Proc. R. Soc. A now so it isn't available through the journal, but in the meantime you can get it from Jonathan's website.  (Here's the pdf.)

(* There's not a lot of detail in that Discovery article because the author of the article found the math in the paper "head-spinning."  I don't mean to be a nerd-snob, but there's not much complicated math in there.  The idea of information entropy is a little tough to wrap your mind around at first, but it's not the math that makes it hard.  Groking information entropy is one of those simple-but-powerful ideas that changes the way you see the world.  Entropy is also at the heart of Information Theory, which has got to be one of the most important advances in theoretical science in the 20th century.  This is all a long way of me saying, what do they teach science journalists these days? and more generally, what have school boards and universities done to math education?

Nit-picky digression from my digression: Shannon entropy isn't so much a "mathematical process" as it is a mathematical attribute of an information stream.  This may seem like a small detail, but calling it a process is like calling the mass of a physical object a process.  Words mean things.)

I've only flipped through the paper briefly thus far, but it looks really interesting.  Give it a read.  My very basic summary of the method is that if the symbols in the Pict's carvings are just decorations, as was previously believed, then the order they are arranged in shouldn't matter.  If the pictures of a wolf and a stag are equally likely and they don't mean anything, then you'd expect to see [chariot then wolf] as often as [chariot then stag].

When symbols mean things this doesn't happen.  In English you're much more likely to see [T-H] than [T-S], even though H and S are about as common.  By analyzing the distribution of these bigram frequencies the authors of the paper were able to determine that Pictish carvings had higher information content than random decorations, and further isolate which general type of language Pictish writing belonged to.  Pictish is semasiographic, or a language in which symbols denote individual meanings, rather than sounds.

~ ~ ~ ~ ~

This is Jim Campbell's portrait of Claude Shannon, inspired by the Nyquist-Shannon Sampling Theorem. That's some fine art geekery.

No comments:

Post a Comment