29 April 2010

Library of Congress, Twitter & Gopher

When the L of C announced a couple of week ago that it was going to archive every message on Twitter lots of people were asking why? To me it was pretty obvious: because they can and it got their name in the papers. The storage requirements aren't that tough, the cataloging is simple (essentially: don't bother and just have your partners at Google roll out a basic plaintext search for you).  Meanwhile it generated a ton of press and made the Library seem hip and relevant rather than dusty and boring. Potential academic value is irrelevant: it's a PR no-brainer.

If that seems a little cynical of me, ask yourself wether the L of C will bother to archive the entirety of gopher. It's also a valuable historical record, the technical requirements are ridiculously trivial, but it won't generate any glowing press. My money is on them not bothering.
boing boing | Cory Doctorow | All of Gopherspace as a single download

In 2007, John Goerzen scraped every gopher site he could find (gopher was a menu-driven text-only precursor to the Web; I got my first online gig programming gopher sites). He saved 780,000 documents, totalling 40GB. Today, most of this is offline, so he's making the entire archive available as a .torrent file; the compressed data is only 15GB. Wanna host the entire history of a medium? Here's your chance!
There are some plans to potentially host this archive publicly in the manner of archive.org; we'll have to wait and see if anything comes of it.

Finally, I have tried to find a place willing to be a permanent host of this data, and to date have struck out. If anybody knows of such a place, please get in touch. I regret that so many Gopher sites disappeared before 2007, but life is what it is, and this is the best snapshot of the old Gopherspace that I'm aware of and would like to make sure that this piece of history is preserved.
Download A Piece of Internet History (via Waxy)

No comments:

Post a Comment