One thing well illustrated by history is that the records that endure are the ones that got chiseled into stone or, failing that, at least put on paper. Given the issues of long-term reliability relating to hard drives, flash memory, and writable optical media, someone wishing to preserve information for the distant future might be well advised to make a paper copy of the parts that are most critical.
PaperBack is a mechanism for facilitating exactly that. It includes software to convert about half a megabyte of any kind of data into a pattern that can be printed onto paper. For some kinds of highly compressible information, it can manage three megabytes per page – as much as two old 3.5″ diskettes. It also includes code for scanning the data back into a digital form. While I doubt anybody will be doing this for multi-gigabyte video files, it may be a worthwhile thing for some kinds of information. Anyone building the modern equivalent of an ancient Greek tomb might be especially well advised to consider the software. Hopefully, future generations will prove as capable at deciphering JPEG images as those in the recent past did at deciphering Linear B.
A compiled version of the software is available for Windows. Mac and Linux users will need to compile the code for themselves.
What kind of data would this be worth using for?
Two racks, each the size of a modest refrigerator, each holding north of a petabyte’s worth of information. These are the PetaBoxes, the Internet Archive’s web-in-a-box systems. Designed as a shippable backup for every freely shareable file anyone cares to upload and hundreds of copies of everything else, too, they betray the archive’s US origins in the strip of American-style electric outlets running down one strut, a column of surprised clown-faces Fed-Exed from across the ocean. A couple of other things set them apart. Each rack draws 12 kilowatts, whereas a normal rack at the facility draws 4.5 kilowatts; the drive-housings are covered in a rather handsome fire-engine-red enamel. Apart from that, the PetaBoxes are just another pair of racks.
Yet housed in these machines are hundreds of copies of the web — every splenetic message-board thrash; every dry e-government document; every scientific paper; every pornographic ramble; every libel; every copyright infringement; every chunk of source code (for sufficiently large values of ‘every’, of course).
They have the elegant, explosive compactness of plutonium.
Comprehensive storage
Your average active computer user has more and more data. The first computer I effectively administered had 170 megabytes of hard disk space. Difficult choices had to be made about the relative merits of Doom versus Simcity. Now, just my primary email account has 1500 megabytes of data in it. I have 15 gigabytes worth of photos I have taken (all since 2005) and 20 gigabytes of music.
All this has been made possible by dramatically falling storage prices, combined with the spread of broadband internet. Soon, I expect that this combination will reach its logical conclusion. Right now, people are constrained by the size of their smallest hard drive, as well as by the difficulty of accessing larger remote drives. Eventually, I expect that most people will have a multi-terabyte disk connected to the internet at high speed and securely accessible from virtually any device in the world over the internet. The biggest question is whether this will be an ‘answering machine’ or a ‘voicemail’ solution…
Actual words written on parchment (not paper), I could imagine surviving for thousands of years.
I am less confident about some pattern produced using unknown dyes and a commercial inkjet or laser printer.
Even if the pattern endures properly, any damage to the sheets would eliminate more data than erasing an equivalent space of written text would.
The British Parliament still writes all of its legislation on real vellum hides, which are then stored in an incredibly smelly room in the Cabinet Office…
Claire,
That’s pretty great. Does it include the whole mass of EU legislation, as well?
The second advantage is reliability. When a tape snaps, it can be spliced back together. The loss is rarely more than a few hundred megabytes—a bagatelle in information-technology circles. When a terabyte hard disk fails, by contrast, all the data on it may be lost. The consequence at CERN, specifically, is that a few hundred megabytes of its 100-petabyte tape repository are, on average, lost every year. Of the 50 petabytes of data held on hard disk, however, it loses a few hundred terabytes in the same period.
…
Tape has two other benefits, as Evangelos Eleftheriou, manager of storage technologies at IBM’s research laboratory in Zurich, points out. It is cheaper than disks (a gigabyte of disk storage costs 10 cents, versus 4 cents for tape), and it lasts longer. Tapes can still be read reliably after three decades, against five years for disks.