Disk archival techniques

Jules Richardson julesrichardsonuk at yahoo.co.uk
Tue May 17 17:34:25 CDT 2005


On Tue, 2005-05-17 at 17:06 -0500, Brian Wheeler wrote:
> As coincidence would have it, I work at Indiana University's Digital
> Library Program and there was a lecture on archiving audio which hits
> many of the same issues that have come up here.  The conclusions that
> they came up with for the project included:
> 	* There's no such thing as an eternal media:  the data must be
> transportable to the latest generation of storage

Yep, it's recognised the need to periodically refresh data onto whatever
the current favourite media type is. The nice thing about a structured
and essentially human-readable metadata format is that there's a good
chance that it can be transferred as-is to a new type of media without
any reprocessing.

> 	* Metadata should be bundled with the content

Just to clarify; do you mean bundled alongside content or interspersed
with? (From the rest of your message I believe you mean the former,
which happens to be my view too...)

> 	* Act like you get one chance to read the media :(

Yep. Although sometimes multiple reads of media and a combination of the
resulting data can actually improve the ability to reconstruct it :)

> I think the optimum format for doing this isn't a single file, but a
> collection of files bundled into a single package.  Someone mentioned
> tar, I think, and zip would work just as well.

The only danger there is that the two become separated over time, but in
my mind it's an acceptable risk. It's sort of like a librarian losing a
few volumes from a set of encyclopedia I suppose - something that you'd
have to be really careless to do.

> I don't think there's any real need to document the physical properties
> of the media for EVERY disk archived -- there should probably be a
> repository of 'standard' media types (1541's different-sectors-per-track
> info, FM vs MFM per track information, etc) plus overrides in the media
> metadata (uses fat-tracks, is 40 track vs 35, etc).

Now risk of seperation there might well be a problem if there's a single
copy of some metatdata for more than one disk image. I'd say that each
'bundle' forming a disk image (raw data + metadata) needs to totally
describe that disk...

cheers

Jules



More information about the cctalk mailing list