Disk archival techniques
Brian Wheeler
bdwheele at indiana.edu
Wed May 18 08:18:43 CDT 2005
On Tue, 2005-05-17 at 22:34 +0000, Jules Richardson wrote:
> On Tue, 2005-05-17 at 17:06 -0500, Brian Wheeler wrote:
> > As coincidence would have it, I work at Indiana University's Digital
> > Library Program and there was a lecture on archiving audio which hits
> > many of the same issues that have come up here. The conclusions that
> > they came up with for the project included:
> > * There's no such thing as an eternal media: the data must be
> > transportable to the latest generation of storage
>
> Yep, it's recognised the need to periodically refresh data onto whatever
> the current favourite media type is. The nice thing about a structured
> and essentially human-readable metadata format is that there's a good
> chance that it can be transferred as-is to a new type of media without
> any reprocessing.
>
> > * Metadata should be bundled with the content
>
> Just to clarify; do you mean bundled alongside content or interspersed
> with? (From the rest of your message I believe you mean the former,
> which happens to be my view too...)
>
Bundled alongside so the 'raw' data (metadata or content) can be
manipulated with standard tools.
> > * Act like you get one chance to read the media :(
>
> Yep. Although sometimes multiple reads of media and a combination of the
> resulting data can actually improve the ability to reconstruct it :)
>
True. During this lecture they were talking about recording the
stop/starts required to get the actual audio into the system.
Apparently there's some standard for doing that for audio. There were
several horror stories as well.
> > I think the optimum format for doing this isn't a single file, but a
> > collection of files bundled into a single package. Someone mentioned
> > tar, I think, and zip would work just as well.
>
> The only danger there is that the two become separated over time, but in
> my mind it's an acceptable risk. It's sort of like a librarian losing a
> few volumes from a set of encyclopedia I suppose - something that you'd
> have to be really careless to do.
>
Yeah, but if the package is treated as a separate archival unit, then
the risk of separation should be fairly low. As to other's comments
about zip vs tar, I only suggested zip because it is more common today.
Just don't use ar or cpio! :)
> > I don't think there's any real need to document the physical properties
> > of the media for EVERY disk archived -- there should probably be a
> > repository of 'standard' media types (1541's different-sectors-per-track
> > info, FM vs MFM per track information, etc) plus overrides in the media
> > metadata (uses fat-tracks, is 40 track vs 35, etc).
>
> Now risk of seperation there might well be a problem if there's a single
> copy of some metatdata for more than one disk image. I'd say that each
> 'bundle' forming a disk image (raw data + metadata) needs to totally
> describe that disk...
>
Well, the on-disk-structure metadata is the only one that would benefit
from having a separate repository of definitions. I don't see any
reason to not allow the data to be fully defined if the archivist feels
the desire to do so, but a list of standard types (as well as a copy of
the full definitions stored somewhere) would take some of the tedium out
of it.
Brian
More information about the cctalk
mailing list