Disk archival techniques

Brian Wheeler bdwheele at indiana.edu
Wed May 18 08:13:31 CDT 2005


On Tue, 2005-05-17 at 17:30 -0500, Randy McLaughlin wrote:
> From: "Brian Wheeler" <bdwheele at indiana.edu>
> Sent: Tuesday, May 17, 2005 5:06 PM
> 
> 
> > As coincidence would have it, I work at Indiana University's Digital
> > Library Program and there was a lecture on archiving audio which hits
> > many of the same issues that have come up here.  The conclusions that
> > they came up with for the project included:
> > * There's no such thing as an eternal media:  the data must be
> > transportable to the latest generation of storage
> > * Metadata should be bundled with the content
> > * Act like you get one chance to read the media :(
> >
> > While this is a different context, the principle is basically the same.
> > I've got a pile of TK50 tapes I'm backing up using the SIMH tape format,
> > so this is relevant to that process as well.
> >
> > I think the optimum format for doing this isn't a single file, but a
> > collection of files bundled into a single package.  Someone mentioned
> > tar, I think, and zip would work just as well.  The container could
> > contain these components:
> > * content metadata - info from the disk's label/sleeve, etc
> > * media metadata - the type of media this came from
> > * archivist metadata - who did it, methods used, notes, etc
> > * badblock information - 0 blocks which are actually bad.
> > * content - a bytestream of the data
> >
> > I don't think there's any real need to document the physical properties
> > of the media for EVERY disk archived -- there should probably be a
> > repository of 'standard' media types (1541's different-sectors-per-track
> > info, FM vs MFM per track information, etc) plus overrides in the media
> > metadata (uses fat-tracks, is 40 track vs 35, etc).
> >
> > Emulators could use the content part of the file as-is and collectors
> > would have enough information to recreate the original media.  It would
> > also allow for cataloging fairly easily.
> >
> > Brian
> <snip>
> 
> I disagree on a few points:
> 
> Today we know what the 1541 structure is, we need enough detail to explain 
> it to future users.

> The differences between FM and MFM are not as simple as a binary decision. 
> RX02 is one example of mixed formatting, even with FM & MFM each 
> implimentation can be fairly unique (hard vs. soft sectored, sector size, 
> flux density, etc).
> 
> 

I guess what I was getting at is there should be a library of standard
types which fully define the format.  1541's look the same 99% of the
time unless half-tracks, fat tracks, or another copy protection scheme
was used.  So if there's a library that fully defines what a 1541 _is_,
there's no reason to have that exact definition copied for each disk
archived.  Not that it really takes up that much space, but it does make
it more tedious -- do you want to enter the track/sector geometry for
every disk you copy?


> Most of the exact details can be understood by using current knowledge but 
> maybe not 50 years from now when someone is trying to understand it.
> 

True, but I suppose that's why we're discussing it now :)  


> One thing can be that for a given format part of the overall archive should 
> include technical details:
> 
> That is to say one example would be a site like asimov should include 
> technical information on the apple disk interface as well as an explaination 
> of how the dsk images are created and restored.  It isn't necessary to 
> include the details with every dsk file but withing the general archive.
> 

Agreed.  If there's an archive with the 'bundling software' which
handles the meta data, there should be a library of definitions there
that can be copied to each user's archive as needed.

Brian



More information about the cctalk mailing list