Disk archival techniques
Brian Wheeler
bdwheele at indiana.edu
Wed May 18 08:13:31 CDT 2005
On Tue, 2005-05-17 at 17:30 -0500, Randy McLaughlin wrote:
> From: "Brian Wheeler" <bdwheele at indiana.edu>
> Sent: Tuesday, May 17, 2005 5:06 PM
>
>
> > As coincidence would have it, I work at Indiana University's Digital
> > Library Program and there was a lecture on archiving audio which hits
> > many of the same issues that have come up here. The conclusions that
> > they came up with for the project included:
> > * There's no such thing as an eternal media: the data must be
> > transportable to the latest generation of storage
> > * Metadata should be bundled with the content
> > * Act like you get one chance to read the media :(
> >
> > While this is a different context, the principle is basically the same.
> > I've got a pile of TK50 tapes I'm backing up using the SIMH tape format,
> > so this is relevant to that process as well.
> >
> > I think the optimum format for doing this isn't a single file, but a
> > collection of files bundled into a single package. Someone mentioned
> > tar, I think, and zip would work just as well. The container could
> > contain these components:
> > * content metadata - info from the disk's label/sleeve, etc
> > * media metadata - the type of media this came from
> > * archivist metadata - who did it, methods used, notes, etc
> > * badblock information - 0 blocks which are actually bad.
> > * content - a bytestream of the data
> >
> > I don't think there's any real need to document the physical properties
> > of the media for EVERY disk archived -- there should probably be a
> > repository of 'standard' media types (1541's different-sectors-per-track
> > info, FM vs MFM per track information, etc) plus overrides in the media
> > metadata (uses fat-tracks, is 40 track vs 35, etc).
> >
> > Emulators could use the content part of the file as-is and collectors
> > would have enough information to recreate the original media. It would
> > also allow for cataloging fairly easily.
> >
> > Brian
> <snip>
>
> I disagree on a few points:
>
> Today we know what the 1541 structure is, we need enough detail to explain
> it to future users.
> The differences between FM and MFM are not as simple as a binary decision.
> RX02 is one example of mixed formatting, even with FM & MFM each
> implimentation can be fairly unique (hard vs. soft sectored, sector size,
> flux density, etc).
>
>
I guess what I was getting at is there should be a library of standard
types which fully define the format. 1541's look the same 99% of the
time unless half-tracks, fat tracks, or another copy protection scheme
was used. So if there's a library that fully defines what a 1541 _is_,
there's no reason to have that exact definition copied for each disk
archived. Not that it really takes up that much space, but it does make
it more tedious -- do you want to enter the track/sector geometry for
every disk you copy?
> Most of the exact details can be understood by using current knowledge but
> maybe not 50 years from now when someone is trying to understand it.
>
True, but I suppose that's why we're discussing it now :)
> One thing can be that for a given format part of the overall archive should
> include technical details:
>
> That is to say one example would be a site like asimov should include
> technical information on the apple disk interface as well as an explaination
> of how the dsk images are created and restored. It isn't necessary to
> include the details with every dsk file but withing the general archive.
>
Agreed. If there's an archive with the 'bundling software' which
handles the meta data, there should be a library of definitions there
that can be copied to each user's archive as needed.
Brian
More information about the cctalk
mailing list