zip (was: Re: Disk archival techniques)

John A. Dundas III dundas at caltech.edu
Thu May 19 12:11:39 CDT 2005


At 2:45 PM -0400 5/18/05, Paul Koning wrote:
>  >>>>> "Jim" == Jim Leonard <trixter at oldskool.org> writes:
>
>  Jim> Jules Richardson wrote:
>  >>> The problem I see with zip is the single table of contents at the
>  >>> end.  Did you try corrupting THAT with a hex editor?
>  >> Ahh, no not at the time. I've just tried it now though and it
>  >> seems remarkably good at recovering from corruption in the TOC
>  >> area. Actually, looking at the zip file it appears to have
>  >> something resembling a file header before each file in the archive
>  >> as well as the TOC at the end.
>
>  Jim> As long as we're talking about fault-tolerant archives, neither
>  Jim> TAR nor ZIP are acceptable.  For years I've used RAR (WinRAR for
>  Jim> windows, RAR and RAR32 for DOS) which has "recovery record"
>  Jim> support (parity info). ...
>
>If you want fault tolerance, it may be a good idea to learn the topic
>of "erasure codes" -- a general concept for way to split data into N+K
>pieces such that you can reconstruct the data from any N pieces (for N
>and K chosen to be whatever you wish).

Error Correcting and Detecting Codes are a huge topic in and of 
themselves.  Fire Codes and others are designed to correct for 
particular expected error characteristics (e.g., contiguous bit 
errors of some maximum length, block errors, random bit errors, 
etc.).  I'm not sure about completely arbitrary N and K; last I 
looked there were tradeoffs and optimizations based on the expected 
characteristics.

For example, the error characteristics (and thus detection/correction 
capability) of a space-earth link is different from that of a 9-track 
magtape, which is different from that of any sort of disk.

>VMS also implemented the XOR thing you mentioned in the BACKUP
>utility (as did RSTS, of course -- since it supports the same
>format).

Yes, Andy Goldstein (or whoever, but I think Andy wrote the first 
version) really did a nice job in VMS BACKUP.  I once wrote a program 
to read such tapes on a Unix system and needed to learn the 
techniques used.  Very nice.  Indeed an entire /BLOCK_SIZE block of 
data within one /GROUP_SIZE could be completely corrupted on tape, 
yet BACKUP could correctly restore it.

John



More information about the cctalk mailing list