zip (was: Re: Disk archival techniques)

Jim Leonard trixter at oldskool.org
Wed May 18 13:30:02 CDT 2005


Jules Richardson wrote:
>>The problem I see with zip is the single table of contents at the end.
>>Did you try corrupting THAT with a hex editor?
> 
> Ahh, no not at the time. I've just tried it now though and it seems
> remarkably good at recovering from corruption in the TOC area. Actually,
> looking at the zip file it appears to have something resembling a file
> header before each file in the archive as well as the TOC at the end.

As long as we're talking about fault-tolerant archives, neither TAR nor ZIP are 
acceptable.  For years I've used RAR (WinRAR for windows, RAR and RAR32 for 
DOS) which has "recovery record" support (parity info).  A "recovery record" 
usually burns about 1% additional space (configurable up to 10%) but can 
completely recover mangled compressed data if the errors are small enough (ie 
no more than 512 bytes at a time, at a certain minimum distance from the next 
error, etc.).  For larger archives, RAR supports parity *files*, so if you 
split a large archive into, say, 10 parts and 3 recovery files, you can lose up 
to any 3 of all 13 files and still be able to recover everything.  I do this 
when archiving data to DVD-R and it has saved my butt once (BOTH DVD sets got 
bitrot because of a flood).

ZIP was never built to be fault tolerant, and trying to recover a mangled TAR 
file goes completely pants if the TAR has other TARs inside it.

There *are* tools that generate parity information for archive files that don't 
have it themselves... The generated files look like *.PAR.  Unfortunately I 
don't recall the names of them, but a quick google should find them.
-- 
Jim Leonard (trixter at oldskool.org)                    http://www.oldskool.org/
Want to help an ambitious games project?             http://www.mobygames.com/
Or check out some trippy MindCandy at             http://www.mindcandydvd.com/


More information about the cctalk mailing list