Lossy compression vs. archiving and OCR (was Re: Many things)

Randy McLaughlin randy at s100-manuals.com
Mon Jan 31 15:46:30 CST 2005


From: "Eric Smith" <eric at brouhaha.com>
Sent: Monday, January 31, 2005 3:15 PM
<snip>
> But that's what you yourself said that the DjVu software does.  It
> replaces glyphs with other glyphs that it thinks are similar.  No matter
> how good a job it thinks it can do of that, I DO NOT WANT IT FOR
> ARCHIVAL DOCUMENTS.
>
> I normally scan at 300 or 400 DPI; when there is very tiny text I
> sometimes use 600 DPI.
>
> Even at those resolutions, it can be difficult to tell some characters
> apart, expecially from poor quality originals.  But usually I can do
> it if I study the scanned page very closely.  No, OCR today cannot do
> as good a job at that as I can.  Someday OCR may be better.  But
> arbitrarily replacing the glyphs with other ones the software considers
> "good enough" is going to f*&# up any possibility of doing this by
> either a human OR OCR.
>
> And all to make the file a little smaller.  DVD-R costs about $0.25
> to store 4.7GB of data, so I just can't get excited about using lossy
> encoding for text and line art pages that usually don't encode with
> lossless G4 to more than 50K bytes per page.
>
> Eric

The point is not you nor your preferences you can store the documents any 
way you want, you can decide to share or not.  If you decide to share you 
can ship the documents on DVD's or offer them on a website.

My documents are not perfect but I believe they are the best I can provide 
given the variables of convenience and cost.

These questions face every archivist, if I decided to archive "perfect 
documents" how many could I archive?


Randy
www.s100-manuals.com 





More information about the cctalk mailing list