manual file types

Antonio Carlini a.carlini at ntlworld.com
Mon Jan 31 17:53:46 CST 2005


> TIFF file (big)

The general consensus seems to be that bi-level scanning
with a resolution of at least 300dpi but preferably 400dpi
(although I tend to use 600dpi). G4 encoded TIFF is pretty
good space wise (obviously lously compared to text).

> OCRed ASCII text (ugly)

OCR is (almost) certain to introduce errors. You'll need a 
significant investment in proof-reading to fix this!

> compressed PostScript of OCRed text (depending on OCR, could be nice).

If you can OCR, then any format that can represent that text in
whatever fonts and layout the original document used (and uses
an efficient openly-documented format) should do. Most of my
text documents are PDF. You can turn PDF into text (or html I guess)
where appropriate.

But you cannot OCR (or at least, I bet you cannot OCR without
introducing errors).

Antonio

-- 

---------------

Antonio Carlini arcarlini at iee.org

 





More information about the cctalk mailing list