Inventory for handling scanned documents (was: Better indexing on bitsavers)

Eric Smith eric at brouhaha.com
Mon May 23 17:42:23 CDT 2005


Jan-Benedict Glaw wrote:
> Armed with this, you can have /n/ TIFFs for a book's /n/ pages or one
> hugh multi-page TIFF containing them all, *plus* the bonus of added
> keywords, chapter captions, printed page number and the like.  All these
> goodies won't show up during TIFF->PDF conversion (for printing),

Certainly all those "goodies" can be transferred into PDF files.  For
instance, my "tumble" program can do that:
    http://tumble.brouhaha.com/

> 	- Could you life, for long-term storing the data, with using
> 	  single-page TIFFs?
> 	- These could be used to create multi-page TIFFs,

If you're going to use TIFF as your storage format, why not store
them as multi-page TIFF?  Anyone that needs individual pages can
easily enough "burst" them with a utility like tiffsplit.  If you
store them as a bunch of single files, it increases the chance that
someone will end up with only a partial document, bad pages, etc.
(same reason programs are often distributed as ZIP or tar files rather
than a bunch of smaller files).

> 	- Would you like to see something like a X11-based reader which
> 	  could support searching and equal-or-better navigation
> 	  (compared to Acrobat Reader)?

Are you talking about a TIFF reader?  I'd like to see better PDF readers.
Evince is already much better than xpdf was, but there's probably room
for further improvement.

> Eric, I don't know how well-working your bookmark generation code is.
> Can it already handle really tree-like looking bookmarks if the data was
> available in tumble's input files?

Yes.

I don't like the way the tumble control files work now, which is
part of why they're not documented.  I'm probably going to redesign it
to use an XML-based control language.

In my copious free time.  Sigh.

Eric



More information about the cctalk mailing list