Inventory for handling scanned documents (was: Better indexing
on bitsavers)
Eric Smith
eric at brouhaha.com
Mon May 23 17:42:23 CDT 2005
Jan-Benedict Glaw wrote:
> Armed with this, you can have /n/ TIFFs for a book's /n/ pages or one
> hugh multi-page TIFF containing them all, *plus* the bonus of added
> keywords, chapter captions, printed page number and the like. All these
> goodies won't show up during TIFF->PDF conversion (for printing),
Certainly all those "goodies" can be transferred into PDF files. For
instance, my "tumble" program can do that:
http://tumble.brouhaha.com/
> - Could you life, for long-term storing the data, with using
> single-page TIFFs?
> - These could be used to create multi-page TIFFs,
If you're going to use TIFF as your storage format, why not store
them as multi-page TIFF? Anyone that needs individual pages can
easily enough "burst" them with a utility like tiffsplit. If you
store them as a bunch of single files, it increases the chance that
someone will end up with only a partial document, bad pages, etc.
(same reason programs are often distributed as ZIP or tar files rather
than a bunch of smaller files).
> - Would you like to see something like a X11-based reader which
> could support searching and equal-or-better navigation
> (compared to Acrobat Reader)?
Are you talking about a TIFF reader? I'd like to see better PDF readers.
Evince is already much better than xpdf was, but there's probably room
for further improvement.
> Eric, I don't know how well-working your bookmark generation code is.
> Can it already handle really tree-like looking bookmarks if the data was
> available in tumble's input files?
Yes.
I don't like the way the tumble control files work now, which is
part of why they're not documented. I'm probably going to redesign it
to use an XML-based control language.
In my copious free time. Sigh.
Eric
More information about the cctalk
mailing list