Inventory for handling scanned documents (was: Better indexing on bitsavers)

der Mouse mouse at Rodents.Montreal.QC.CA
Fri May 20 15:29:59 CDT 2005


> We've now named quite a lot of applications and concepts about how to
> handle scanned documents.  I'd like to get the big picture:

> - How do you scan a paper document?  Page by page?

Page by page, usually.  If it's bound, and two pages fit together on my
scanner, I'll often do it two pages at a time.

>   Do you use a script or something like that?

No...though I will use shell history mechanisms to ease the task, each
scan generally involves at least a few keystrokes.

>   Do you directly scan b/w, or first use grayscale/colour and then
>   degrade that to b/w?

This is a judgement call.  Sometimes I'll do colour, sometimes
greyscale, sometimes binary.  When I do binary I'll sometimes lett he
scanner do it and sometimes I'll scan it some other way and convert it
by hand.

> - How do you work on the scanned images: Do you cut off the white rim
>   as much as possible?

Usually.

>   How do you deal with images that are a tad rotated?

Ignore the rotation, usually.  Egregious cases I may rescan.  In some
cases, I touch up the orientation with pnmrotate (and a subsequent
pnmcut).

>   How do you deal with single black dots in white areas or the other
>   way around?

If the use is one for which they matter, I edit them out "by hand".

> - What digital format do you like to get when it's all finished?
>   Plain PDF?  PDF with some bookmarks?  PDF with all headings as
>   bookmarks?  A new PDF-hyperref based index?  Multiple
>   TIFF/PNG/whatever images?  Something like a web-based slide-show?
>   ...or multiple formats (web-based for viewing, PDF for printing,
>   ...)?

I dislike PDF.  I loathe Web-specific stuff.

I generally just keep a directory around with the iamge files.
Sometimes I'll tar it up, sometimes I'll use compression programs like
gzip or bzip2 to reduce the storage requirements.

> - What do you currently use as your software:

> 	Operating system:

NetBSD (1.4T plus a number of private hacks).

> 	PDF viewer:

GhostScript (8.30 at the moment).

> 	TIFF viewer:

tifftopnm, often transformed with pnm tools like pnmscale, with the
resulting p*m file displayed using a picture display program of my own.

> 	Browser/other viewers you'd love to use:

I've yet to find really *good* such tools, probably because my
user-interface tastes are unusual.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse at rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


More information about the cctalk mailing list