Question about PDF manipulation
Jules Richardson
julesrichardsonuk at yahoo.co.uk
Thu Jun 2 05:53:12 CDT 2005
On Wed, 2005-06-01 at 22:28 -0700, Tom Jennings wrote:
> >> Please do not waste any time making new PDF documents,
>
> On Wed, 1 Jun 2005, Eric Smith wrote:
>
> > Please don't waste any time complaining about PDF documents. In many
> > cases, you're lucky to get the data in any form at all.
>
> While I do believe PDFs are often abused and partially deserve
> their bloated reputation (Adobe pushes them too much) they can be
> used properly as containers for multiple-component documents.
>
> It's unfortunate that it's so much work to hand-type or "OCR" old
> documents to produce PDFs as lovely as MSC's.
Well, the best that can be hoped for is that OCR technology will
gradually improve (one of the reasons I don't personally scan stuff as
bi-level). Obviously the ultimate goal would be to have text stored as
text (irrespective of surrounding markup - RTF / Word doc / HTML etc.)
but the technology's probably not quite there yet.
For the near future, the question's probably whether PDF over some other
encapsulation format or seperate scan-per-page approach is the better
choice (and there's the questions of what resolution / bit depth to scan
at, whether any intermediate processing should be done to correct skewed
pages etc.)
cheers
Jules
More information about the cctalk
mailing list