Screw Drivers and PDF Files

der Mouse mouse at Rodents.Montreal.QC.CA
Fri Jun 3 15:06:45 CDT 2005


> If you have to muck with a PDF then the PDF should never have been
> generated.

And that is what I (and others) have been saying all along: please
don't shove scans into a PDF!

> PDF is an END format -- it is assumed that the information and
> graphics are perfect BEFORE creating it.

It thus assumes there will never be any reason to pull it apart.  Even
if the information *is* perfect, that is often false.

> If you have bad PDF files that you have a need to manipulate, blame
> the person who created the PDF, not the PDF format itself!

That's exactly what we have been doing - or more precisely, we've been
blaming those who chose PDF as the format for distributing
documentation scans.

Since no format will be perfect for all uses, choosing a packaging
format which is designed around the assumption that the package will
never be pulled apart is..broken.

> What *is* your need to extract images out of PDF?

Usually, to postprocess the page scan for better results, whatever
"better" means at the moment in question.

For example, a page scan may have a different white point on one side
than the other, and I may want to remove that bias before printing.  I
may want to take a greyscale-scanned page and convert it to bilevel for
printing (the printer's conversion, usually by dithering, will not
always be the best for what I want).  I may want to throw the scan at
some OCR technology.  I may even want to just look at it, or part of
it, on-screen - if you think PDF displayers are always at least as good
at displaying scanned page images as programs designed for image
display, you have either an unrealistically good impression of PDF
displayers or an unrealistically bad impression of image displayers.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse at rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


More information about the cctalk mailing list