Question about PDF manipulation
Jules Richardson
julesrichardsonuk at yahoo.co.uk
Fri Jun 3 06:50:23 CDT 2005
On Fri, 2005-06-03 at 00:16 -0400, der Mouse wrote:
> > - Ran identify again on the resulting TIFF file, and the comment's now
> > changed to: "Image generated by ESP Ghostscript (device=pnmraw)"
>
> > ... so it looks like any TIFF 'metadata' isn't getting preserved.
>
> Worse, it is probably re-rendering the pixels, so unless it's quite
> careful, it's introduced blur due to mismatches between the original's
> pixel boundaries and the output's pixel boundaries. Did you compare
> the pixel contents bit-for-bit?
Sort-of. Well, what I tried was taking my original image and the one
that was a result of the conversion from PDF, setting the comment field
back to that of the original TIFF image, then running a checksum on both
files.
They were different, but then that's probably not alarming; there's
likely a timestamp in there or extra tags added or the tags are in a
different order etc.
I'll load the two images into a viewer and save the raw pixel data then
compare that I think, which should be a fair test....
> > Looking at the PDF file, I'm not convinced there's any TIFF data in
> > there to be honest. It looks more like the image is re-encoded from
> > the input TIFF to PDFs own way of storing bitmap data - in other
> > words it's not simply a wrapper for a bunch of TIFF images, but
> > merely a wrapper for bitmap data in PDF's own format.
>
> It's not *quite* that simple. I've seen PDFs containing JPEGs which I
> could pick the JPEGs out of simply by looking for the
> \xff\xd8\xff\xe0..JFIF marker.
>
> Unless of course the "PDF's own format" *is* JPEG, which would be both
> surprising and disappointing.
Hmm. I've certainly seen PDFs where the source JPEG data has been far
better quality than the data embedded in the resulting PDF file. Always
put that down to the quality of the PDF viewer, but maybe it's the
actual data - maybe PDF always recodes source JPEG data to JPEG
internally (with a drop in quality) and always recodes non-lossy data
(e.g. TIFF) to its own internal non-lossy format.
cheers
Jules
More information about the cctalk
mailing list