Question about PDF manipulation
Jules Richardson
julesrichardsonuk at yahoo.co.uk
Thu Jun 2 16:27:29 CDT 2005
On Thu, 2005-06-02 at 16:58 -0400, Paul Koning wrote:
> >>>>> "Jan-Benedict" == Jan-Benedict Glaw <jbglaw at lug-owl.de> writes:
>
> >> Ghostscript reads PDF files every bit as well as PS files, and
> >> it's open source...
>
> Jan-Benedict> You didn't answer my question:-) Consider I prepare a
> Jan-Benedict> TIFF file that contains (with additional tags) eg. some
> Jan-Benedict> raw OCRed text, not read-checked. Now I preapre a PDF
> Jan-Benedict> from this and use gs to get the image back. Is my text
> Jan-Benedict> still there? Or do I get an image that "looks" almost
> Jan-Benedict> the original, but doesn't contain my extra-data?
>
> Oh. I didn't know TIFF could do that; I certainly would never store
> text in a TIFF file, no more than I would store images in a DOC file.
It's a pretty cool format for that kind of thing; I suppose like HTML
there are a bare minimum of tags which any decoder should support, and
should be able to skip over anything it can't handle and still output an
image.
It tends to be let down by bad decoder code though - in particular
decoders typically either can't handle multi-page images (or do so
badly), or they don't support all the common compression schemes out
there.
cheers
Jules
More information about the cctalk
mailing list