Better indexing on bitsavers
Jan-Benedict Glaw
jbglaw at lug-owl.de
Fri May 20 02:46:25 CDT 2005
On Thu, 2005-05-19 23:37:18 +0000, Jules Richardson <julesrichardsonuk at yahoo.co.uk> wrote:
> On Thu, 2005-05-19 at 16:02 -0700, Al Kossow wrote:
> I tend to 'explode' any PDF files of scans (from whatever source) here
> once downloaded into their own directory; I just find it easier to
> manipulate via whatever image tool is most suitable for whatever I'm
> doing at the time, rather than being stuck with a PDF viewer. I suppose
> if I wanted to add metadata to that, I'd include an ASCII text file in
> the directory full of images with the relevant info in (I've done that
> with ROM and Disk images many a time; not needed to do it with Doc scans
> yet*)
That's basically what I did for the TeX+Images -> PDF script, one .TXT
file per image. This way, you can easily distribute work if needed.
> Maybe I'm atypical in usage :-) I'll rarely want to download scans and
> *not* keep a copy on local storage just in case, so I've never used the
> "view a PDF file in a web browser" side of things.
No, I consider that as normal, fair and expected usage :)
Just a honest question. There is some kind of warez szene
scanning/ORCing/correcting scans of current best-selling books. How do
they do the job? They've basically got to solve the very same problem.
> > If something different comes around, the PDF spec is public, and by using
> > such a small subset it should be simple to translate.
>
> Yep true... plenty of tools already exist to pull PDF files apart. Well,
> you'll be converting all your bitsavers content to futurekeep format
> soon :-)
Pointers for tools? Even while I'm out of time, I'd like to learn more:)
MfG, JBG
--
Jan-Benedict Glaw jbglaw at lug-owl.de . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
fuer einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
More information about the cctalk
mailing list