Better indexing on bitsavers

Jan-Benedict Glaw jbglaw at lug-owl.de
Fri May 20 02:46:25 CDT 2005


On Thu, 2005-05-19 23:37:18 +0000, Jules Richardson <julesrichardsonuk at yahoo.co.uk> wrote:
> On Thu, 2005-05-19 at 16:02 -0700, Al Kossow wrote:
> I tend to 'explode' any PDF files of scans (from whatever source) here
> once downloaded into their own directory; I just find it easier to
> manipulate via whatever image tool is most suitable for whatever I'm
> doing at the time, rather than being stuck with a PDF viewer. I suppose
> if I wanted to add metadata to that, I'd include an ASCII text file in
> the directory full of images with the relevant info in (I've done that
> with ROM and Disk images many a time; not needed to do it with Doc scans
> yet*)

That's basically what I did for the TeX+Images -> PDF script, one .TXT
file per image. This way, you can easily distribute work if needed.

> Maybe I'm atypical in usage :-) I'll rarely want to download scans and
> *not* keep a copy on local storage just in case, so I've never used the
> "view a PDF file in a web browser" side of things. 

No, I consider that as normal, fair and expected usage :)

Just a honest question. There is some kind of warez szene
scanning/ORCing/correcting scans of current best-selling books. How do
they do the job? They've basically got to solve the very same problem.

> > If something different comes around, the PDF spec is public, and by using
> > such a small subset it should be simple to translate.
> 
> Yep true... plenty of tools already exist to pull PDF files apart. Well,
> you'll be converting all your bitsavers content to futurekeep format
> soon :-)

Pointers for tools? Even while I'm out of time, I'd like to learn more:)

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw at lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 fuer einen Freien Staat voll Freier Bürger" | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));



More information about the cctalk mailing list