Better indexing on bitsavers

Jan-Benedict Glaw jbglaw at lug-owl.de
Fri May 20 07:05:36 CDT 2005


On Fri, 2005-05-20 11:36:24 +0000, Jules Richardson <julesrichardsonuk at yahoo.co.uk> wrote:
> On Fri, 2005-05-20 at 09:31 +0200, Jan-Benedict Glaw wrote:
> > On Thu, 2005-05-19 22:20:53 +0000, Jules Richardson <julesrichardsonuk at yahoo.co.uk> wrote:
> > > On Thu, 2005-05-19 at 23:46 +0200, Jan-Benedict Glaw wrote:
> > > > I'm still thinking about how paper-based documentation can be made up
> > > > cleverly enough to gain text as well as images and mixing meta-data into
> > > > that. Maybe I'd do some C programming and hack something nice producing
> > > > PDF files helding everything? But first, I'd need to understand PDF
> > > > (whose specification actually is about 8cm thick...)
> > > 
> > > Doesn't this sort of imply that PDF is the wrong choice of format for
> > > jobs like these? (plus I'm pissed at Adobe because their current readr
> > > for Linux eats close on 100MB of disk space just to let me read a PDF
> > > file :-)
> > 
> > There are alternatives, like:
> > 
> > 	- A tarball containing all the TIFF (or whatever) images as well
> > 	  as some (generated) 
> 
> See my other post; that's my preference and what I tend to do with all
> image-based PDF content I download from anywhere anyway...

For the records (and my education), how do you extract these?

> > HTML page (containing some kind of slide
> > 	  show) as well as a small description file (use this with some
> > 	  program (to be written) to generate the HTML file(s)).
> > 
> > 	  This gives the chance that the description file can be done
> > 	  quite clever, so you'll get eg. a clickable index for the TIFF
> > 	  files (though, needs to be done manually, but now this work
> > 	  load can actually be *distributed*)
> 
> One of the things that I was working on a few years back was layering
> multiple delivery mechanisms over one form of content (where the dataset
> was sufficiently large that storage in multiple formats wasn't
> justified). 
> 
> Data was kept in the "purest" form on the server side, and a client
> could ask for content in whatever format they wanted (in this case raw
> images, PDF, HTML etc.) and over whatever interface mechanism they
> wanted (HTTP, FTP, WAP, email, network filesystem etc.)

Actually, I was working on something like that as well, but with a
different ulterior motive: build something like this as a redundant,
peer-to-peer capable database and many of archiving-old-data problems
just vanish. (Indeed, it would make up a nice P2P system as well.)

> I could see some of the big archives around the planet (regardless of
> content) going this way in the future; user base is maximised through
> offering different formats whilst the "pure" dataset is all that's
> backed up and actually kept on disk.

That's what I dream about in long nights, but not as a centralized
database, but a distributed one. Imagine you shift in a raw-encoded
audio CD (with all the interleaved stuff intact, even containing all the
mischievous, intentional errors. ...and a front-end interlacing WAV
files (which another front-end could use to produce ogg/mp3/wma/you name
it).

Concepts like that can be applied to nearly all media types. Just record
that the underliing (sp?)  recording/reading machinery gets and write
filters for that.  These filters may get as complex as filesystem
drivers. In fact, this *is* a layered filesystem. Remember the thread(s)
about how to rescue tapes? ...or other HDD images? Apply these general
concepts and things may get easier :)


For now (as we're not (yet) there), prividing space isn't my main
problem. It's about having (time to write) the software.

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw at lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 fuer einen Freien Staat voll Freier Bürger" | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));



More information about the cctalk mailing list