Better indexing on bitsavers
Randy McLaughlin
cctalk at randy482.com
Thu May 19 16:32:46 CDT 2005
From: "Al Kossow" <aek at bitsavers.org>
Sent: Thursday, May 19, 2005 4:12 PM
> On a separate post I mentioned cross support/cross linking. It was my
> clumsy way of saying indexing. It would be nice if people pitched in and
> just did it. I may make a list of all of the chips listed in the
> individual
> PDF's that Al has posted for the westerndigital datasheets. If Al then
> posts this index with the PDF's (or creates an index folder) so the
> googlebot can scan it then a google search would point you where to get
> it.
>
> Simple with the task easily shared amoung many people.
>
> --
>
> That would be a wonderful thing. I have a HUGE backlog of scanned databook
> material, and just finished picking up almost 40 book boxes
> of 70's -> 90's data books from a third large collection.
>
> The first was from the databook collection of Haltek Electronics (RIP),
> the second from a private collection that was given to us with the promise
> that it would be scanned, and now this addional one.
>
> (I've found a few interesting things in the last lot already.. A copy
> of the Fairchild '69 data book, a book by Gnostic Concepts on early
> 70's memory technology, and two of the classic error correcting codes
> books)
>
> There is no way I'm going to have time to OCR or index this. A simple
> text file per PDF with part number and page number would be wonderful.
> This is also the sort of data that Google seems to index REALLY well.
> Watching the hits on bitsavers, almost everyone finds the archive by
> stumbling upon the 'whatsnew.txt' or 'Index.txt' files.
>
> I'd be interested in suggestions for what books should be higher on the
> post-processing queue too. I probably have 50 databooks scanned but not
> PDFed right now. I've been concentrating mostly on getting the classic
> early stuff done first (2nd Edition TI TTL Data Book, etc.)
One extra caveat would be when listing page numbers both the printed page
numbers and the PDF's declared page numbers should be included.
As I said when someone downloads a copy to look for something when there is
no index, just do it and send it back to Al. By sharing the load no one has
to "do it all right now by yourself".
I highly recommend a file that could be concatenated into a "master index"
so it should have a PDF filename plus page numbers. Each index should be
named the same as the PDF with a different extension and the field widths
should be standardized.
Al can decide if it should be text only, html, excel, etc. If there is a
vote I vote text only either comma delimited (with quotes when needed) or
fixed width fields that would be easier to read.
Randy
www.s100-manuals.com
More information about the cctalk
mailing list