20,046 page doc archive still available

John Foust jfoust at threedee.com
Fri Sep 3 20:01:51 CDT 2004


Back in March 2001 I posted about a cache of 20,046 pages
of scanned docs I received from someone on the net.  
See the TOC below, followed by his explanation of how 
he did it.

It consumed several CD-Rs, compressed.  I now have a DVD burner 
as well, so I'd be glad to make copies on new or old media.  
(It is actually all available on a hidden web page that 
I disclose if someone sends me a pointed email, but I'd 
hate to stress my little T-1.)

Anyone care to upgrade it to OCR'd PDF or whatever would be
considered a next-best method of preservation and search-ability?
I know it's possible with a handful of Linux, but my to-do list 
is already too long.

- John

I've made contact with a guy who's scanned 20,046 pages of the
docs listed below, at 300 to 400 DPI.  He first told me about the
UCSD p-System docs he'd scanned.  Below the list is his description
of the process he followed.

I'm planning to get a copy of what he has and burn it to CD-R.
Does anyone else have an interest in these docs, or have any 
ideas about distribution without massive copyright violation?

- John

    6502
	MOS 6502 datasheet
	6502 Assembly Language Subroutines (Leventhal)

    AMD
	AMD 29000 Memory Design Handbook
	Am29027 Arithmetic Accelerator
	Am29C327 Floating Point Processor

    Data General
	C Language Reference Manual
	GATE User's Manual
	AOS/VS Internals Manual
	AOS/VS Programmer's Manual, volume 1
	AOS/VS System Calls Dictionary
	CEO User's Manual
	Eclipse 32-bit Principles of Operation
	Eclipse 32-bit System Functional Characteristics
	Fortran-77 Environment Manual
	Fortran-77 Reference Manual

    Fairchild
	Clipper User's Manual

    IDT
	RISC System Programmer's Guide
	R3000 Assembly Language Programmer's Guide
	R3000 Hardware User Manuals
	R3000 Language Programmer's Guide
	High-speed CMOS databook

    Motorola
	68000 Family Reference
	68020 User's Manual
	68851 User's Manual
	88100 User's Manual
	88200 User's Manual
	Linear Interface Integrated Circuits

    NCR
	53C90A/B Advanced SCSI Controller (2 different manuals)
	53C94/5/6 databook
	53CF94/96-2 Fast SCSI Controller
	Disk Array Controller Firmware
	Disk Array Controller Hardware
	Disk Array Controller Software
	Floppy Disk Controller (SCSI-to-FD)

    National Semiconductor
	NS32532 Datasheet
	Series 32000 Programmer's Reference Manual
	DP8490 Enhanced Asynchronous SCSI Interface
	NS32CG16 Programmer's Reference Supplement
	Graphics Handbook
	Series 32000 Databook
	DRAM Management databook
	Embedded Controller Databook

    Ohio Scientific
	C4P User's Manual (2 different manuals)
	65V Programmer's manual
	Schematics for:
	    502 CPU board
	    505 CPU board
	    527 24K memory board
	    540 Video board
	    542 Polled Keyboard

    Pinnacle Systems
	2 User's manuals for their 68k machine (My P-system machine)

    P-system manuals IV.12

	Operating System Reference
	Program Development Reference
	Application Development Guide
	Fortran 77 Reference
	Assembler Reference

    Weitek
	WTL4167 Floating-Point Coprocessor datasheet

Most of these are from about 1988 to 1992, with the exception of the OSI
documentation, of course, which is from 1979.

---

> What sort of process did you follow?  What sort of devices?

As far as the process, I scanned a manual in and checked to make sure
all the pages were there. If they weren't, I'd scan the pages that
didn't make it, and go through all the pages again. I'll admit this is a
little anal, but better safe than sorry. (When you're using a lot of
shell scripts, you never know if you accidently deleted a page with an
"mv" command.) When all the pages where there, I'd go through the manual
one more time to check for general quality (no folded corners, no torn
pages, etc.) If all was good, the manual would be moved to the directory
that would be the root directory of my CD-ROM. That's pretty much it.

The big manuals of more than 1000 pages really sucked, because I'd
generally have to make 3 or more passes to get those completely correct.
If I was going to do it again, I'd probably break the larger manuals
into smaller chunks to avoid this problem.

One thing that made the whole process a lot easier was the netpbm
utilities. I wrote a script to convert the manuals from ~2500x3300 TIFs
to ~500x600 GIFs. My machine takes about 2 seconds to process a 300-400
DPI TIF, but only a fraction of a second for a 75 DPI GIF. I'd run my
script, then do something else for a while. When it was done, I could
flip through the GIFs with GQview and inspect about 2-4 pages per
second. That saved a lot of time.

I assume that, by "devices", you mean what type of scanners I used. I
started with an HP 6350cse (with ADF) that I bought for this very
purpose. However, having never owned a scanner before, I was a little
disappointed with how slow the "fast" scanners are. Fortunately, imaging
is an integral part of the software my company sells and, as luck would
have it, we were demoing a new scanner from Fujitsu. This thing
literally does 60 pages/min at 300 dpi - *both* sides. It's about half
that fast at 400 dpi, which I had to use for the IC databooks to get the
fine print. Needless to say, I did most of my scanning on that.

By the way, to date, I've processed 20046 pages. I'm kinda burned out,
though, so it'll be a while before I do any more. 




More information about the cctalk mailing list