Better indexing on bitsavers

Randy McLaughlin cctalk at randy482.com
Thu May 19 17:48:18 CDT 2005


From: "Jules Richardson" <julesrichardsonuk at yahoo.co.uk>
Sent: Thursday, May 19, 2005 5:20 PM


> On Thu, 2005-05-19 at 23:46 +0200, Jan-Benedict Glaw wrote:
>> On Thu, 2005-05-19 16:32:46 -0500, Randy McLaughlin <cctalk at randy482.com> 
>> wrote:
>> > One extra caveat would be when listing page numbers both the printed 
>> > page
>> > numbers and the PDF's declared page numbers should be included.
>>
>> Some time ago, I worked on some TeX skeleton (generated with script's
>> aid) to produce a PDF file with nice bookmarks and all the like.
>> However, I came to the conclusion that this isn't a real solution.
>>
>> I'm still thinking about how paper-based documentation can be made up
>> cleverly enough to gain text as well as images and mixing meta-data into
>> that. Maybe I'd do some C programming and hack something nice producing
>> PDF files helding everything? But first, I'd need to understand PDF
>> (whose specification actually is about 8cm thick...)
>
> Doesn't this sort of imply that PDF is the wrong choice of format for
> jobs like these? (plus I'm pissed at Adobe because their current readr
> for Linux eats close on 100MB of disk space just to let me read a PDF
> file :-)
>
> It might be good for text-based documents (offering text searching and
> the like), but is it necessarily the right thing for collections of page
> scans?
>
> cheers
>
> J.

PDF is terrible way to package the documents.  It's just better than any 
other practical method ;-)

So many talk about ASCII being the only "right way", as Al can attest to 
time and accuracy makes image oriented PDF's the way to go.

Finding errors in OCR'ed files is extremely time consuming.


Randy
www.s100-manuals.com 




More information about the cctalk mailing list