Many things

Antonio Carlini a.carlini at ntlworld.com
Mon Jan 31 17:49:29 CST 2005


> Although it doesn't really know text is per-se, one of its 
> algorithms is 
> to find glyph-like things.  Once it has all glyph-like things 
> isolated 
> on a page, it compares them all to each other and if two glyphs are 
> similar enough, it will just represent them both (or N of 
> them) with one 
> compressed glyph image.

That looks like information loss to me. If one of those glyph-like
things was not the same symbol as the others, then the algorithm
has just introduced an error.

> So for OCR purposes, I don't think this type of compression 
> really hurts 
> -- it replaces one plausible "e" image with another one.

But one of them might have been something other than an "e".

Antonio

-- 

---------------

Antonio Carlini arcarlini at iee.org





More information about the cctalk mailing list