[FoRK] image processing for OCR
Eugen Leitl <
eugen at leitl.org
> on >
Mon Jul 3 06:38:39 PDT 2006
On Mon, Jul 03, 2006 at 02:07:40PM +0100, Andy Armstrong wrote:
> Are you looking to automate this or is it a one off?
Yes, something like run a batch over an incoming directory
on a server, before plugging it into a FineReader or OmniPage
> I think the easiest way to do it programatically is to scan each
> raster turning filling on and off as you cross filled pixels. That
> implements the effect of filling the paths using an odd/even winding
> rule - which is what you want for text.
Unfortunately, it has to be a bit more intelligent than that. Only
parts of the page have the stupid artefact, the others are fine.
(Error rate is still lousy, though, there's definitely a need
for an IUPAC proofreader). There *must* be an off-the-shelf filter
for it already. It's just it's too old-skool for the web, or
I don't know the proper terminology.
> The main problem then is handling the special case of the horizontal
> path segments at the tops and bottoms of letters - you only want to
> toggle filling where a path crosses the raster rather than where it
> just runs along the raster for a bit. The fact that your letter
> outlines are probably more than one pixel thick slightly complicates
> detecting that case.
I don't think this will work for http://eugen.leitl.org/sample.tif
especially since the bottom of the scan is good-quality.
> Is that what you need to do? If so I'll try to provide more detail :)
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
ICBM: 48.07100, 11.36820 http://www.ativel.com
8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
More information about the FoRK