Scanning software

MousePotato

New member
can anyone suggest a decent scanning software that works well for text documents and doesnt cost me an arm and a leg.
I've an older version of OmniPage Pro that is kinda crappy in it
messes up text and doesnt display it as the original is.

Anyway, just wondering if someone has any suggestions for this ol' potato, thx in advance.
 
Re: Scanning software

I've been using TextBridge from ScanSoft for some time and find that while the text recognition is "decent", the formatting of the final document is often bad. In my experience changing font sizes by -0.5 points often helps things (the damn thing doesn't seem to know about half points). The few tabular information I tried to get recognized needed a lot of manual changes. It works best with ordinary text.

TextBridge is part of the Pagis software which allows one to pre-scan as much material as you want and store it in XIF files (layered multi-page image format giving very high compression) for recognition later. The scanning software can orient the scanned material automatically and is pretty good at it. Sometimes though, if you use too high resolutions of text with images some parts of images (such as feathers in birds) can be assumed to be "text" and the orientation may suffer; in that case, image enhancements (such as 'make text black') will cause parts of images assumed to be text to become black... The Pagis package has a XIF image editor which has a nice tool for finely re-orienting pages that were not properly oriented, if any. This also allows adding and moving pages around, cropping, and a few other very basic image manipulations.

ScanSoft (www.scansoft.com) has acquired OmniPage about a year ago and the latest version has just been released. I've read, prior to that, that OmniPage was better than TextBridge overall (the best, actually), particularly with tabular information.

While I find TextBridge and Pagis to be useful, I find that there is not enough control on many things and certains things need trial and error to discover how they work (see example below). Overall, I find that a lot of manual changes are necessary to achieve perfection. But if text recognition is only what you need (you want to do the formatting yourself, which often is faster than trying to fix the formatted output), then it's pretty good.

As with most software today the documentation describes the obvious and you must discover the rest by yourself. Here's one example, about pre-scanning and XIF editing. You can either use your twain interface or let Pagis directly control the scanner. If directly controlling the scanner, you have one more property page in the scanning settings. This allows to set the mode (text and images, text only, or complete image without text/image discrimination) and resolution, etc. I wanted both image and text and the possibility to manually reorient scans in the editor if necessary.

I discovered that I could only do 90, 180 or 270 degrees orientation if using the two modes that separate text and images. To be able to reorient by any angle I needed to use the 'image' mode (text/image discrimination can be done later on). But I also wanted to use my twain interface for more control on the scanning pre-process. The problem is, the 'image' setting is only selectable if you let Pagis directly control the scanner. I found out that the mode can be selected and would remain if you later change to use the twain interface. But for this to work I needed to run the scanner setup wizard every time I switch the "use/don't use twain interface setting" and LET IT DO ALL TEST SCANS or I would later get freezes or improper image data. When changing from direct control to twain, one scan is required. The other way around 4 scans are necessary.

If I'm using the twain interface already and want to reliably change the scanning mode I therefore need to let it do 4 scans to change to not using the twain interface, then change the mode, then do one more scan when re-enabling the twain interface (because this involves running the setup wizard twice). This is insanely inefficient but once you figured this out you know how to better control scanning in Pagis, unless you always let it control the scanner directly.

What I hope for you is that they took the best of TextBridge/Pagis/OmniPage, improved on it and put that back into OmniPage. Pagis/TextBridge don't seem to get updated anymore, which seem to confirm the superiority of OmniPage. Unfortunately Omnipage appears to be very expensive and I decided to not upgrade for now (even at a discount) because I don't do that much text recognition and TextBridge is good enough already, supporting 56 languages - more than what I require.

Bottom line is, if you don't like your version of OmniPage your only option might be to upgrade to the latest version and cross your fingers. Just like voice recognition, text recognition is improving and will continue to improve but it will never be perfect. You will always need to do some touch-ups and re-read everything.

Regards, ../Klingon
 
Back
Top