Optimizing OCR Results

The accuracy of text recognition through OCR can be affected by a variety of factors, such as documents scanned at an angle. However, there are actions you can take to improve the accuracy of OCR processing.

Note: Image processing settings will not permanently alter images. They are only performed to optimize OCR performance.

Potential Problem Description Optimization
Skewed Text Images scanned at a slight angle may produce less accurate OCR results. OCR processing is most effective when text is level on the horizontal plane. Select Deskew image. This option can be configured from the Image Clean-up Options dialog box.
Text Orientation Text orientation is extremely important in determining whether text will be properly recognized. English text can only be recognized when characters run from left to right. Select Rotate image. If you are not sure how your images should be rotated, select the Automatically option. This option can be configured from the Image Clean-up Options dialog box.
Dirty Images A dirty image contains speckles, which are dots artificially added to an image by a scanner, which can interfere with OCR accuracy. Select Despeckle image. Use the Specify the maximum size of noise to be removed (in pixels) option to optimize how your images are cleaned up. Click Preview to view how an image in the currently selected document will be cleaned up. These options can be configured from the Image Clean-up Options dialog box.
Lines Lines can touch or run through words on a page. When this occurs, those words may not be recognized when the image is processed by OCR. Select the orientation of the lines that will be removed from the image. You can choose to remove horizontal lines, vertical lines, or both. Once they have been removed, the characters that intersected or touched those lines will be reconstructed to their original shape. This option can be configured from the Image Clean-up Options dialog box.
Language OCR processing can be optimized for a particular language, causing OCR processing to prefer certain types of characters when it is not certain what a character should be. Select the language of the majority of the text on the page. This option can be configured from the Optical Character Recognition dialog box.

To optimize OCR results

  1. Click Options from the Tools drop-down menu.
  2. From the Options dialog box, under Generate Text, and click General.

  3. Under Settings, use the Language option t to select the language of the text being optimized for OCR.
  4. Make sure the Perform image enhancement checkbox is selected.
  5. Click Configure to open the Image Clean-Up Options dialog box. From here, determine how images will be enhanced to improve processing.
  6. In the Optimization priority option, select Accuracy, Speed or Balance (learn more about optimization priority).
  7. Click OK.

To optimize OCR results

  1. Click Settings on the upper right of the Laserfiche web client window.
  2. From the Settings dialog box, under Generate Text, and click OCR Settings.

  3. Use the Language option t to select the language of the text being optimized for OCR.
  4. In the Optimization priority option, select Accuracy, Speed or Balance (learn more about optimization priority).
  5. Make sure the Perform image enhancement checkbox is selected.
  6. Configure image enhancements, such as deskew and despeckle as appropriate.
  7. Click Save.

Related Topics