OmniPage Zone OCR

OmniPage Zone OCR performs Optical Character Recognition (OCR) to generate text from a specified area of an image. It can be used in Pre-Classification Processing, First Page Identification, Page Processing, Last Page Identification, or Post-Processing. You can draw or specify the coordinates of a box in a particular area of a sample image, and text will be extracted from that area of all the pages processed with that process. This text, in turn, can be used in a number of different ways. The most common ways to use information extracted from an image via OmniPage Zone OCR are to identify documents and automatically populate fields. OmniPage Zone OCR can also be used to name documents and create folders.

Note: OmniPage Zone OCR does not associate any text with the document. It just reads and gives you access to what it reads in a token. If you want text associated with the document, use OmniPage OCR.

Example: The City of Wonderland configures a session to process building permit applications. Though they are mixed in with other documents, the applications all have "Building Permit Application" written at the top of the first page. They create a document class called "Permit Application" and configure an OmniPage Zone OCR process in the First Page Identification stage to generate text from a region at the top of the page and identify it as belonging to the class if it contains the words "Building Permit Application" at the top.

Example: The City of Wonderland also wants to retrieve other information from the building permit applications and insert it into the fields. They create OmniPage Zone OCR zones in the regions that contain the date and the name of the applicant and insert tokens representing those zones into the document fields. When the documents are processed, the data will be automatically entered into the fields.

To use OmniPage Zone OCR

  1. In the Session Configuration Pane, select the stage of processing where you want to use OmniPage Zone OCR.
  2. In the Tasks Pane, select OmniPage Zone OCR.
  3. You can optionally enter a name for the process under Process Name.
  4. Move through each step of the wizard at the bottom of the pane. You can also click Skip Wizard to display and configure the properties all at once.
  5. Page Range: When configuring a process in Page Processing or Post-Processing, you will be prompted to specify a page range. In other stages, default settings will be automatically applied.

    Note: When Zone OCR is configured to extract information from a zone on multiple pages, it only retains the value of the zone on the last page processed. If you want to accumulate all the values, use the Token Accumulator process.

  6. Region Selection: Define one or more regions, or zones, to be read by the OmniPage Zone OCR process.
    • To define a region, drag and resize the zone on the Display Pane or specify coordinates in pixels or by percentage in the Tasks Pane.  

      Tip: If you are zoomed in to a specific area of an image, adding a zone will place it in the top left corner of the zoomed in image for convenience.

    • Specify a name for the region. This will also be the name of the token that represents the value read from the region.
    • To define an additional region with the same settings, click Add again.
    • To remove a region, select it and click Remove.
    • To configure advanced OCR options for a specific region, select the region and click Advanced options. More info.

    Note: Zones can be copied and pasted within this and other processes that contain zones. When copying a zone within this process, pasting using CTRL + V will paste the zone directly on top of the zone you copied. Right-clicking on a different area of the image and selecting Paste Zone from the context menu will paste the copied zone where you right-clicked. The advanced settings will be copied as well.

  7. Identification Condition: When configuring OmniPage Zone OCR in First Page Identification, you will be promoted to set an identification condition to match the information read from the region with the definitions for the document class.
  8. Language Selection: Select a language to help optimize the character recognition.

    Note: Arabic and Thai will be available if you are licensed for them in Quick Fields 10.1 and later.

  9. Optimization: Specify an optimization style. There is generally a trade-off between speed and accuracy.
    • Speed: Reduces the amount of time it takes to OCR. Generated text may be less accurate.
    • Balanced: Neither optimum speed nor optimum accuracy, but between the two.
    • Accuracy: Increases OCR quality. Processing time will also be increased.
  10. Orientation: Specify the zone's orientation. If a captured document's text in the zone matches this setting, OCR will be more accurate.

    Tip: 0 degrees represents a standard page with text that reads from left to right. 180 degrees represents a page that appears upside down.

  11. Optional: To preview how this enhancement will affect scanned images and OCRed or extracted text, test processes. For the best results, add a custom sample page before testing. Adjust and test until you are satisfied with the results.

Note: Local image enhancements can be used with OmniPage Zone OCR.

Note: If you define a zone OCR region using percentages for scanned pages of a certain size, the size and placement of the region will change if a page of a different dimension is scanned. For example, the region defined for a scanned page that is 8.5 x 11 in size will be a different size and in a different location on a scanned page that is 8.5 x 14.

Note: Some processes come with the basic Quick Fields installation, and some must be purchased as add-ons. Contact your reseller for more information.

OmniPage Zone OCR Advanced Options

You can configure advanced OCR settings for the OmniPage Zone OCR process. In most cases, these settings should only be modified if your OCR accuracy is low.

To configure advanced settings

  1. Add an OmniPage Zone OCR process to your session.
  2. During the Region Selection configuration process, select a region you have added and click Advanced options.
  3. In the Advanced Options dialog box, configure the following settings. Note that each setting is organized into a category. Expand a category to view its settings.

    Setup

    Character preference: Specify the type of text that will most often appear in the region: letters or numbers, or None for no preference. When the OCR engine encounters a character that could be identified as either a letter or a number (e.g., 1 or I), it will default to the type you select. You can also select Custom to specify the preference per character rather than per zone. If you select Custom, a Custom Character Preference line appears in the Advanced section. See below for configuration information.

    Single line: If the region contains multiple lines of text, specify which lines should be returned. If you select True, only the first line will be returned. If you select False, all text will be returned.

    Create multi-value token: This option can only be set to True if Single line is set to False. If this option is set to True, when the token generated by this process is placed in a multi-value field, each line of text will appear as a separate value in the field. If it is set to False, all information in the token will still appear as a single value.

    Use existing text: This option specifies whether the zone should use existing text or OCR text from the image. If True is selected, it will check and see if there is any text associated with the image (from OCR, Laserfiche Capture Engine retrieving text, or PDF generating text). If there is no text, it will be OCRed and returned. If False is selected the text within the zone bounds will be OCRed and used.

    Confidence

    Confidence: Specify how confident the OCR engine should be when determining whether or not the OCRed value is correct. Choose a number between 1 and 100, where 100 is the most confident. Alternately, enter 0 to disable this feature.

    Note: This setting applies to the entire region, not to each individual OCRed character.

    Tip: If you enable this feature, the recommended value is 65.

    Confidence method: Specify which statistical value the confidence level (set above) should apply to: the region's minimum value, median value, average, or average excluding outliers (an outlier is a figure that is significantly distant from the rest of the data). The OCR engine calculates these figures by taking into account each individual OCRed character in the region.

    Example: Jim sets his confidence level to 90 and his confidence method to minimum. In effect, Jim is specifying that the OCR engine should only be confident that it has returned an accurate value if the region's least accurate value (its minimum) is at least 90 out of 100. Alternatively, Mike sets his confidence level to 55 and his confidence method to average. In effect, Mike is specifying that the OCR engine should only be confident that it has returned an accurate value if the region's average confidence level is at least 55 out of 100.

    Clear data: Specify what value should be returned if the OCR engine does not meet the region's confidence level. If you select True, a blank value will be returned. If you select False, the OCRed value will be returned, despite the fact that it has failed to meet the confidence level.

    Mark field: Specify whether Quick Fields should visually distinguish fields populated with an OCRed value that did not meet the confidence level. This setting applies regardless of whether the Clear data option is set to True or False.

    Advanced

    Module: Specify a recognition module for the OCR engine to use. This setting should only be changed from Default by advanced users familiar with the alternatives (Dot matrix, 2-way vote, 3-way vote and Asian).

    Custom Character Preference: If you know the specific pattern for the information to be read from the zone, you can specify a character preference for each individual character in the string. The types of characters are represented by the following symbols:

    • @ - Letters
    • # - Numbers
    • ! - Punctuation marks.
    • . - No preference

    Example: Mark is reading dates from a particular zone. The dates are always in a format similar to 08/01/2009, but sometimes the number one is read as a slash and vice versa. He sets the character preference to ##!##!##### to set the preference for each character in the string for letters or numbers, depending on its position.

    • Click Restore original defaults to revert the settings to their default state. To set the newly defined settings as default, select Set current as default.