Text Identification

Text Identification sorts documents into classes based on whether page text matches a specified pattern or patterns. It can be used in the First Page and Last Page Identification stages.

Note: To generate text for use in Text Identification, configure OmniPage OCR or Text Extraction in Pre-Classification Processing.

To configure Text Identification

  1. In the Session Configuration Pane, select the First Page Identification stage.
  2. In the Tasks Pane, select Text Identification.
  3. You can optionally enter a name for the process under Process Name.
  4. Move through each step of the wizard at the bottom of the pane. You can also click Skip Wizard to display and configure the properties all at once.
  5. Click Add Pattern. The New Text Pattern dialog box will appear.
  6. Specify a name for the pattern.
  7. Specify a pattern to match. For a list of common expressions, click the pattern button. For more information, see the Regular Expression Reference.
  8. To make the pattern match case sensitive, select Match Case. To make it case insensitive, clear it.
  9. To test the pattern, click the plus sign next to Test the pattern to expand the Test Value box. Specify a value that you expect to fit the pattern and the expected value will be displayed under Result value as you type.
  10. Click OK.
  11. Optional: To add another pattern, click Add Pattern again. Text will need to match all patterns in the process for the document to be identified as belonging to the document class.
  12. Optional: To preview how this enhancement will affect scanned images and OCRed or extracted text, Optional: To preview how this enhancement will affect scanned images and OCRed or extracted text, test processes. For the best results, add a custom sample page before testing. Adjust and test until you are satisfied with the results.

Note: Some processes come with the basic Quick Fields installation, and some must be purchased as add-ons. Contact your reseller for more information.