Substitution

Substitution allows you to find and replace words in tokens or page text using regular expressions as in Pattern Matching. It is useful for correcting errors or changes made during OCR. Substitution can be used in Pre-Classification Processing, First Page Identification, Page Processing, Last Page Identification, or Post-Processing.

Example: Pennybags Financial Advisors wants to be able to perform text searches on their old financial statements in Laserfiche. It is particularly important to them to be able to search for the term "interest accrued," but because of the font used and the poor photocopy quality of the old statements, this phrase is frequently OCRed incorrectly, with the wrong letters used and extra spaces sometimes inserted. They configure a text substitution process that looks for the phrase with any character substituted for the frequently misread letters, or with extra spaces, and replaces it with the correct phrase.

To use Substitution

  1. In the Session Configuration Pane, select the stage of processing where you want to use Substitution.
  2. In the Tasks Pane, select Substitution.
  3. You can optionally enter a name for the process under Process Name.
  4. Move through each step of the wizard at the bottom of the pane. You can also click Skip Wizard to display and configure the properties all at once.
  5. Page Range: When configuring a process in Page Processing or Post-Processing, you will be prompted to specify a page range. In other stages, default settings will be automatically applied.
  6. Substitution Patterns: To configure a pattern, click Add Substitution. The New Substitution Pattern dialog box will open.
  7. Specify the name for the substitution. The best practice is to choose a name that will help you remember the function of the process when reviewing the session later.
  8. Choose whether to look for the pattern in an existing token value or the page text. If using a token value, you can use the token button (right arrow) to select tokens generated from the system or other processes in Quick Fields. If using page text, you can choose whether to run the pattern match on the entire page or specify a custom line range.
  9. Specify a pattern to match. For a list of common expressions, click the pattern button. For more information, see the Regular Expression Reference.
  10. Specify the data that will replace the information that matches the pattern. If you have used a match group in the pattern, you can click the button to select it.
  11. To make the pattern match case sensitive, select Match Case. To make it case insensitive, clear it.
  12. To test the pattern, click the plus sign next to Test the pattern to expand the Test Value box. Specify a value that you expect to fit the pattern and the expected value will be displayed under Result value as you type.
  13. Under Options, select Clear token if no match is found if you want to clear the token when a match is not found. If Replace the input token's value with the result is selected, the token will be modified to match the input value.

  14. Optional: To preview how this enhancement will affect scanned images and OCRed or extracted text, test processes. For the best results, add a custom sample page before testing. Adjust and test until you are satisfied with the results.

Note: Some processes come with the basic Quick Fields installation, and some must be purchased as add-ons. Contact your reseller for more information.

Match Groups

The Substitution process has a feature called Match Groups. These are groups defined in a pattern match that can be modified to reformat data, e.g., Match Groups can be used to change a “Student Name” token that is formatted as First Name Last Name to Last Name, First Name. This helps organizations standardize their information when populating fields, naming entries, etc.

Example: Central Florida University uses a Lookup process to retrieve each student’s five-digit student ID number from an external database. The ID numbers stored in the database are not formatted according to University standards. Each ID number should consist of two numbers, a dash, and three more numbers (11-111). The database stores them as five digits with no dash (11111). The Lookup process retrieves the incorrectly formatted student ID and saves the value in a “Student ID” token, which is then used to name each student record. To format the retrieved student ID’s correctly, they use pattern matching to separate the five-digit number into two groups using parentheses (\p\p)(\p\p\p). Each set of parentheses is a match group and is named after their position and denoted as ${position}.

  • (\p\p) = ${1} since it is listed first and (\p\p\p)=${2} since it is listed second

To add a dash between the groups, add the first match group ${1}, then a dash, then the second match group ${2}.

Closed${1}-${2}

Select Replace the input token’s value with the result to modify the token to match the input value. For example, if this option is selected for the “Student ID” above, the value 11111 retrieved will be replaced with 11-111.

Note: You can enter the match group syntax manually or click ClosedInsert Group and select them from the list. The groups will be labeled as Match Group 1, Match Group 2, etc.