First Page Identification
First Page Identification uses criteria you specify to determine if a page from your scan source is the first page of the document class you have defined. Once you create a document class, the First Page Identification stage will appear underneath it. You can add First Page Identification processes to determine how documents will be assigned to the document class.
Note: If you do not specify any identification conditions in a document class, all documents and pages that have not already been assigned will be assigned to that class.
- How Documents are Identified
- How First Page Identification Works
- First Page Identification Processes
- First Page Identification and Multiple Document Classes
- First Page Identification and Efficiency
- Configuring a First Page Identification Process
- First Page Identification Conditions
How First Page Identification Works
When pages scanned into Quick Fields reach the First Page Identification stage, Quick Fields compares each page to the identification conditions defined within the processes there to see whether that page should become the first page of a new document in that document class. A page is defined as image or text. According to the default Quick Fields settings, a page that matches the identification conditions will be assigned to the document class. Each following page that does not match an identification condition will be appended to the first page as additional pages of the document. When another page does meet the identification conditions, that page will become the first page in a new document.
Example: The Human Resources Department is processing job applications. Each application packet consists of a standardized application form followed by supplementary materials such as resumes, letters of recommendation, transcripts, writing samples, and so on. They configure a Quick Fields session with a document class named "Job Applications." In the First Page Identification stage under that document class, they configure a Form Identification process to recognize their standardized job application form. When they place the documents into the scanner, they organize them so the standardized form precedes the supplemental materials for each applicant. When scanned, the applications will be grouped into packets with the standardized form as the first page of each packet.
Other Options and First Page Identification
When you configure options in other areas, they can affect the First Page Identification stage.
Note: Using multiple document classes also involves special considerations.
First Page Identification Processes
First Page Identification processes matches pages to specified criteria—identification conditions—for each document class to determine whether a document belongs to that class. Some processes can be used only for the purpose of identifying documents, although some of these can be configured in the Pre-Classification Processing stage as well as the First Page Identification stage. Other processes, such as Barcode or OmniPage Zone OCR, can be used to identify a document as well as to extract information from it in order to populate fields and other document metadata during other stages of processing. When configuring these processes in the First Page Identification stage, you will have the additional step of defining the identification condition. You can also assign image enhancements or other processes within First Page Identification that do not contain identification conditions but help prepare the document for the identification process: e.g., using Despeckle to get better results when using OmniPage Zone OCR to read a word from an image.
Note: The processes that appear in the Tasks Pane are determined by the stage selected in the Session Configuration Pane; only processes that can be used in that stage will be shown.
Note: If no identification condition is specified for a document class, all documents and pages that have not already been assigned will be assigned to that document class.
Note: Processes and image enhancements used within the First Page Identification stage will apply only within First Page Identification. To store data from an image or permanently modify the image, use the process or enhancement in a different stage of processing
First Page Identification and Multiple Document Classes
If you have the Document Classification add-on installed, you will be able to configure multiple document classes within a single Quick Fields session. With multiple document classes, the order of the classes and the First Page Identification criteria within each can interact to produce very different results. First Page Identification works by comparing each scanned page to the identification conditions for each document class. With multiple document classes, Quick Fields will proceed as follows:
- Compare each scanned page to First Page Identification criteria for the first document class.
- If the page matches the criteria for the first class, it will be assigned to that class and will not be compared to the criteria for any other classes.
- If the page is not assigned to the first class, it will be compared to the First Page Identification criteria for the next class. It will either be assigned to that class or proceed to the next class.
- If the page does not meet the First Page Identification criteria for any of the document classes, with the default Quick Fields settings it will be appended to the document that was created the last time a page did meet a First Page Identification condition.
Note: The above statements assume neither Last Page Identification conditions nor Limit document to __ pages are configured. If these options are configured, see How Documents are Identified to learn how these options interact.
Therefore, a document that fits the identification criteria for more than one document class will only be assigned to the first class whose identification criteria it meets. For this reason, the order of document classes in the Session Configuration Pane is very important.
To change the order of document classes
- Select the Classification node in the Session Configuration Pane.
- In the Tasks Pane, under Reorder Document Classes, select a document class.
- Use the arrows to move the document class up and down in the order.
Note: You can also use Pre-Classification Processing and Token Identification to improve the efficiency of First Page Identification. This is particularly helpful with multiple document classes.
First Page Identification, Accuracy, and Efficiency
To improve the accuracy or efficiency of sessions, you can take advantage of the Pre-Classification Processing stage and the Token Identification process to run processes on each page only once even if you have multiple identification conditions or document classes. Token Identification is specifically designed to handle multiple conditions. For example, consider the following scenarios:
Multiple Identification Conditions
To efficiently use multiple identification conditions to assign documents to a particular class, you can configure processes that extract information from the documents in Pre-Classification Processing and configure the Token Identification process in First Page Identification to assign only documents that meet all the specified conditions to the document class.
Example: The Wonderland Police Department is scanning tickets, and wants to create a document class for citations for parking at an expired meter. All parking violations are printed on 297x420 millimeter cards, and they are the only documents that use paper this size. They configure a Page Size Identification process in Pre-Classification Processing to determine whether documents are on this size paper. They also configure an Optical Mark Recognition process in Pre-Classification Processing to determine which box indicating the type of violation is checked. They create a document class called "Expired Meter Parking Violations." In the First Page Identification stage for that document class, they configure a Token Identification process to assign the document to that class if both conditions are true: it is on 297x420 millimeter paper and the "Expired Meter" box is checked.
Multiple Document Classes
Pre-Classification Processing and Token Identification can greatly improve the efficiency or accuracy of sessions that involve multiple document classes. Rather than run all the identification processes for each class in turn, you can configure processes in Pre-Classification Processing and use Token Identification in each document class to sort the pages depending on the values stored in the tokens, which will improve the efficiency. Also, with Token Identification, you can create more accurate identifications.
Example: The City of Neverland Treasurer's Office is scanning invoices. They have affixed barcodes containing budget codes to each document, and will use the codes to sort them into 15 different document classes. They configure a Barcode process in Pre-Classification Processing to read each barcode once, and Token Identification in each of the 15 document classes to identify a document as belonging to the class if the token from the Barcode process contains the appropriate value. This will be much faster than if each document class contained its own barcode process, and each barcode had to be read multiple times until the identification succeeded.
Example: Nassau University is scanning undergraduate and graduate admission applications in a single session. The two types of applications use forms that are very similar, but not exactly identical. They configure Form Identification in Pre-Classification Processing with two master forms, one for each type of application. Each scanned application will be compared to both forms and assigned a token based on which it more closely resembles. When Token Identification is used in the document classes, it should be more accurate, since each application has been compared to both forms, whereas otherwise many forms might be mis-assigned to the first document class since they are so similar.
Configuring a First Page Identification Process
Each process that can be used for First Page Identification has its own set of properties. In most cases when configuring a First Page Identification process, you will need to determine which properties a document must match to be identified — these are the identification conditions. When a page meets these conditions, it will be assigned as the first page of a new document in that document class.
Note: If the identification process depends on the results of some other process, you will need to configure that process first.
Example: Text Identification examines the text associated with a page. If text is not already associated with the pages you will scan, you will need to configure OCR or Text Extraction in Pre-Classification Processing to generate the text.
Example: Token Identification examines the tokens associated with a page. Configure processes that generate tokens in Pre-Classification Processing to use their values in Token Identification.
To configure a process in First Page Identification
- In the Session Configuration Pane, select the First Page Identification stage under the name of the document class the documents will be assigned to.
- In the Tasks Pane, select the name of the process.
- Configure the properties of the process. For more information, use the wizard and the help files for that process.
- When you encounter the Identification Condition property, follow the steps described in the Identification Conditions section of the help files.
- Optional: To test the configuration, select Test Processes or Test Current Process. For the best results, add a custom sample page before testing. Adjust and test until you are satisfied with the results.
Note: First Page Identification processes will all be configured using the sample page configured as the first sample page, not the default sample page. If there is no first sample page configured, the default sample page will be displayed.
Identification Conditions
In the Identification Condition dialog box, you can define the conditions that must be met for a First Page Identification, Last Page Identification, and Conditional process to succeed. The identification consists of a logic statement or series of statements that must describe a page for it to be identified as the first or last page of that document class.
Note: To use conditions from more than one process, use Token Identification as described in First Page Identification and Efficiency.
To create or modify an identification condition
Do some of the following until you have created a valid statement that suits your purpose:
- In the sentence Identify the page if any of these conditions are true, you can select the word any to change it to all, or the word true to change it to false.
- Next to the number 1, if you select the first phrase, you can choose a token available in that process, such as the token representing the information in a barcode or an OmniPage Zone OCR zone.
- By selecting the second phrase, you can choose from a menu of phrases relevant to that process, such as "equals, does not equal, is greater than, starts with," and so on.
- Depending on which process you are configuring, the last part of the statement may consist of a selection of values such as "equals, does not equal, is greater than, starts with," etc., or a blank where you can specify a value or add a token.
The examples below display zones in parentheses that have been renamed to reflect the information being extracted. Renaming zones can help you keep track of what information is being extracted from which zone. For example, renaming "Zone 1" to "Social Security Number" clearly indicates that the social security number is being extracted from that zone.
Example using Optical Mark Recognition
Quinn & Harte Legal Associates wants to process contracts according to whether or not they have been signed. They configure an OMR process to determine whether there is anything inside the signature box on the contract forms, and assign the documents to document classes accordingly. Identify the page if any of these conditions are true:
-
(Signature Box) is marked.
-
Select Add condition to add another line to the identification condition.
Example using OmniPage Zone OCR
Caleb and Associates processes multi-page invoices and purchase orders daily. The pages of each invoice have the word "INVOICE" in the top left of every page and each page is numbered. Because "INVOICE" is at the top of every page of an invoice, it alone, cannot identify the first page of an invoice. It needs to be combined with another condition that looks for a page "1" on the bottom left of the page. If both conditions are met (has "INVOICE" at the top and has "1" at the bottom), it is the first page of a new invoice .Identify the page if all these conditions are true:
-
(Page Number) equals 1.
-
(Invoice) equals to Invoice.
-
Select Add group to add a subgroup of conditions. To create additional conditions or groups within the subgroup, right-click and select Add child condition or Add child group.
Example using OmniPage Zone OCR
Gray University developed a new application form for new applicants. Their old application forms are still in production, so they now have two forms. Both form types have the word "Application" at the top, but the page numbers have moved from the bottom left of the page to the bottom right of the page. Therefore, in order for a document to be considered the first page of an application, it must have the word "Application" at the top and a number "1" either in the bottom left or bottom right corner. Identify the page if all these conditions are true:
-
(Application) equals Application.
-
If any of these conditions are true:
(Page Number Left) equals 1
(Page Number Right) equals 1
You can also view and modify the format of the tokens used.
- Select Show token format to display the format of the tokens instead of their names. Select Edit formatto open the Token Editor and edit the token format.
- Select Show token type to display the type of token it is. To change the token type, click Change type.