Automating Import from a Windows Folder with Import Agent
Import Agent is a tool for automatically retrieving files stored in a Windows folder and importing them into a Laserfiche repository. The Windows folder can be local to the Import Agent machine or stored on a network drive. During the import process, Import Agent can process the files (e.g., perform OCR), add metadata (e.g., populate fields, add tags), and perform additional tasks.
The Windows folder that files are retrieved from is referred to as the monitored directory. If a file is found in this folder, it will be imported into a repository. After a file is imported to Laserfiche, the source file can either be deleted or moved to another Windows folder. If a file cannot be imported into Laserfiche, it will be moved to an alternate Windows folder.
To automate import, you first need to create an Import Agent profile. Profiles define the Windows folder from which files will be retrieved, the import schedule, what will happen to source files after an import is attempted, the metadata that should be assigned to imported files, and other settings.
Creating a Profile to Automatically Import Files
- Go to the Windows Start menu.
- Select Laserfiche Import Agent Config to open the Import Agent Configuration Utility.
- Select Profile from the menu and select New. If you've already create a profile, you can choose to either create a new profile or copy an existing profile. Copying a profile is helpful if you need a profile similar to an existing profile and do not want to configure the entire profile from scratch.
- If you chose to create a new profile, the Sign in dialog box will prompt you to provide the credentials Import Agent will use to access Laserfiche.
- If the Laserfiche folder where files will be imported does not exist, the user must have rights to create it.
- If you have configured Import Agent to insert or replace content when a duplicate document is found, the user must have rights to do so.
- The user must have rights to create documents in the destination folder.
- The user must have rights to the specified volume.
- If you have configured Import Agent to assign metadata (e.g., templates, fields, tags), the account must have rights to do so.
- Sign into a repository: Specify a Laserfiche Server, repository, and authentication information for the profile. Optionally, select Use SSL Connection to connect to the Laserfiche Server securely.
- Sign into Laserfiche Cloud: If you are using Laserfiche Cloud, select the Cloud tab and provide a user account ID, username, and password. We recommend using a Service Principal user, which is a secure, unlicensed user that Laserfiche provides for connecting applications.
- Click OK.
- Configure each of the following tabs:
- Under Profile name, enter a unique profile name that describes what the profile does.
- Under Monitored folder, specify the full path to a folder on the current machine that you want to import documents from. Alternatively, use a UNC path to specify a folder on a network drive (e.g., \\ComputerName\ShareName\Path).
- Optional: Select Retrieve files from subfolders to import files from subfolders in the monitored folder.
- Under Filter, select one of the following options:
- All files. With the exception of hidden and locked files, all items in the Windows folder will be retrieved.
- Only those files whose names match one of these filters. You may limit the files that will be retrieved by specifying a filter for the file name and/or extension. The filter syntax is FileName.Extension. Click Add to create a filter.
- Under Laserfiche files, you can configure settings for Laserfiche files that are specific to the profile you are configuring. These settings override the global list file settings and Laserfiche briefcase settings configured in the File Type tab of Import Agent's options. You can configure the profile to
- Recognize files with LST extensions as list files
- Recognize files with XML and LSTX extensions as list files
- Recognize files with LFB extensions as list files
- Optional: Under Sign-in account, click Change to modify the user name and password that Import Agent will use to access the repository. We recommend using a Service Principal user, which is a secure, unlicensed user that Laserfiche provides for connecting applications.
- In the Document properties section, under Name, specify a naming convention for new documents.
Tip: Use tokens to assign each document a unique name.
- Under Folder, specify the location inside the repository where new documents will be stored.
- Optional: Select Use the following folder if the above folder cannot be found. This option enables you to define an alternate destination folder in the event the primary folder has been deleted or moved.
Note: To define all of the above settings, enter regular text, click the token button to use tokens, or use a combination of the two.
- Under Volume, select the Laserfiche volume where new documents will be created. Learn more about volumes.
- Optional: Set the value of the Profile Count token.
- Assign fields using a template. Select a template and the fields associated with the selected template will appear.
- Assign fields independent of a template. Click Add/remove fields and select fields.
- Use a combination of the above.
- Select one or more tags. You can add informational or security tags.
- Optional: Add tag comments by double-clicking in the Comments column next to a tag.
- Choose one the following options.
- Continuously: Import Agent will retrieve files continuously.
- Only at these times: Import Agent will only import files during a specified time interval. Click Add to define the interval. You can create multiple schedules.
- Disable this profile: The profile will be inactive until this option is cleared.
- All settings are optional.
- You can configure OCR settings, such as language preference and image rotation. Learn more.
- Under Generate text, choose to retrieve text and use and configure OCR if desired.
- Retrieve text: Automatically retrieve any available text from PDF or electronic documents.
- Use OCR if no text is available: OCR files that do not contain a text stream.
- Configure profile-specific OCR settings: Set the OCR settings for the current profile.
- Language: Select a language.
- Decolumnize text: Select to convert multiple columns of generated text into a single column. Clearing the check box will preserve column formatting in the OCRed text, even if that separates words and sentences.
- Perform image enhancement: Click Configure to clean up the image to aid the OCR process. After the OCR process is performed, the image will return to its original state.
- Deskew image: Straighten crooked images.
- Despeckle image: Remove undesired noise from an image. Specify a maximum size of the noise to remove. Size is specified as both width and height. For example, setting this option to 2 will remove all noise that is equal to or smaller than a 2 pixel x 2 pixel square.
- Rotate image: Automatically or manually rotate an image.
- Line removal: Select to remove horizontal or vertical lines from the image.
- Optimization priority: Specify an optimization style. There is generally a trade-off between speed and accuracy.
- Speed: Reduces the amount of time it takes to OCR. Generated text may be less accurate. Choose this option if you are more concerned about the speed of your OCRing process than about having a few errors in the generated text.
- Standard: Neither optimum speed nor optimum accuracy, but a balanced between the two. Choose this option if you want the generated text to be fairly accurate, but you prefer the OCRing process not take the maximum amount of time to run.
- Accuracy: Increases OCR quality. Processing time will also be increased. Choose this option if you must have the most accurate text possible and are not concerned about how long it takes to run the OCRing process.
- Under Text file processing, specify if text files will be split into separate pages. If you select this option, specify the number of lines of text on each page.
- Under PDF file processing, choose if you want to generate Laserfiche image pages when importing PDF files.
- Generate Lasrfiche pages: Enable Import Agent to generate Laserfiche TIFF image pages when importing a PDF file. If you know the documents you are importing are color, black and white, or high-quality color, select the corresponding option from this drop-down menu. If you are importing black and white documents, selecting that option can reduce file size and save space. The legacy option uses the default PDF storage process.
- Keep original PDF files: Retain the PDF file after generating TIFF image pages. The import process will import the original unaltered PDF along with the TIFF image pages.
- Preserve PDF annotations on Laserfiche pages: Convert PDF annotations to the equivalent Laserfiche annotations when generating TIFF image pages.
- Ignore PDF native text: By default, Import Agent uses native text extraction for PDFs. Select this option to use OCRed text instead of native text.
- Create pages at DPI: By default, Import Agent generates TIFF images that are 300x300 DPI. Select this checkbox to configure a different DPI value.
- Delete the source file.
- Move the source file to a Windows folder.
- Under Imported files, choose one of the following options:
- Deleted Source files will be deleted.
- Moved to Source files will be moved to another Windows folder. Specify the full path to a folder on the current machine or use a UNC to specify a folder on a network drive (e.g., \\ComputerName\ShareName\Path). You can enter regular text, click the Token button to use tokens, or use a combination of the two.
- [Optional] Select to delete unmodified files after a specific time frame. This setting deletes successfully imported files based on their last modified dates in Windows, not their import dates.
- Under Files that failed to import, specify the full path to a folder on the current machine or use a UNC to specify a folder on a network drive. Files that Import Agent fails to import will be moved to this location.
- Under Monitored subfolders, specify if you want to delete subfolders that are empty after their contents have been imported.
- Under List file import, select Post process the files referenced in list files (LST/LSTX) after import to apply the settings in this tab to the files referenced in list files. If you clear this option, then files referenced in list files will not be deleted or moved.
- Under Image and text files, specify what will occur if an imported image or text file has the same name as an existing document in the destination folder:
- Create a new document: A new Laserfiche image document will be created.
- Update the existing document by: The new document will be merged with the existing document. Specify whether the new document will:
- Be appended to the existing document.
- Be prepended to the existing document.
- Replace the existing document. Only text and images will be replaced, not metadata or security.
- Under Electronic files, specify what will occur if an imported electronic file has the same name as an existing document in the destination folder:
- Create a new document: A new electronic document will be created.
- Replace the electronic file: The imported document will replace the existing document.
- Under Metadata, select one of the following options:
- Keep the existing document's metadata: Metadata associated with the existing document will not be modified.
- Replace the existing document's metadata: Metadata associated with the imported document will replace the metadata associated with the existing document.
- Merge the metadata. If the metadata conflicts ... Metadata associated with the imported document will be merged with the metadata associated with the existing document. Select which metadata will take precedence in the event of a conflict (e.g., if the same field contains different values).
- Under Version control, specify if you want to put the document you are importing into under version control. Selecting this option retains the original document as the first version. The updated document (with the new, imported content) will be saved as the second version.
Note: You must have sufficient Windows administrative privileges to open this utility.
Important: The account you provide must have sufficient Laserfiche rights to perform all of the tasks specified in the profile.
Learn more about granting rights to users.
Note: If you select Windows authentication, the Windows account assigned to the Laserfiche Import Agent service will be used to sign into the repository.
General
Use the General tab to define the profile's name, the Window's folder from which files will be retrieved, and a file retrieval filter.
To configure the General tab
Note: Monitored folders should only have one profile associated with each folder. Running two profiles on the same folder may cause files to be processed twice or not at all.
Example: If you specify that files must match *.tif then only TIFF images will be retrieved. If you specify that files must match AAA*.tif then only TIFF image files whose names start with AAA will be retrieved. The asterisk (*) used in these examples is a wild card character that represents one or more characters.
Wildcard Character | Description |
---|---|
* | Represents zero or more missing characters. For example, govern*s.txt would find governors.txt, governments.txt, and governs.txt. |
? | Represents any single character. For example, gr?y.doc would find gray.doc and grey.doc but not gravy.doc. |
Tip: Wildcard characters may be combined. For example, br?k*.tif would find brake.tif, braked.tif, broke.tif, broker.tif, and broken.tif.
Note: If the profile's post-processing action is set to delete the source files, read-only files will be ignored. If the post-processing action is set to move the source files, read-only files will be imported. Learn more.
Properties
Use the Properties tab to define the sign-in account, the document properties (e.g., name, destination folders, volume), and the value of the Profile Count token.
To configure the Properties tab
Tip: To use the volume assigned to the parent folder, select Use parent folder's default volume. For example, if you configured a profile to import documents into a Laserfiche folder named Receipts, and this folder had been previously configured in the Laserfiche Client to use a default volume named Accounting, documents imported via this profile would be stored in the Accounting volume.
Fields
Use the Fields tab to assign a template and/or fields to imported documents. Configure the tab using any of the following methods:
Note: Dynamic fields are not supported in Import Agent.
After fields have been assigned, populate them by entering regular text, clicking the token button to use tokens, or using a combination of the two.
Tags
Use the Tags tab to assign tags and tag comments to imported documents.
To configure the Tags tab
Schedule
Use the Schedule tab to define when Import Agent will retrieve and import files. Scheduling a time interval lets you reduce network usage during peak hours.
To configure the Schedule tab
Tip: You can configure how often Import Agent will check for new files. Learn more.
Processing
Use the Processing tab to specify if text will be retrieved from files, how long text pages will be, and how PDFs will be processed. When configuring this tab, keep in mind that:
To configure the Processing tab
Note: If the text file has over 8,388,608 characters, then you must split the file before that character limit is reached to avoid an error.
Post-processing
Use the Post-processing tab to specify what should happen to source files after an import.
Was the file imported successfully into Laserfiche? | Options |
---|---|
Yes | You can choose to either:
|
No | You can specify the Windows folder where these files will be stored. |
To configure the Post-processing tab
Tip: You should regularly check this folder. Once you determine why Import Agent failed to import a file, correct the issue then move the file back to the original Windows folder. The file will be imported the next time the profile runs.
Document Handling
Use the Document Handling tab to specify what should happen if the name of the document being imported matches the name of document already in the destination folder.
To configure the Document Handling tab
Note: Profiles are stored as XML files under Program Data. (E.g., C:\ProgramData\Laserfiche\ImportAgent\Profiles)
Settings that Apply to All Profiles
Import Agent's Options dialog box lets you configure settings that apply to all profiles.
To open the Options dialog box
- Open the Import Agent Configuration Utility.
- Select Profile from the menu, and then select Options.
- Select one of the following tabs.
- General: Configures the Count token, the Laserfiche Distributed Computing Cluster, and OCR settings.
- Under Session Count token, you can configure the Session Count token which is a per user/session token that starts at 1 by default.
- To set a minimum width on the Session Count token, use the following notation: %(Count, #). The variable # should be replaced by the minimum width that the value returned by the Count token must meet. If the width of the returned value is smaller than the specified number, leading spaces will be added to the value. For example, if the Count token has been specified as %(Count, 3) and the value returned is a single digit, two leading spaces would be inserted before the current Count value.
- If you want for the Session Count token to use leading zeros, prepend the current value of the token with the desired number of zeros.
- Under Laserfiche Distributed Computing Cluster, select the Enable checkbox if you want to connect the Import Agent service to a Laserfiche Distributed Computing Cluster (DCC) installation. When enabled, Import Agent will send OCR requests to the specified DCC Scheduler instead of processing OCR on the Import Engine Computer. Enabling a Distributed Computing Cluster can improve the speed at which OCR is performed.
- Scheduler: Specify the name of the computer where the Laserfiche Distributed Computing Cluster Scheduler is installed.
- Port: Specify the port to use when communicating with the Distributed Computing Cluster Scheduler. By default, DCC uses port 8108.
- Under OCR, select or clear the following options.
- Language: Select a language to help optimize the character recognition.
- Decolumnize text: Select this option to convert multiple columns of generated text into a single column. Clearing the checkbox will preserve column formatting in the OCRed text, even if that separates words and sentences.
- Perform image enhancement: Images will be temporarily enhanced prior to OCR processing to optimize the processing.
Click Configure open the Image Clean-up Options dialog box and configure the desired temporary image enhancements.
- Deskew image: Straighten crooked images.
- Despeckle image: Remove undesired noise from an image.
- Specify the maximum size of noise to be removed (in pixels): Size is specified as both width and height. For example, setting this option to 2 will remove all noise that is equal to or smaller than a 2 pixel x 2 pixel square.
- Rotate image: Rotates images to an orientation that is appropriate for OCR processing. After the OCR process is performed, the image will return to its original orientation.
- Automatically: The direction in which text flows on the page will be detected. The image will be rotated so the text flows horizontally (left to right).Deskew image:
- By this amount: The amount by which the image will be rotated. An image can be rotated by 90, 180, or 270 degrees.
- Line removal: Remove lines from an image.
- Horizontal: Removes horizontal lines from the image. Characters that are damaged due to the line removal will be repaired.
- Vertical: Removes vertical lines from the image. Characters that are damaged due to the line removal will be repaired.
- Optimization priority: Choose one of the following:
- Speed: Reduces the amount of time it takes to OCR. Generated text may be less accurate. Best for documents with clear text.
- Balance: Strikes a balance between speed and accuracy. Best for documents with average text.
- Accuracy: Increases OCR quality. Processing time will also be increased. Best for documents with less clear text.
- Under Session Count token, you can configure the Session Count token which is a per user/session token that starts at 1 by default.
- File Types: Configures how images, text, list, and Laserfiche briefcase files will be imported.
- Images: Choose if certain types of image files will be imported as Laserfiche documents or electronic documents. By default, the Laserfiche Client opens electronic documents in their native application (e.g., Word, Excel, Paint, etc.) and Laserfiche documents in the repository document viewer.
Note: If they are set to be imported as Laserfiche documents, then bi-tonal images are imported as TIFF-G4, JPEG images are imported in their native format, and all other image formats are imported as TIFF LZW.
Text: Choose if certain types of text files will be imported as Laserfiche documents or electronic documents. By default, the Laserfiche Client opens electronic documents in their native application (e.g., Word, Excel, Paint, etc.) and Laserfiche documents in the Laserfiche Document Viewer. To add an extension, click Add. Specify a file extension. Click OK. Repeat this step as necessary.
List: Import Agent can handle LST, XML, and LSTX files as lists of files or as imaged or electronic documents. Select or clear the Recognize files with LST extensions as list files option and/or the Recognize files with XML and LSTX extensions as list files option.
- Select these options to have Import Agent follow the import instructions contained in the file. In most cases this will cause Import Agent to import a set of files. If a list file specifies a document's name, field data, and/or an index status, these values will take precedence over the values defined in the corresponding profile.
- Clear these options to have Import Agent import the file as a Laserfiche imaged document or an electronic document. Import Agent will not attempt to read any import instructions that may be contained in the file.
Laserfiche Briefcase: Import Agent can either import a LFB (Laserfiche Briefcase) file as a Laserfiche briefcase or as an imaged or electronic document.
- Select Recognize files with LFB extensions as Laserfiche briefcases to treat the file as a Laserfiche briefcase. Import Agent will attempt to follow the import instructions contained in the file. In most cases, this will cause Import Agent to import a set of files.
- Clear this option to import the file as a Laserfiche imaged document or an electronic document. Laserfiche will not attempt to read any import instructions that may be contained in the file.
Note: For information on the structure of XML import list files, see the schema definition file and sample XML files included in the installation directory (e.g., C:\Program Files\Laserfiche\Import Agent\List File Examples).
Note: A list file will only be imported if Import Agent can retrieve all referenced files. One reason Import Agent may fail to find a file is due to the notation used to specify the file paths. If the desired file is on a network drive, use a relative path or UNC. See other reasons why a list file may fail to import.
Note: All settings and metadata assigned to files via the Laserfiche briefcase will take precedence over the settings and metadata defined in the corresponding profile.
- Advanced: Import Agent receives notifications from your operating system that help it determine when files are available for import. In some situations, Import Agent is unable to import a file after receiving a notification (e.g., the file was in use when the notification was sent). To prevent files from being overlooked, Import Agent periodically checks monitored directories. You can specify how often a check should be performed, as well as how long Import Agent should wait before importing files that have been recently modified. These options are useful in situations where content is being added to a file as it is being created.
- Under File monitoring interval, specify the number of seconds that must elapse between each time that Import Agent will check for new files.
- Under New files, specify the number of seconds that must elapse after a file's been modified before it can be imported into Laserfiche.
- Under Import Thread Count, specify the number of threads Import Agent will run on. This allows you to run Import Agent on machines with lower processing speeds.
- If you are running Import Agent on a powerful machine, you can increase thread count and enable parallel processing in the next step.
- If Import Agent is taking too many resources and affecting the performance of other products, reduce the thread count and clear the parallel processing option in the next step.
- Select Enable parallel processing for each profile, to allow for profiles to run in parallel.
- Click OK.
Example: Your scanner has been configured to create multi-page TIFF images. In this scenario, you should increase the time interval that Import Agent waits so your scanner has enough time to create all of the document's pages.
Deleting or Disabling Profiles
Deleting a profile remove it completely. Disabling a profile makes it in-operational but does not remove it.
- Go to the Windows Start menu.
- Select Laserfiche Import Agent Config to open the Import Agent Configuration Utility.
- Delete: Select a profile, and click the Delete button
- Disable: Double-click a profile. Select the Schedule tab, and select Disable this profile.
Organizing Profiles
Each column in the Configuration Utility's main window displays a different piece of information about your profile. You can configure the columns that are shown and hidden.
To show or hide columns
- Open the Configuration Utility.
- Select View from the menu, and select Choose Columns.
- Each column will be listed by name under either Available Columns or Selected Columns. Available Columns will not appear in the main window of the Import Agent Configuration Utility; Selected Columns will. Configure which columns are shown or hidden by selecting columns and clicking the left or right arrow buttons.
- Use the Move up and Move down buttons to configure the left-to-right display order of the columns. The top most item under Selected Columns will appear as the left-most column.
- Click OK.