OCR, Text Extraction, and Page Generation
These attributes influence how pages and text are generated from existing documents.
For settings that are specifically related to automatic text or page generation from new documents during import, see the section on importing. The attributes [Settings]MImportFlags and [Settings]MImportBreak may be of particular interest.
OCR: Image Cleanup
[LfImageCleanup]Deskew
If TRUE, straightens images that have been scanned at an angle.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]Despeckle
If TRUE, removes unwanted marks (e.g., dots or stray pixels) from scanned documents.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]SpeckleSize
Specifies the maximum size of speckles that can be removed. Size is specified as both width and height, in pixels. For example, setting this option to 2 and setting [LfImageCleanup]Despeckle to TRUE will remove all noise that is equal to or smaller than a 2 pixel by 2 pixel square.
Valid values:
- Integers
Default value: 2
[LfImageCleanup]Rotate
If TRUE, rotates documents before carrying out OCR according to the setting in [LfImageCleanup]RotateType.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]RotateType
Rotates documents by the specified number of degrees clockwise.
Valid values:
- Integers. 0 = Auto-rotate. 90, 180, 270 = rotate by that number of degrees clockwise.
Default value: 0
[LfImageCleanup]HorizLineRemoval
If TRUE, removes horizontal lines from scanned documents.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]VertLineRemoval
If TRUE, removes vertical lines from scanned documents.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]LineRemovalCharProtection
If TRUE, protects characters that intersect lines from being damaged during line removal.
Valid values:
- TRUE/FALSE
Default value: FALSE
Other OCR Options
[Settings]OCRConflict
Determines how text extraction for electronic documents that contain image pages will be handled.
Valid values:
- Integers. 0 = When the text extraction process encounters electronic documents with existing image pages, text will be extracted from the electronic documents and the image pages will be deleted. 1 = When the text extraction process encounters electronic documents with existing image pages, the existing image pages will be OCRed. 2 = When the text extraction process encounters electronic documents with existing image pages, the documents will be skipped without additional processing.
Default value: 2
[OCR]ForceSingleColumn
Valid values:
- TRUE/FALSE
Default value: FALSE
[OCR]Language
Specifies the language used to recognize characters. In the web client, either this or [OCR]LanguageIETF must be specified if not using the default language (English).
Valid values:
- String
Default value: English
[OCR]LanguageIETF
Specifies the IETF language tag for the OCR language. In the web client, either this or [OCR]Language must be specified if the user is not using the default language (English).
Valid values:
- String
Default value: en
[OCR]Optimization
Determines how far the OCR process optimizes speed or accuracy.
Valid values:
- Integers. 0 = Optimize speed, 2 = Optimize accuracy, 1 = Balance both speed and accuracy.
Default value: 1
[OCR]UseFormFix
Valid values:
- TRUE/FALSE
Default value: FALSE
[Options]OCREngineProgId
This is the same setting as OCR Engine in Options → Generate Text → General.
Valid values:
- Integers. Set to 6 to hide dialog box. Delete attribute to show dialog box.
PDF-Specific Attributes
[Settings]GeneratePagesMonochrome
Determines whether to create monochrome image pages when generating pages for PDFs that already exist in the repository.
Valid values:
- TRUE/FALSE
Default value: FALSE
[Settings]IncludePDFAnnotsExistingDocs
For the web client only. When generating pages for existing PDFs in the repository, this determines whether to include PDF annotations on the Laserfiche pages. This attribute does not apply if you are using Snapshot to generate pages.
Valid values:
- TRUE/FALSE
Default value: TRUE
[WebAccess]PdfTextExtractMethod
Determines whether text is extracted from a PDF using Laserfiche's method, or using the PDF IFilter program on the user's computer. This attribute applies to the web client only.
Valid values:
- String. "native" = Use Laserfiche's method, "ifilter" = Use IFilter.
Default value: native
[HiddenDialogs]ConfirmPageGeneration
If TRUE, the user is warned when they generate pages for a PDF file that already has some Laserfiche image pages. They can then choose to continue or abort the operation.
This attribute applies to the web client only.
Valid values:
- Integers. Set to 6 to disable dialog. Delete attribute to enable dialog.
Others
[OfficePlugin]TextExtraction
Valid values:
- TRUE/FALSE
Default value: TRUE
[Settings]EnableEDocTextExtract
Valid values:
- TRUE/FALSE
Default value: TRUE