OCR, Text Extraction, and Page Generation

These attributes influence how pages and text are generated from existing documents.

For settings that are specifically related to automatic text or page generation from new documents during import, see the section on importing. The attributes [Settings]MImportFlags and [Settings]MImportBreak may be of particular interest.

OCR: Image Cleanup

[LfImageCleanup]Deskew

If TRUE, straightens images that have been scanned at an angle.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]Despeckle

If TRUE, removes unwanted marks (e.g., dots or stray pixels) from scanned documents.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]SpeckleSize

Specifies the maximum size of speckles that can be removed. Size is specified as both width and height, in pixels. For example, setting this option to 2 and setting [LfImageCleanup]Despeckle to TRUE will remove all noise that is equal to or smaller than a 2 pixel by 2 pixel square.

Valid values:

  • Integers

Default value: 2

[LfImageCleanup]Rotate

If TRUE, rotates documents before carrying out OCR according to the setting in [LfImageCleanup]RotateType.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]RotateType

Rotates documents by the specified number of degrees clockwise.

Valid values:

  • Integers. 0 = Auto-rotate. 90, 180, 270 = rotate by that number of degrees clockwise.

Default value: 0

[LfImageCleanup]HorizLineRemoval

If TRUE, removes horizontal lines from scanned documents.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]VertLineRemoval

If TRUE, removes vertical lines from scanned documents.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]LineRemovalCharProtection

If TRUE, protects characters that intersect lines from being damaged during line removal.

Valid values:

  • TRUE/FALSE

Default value: FALSE

Other OCR Options

[Settings]OCRConflict

Determines how text extraction for electronic documents that contain image pages will be handled.

Valid values:

  • Integers. 0 = When the text extraction process encounters electronic documents with existing image pages, text will be extracted from the electronic documents and the image pages will be deleted. 1 = When the text extraction process encounters electronic documents with existing image pages, the existing image pages will be OCRed. 2 = When the text extraction process encounters electronic documents with existing image pages, the documents will be skipped without additional processing.

Default value: 2

[OCR]ForceSingleColumn

Valid values:

  • TRUE/FALSE

Default value: FALSE

[OCR]Language

Specifies the language used to recognize characters. In the web client, either this or [OCR]LanguageIETF must be specified if not using the default language (English).

Valid values:

  • String

Default value: English

[OCR]LanguageIETF

Specifies the IETF language tag for the OCR language. In the web client, either this or [OCR]Language must be specified if the user is not using the default language (English).

Valid values:

  • String

Default value: en

[OCR]Optimization

Determines how far the OCR process optimizes speed or accuracy.

Valid values:

  • Integers. 0 = Optimize speed, 2 = Optimize accuracy, 1 = Balance both speed and accuracy.

Default value: 1

[OCR]UseFormFix

Valid values:

  • TRUE/FALSE

Default value: FALSE

[Options]OCREngineProgId

This is the same setting as OCR Engine in Options → Generate Text → General.

Valid values:

  • Integers. Set to 6 to hide dialog box. Delete attribute to show dialog box.

PDF-Specific Attributes

[Settings]GeneratePagesMonochrome

Determines whether to create monochrome image pages when generating pages for PDFs that already exist in the repository.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[Settings]IncludePDFAnnotsExistingDocs

For the web client only. When generating pages for existing PDFs in the repository, this determines whether to include PDF annotations on the Laserfiche pages. This attribute does not apply if you are using Snapshot to generate pages.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[WebAccess]PdfTextExtractMethod

Determines whether text is extracted from a PDF using Laserfiche's method, or using the PDF IFilter program on the user's computer. This attribute applies to the web client only.

Valid values:

  • String. "native" = Use Laserfiche's method, "ifilter" = Use IFilter.

Default value: native

[HiddenDialogs]ConfirmPageGeneration

If TRUE, the user is warned when they generate pages for a PDF file that already has some Laserfiche image pages. They can then choose to continue or abort the operation.

This attribute applies to the web client only.

Valid values:

  • Integers. Set to 6 to disable dialog. Delete attribute to enable dialog.

Others

[OfficePlugin]TextExtraction

Valid values:

  • TRUE/FALSE

Default value: TRUE

[Settings]EnableEDocTextExtract

Valid values:

  • TRUE/FALSE

Default value: TRUE