OCR, Text Extraction, and Page Generation

These attributes influence how pages and text are generated from existing documents.

For settings that are specifically related to automatic text or page generation from new documents during import, see the section on importing. The attributes[Settings]MImportFlagsand [Settings]MImportBreak may be of particular interest.

OCR: Image Cleanup

[LfImageCleanup]Deskew

If TRUE, straightens images that have been scanned at an angle.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]Despeckle

If TRUE, removes unwanted marks (e.g., dots or stray pixels) from scanned documents.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]SpeckleSize

Specifies the maximum size of speckles that can be removed. Size is specified as both width and height, in pixels. For example, setting this option to 2 and setting [LfImageCleanup]Despeckle to TRUE will remove all noise that is equal to or smaller than a 2 pixel by 2 pixel square.

Valid values:

  • Integers

Default value: 2

[LfImageCleanup]Rotate

If TRUE, rotates documents before carrying out OCR according to the setting in [LfImageCleanup]RotateType.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]RotateType

Rotates documents by the specified number of degrees clockwise.

Valid values:

  • Integers. 0 = Auto-rotate. 90, 180, 270 = rotate by that number of degrees clockwise.

Default value: 0

[LfImageCleanup]HorizLineRemoval

If TRUE, removes horizontal lines from scanned documents.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]VertLineRemoval

If TRUE, removes vertical lines from scanned documents.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[LfImageCleanup]LineRemovalCharProtection

If TRUE, protects characters that intersect lines from being damaged during line removal.

Valid values:

  • TRUE/FALSE

Default value: FALSE

Other OCR Options

[Settings]OCRConflict

Determines how text extraction for electronic documents that contain image pages will be handled.

Valid values:

  • Integers. 0 = When the text extraction process encounters electronic documents with existing image pages, text will be extracted from the electronic documents and the image pages will be deleted. 1 = When the text extraction process encounters electronic documents with existing image pages, the existing image pages will be OCRed. 2 = When the text extraction process encounters electronic documents with existing image pages, the documents will be skipped without additional processing.

Default value: 2

[OCR]ForceSingleColumn

This is the same setting as Decolumnize text in the Generate Text: General section of the Options dialog box.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[OCR]Language

Specifies the language used to recognize characters. In the Windows client, this must be specified if not using the default language (English). In the web client, either this or [OCR]LanguageIETF must be specified if not using the default language (English).

Valid values:

  • String

Default value: English

[OCR]LanguageIETF

Specifies the IETF language tag for the OCR language. In the web client, either this or [OCR]Language must be specified if the user is not using the default language (English). This attribute does not apply to the Windows client.

Valid values:

  • String

Default value: en

[OCR]Optimization

Determines how far the OCR process optimizes speed or accuracy.

Valid values:

  • Integers. 0 = Optimize speed, 2 = Optimize accuracy, 1 = Balance both speed and accuracy.

Default value: 1

[OCR]UseFormFix

This is the same setting as Perform image enhancement in the Generate Text: General section of the Options dialog box.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[Settings]OCR

This is the same setting as the OCR / Extract Text checkbox in the Generate Searchable Text dialog box. Applies to the Windows client only.

Valid values:

  • Integers. 1 = Yes, 0 = No.

Default value: 1

[OCR]PagesOption

This is the same setting under "Which image pages would you like to OCR?" in the Generate Searchable Text dialog box. Applies to the Windows client only.

Valid values:

  • Integers. 0 = All image pages, 1 = All image pages without text, 2 = Specified pages.

Default value: 0

[OCR]OptimizeForSpeed

Determines whether speed will be optimized when OCRing. Applies to the Windows client only.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[OCR]OcrCancelTimeout

When canceling an OCR job, the client will wait this number of milliseconds before abandoning the background operation that is delaying the cancellation. Applies to the Windows client only.

Valid values:

  • Integers

Default value: 10000

[Options]OCREngineProgId

This is the same setting as OCR Engine in Options → Generate Text → General.

Valid values:

  • Integers. Set to 6 to hide dialog box. Delete attribute to show dialog box.

PDF-Specific Attributes

[Settings]GeneratePagesMonochrome

Determines whether to create monochrome image pages when generating pages for PDFs that already exist in the repository.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[Settings]GeneratePagesDirectImageExtraction

For documents that have pages with different DPI, extracting the images directly from the PDF is faster but could lead to differences between the displays for each page. Set this option to FALSE to ensure that new images are generated for each page, rather than extracting the images directly. Use [Settings]PdfImportResolution to specify the image resolution.

This attribute applies to the Windows client only.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[Settings]GeneratePagesRecompressImages

Determines the compression used when generating pages from a PDF. When this is FALSE, the images are saved as a TIFF with either LZW or Group 4 compression. When this is TRUE, use [Settings]ImportJPEGCompressionLevel to set the compression level. Set this to TRUE to produce images of smaller sizes.

This attribute applies to the Windows client only.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[Settings]PdfPageImportOption

Determines how imaged pages will be generated from PDFs in the repository. The two options on offer are the same options that the user can select from Tools → Options → Generate Pages → General → PDFs in the repository.

This attribute applies to the Windows client only.

Note: Using Snapshot to generate pages will not preserve PDF annotations.

Valid values:

  • Integers. 0 = Generate images from the pages of PDF files, 1 = Use Snapshot to generate pages.

Default value: 0

[Settings]IncludePDFAnnotsExistingDocs

For the web client only. When generating pages for existing PDFs in the repository, this determines whether to include PDF annotations on the Laserfiche pages. This is the same option as Preserve PDF annotations on Laserfiche pages in Tools → Options → Generate Pages → General → PDFs in the repository. This attribute does not apply if you are using Snapshot to generate pages.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[Settings]GeneratePagesPreservePdfAnnotations

For Windows client only. When generating pages for existing PDFs in the repository, this determines whether to include PDF annotations on the Laserfiche pages. This is the same option as Preserve PDF annotations on Laserfiche pages in Tools → Options → Generate Pages → General → PDFs in the repository. This attribute does not apply if you are using Snapshot to generate pages.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[Settings]BurnPDFAnnotationsOnLFImage

Determines whether PDF annotations will be burned directly onto the Laserfiche page, as opposed to being converted into Laserfiche annotations, when generating pages for a PDF.

This attribute applies to the Windows client only.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[Settings]UseAlternatePdfTextMethod

This is the same setting as Use an alternative method to generate text in the Advanced PDF Import Options dialog box. This attribute applies to the Windows client only. See [WebAccess]PdfTextExtractMethod for the web client equivalent.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[Settings]AlternatePdfTextMethod

If [Settings]UseAlternatePdfTextMethod is TRUE, this specifies the method used to generate text. Windows client only.

Valid values:

  • Integers. 0 = OCR, 1 = IFilter.

Default value: 0

[WebAccess]PdfTextExtractMethod

Determines whether text is extracted from a PDF using Laserfiche's method, or using the PDF IFilter program on the user's computer. This attribute applies to the web client only. See [Settings]AlternatePdfTextMethod and [Settings]UseAlternatePdfTextMethod for the analogous Windows client attributes.

Valid values:

  • String. "native" = Use Laserfiche's method, "ifilter" = Use IFilter.

Default value: native

[Settings]GeneratePagesForPdfWithoutTextStream

This is the same setting as Generate images and text for PDFs without a text stream in the Advanced PDF Import Options dialog box.

This setting does not apply to the web client.

Valid values:

  • TRUE/FALSE

Default value: FALSE

[HiddenDialogs]ConfirmPageGeneration

If TRUE, the user is warned when they generate pages for a PDF file that already has some Laserfiche image pages. They can then choose to continue or abort the operation.

This attribute applies to the web client only.

Valid values:

  • Integers. Set to 6 to disable dialog. Delete attribute to enable dialog.

Others

[Settings]ShowLinkAllAnnotations

If enabled, adds a Link all Image Annotations option to the Tools menu in the Windows client. Click on this option to automatically generate linked text annotations from existing image highlight, redaction, underline, and strikethrough annotations on a document.

This attribute does not apply to the web client.

Valid values:

  • Integers. 1 = Show option, 0 = Hide option.

Default value: 0

[OfficePlugin]TextExtraction

This is the same setting as Automatically extract text when saving documents from Microsoft Office under Generate Text: General in theOptions dialog box.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[Settings]EnableEDocTextExtract

This is the same setting as Generate searchable text in New Documents: Settings in the Options dialog box. It applies to the web client only.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[Settings]GeneratePagesUseClientHelper

When generating pages from a PDF that already exists in the repository, this determines whether a client helper process will be used. Set this to FALSE to continue generating pages within the current process. Set this to TRUE to improve performance within the current process.

This attribute applies to the Windows client only.

Valid values:

  • TRUE/FALSE

Default value: TRUE

[HiddenDialogs]NoIFilter

Determines whether a dialog box appears when some documents have no text extracted because IFilter is not installed. The dialog box provides the user with a link to the Laserfiche Support site to install IFilter. This attribute applies to the Windows client only.

Valid values:

  • TRUE/FALSE

Default behavior: Dialog box appears.

[HiddenDialogs]GeneratePages

Determines whether the Generate Pages dialog box, which allows the user to change the settings for page generation, appears when the user attempts to generate pages. This attribute applies to the Windows client only.

Valid values:

  • TRUE to hide dialog box, FALSE to show dialog box.

Default value: FALSE