OCR, Text Extraction, and Page Generation
These attributes influence how pages and text are generated from existing documents.
For settings that are specifically related to automatic text or page generation from new documents during import, see the section on importing. The attributes[Settings]MImportFlagsand [Settings]MImportBreak may be of particular interest.
OCR: Image Cleanup
[LfImageCleanup]Deskew
If TRUE, straightens images that have been scanned at an angle.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]Despeckle
If TRUE, removes unwanted marks (e.g., dots or stray pixels) from scanned documents.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]SpeckleSize
Specifies the maximum size of speckles that can be removed. Size is specified as both width and height, in pixels. For example, setting this option to 2 and setting [LfImageCleanup]Despeckle to TRUE will remove all noise that is equal to or smaller than a 2 pixel by 2 pixel square.
Valid values:
- Integers
Default value: 2
[LfImageCleanup]Rotate
If TRUE, rotates documents before carrying out OCR according to the setting in [LfImageCleanup]RotateType.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]RotateType
Rotates documents by the specified number of degrees clockwise.
Valid values:
- Integers. 0 = Auto-rotate. 90, 180, 270 = rotate by that number of degrees clockwise.
Default value: 0
[LfImageCleanup]HorizLineRemoval
If TRUE, removes horizontal lines from scanned documents.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]VertLineRemoval
If TRUE, removes vertical lines from scanned documents.
Valid values:
- TRUE/FALSE
Default value: TRUE
[LfImageCleanup]LineRemovalCharProtection
If TRUE, protects characters that intersect lines from being damaged during line removal.
Valid values:
- TRUE/FALSE
Default value: FALSE
Other OCR Options
[Settings]OCRConflict
Determines how text extraction for electronic documents that contain image pages will be handled.
Valid values:
- Integers. 0 = When the text extraction process encounters electronic documents with existing image pages, text will be extracted from the electronic documents and the image pages will be deleted. 1 = When the text extraction process encounters electronic documents with existing image pages, the existing image pages will be OCRed. 2 = When the text extraction process encounters electronic documents with existing image pages, the documents will be skipped without additional processing.
Default value: 2
[OCR]ForceSingleColumn
This is the same setting as Decolumnize text in the Generate Text: General section of the Options dialog box.
Valid values:
- TRUE/FALSE
Default value: FALSE
[OCR]Language
Specifies the language used to recognize characters. In the Windows client, this must be specified if not using the default language (English). In the web client, either this or [OCR]LanguageIETF must be specified if not using the default language (English).
Valid values:
- String
Default value: English
[OCR]LanguageIETF
Specifies the IETF language tag for the OCR language. In the web client, either this or [OCR]Language must be specified if the user is not using the default language (English). This attribute does not apply to the Windows client.
Valid values:
- String
Default value: en
[OCR]Optimization
Determines how far the OCR process optimizes speed or accuracy.
Valid values:
- Integers. 0 = Optimize speed, 2 = Optimize accuracy, 1 = Balance both speed and accuracy.
Default value: 1
[OCR]UseFormFix
This is the same setting as Perform image enhancement in the Generate Text: General section of the Options dialog box.
Valid values:
- TRUE/FALSE
Default value: FALSE
[Settings]OCR
This is the same setting as the OCR / Extract Text checkbox in the Generate Searchable Text dialog box. Applies to the Windows client only.
Valid values:
- Integers. 1 = Yes, 0 = No.
Default value: 1
[OCR]PagesOption
This is the same setting under "Which image pages would you like to OCR?" in the Generate Searchable Text dialog box. Applies to the Windows client only.
Valid values:
- Integers. 0 = All image pages, 1 = All image pages without text, 2 = Specified pages.
Default value: 0
[OCR]OptimizeForSpeed
Determines whether speed will be optimized when OCRing. Applies to the Windows client only.
Valid values:
- TRUE/FALSE
Default value: FALSE
[OCR]OcrCancelTimeout
When canceling an OCR job, the client will wait this number of milliseconds before abandoning the background operation that is delaying the cancellation. Applies to the Windows client only.
Valid values:
- Integers
Default value: 10000
[Options]OCREngineProgId
This is the same setting as OCR Engine in Options → Generate Text → General.
Valid values:
- Integers. Set to 6 to hide dialog box. Delete attribute to show dialog box.
PDF-Specific Attributes
[Settings]GeneratePagesMonochrome
Determines whether to create monochrome image pages when generating pages for PDFs that already exist in the repository.
Valid values:
- TRUE/FALSE
Default value: FALSE
[Settings]GeneratePagesDirectImageExtraction
For documents that have pages with different DPI, extracting the images directly from the PDF is faster but could lead to differences between the displays for each page. Set this option to FALSE to ensure that new images are generated for each page, rather than extracting the images directly. Use [Settings]PdfImportResolution to specify the image resolution.
This attribute applies to the Windows client only.
Valid values:
- TRUE/FALSE
Default value: TRUE
[Settings]GeneratePagesRecompressImages
Determines the compression used when generating pages from a PDF. When this is FALSE, the images are saved as a TIFF with either LZW or Group 4 compression. When this is TRUE, use [Settings]ImportJPEGCompressionLevel to set the compression level. Set this to TRUE to produce images of smaller sizes.
This attribute applies to the Windows client only.
Valid values:
- TRUE/FALSE
Default value: FALSE
[Settings]PdfPageImportOption
Determines how imaged pages will be generated from PDFs in the repository. The two options on offer are the same options that the user can select from Tools → Options → Generate Pages → General → PDFs in the repository.
This attribute applies to the Windows client only.
Note: Using Snapshot to generate pages will not preserve PDF annotations.
Valid values:
- Integers. 0 = Generate images from the pages of PDF files, 1 = Use Snapshot to generate pages.
Default value: 0
[Settings]IncludePDFAnnotsExistingDocs
For the web client only. When generating pages for existing PDFs in the repository, this determines whether to include PDF annotations on the Laserfiche pages. This is the same option as Preserve PDF annotations on Laserfiche pages in Tools → Options → Generate Pages → General → PDFs in the repository. This attribute does not apply if you are using Snapshot to generate pages.
Valid values:
- TRUE/FALSE
Default value: TRUE
[Settings]GeneratePagesPreservePdfAnnotations
For Windows client only. When generating pages for existing PDFs in the repository, this determines whether to include PDF annotations on the Laserfiche pages. This is the same option as Preserve PDF annotations on Laserfiche pages in Tools → Options → Generate Pages → General → PDFs in the repository. This attribute does not apply if you are using Snapshot to generate pages.
Valid values:
- TRUE/FALSE
Default value: TRUE
[Settings]BurnPDFAnnotationsOnLFImage
Determines whether PDF annotations will be burned directly onto the Laserfiche page, as opposed to being converted into Laserfiche annotations, when generating pages for a PDF.
This attribute applies to the Windows client only.
Valid values:
- TRUE/FALSE
Default value: FALSE
[Settings]UseAlternatePdfTextMethod
This is the same setting as Use an alternative method to generate text in the Advanced PDF Import Options dialog box. This attribute applies to the Windows client only. See [WebAccess]PdfTextExtractMethod for the web client equivalent.
Valid values:
- TRUE/FALSE
Default value: FALSE
[Settings]AlternatePdfTextMethod
If [Settings]UseAlternatePdfTextMethod is TRUE, this specifies the method used to generate text. Windows client only.
Valid values:
- Integers. 0 = OCR, 1 = IFilter.
Default value: 0
[WebAccess]PdfTextExtractMethod
Determines whether text is extracted from a PDF using Laserfiche's method, or using the PDF IFilter program on the user's computer. This attribute applies to the web client only. See [Settings]AlternatePdfTextMethod and [Settings]UseAlternatePdfTextMethod for the analogous Windows client attributes.
Valid values:
- String. "native" = Use Laserfiche's method, "ifilter" = Use IFilter.
Default value: native
[Settings]GeneratePagesForPdfWithoutTextStream
This is the same setting as Generate images and text for PDFs without a text stream in the Advanced PDF Import Options dialog box.
This setting does not apply to the web client.
Valid values:
- TRUE/FALSE
Default value: FALSE
[HiddenDialogs]ConfirmPageGeneration
If TRUE, the user is warned when they generate pages for a PDF file that already has some Laserfiche image pages. They can then choose to continue or abort the operation.
This attribute applies to the web client only.
Valid values:
- Integers. Set to 6 to disable dialog. Delete attribute to enable dialog.
Others
[Settings]ShowLinkAllAnnotations
If enabled, adds a Link all Image Annotations option to the Tools menu in the Windows client. Click on this option to automatically generate linked text annotations from existing image highlight, redaction, underline, and strikethrough annotations on a document.
This attribute does not apply to the web client.
Valid values:
- Integers. 1 = Show option, 0 = Hide option.
Default value: 0
[OfficePlugin]TextExtraction
This is the same setting as Automatically extract text when saving documents from Microsoft Office under Generate Text: General in theOptions dialog box.
Valid values:
- TRUE/FALSE
Default value: TRUE
[Settings]EnableEDocTextExtract
This is the same setting as Generate searchable text in New Documents: Settings in the Options dialog box. It applies to the web client only.
Valid values:
- TRUE/FALSE
Default value: TRUE
[Settings]GeneratePagesUseClientHelper
When generating pages from a PDF that already exists in the repository, this determines whether a client helper process will be used. Set this to FALSE to continue generating pages within the current process. Set this to TRUE to improve performance within the current process.
This attribute applies to the Windows client only.
Valid values:
- TRUE/FALSE
Default value: TRUE
[HiddenDialogs]NoIFilter
Determines whether a dialog box appears when some documents have no text extracted because IFilter is not installed. The dialog box provides the user with a link to the Laserfiche Support site to install IFilter. This attribute applies to the Windows client only.
Valid values:
- TRUE/FALSE
Default behavior: Dialog box appears.
[HiddenDialogs]GeneratePages
Determines whether the Generate Pages dialog box, which allows the user to change the settings for page generation, appears when the user attempts to generate pages. This attribute applies to the Windows client only.
Valid values:
- TRUE to hide dialog box, FALSE to show dialog box.
Default value: FALSE