Regular Expressions

Regular expressions are a way of defining patterns in information by using special symbols. These expressions can be used to identify, constrain, display, or extract information.

Using regular expressions with Laserfiche applications

Laserfiche applications make use of regular expressions in various ways, depending on the aims of the application. See some common regular expressions examples.

Forms

Regular expressions can restrict the format of values entered in a field. For example, a regular expression can require that a user adds dashes to a phone number or Social Security number, but it cannot automatically add those dashes. To automatically change the appearance of entered values, you'll want to use field masks. Field masks change how values are displayed on a form, and are accomplished using very simple JavaScript code.

Workflow

Regular expressions can be used to extract specific information from tokens used in your workflow. You can apply regular expressions to tokens with the Token Editor or Token Dialog, and the Pattern Matching activity.

Quick Fields

Regular expressions can be applied to document text and token values using the Pattern Matching and Substitution processes. When the text or token values match a pattern, Quick Fields can perform some action, such as identifying the document as belonging to a document class, automatically annotating the text that meets the pattern, or substituting different text. Pattern Matching expressions can also be used as constraints on particular fields to ensure data used in them meets specified criteria.

Documents

In Repository Administration, constraints for text fields can be created using regular expressions. Constraints for numeric fields (number, integer, and long integer) do not use regular expressions and instead use relational operators. For more information about relational operators, see Relational Operator Reference.

Helpful regular expressions to use when configuring field constraints in Repository Administration.

Connector

If you only want to use part of the value stored in a token, you can use regular expressions to extract just the information you need.

Capture Profiles

Regular expressions can be used to account for variations in a zone's anchor text and help locate the data you want to capture. When text matches the regular expression pattern, the capture profile can accurately find the correct information on the document.

Types of regular expressions

Character Classes

The following regular expression characters find a match if any one of the characters included in the set matches the criteria you define.

Regular Expression Character Classes
Regular Expression Description
.

Matches any character except "\n". For example, the pattern a.e matches "ave" in "have" and "ate" in "water". To match a period character ".", precede the period with the escape character "\" to produce "\.".

[aeiou] Matches any single character included in the specified set of characters. The characters are case-sensitive. For example, the expression [abc123] only allows one of the following characters:  "a," "b," "c," "1," "2," or "3", or the pattern [as] matches "a" and "s" in "Laserfiche".
[^aeiou] Matches any single character not in the specified set of characters. The characters are case-sensitive. For example, the expression [^abc123] allows any character except for:   "a," "b," "c," "1," "2," or "3", or the pattern [^as] matches "L", "r", "f", "i", "c", and "h", and "e" twice in "Laserfiche".
[0-9a-fA-F] Use of a hyphen ( ) allows specifying a contiguous character range from left to right. For example, the expression [0-9] allows only any number that falls between 0 and 9 and [A-Z] allows any capital letter. The pattern [A-X] matches "X" in "XY".
\p{name} Matches any character in the named character class specified by {name}. Supported names are Unicode category groups and block ranges. For example, Ll, Nd, Z, IsGreek, and IsBoxDrawing. The pattern \p{IsCyrillic} matches "Д" in "ДA". Learn more. *
\P{name} Matches any single character not in the specified Unicode category or block named in {name}. For example, the pattern \P{IsCyrillic} matches the character “A” in the text “ДA” because “A” is not part of the Cyrillic block. Use this construct when you need to exclude characters from a particular Unicode category or block. Learn more. *
\w Matches any word character. For example, the pattern \w matches "A", "B", "1", and "2" in "AB 1.2". Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If you are defining a regular expression in Laserfiche repository administration, \w is equivalent to [a-zA-Z_0-9].
\W Matches any non-word character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}] For example, the pattern \W matches " " (blank space) and "." in "AB 1.2". If you are defining a regular expression in Laserfiche repository administration, \W is equivalent to [^a-zA-Z_0-9].
\s Matches any white-space character. For example, the pattern \s matches " " (blank space) in "AB 1.2". Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If you are defining a regular expression in Laserfiche repository administration, \s is equivalent to [ \f\n\r\t\v].
\S Matches any non-white-space character. For example, the pattern \S matches "A", "B", "1", ".", and "2" in "AB 1.2". Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If you are defining a regular expression in Laserfiche repository administration, \S is equivalent to [^ \f\n\r\t\v].
\d Matches any decimal digit. For example, the pattern \d matches "1", "2", "3", and "4" in "ab 1.234". Equivalent to \p{Nd} for Unicode. If you are defining a regular expression in Laserfiche repository administration, \d is equivalent to [0-9].
\D Matches any non-decimal digit. For example, the pattern \D matches "a", "b", " " (blank space), and "." in "ab 1.234". Equivalent to \P{Nd} for Unicode. If you are defining a regular expression in Laserfiche repository administration, \D is equivalent to [^0-9].

*These regular expressions are not available when configuring field constraints in Laserfiche repository administration.

Alternative Character Class Syntax Available in Laserfiche repository administration
Regular Expression Description
[[:alnum:]] Any alphanumeric character.
[[:alpha:]] Any alphabetical character in the following ranges: a-z and A-Z.
[[:blank:]] A space or a tab.
[[:digit:]] Any whole number from 0 to 9.
[[:lower:]] Any lower-case character (i.e., a-z).
[[:print:]] Any printable character.
[[:punct:]] Any punctuation character.
[[:space:]] Any white space character.
[[:upper:]] Any upper-case character (i.e., A-Z).
[[:xdigit:]] Any hexadecimal digit (i.e., 0-9, a-f and A-F).
[[:word:]] Any alphanumeric character or an underscore.

Quantifiers

Quantifiers match a specified number of elements in a regular expression. They apply to the character, group, or character class that immediately precedes them and can add optional quantity data to a pattern.

Regular Expression Quantifiers
Regular Expression Description
* Zero or more matches. Matches the previous element zero or more times. For example: \d* matches zero or more consecutive digits (equivalent to {0,}), and the pattern d* matches "d" twice in "1dad".
+ One or more matches. Matches the previous element one or more times. For example: \d+ matches one or more consecutive digits; i.e., a positive whole number, (equivalent to {1,}), and the pattern to+ matches "to" in "tough" and "too" in "tooth".
? Zero or one match. Matches the previous element zero or one time. For example: \d? matches a single digit or a blank value (equivalent to {0,1}), and the pattern card? matches "card" in "cards" and "car" in "cars".
{n} Exactly n matches. Matches the previous element n times. For example: (pizza){2} only matches "pizzapizza", and the pattern ,\d{3} matches ",234" and ",567" in "1,234,567.890".
{n,} At least n matches. Matches the previous element at least n times. For example: (abc){2,} matches "abcabc" and "abcabcabc", but not "abc", and the pattern \d{2,} matches "11" and "24" in "11.24".
{n,m} At least n, but no more than m, matches. Matches the previous element at least n times, but no more than m times. For example: \d{2,4} matches a two-, three-, or four-digit number, such as "113" and "2444" in "113.2444".

Note: By default, .NET regular expressions are "greedy." You can add a question mark "?" after these quantifiers to make them "lazy." For example: Name:\s(.+)\s matches "John Last Name:" in "First Name: John Last Name: Smith," while Name:\s(.+?)\s matches "John".

Character Escapes

Most regular expression language operators are unescaped single characters. The escape character \ (a single backslash) signals to the regular expression parser that the character following the backslash is not a literal character, but instead represents a regular expression symbol.

Example: The parser treats an asterisk (*) as a repeating quantifier and a backslash followed by an asterisk (\*) as the Unicode character 002A.

Note: The character escapes listed are recognized both in regular expressions and in replacement patterns.

Regular Expression Character Escapes
Regular Expression Description
Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.
\a Matches a bell (alarm) \u0007.*
\b Matches a backspace \u0008 if in a [] character class. This character is a special case. In a regular expression, \b denotes a word boundary (between \w and \W characters) except within a [] character class, where \b refers to the backspace character. In a replacement pattern, \b always denotes a backspace..*
\t Matches a tab \u0009.
\r Matches a carriage return \u000D.
\v Matches a vertical tab \u000B.
\f Matches a form feed \u000C.
\n Matches a new line \u000A.
\e Matches an escape \u001B.*
\040 Matches an ASCII character as octal (up to three digits); numbers with no leading zero are back references if they have only one digit or if they correspond to a capturing group number. For example, the character \040represents a space.
\x20 Matches an ASCII character using hexadecimal representation (exactly two digits).
\cC Matches an ASCII control character. For example, \cC is Control+C.
\c Matches an ASCII control character.
\\ Matches a backslash.

* These regular expressions are not available when configuring field constraints in Laserfiche repository administration.

Grouping Constructs

The following regular expression grouping constructs let you extract a subset of information from a token or text.

Regular Expression Grouping Constructs
Regular Expression Description
(expr) Match or capture group. Matches the exact expression in the parentheses. For example, the pattern "(1-3)" matches "1-3" in "1-34", but nothing in "1".
(?:expr) Non-capturing group. Groups the contained expressions together (e.g., to apply a quantifier to multiple symbols at once), but does not create a capture group. Useful for applying quantifiers or alternation without cluttering the capture results. E.g., (?:https?|ftp)://\S+ → Matches URLs starting with http, https, or ftp without capturing the protocol separately.
(?=expr) Asserts that the following text matches expr without consuming any characters. The match only succeeds if expr can be matched at the current position, but the input position does not advance, so subsequent patterns still process the same text.
(?<name>expr) Creates a named capture group for future use in the regular expression.*
\k<> References a named capture group created in the expression (back reference). Matches the string captured by that capture group. *

*These regular expressions are not available when configuring field constraints in Laserfiche repository administration.

Metacharacters

The following regular expression characters cause a match to succeed or fail depending on the position of the match within the text.

Example: The regular expression ^FTP returns only those occurrences of the character string "FTP" that occur at the beginning of a line.

Regular Expression Metacharacters
Regular Expression Description
^ Matches at the beginning of the text or line. For example, the pattern ^\d{2} matches "12" in "12-34".
$ Matches at the end of the text, before \n, or at the end of the line. For example, the pattern \d{2}$ matches "34" in "12-34".
\A Matches at the beginning of the text. For example, the pattern \A\d{2} matches "12" in "12-34". (Ignores the Multiline option)*
\Z Matches at the end of the text or before \n at the end of the text. For example, the pattern \d{2}\Z matches "34" in "12-34". (Ignores the Multiline option)*
\G Matches only at the point where the previous match ended. When used with Match.NextMatch(), ensures that matches are all contiguous. For example, the pattern \G\(\d\) matches "(1)" and "(2)" in "(1)(2)[3](4)", but matches only "(1)" in "(1) (2)3".*
\b Matches on a boundary between a \w and a \W, or at the start or end of a line. For example, the pattern \b\w matches "s" twice in "sea shells". The pattern \w\b matches "w" once in "workflow".
\B Matches an item that does not occur on a \b boundary. For example, the pattern \B\w matches "a", "h", "s", "e", and "l" twice in "sea shells".

*These expressions are not available when configuring field constraints in Laserfiche repository administration.

Alternations

An alternation is a character that modifies a regular expression to allow either/or matching.

Regular Expression Alternations
Regular Expression Description
| Matches any one of the terms separated by the vertical bar (|) character. For example, cat|dog|tiger matches either ‘cat’, ‘dog’, or ‘tiger’. When multiple subexpressions are satisfied, the left-most expression is used. You can also group alternatives within parentheses. For example, the pattern c(ar|haracter|all) matches ‘car’ and ‘character’ in ‘This car has character’.

Options

Regular expression options that let you specify how a regular expression pattern will be interpreted.

Note: These regular expressions are not available when configuring field constraints in Laserfiche repository administration.

Regular Expression Options
Regular Expression Description
(?i) Enables case-insensitive matching. For example, (?i)[A-Z] would match any upper-case or lower-case letter.
(?m) Enables multi-line mode. ^ and $ will match the beginning and end of a line, instead of the beginning and end of the text.
(?n) Enables capturing named groups only (explicit capture). I.e., groups using this format: (?<name> subexpression).
(?s) Enables single-line mode. ^ and $ will match the beginning and end of the text, instead of the beginning and end of a line.
(?x) Enables ignoring white space in the regular expression.
(?-i) Disables case-insensitive matching. For example, (?!)[A-Z](?-i)[A-Z] would match any upper-case or lower-case letter that was followed by an upper-case letter.
(?-m) Disables multi-line mode. ^ and $ will match at the beginning and end of the text,
(?-n) Disables capturing named groups only (explicit capture).
(?-s) Disables single-line mode.
(?-x) Disables ignoring white space in the regular expression.

More resources

  • To test your regular expressions, use https://regex101.com
  • Laserfiche Workflow, Quick Fields, and Process Automation use .NET regular expressions. The Microsoft Developer Network (MSDN) website has additional information about this type of regular expression, including all supported language elements, best practices, and examples.
  • Repository Administration uses ECMAScript regular expressions. The Microsoft Developer Network (MSDN) website lists all of the supported language elements.