Search Strings

One of the principle uses of regular expressions is the search for (and subsequent replacement of) substrings in character strings. In general, a user is interested in a specific selection of character strings that match a regular expression. In ABAP, the search of regular expressions is realized using the addition REGEX of the statement FIND, whereby the found substrings are determined with no overlaps according to the leftmost-longest rule.

Leftmost-longest rule

First, the substring is determined that is the furthest to the left in the character string, and which matches the regular expression ("leftmost"). If there are several substrings, the longest sequence is chosen ("longest"). This procedure is then repeated for the remaining sequence after the found location.

Example

For the regular expression d*od*, five substrings are found in doobedoddoo: do at offset 0, o at offset 2, dodd at offset 5, o at offset 9 and o at offset 10.

DATA result_tab TYPE match_result_tab.
FIND ALL OCCURRENCES OF regex 'd*od*' IN 'doobedoddoo'
                      RESULTS result_tab.

Operators for search strings

The following operators support searching in character strings. These operators are made up of the special characters ^, $, \, <, >, (, ), =and !. The special characters can be made into literal characters using the prefix \.

Start and end of a character string

The operators ^ and $ act as anchor characters for the offset before the first character of a line and the offset after the last character of a line. If the character string to be searched contains control characters such as a line feed, it is interpreted as consisting of several lines.

The operators \A and \Z have the same effect as ^ and $, but always refer to the whole character string instead of to single lines.

Note

The operators ^, $ and \A, \Z behave differently if control characters are present. Within ABAP programs, these control characters normally occur only for importing externally generated data records.

Example

The following search finds Smile at the start of the first line and at the end of the last line of the internal table text_tab.

DATA text(10) TYPE c.
DATA text_tab LIKE TABLE OF text.

DATA result_tab TYPE match_result_tab.

APPEND 'Smile' TO text_tab.
APPEND ' Smile' TO text_tab.
APPEND '  Smile' TO text_tab.
APPEND '   Smile' TO text_tab.
APPEND '    Smile' TO text_tab.
APPEND '     Smile' TO text_tab.

FIND ALL OCCURRENCES OF regex '^(?:Smile)|(?:Smile)$'
     IN TABLE text_tab RESULTS result_tab.

Start and end of a word

The operator \< fits at the start of a word and the operator \> fits at the end of a word. The operator <\b fits at both the beginning and the end of a word. A word is defined as an uninterrupted sequence of alphanumeric characters.

Example

The following search finds the three words One, two and 3. Instead of the expression \<[[:alnum:]]+\>, \b[[:alnum:]]+\b can also be used.

DATA text TYPE string.
DATA result_tab TYPE match_result_tab.

text = `One, two, 3!`.

FIND ALL OCCURRENCES OF regex '\<[[:alnum:]]+\>'
     IN text RESULTS result_tab.

Preview conditions

The operator (?=...) defines a regular expression s as a subsequent condition for a previous regular expression r. The regular expression r(?=s) has the same effect in a search as the regular expression r, if the regular expression s matches the substring that immediately follows the substring found with r.

The operator (?!...) acts in the same way as (?=... ), with the difference that r(?!s) matches the substring for r if s does not match the subsequent substring.

Note

The substring found by the preview s is not a part of the match found by r(?=s).

Example

The following search finds the substring la at offset 7.

DATA text TYPE string.
DATA result_tab TYPE match_result_tab.

text = `Shalalala!`.

FIND ALL OCCURRENCES OF REGEX '(?:la)(?=!)'
     IN text RESULTS result_tab.