Processing Byte Strings and Character Strings

This section describes the statements that are specially intended for processing byte and character strings in byte-type data objects.

Byte Strings and Character Strings

Since Release 6.10 any flat structures and byte-type data objects can only be treated as character strings outside Unicode programs. Before Release 6.10, this was possible in all programs.

Instructions for Byte and Character String Processing

The following table lists the keywords for byte and character string processing and states which processes the statements support:

Key word Byte string processing Character string processing
CONCATENATE x x
FIND x x
REPLACE x x
SHIFT x x
SPLIT x x
CONDENSE - x
CONVERT TEXT - x
OVERLAY - x
TRANSLATE - x
SET BIT x -
GET BIT x -

Since Release 6.10, there is a clear difference between the processing of byte strings and character strings. Since Release 6.10, the keywords in the table that support both byte and character string processing have an optional addition:

... IN {BYTE|CHARACTER} MODE ...

This addition defines which process is carried out. If the addition is not specified, character string processing is carried out in these statements.

Before Release 6.10, this addition cannot be specified. The system always carries out character string processing. Flat structures and byte strings are treated as character strings (implicit casting). All statements for which explicit byte string processing is possible since Release 6.10 (according to the table above) have the correct result when accessing byte strings even before Release 6.10, provided the character-type processing of the binary content is irrelevant. As of Release 6.10, character string processing and the statements GET BIT or SET BIT are only possible outside Unicode programs.

Operands in Byte and Character String Processing

Operands in Byte String Processing

With byte string processing triggered by the IN BYTE MODE addition and in the statements GET BIT or SET BIT, the relevant operands must be byte-type because they are processed byte-by-byte. This condition applies within and outside classes and in both Unicode- and non-Unicode programs.

Operands in Character String Processing

With character string processing defined using the IN CHARACTER MODE addition and in the statements that support only character string processing, the relevant operands must be character-type, because they are processed character-by-character and the storage of the characters in the memory depends on the Codepage used. This condition is essential for character string processing to function properly, but is checked in different ways:

Note that in Unicode programs, the term character-type has a more specific meaning than in non-Unicode programs:

In non-Unicode programs, that is especially all programs before Release 6.10, the last point allows character string processing of byte strings with the same results as the byte string processing as from Release 6.10, if the statement is appropriate (according to the table above).

Note

If you use the character-type data objects d, n, or t in character string processing when assigning interim results to target fields, note that the rules for the data type c apply and not the type-specific conversion rules.

Handling Closing Blanks in Character String Processing

With operands of the data types with fixed length (c, d, n, and t or structures regarded as character-type), leading blanks in the statements for character string processing are generally taken into account, while closing blanks are truncated. Exceptions to this rule are listed for the relevant statements. In the case of operands of the data type string, all blanks are taken into account. The consideration of closing blanks in assignments can be ensured by using the addition RESPECTING BLANKS of the statement CONCATENATE.

If the result of a statement for character string processing is assigned to an operand, it is generally filled with blanks on the right if the result is shorter than the length of the operand. When assigning to a string, its length generally adapts to that of the result. Exceptions to this rule are listed for the relevant statements.

Note

These rules also apply particularly to the processing of byte strings in non-Unicode programs. If a byte string contains values in its closing bytes that represent the space character in the current codepage, these bytes are truncated in insufficiently long result fields of fixed length while result fields that are too long are filled with these byte values.