init options match matchAll replace replaceAllthe value attribute
subPatternsthe exception attribute
compilationError
A regular expression (regexp, for short) is a pattern that denotes a set of strings, possibly an infinite set. Searching for matches for a regexp is a very powerful operation that editors and scripting languages on Unix systems have traditionally offered.
This implementation of regular expressions implements a regular expression syntax that is compatible with that in the language Perl. It is based on Philip Hazel's PCRE library. Most of Philip Hazel's documentation also applies to the BETA version.
The pcre pattern encapsulates a regular expression. It takes a Text reference as an enter parameter. The enter parameter is given to the Init method.
The pcre pattern has an empty do-part
The pcre pattern exits a reference to itself
You can use the pcre pattern as in the following example
re: ^Pcre; do 'trigger' -> pcre -> re[]; (filename[], re[]) -> myGrep;
The init method takes a Text reference as an enter parameter. This string describes the regular expression according to the syntax described in the pcre documentation. Init compiles the regular expression into an internal format suitable for matching against strings. This operation takes some CPU time, so the result (stored in the pcre object) should be kept if the same pattern is to be used many times.
When compiling the regular expression the options defined by the options method are used.
You can call init several times if you want to change the regular expression matched by the pcre object.
The options method is a virtual pattern, which you can specialise. Put the options you need into the do part. For example:
re: @Pcre (# options:: (# do CASELESS; DO_STUDY; #); #) do 'tRiGgEr' -> re; (filename[], re[]) -> myGrep;
There is an alternative way to specify certain options, which involves placing them in the textual representation of the regular expression. For example the option CASELESS can be specified by prepending the string '<?i>' to the regular expression.
The following options are supported
Option | Text version | Used in | Description |
CASELESS | (?i) | init | Ignore case when matching |
MULTILINE | (?m) | init | ^ and $ match after/before newlines |
DOTALL | (?s) | init | . matches newlines |
EXTENDED | (?x) | init | Extended regexp syntax |
ANCHORED | ^ | init | Match only at start of string |
DOLLAR_ENDONLY | init | $ doesn't match before terminal newline | |
EXTRA | (?X) | init | Support PCRE extensions to Perl regexps |
NOTBOL | init or match | Do not match ^ at start of string | |
NOTEOL | init or match | Do not match $ at end of string | |
UNGREEDY | (?U) | init | Quantifiers not greedy by default |
NOTEMPTY | init or match | Empty string cannot match entire expression | |
C_LOCALE | init | Use C locale instead of default localei | |
DO_STUDY | init | Study regexp after compiling it | |
RETURN_NONE | init or match | Return NONE for subpatterns that didn't match |
Notes:
(* p is a perl regexp with my favourite options, including case * insensitivity, but just this once I want a case sensitive * regexp. *) p: @PcreWithMyFavouriteOptions (# options:: (# do clearCASELESS #) #);
The match method takes a Text reference and exits true or false, depending on whether the text matched the expression. It also contains a set of methods that can be overridden to provide much more information about the match.
The INNER part of the match method is only called in the case of a match.
options pre matchPos matchText preMatchText postMatchText subMatchPos subMatchText sub1, sub2, sub3... noMatch position
This method can be overridden in much the same way as the options method in the pcre pattern in order to pass options to the matching stage of the regular expression engine.
This method is called before any matching takes place. It does nothing, but you can specialise it in your own subclasses.
This method can be called from the inner part of the match method. It exits an integer pair, indicating the start and end positions of the matched text in the original text. See the example below.
This method can be called from the inner part of the match method. It exits a Text reference indicating the text that matched the regular expression. See the example below.
These methods can be called from the inner part of the match method. They exit a Text reference indicating the text that preceeded (or followed) the text that matched the regular expression.
For example:
(# t1: ^Text; t2: ^Text; r3: ^Text; s: @Integer; e: @Integer; do 'abc123def' -> ('\\d+' -> pcre).match (# do preMatchText -> t1[]; matchText -> t2[]; postMatchText -> t3[]; matchPos -> (s, e); #); ... #);
Will put 'abc' in t1, '123' in t2 and 'def' in t3. It also puts 4 in s and 6 in e.
This method can be called from the inner part of the match method. It enters an integer and exits an integer pair, indicating the start and end positions of the nth subpattern in the original text. See the example below.
This method can be called from the inner part of the match method. It enters an integer and exits a text, indicating the text matched by the nth subpattern. See the example below.
These methods can be called from the inner part of the match method. They exit a text, indicating the text matched by the nth subpattern. They are simply a shorthand method of invoking subMatchText. See the example below:
(# t1: ^Text; t2: ^Text; r3: ^Text; s: @Integer; e: @Integer; do 'abc123def' -> ('([a-z])(\\d+)([a-z]+)' -> pcre).match (# do sub1 -> t1[]; sub2 -> t2[]; 3 -> subMatchText -> t3[]; 3 -> subMatchPos -> (s, e); #); ... #);
Will put 'c' in t1, '123' in t2 and 'def' in t3. It also puts 7 in s and 9 in e.
This method is called by match when no match is found. You can specialise it to specify an action if no match is found.
This method controls where in the input string the search for a regular expression match starts. You can specialise it, putting a different number into the variable 'value'.
The replace method inherits from the match method. It takes two inputs, firstly a Text reference to a search string, and secondly a text reference to a default replacement string. It exits two values, firstly a boolean (true or false), depending on whether the text matched the expression. Secondly the a text reference to the new text with the replacement carried out. If no replacement is carried out then the text exited is a copy of the search string entered. Replace also contains a set of methods that can be overridden to provide much more information about the match and to control the replacement text more accurately. See the example below.
The INNER part of the replace method is only called in the case of a match.
options pre matchPos matchText preMatchText postMatchText subMatchPos subMatchText sub1, sub2, sub3... noMatch position rep
This method controls the replacement string. The 'value' variable is a reference to the default replacement text. By assigning a new reference to 'value' you can dynamically choose another replacement string, based on information gleaned from the other methods available in replace.
(# t1: ^Text; do ('The y2k problem', 'year 2000' -> ('\\by2k\\b' -> pcre).replace -> (p, t1[]); ... #);
Will put 'The year 2000 problem' in t1. (The escape sequence '\b' in a regular expression matches a word boundary. In a BETA string you have to double the backslash.)
(# t1: ^Text; do ('The y3k problem', '' -> ('\\by([0-9]+)k\\b' -> pcre).replace (# rep:: (# do 'year %s000' -> putFormat (# do sub1 -> s #) -> value[]; #); #) -> (p, t1[]); ... #);
This method is similar to match, but calls INNER several times, once for each match. It is not yet fully documented. Please see pcre.bet comments and demo programs.
This method is similar to replace, but calls INNER several times, once for each match. It is not yet fully documented. Please see pcre.bet comments and demo programs.
Basic Libraries - Reference Manual | © 1990-2002 Mjølner Informatics |
[Modified: Friday January 4th 2002 at 13:10]
|