|
|
Regular Expressions Module for Modula-2
By Tom Breeden (tmb@virginia.edu)
November 21, 2004
Overview
Support for pattern matching with regular expressions (a la AmigaDOS).
Syntax Characters are | # ? % - ( ) [ ] *
The code is a straightforward, non-optimized implementation of algorithms
3.3 (Thompson's Construction) for producing an NFA (Non-deterministic Finite Automaton)
from a regular
expression and 3.4 for simulating an NFA from the new dragon book,
Compilers, Principles, Techniques, and Tools, Aho, Sethi, Ullman.
| | OR operator |
| AND is implied by adjacency |
# | Zero or more operator |
? | Matches any character |
' | Escape character for syntax symbols |
% | Matches empty string |
- | Range of characters |
( ) [ ] | grouping |
* | abbrev for '#?' |
Examples
#? | Accepts any string including empty string |
?#? | Accepts any string except empty string |
'#Hello'# | Accepts only #Hello# |
(0|1|%)(0-9|%)(0-9|%)|2[0-4][0-9]|25[0-5] | Accepts numbers 0 to 255, with leading zeros optionally |
Definition Module
DEFINITION MODULE RegularExpressions; (* v1.1-073004 *)
(* Copyright (C) 1989 Thomas Breeden.
Support for pattern matching a la AmigaDos.
AlternateOp '|'
ConcatOp -> implied by adjacency
KleeneClosureOp '#' ie, 0 or more
AnyCharOp '?'
QuoteOp "'"
EpsiOp '%' ie, null
RangeOp '-'
also
grouping '(' and ')' OR '[' and ']'
abbrev for '#?' '*'
*)
TYPE NFA; (* Non-deterministic Finite Automaton *)
RegExprErrs = (NoErr, NoMemory, BadExpr, NilNfa);
PROCEDURE CreateNFA(VAR Handle:NFA; RegExpr:ARRAY OF CHAR; VAR Err:RegExprErrs);
PROCEDURE InvalidNFA():NFA;
PROCEDURE RunNFA(Handle:NFA; Str:ARRAY OF CHAR; VAR Err:RegExprErrs):BOOLEAN;
PROCEDURE DiscardNFA(VAR Handle:NFA; VAR Err:RegExprErrs);
PROCEDURE IsSimpleNFA(Handle:NFA):BOOLEAN;
(*DEBUG*) PROCEDURE DumpNFA(Handle:NFA);
(* NOTE:
1. Careful: case is significant.
2. Max size of the ARRAY OF CHAR params is 256.
3. Strings should contain only 7 bit codes, null delimited.
*)
|
Test Programs
TstRegExpr Test Program, tty based. |
| TstRegExpr.mod | | |
TstRegExprGUI Test Program, with GUI. |
| Windows/TstRegExprGUI.exe | | For 32 bit Windows. |
| AmigaOS4/TstRegExprGUI | | For AmigaOS4 Pre-release. |
Associated Modules
Null delimited static strings |
| |
| Str0 | | Basic procedures. |
| Str1 | | More procedures. |
Large sets |
| |
| Set0 | | Basic procedures. |
Miscellaneous support modules |
| Debugging0 | | Assert and Debug procs. |
Notes
- Input characters are limited (mostly) to 7 bit ASCII. Characters above 367C are
used internally in the algorithm code.
- The input RegExpr string length for CreateNFA and the input test string length for
RunNFA is limited to about 256 characters.
- The created NFA has at most twice as many states as the number of symbols and
operators in the regular expression.
- Most of the development was done on AmigaOS. The StonyBrook Win32 Modula-2 compiler
was used to build the TstRegExprGUI program. My multiplatform GUI module is pretty primitive
and untested on any but my own Windows config.
|