Regular Expressions Module for Modula-2
By Tom Breeden (tmb@virginia.edu)
November 21, 2004
Overview
Support for pattern matching with regular expressions (a la AmigaDOS).
Syntax Characters are | # ? % - ( ) [ ] *
The code is a straightforward, non-optimized implementation of algorithms
3.3 (Thompson's Construction) for producing an NFA (Non-deterministic Finite Automaton)
from a regular
expression and 3.4 for simulating an NFA from the new dragon book,
Compilers, Principles, Techniques, and Tools, Aho, Sethi, Ullman.
Syntax
Examples
Definition Module
Test Programs
Associated Modules
Download
Notes
| | OR operator |
| AND is implied by adjacency |
# | Zero or more operator |
? | Matches any character |
' | Escape character for syntax symbols |
% | Matches empty string |
- | Range of characters |
( ) [ ] | grouping |
* | abbrev for '#?' |
Examples
#? | Accepts any string including empty string |
?#? | Accepts any string except empty string |
'#Hello'# | Accepts only #Hello# |
(0|1|%)(0-9|%)(0-9|%)|2[0-4][0-9]|25[0-5] | Accepts numbers 0 to 255, with leading zeros optionally |
Definition Module
DEFINITION MODULE RegularExpressions; (* v1.1-073004 *)
(* Copyright (C) 1989 Thomas Breeden.
Support for pattern matching a la AmigaDos.
AlternateOp '|'
ConcatOp -> implied by adjacency
KleeneClosureOp '#' ie, 0 or more
AnyCharOp '?'
QuoteOp "'"
EpsiOp '%' ie, null
RangeOp '-'
also
grouping '(' and ')' OR '[' and ']'
abbrev for '#?' '*'
*)
TYPE NFA; (* Non-deterministic Finite Automaton *)
RegExprErrs = (NoErr, NoMemory, BadExpr, NilNfa);
PROCEDURE CreateNFA(VAR Handle:NFA; RegExpr:ARRAY OF CHAR; VAR Err:RegExprErrs);
PROCEDURE InvalidNFA():NFA;
PROCEDURE RunNFA(Handle:NFA; Str:ARRAY OF CHAR; VAR Err:RegExprErrs):BOOLEAN;
PROCEDURE DiscardNFA(VAR Handle:NFA; VAR Err:RegExprErrs);
PROCEDURE IsSimpleNFA(Handle:NFA):BOOLEAN;
(*DEBUG*) PROCEDURE DumpNFA(Handle:NFA);
(* NOTE:
1. Careful: case is significant.
2. Max size of the ARRAY OF CHAR params is 256.
3. Strings should contain only 7 bit codes, null delimited.
*)
|
Test Programs
TstRegExpr Test Program, tty based. |
| TstRegExpr.mod | | |
TstRegExprGUI Test Program, with GUI. |
| Windows/TstRegExprGUI.exe | | For 32 bit Windows. |
| AmigaOS4/TstRegExprGUI | | For AmigaOS4 Pre-release. |
Associated Modules
Null delimited static strings |
| |
| Str0 | | Basic procedures. |
| Str1 | | More procedures. |
Large sets |
| |
| Set0 | | Basic procedures. |
Miscellaneous support modules |
| Debugging0 | | Assert and Debug procs. |
Download
RegExprM2.zip includes all test programs and source files listed above.
Notes
- Input characters are limited (mostly) to 7 bit ASCII. Characters above 367C are
used internally in the algorithm code.
- The input RegExpr string length for CreateNFA and the input test string length for
RunNFA is limited to about 256 characters.
- The created NFA has at most twice as many states as the number of symbols and
operators in the regular expression.
- Most of the development was done on AmigaOS. The StonyBrook Win32 Modula-2 compiler
was used to build the TstRegExprGUI program. My multiplatform GUI module is pretty primitive
and untested on any but my own Windows config.
|