modula-2 home

  Home  
  Tutorial  
  Win32 API  
  Reference  
  Projects  
 

 

Regular Expressions Module for Modula-2

By Tom Breeden (tmb@virginia.edu)
November 21, 2004

Overview

Support for pattern matching with regular expressions (a la AmigaDOS).

Syntax Characters are | # ? % - ( ) [ ] *
 

The code is a straightforward, non-optimized implementation of algorithms 3.3 (Thompson's Construction) for producing an NFA (Non-deterministic Finite Automaton) from a regular expression and 3.4 for simulating an NFA from the new dragon book, Compilers, Principles, Techniques, and Tools, Aho, Sethi, Ullman.

 

Syntax   Examples   Definition Module   Test Programs   Associated Modules   Download   Notes

Syntax

|OR operator
 AND is implied by adjacency
#Zero or more operator
?Matches any character
'Escape character for syntax symbols
%Matches empty string
-Range of characters
( ) [ ]grouping
*abbrev for '#?'

 


Examples

#?Accepts any string including empty string
?#?Accepts any string except empty string
'#Hello'#Accepts only #Hello#
(0|1|%)(0-9|%)(0-9|%)|2[0-4][0-9]|25[0-5]Accepts numbers 0 to 255, with leading zeros optionally

 


Definition Module

DEFINITION MODULE RegularExpressions;     (* v1.1-073004 *)

(* Copyright (C) 1989 Thomas Breeden.

       Support for pattern matching a la AmigaDos.

        AlternateOp       '|'
        ConcatOp             -> implied by adjacency
        KleeneClosureOp   '#'               ie, 0 or more
        AnyCharOp         '?'
        QuoteOp           "'"
        EpsiOp            '%'               ie, null
        RangeOp           '-'

        also
        grouping          '(' and ')'  OR   '[' and ']'
        abbrev for '#?'   '*'
*)

TYPE   NFA;         (* Non-deterministic Finite Automaton *)

       RegExprErrs  = (NoErr, NoMemory, BadExpr, NilNfa);

PROCEDURE CreateNFA(VAR Handle:NFA; RegExpr:ARRAY OF CHAR; VAR Err:RegExprErrs);
PROCEDURE InvalidNFA():NFA;

PROCEDURE RunNFA(Handle:NFA; Str:ARRAY OF CHAR; VAR Err:RegExprErrs):BOOLEAN;

PROCEDURE DiscardNFA(VAR Handle:NFA; VAR Err:RegExprErrs);

PROCEDURE IsSimpleNFA(Handle:NFA):BOOLEAN;

(*DEBUG*) PROCEDURE DumpNFA(Handle:NFA);

(* NOTE:
     1. Careful: case is significant.
     2. Max size of the ARRAY OF CHAR params is 256.
     3. Strings should contain only 7 bit codes, null delimited.
*)

Test Programs

TstRegExpr  Test Program, tty based.
 TstRegExpr.mod 
TstRegExprGUI  Test Program, with GUI.
 Windows/TstRegExprGUI.exe For 32 bit Windows.
 AmigaOS4/TstRegExprGUI For AmigaOS4 Pre-release.

Associated Modules

Null delimited static strings  
 Str0 Basic procedures.
 Str1 More procedures.
Large sets  
 Set0 Basic procedures.
Miscellaneous support modules
 Debugging0 Assert and Debug procs.

Download

RegExprM2.zip includes all test programs and source files listed above.


Notes

  • Input characters are limited (mostly) to 7 bit ASCII. Characters above 367C are used internally in the algorithm code.
  • The input RegExpr string length for CreateNFA and the input test string length for RunNFA is limited to about 256 characters.
  • The created NFA has at most twice as many states as the number of symbols and operators in the regular expression.
  • Most of the development was done on AmigaOS. The StonyBrook Win32 Modula-2 compiler was used to build the TstRegExprGUI program. My multiplatform GUI module is pretty primitive and untested on any but my own Windows config.