Runtime Environment

This topic provides background information about the runtime environment used by Stony Brook Modula-2 and Ada95 programs.

This information is particularly important for anyone who wants to combine Stony Brook code with code written in another language. If you are writing programs in your native language only, it is unlikely you will need the information in this chapter.

The Stony Brook Compiler and Runtime Library have been designed to operate with the fewest possible assumptions, thereby allowing maximum flexibility in combining Stony Brook with other languages. This chapter documents all the assumptions the compiler makes about the runtime environment. You will find information about:

Memory models
The data group
Mixed models programming
Calling procedure assumptions
Program initialization
Public name usage

The CPU Word

In order to more clearly discuss memory models with 16-bit and 32-bit processor modes, we introduce the concept of a CPU word. The CPU word size determines the "direct addressability" within a segment.

16-bit mode CPU word : 2 bytes, address range 64K bytes
32-bit mode CPU word : 4 bytes, address range 4G bytes

Memory Models

The Intel IA-32 family of machines has a unique architecture that divides memory into segments (64K bytes for 16-bit segments, 4G bytes for 32-bit segments). The processor has four segment registers, each of which allows you to access a segment of data or code:

CS points to the segment in which the program is executing code.
DS points to the default data segment.
SS points to the stack segment.
ES is set as needed to access data that are not accessible by any of the others.

Note: With the 80386 and later processors, 2 additional segment registers (FS and GS) were added.

With this architecture, you can think of memory addresses as either near or far addresses:

Near addresses are offsets from a segment register. They occupy one CPU word of storage and can access a maximum of 64K bytes of data in 16-bit mode and 4G in 32-bit mode.
Far addresses contain an offset and a segment. They can address any byte of memory.

32-bit mode:

In 32-bit mode, a single segment can access 4G bytes of memory. Programs are run in what is called a FLAT memory model. This model points all segments to the same memory, eliminating the requirement that a program has to concern itself with segments (as in 16-bit mode). The 32-bit mode always sets the memory models as follows:

CODE = SMALL

DATA = SMALL

CONSTANT = SMALL

16-bit mode:

The code generated by Intel compilers is more efficient when you assume that there is precisely one segment of code and one segment of data. Unfortunately, this is not true of many applications. To avoid compromising efficiency or addressability in all situations, many compilers provide different memory models. The compiler can make different assumptions about the size of code and data for different programs, according to the memory model you choose. The memory model chosen determines whether near or far addressing is used to access various objects like procedures, variables, parameters, and pointers. The Stony Brook compiler provides several different memory models for several different purposes:

Two memory models for code

Three memory models for data

Three memory models for constants

The memory models are chosen at compile time by using the memory model compiler options. Specifying a memory model affects only the default behavior of the compiler. You can override the method of addressing for each data object individually, when necessary.

Note: Since a fixed memory model is used for 32-bit mode, the remainder of the memory model discussion pertains to 16-bit mode only.

Code memory models (16-bit only)

The supported code models and their bounds are:

Small	The combined code of all modules must be less than 64K bytes.
Large	Each module can have up to 64K bytes of code. The total is limited only by the memory available on your system. Large is the default.

Data memory models (16-bit only)

For purposes of defining the memory models, the data accessed by a Stony Brook program is divided into five data classes:

Static data consists of all the variables in your program that are not contained in a procedure.
Initialized data consists of all variables with initializers which are outside of a procedure.
Constant data consists of literal strings, floating point constants, and some constants implicitly created by the compiler.
Dynamic data consists of all variables defined inside procedures, and is always stored on the stack.
Heap data is the data you allocate by use of the NEW standard procedure. Pointer/Access types generally refer to heap data.

The data models supported are:

Small	All data, including stack and heap, must be less than 64K bytes.
Medium	All static data must be less than 64K bytes. The stack can contain an additional 64K bytes for dynamic data. Heap data is limited only by the available memory. Medium is the default.
Large	The same requirements as the medium model, except each module can have up to 64K bytes of static data.

By default, the compiler assumes that the stack is in a separate segment. If you use the memory model option to change the stack to reside in the data segment, the startup code adjusts the stack segment and stack pointer to make the stack segment the same as the data segment.

Constant memory models (16-bit only)

The three constant data memory models supported are:

Small	Stores the constant data in the data group
Medium	Stores the constant data in the code segment (the default)
Large	Allocates a separate segment for the constant data

The Data Group (16-bit only)

When you use the small or medium memory models, the static data, and possibly the Runtime environment stack, are placed in a group called DGROUP. A group is a logical structure used by the linker to represent a set of data that is limited to 64K bytes. The data segment register (DS) is used to point to the data group. When any Stony Brook procedure is called, the data segment register is assumed to point to the data group. If you are calling Stony Brook code from another language, you must first ensure that DS is set up properly. Use the procedure attribute LoadDS to inform the compiler that the procedure will be called without DS pointing to the data group. This causes the compiler to generate code to set up DS on entry and restore it on exit from the procedure.

Mixed Model Programming (16-bit only)

Mixed model programming is mixing modules compiled with different memory models in the same program.

You cannot do mixed model programming with most compilers that operate on the Intel processor. Those compilers require that you compile all modules of a single program with the same memory models. In those cases, the compiler does not know how the other modules were compiled, and, therefore, must make the assumption that they were compiled the same way. Because Stony Brook compilers read a symbol file of all external modules to which you refer, it can transmit information about the memory models used on separate compilations. This makes it possible to mix modules compiled with different memory models in the same program.

There are some restrictions, however, in what you can do with mixed models. The next three sections state the conditions for compatibility between modules compiled with different memory models.

A module compiled with the large code model cannot make calls to modules compiled with the small code model. This is because the procedures in the module compiled with the small code model assume that the caller and the procedure are in the same segment. All other combinations of code models of the caller and procedure are allowed.

In files compiled with the small data model, address parameters are assumed to be near addresses. This means that the actual parameter must be in the data group.

The following classes of data cannot be passed as short address parameters:

Static data of large model modules
Heap data of medium and large model modules
Local data of modules compiled without the /STACK:DS qualifier
STACK qualifier
Literal strings and floating point constants in modules compiled with the medium or large constant model.

The compiler issues an error message if you try to pass data that is not in the data group as a near address parameter.

In modules compiled with the medium or large data model, far addresses are used for address parameters. You can pass any data to address parameters in these modules.

Using the near and far keywords (16-bit only)

You can use the near and far keywords in procedure declarations to override the defaults for the address length implied by the code and data models you specified at compile time.

Assumptions for Calling Procedures

When a procedure is called in a Stony Brook compiler, the compiler makes the following assumptions:

DS must be the address of DGROUP, the data group. You can use the LoadDS procedure attribute on the procedure declaration to eliminate this requirement.
The stack segment register SS must be the same as DS if the file in which the procedure is defined was compiled with this stack memory model option.
The values of all other registers are not used.

All of the above conditions must be met when calling a Stony Brook procedure from another language. When calling a procedure in another language from Stony Brook, you must be sure that the assumptions the procedure makes are compatible with those listed above.

Program Initialization

When you run a Stony Brook program, certain initialization code must be executed before the first line of your program.

The next thing your main program does before executing its first line is call the initialization code of each separate compilation unit in the program. These are Modula-2 IMPLEMENTATION MODULEs and Ada95 Library Package bodies. The initialization code is the code after the BEGIN statement in the implementation module, or Package.

If your main program is not written in Stony Brook, you must ensure that the initialization code of all Modula-2 modules and Ada95 Packages, linked into your program are called before you call any procedures in the code. You can do this by making an explicit call to each Modula-2 module, or Ada95 Package, that has initialization code by using the public name of the initialization code.

The public name of the initialization code for a Modula-2 module, or Ada95 Package, is the compilation unit name with $InitCode appended to the end of the name. For example, if the Module name is Crt, the initialization code public name would be

Crt$InitCode.

To call it:

You call it with no parameters.
You use a far call instruction in 16-bit mode, even for modules compiled with small code model. For 32-bit mode, a near call instruction is used.
For 16-bit mode, you must have DS set to point to the data group before calling the initialization code.

Public Name Usage

The Stony Brook compiler defines a public name for each variable and procedure declared in a Modula-2 Definition module, or Ada95 Library Package. The public name is the name of the symbol, preceded by the name of the module and an underline character.

For example, the public symbol name of the SetTextAttr procedure in the Crt module is:

Crt_SetTextAttr

For Ada95 things get a bit more complicated. The compiler appends a numeric value to the end of the public symbol on procedures. This is because Ada95 supports procedure name overloading. The exact structure of this numeric value is undocumented. Therefore you should probably specify via pragmas public symbols for your procedures if you want to call them from other languages. Use the above public name convention for your public symbols.

Note: The calling convention currently in effect can change how public symbols are formatted. The above example is for the StonyBrook (default) calling convention.