[HOWTOUSE.DOC]
[Harold V. McIntosh, 28 December 1983]

This disk is intended as a collection of examples for the programming
language REC, although many of the examples are useful in their own
right, particularly CNVRT.REC which compiles the programming language
CONVERT. The files REC80.COM and REC86.CMD are binary files containing
object code for REC destined respectively for the Intel 8080 and the
Intel 8086. REC80 may be executed by the Intel 8080, Intel 8085, or the
Zilog Z80, since it uses none of the instructions which differentiate
these CPU's. REC86 will presumably execute in any of the Intel family
of 8086 CPU's, that is the 8088, 80186, 80286, although in fact it has
only been proven on Godbout's dual 8085-8088 board.

REC80 partitions the available memory in the ratio 5-4-1 for compile
area, pushdown area and workspace, after setting aside space for the
8080's pushdown stack and having lost whatever space was occupied by the
binary program and the tables FXT and VRT.  REC86 is configured for 8080
mode with a 64K code segment, and a fixed partition for the available
memory. Thus REC80 will work in almost any system - at least 16K or
larger, but since the source code is available on the companion disk,
either of the two versions can be reassembled to suit whatever system
desired by changing the EQU's defining the memory partition or else
by reprogramming the partitioning alogrithm.

The assumed operating system is CP/M 2.2, on the one hand, or CP/M86
on the other hand. However, deliberate advantage is taken of the CP/M
transfer vector in BIOS to get direct access to console I-O. This was
necessary in some editor-like applications of REC, for which CP/M's
tampering with the data stream was too slow and conflicted with the
natural editing functions of the programs involved. The source code
contains provision to restore the CP/M I-O. We have observed that the
direct access prevents REC programs from working with MicroShell in all
respects. REC80 will run with CP/M 1.4, because it does not use any of
the features which distinguish the two versions.

There are some minor aspects in which the programs are dependent on
the nature of the console equipment, again when used for editing and
similar purposes. Although some ASCII standards are universally res-
pected, such as for the assignment of carriage return and line feed,
many others are not, or do not even exist. Thus screen clear, and the
cursor movements may require some adjustment, although this adjustment
can be made in the applications programs and would not require any
change in the binary REC programs.  As written, they have all worked
successfully with the TeleVideo 912-C and TeleVideo 910 terminals.

REC has operators which give direct access to the I-O ports. None of
these applications programs use them.   Anyone planning to use them
would clearly have to know the configuration of his own system to do
so.  The operators in REC86 are literal transcriptions of those in
REC80, and have not been tested for 16-bit port addresses.

REC expects to find one or two file names on its command line.  If it
finds neither, it enters a conversational mode, in which it displays
a logo with the version date.  In this mode REC expressions may be
entered, using CP/M's editing facilities [which means that you can get
back to CP/M by typing ^C at the beginning of a blank line].  They are
compiled and promptly executed, giving one an opportunity to experiment
with REC as a language.  Try ('hello'TL;) for example. A line may be
edited, but once a carriage return is given, it is compiled and cannot
be retrieved for additional changes. As many lines will be prompted as
required to balance parentheses; there is no guarantee that the program
entered makes sense, of course.

When REC encounters a file name on the command line, it supposes that
it is the name of a source file which is to be loaded and executed. A
second file name is preserved and transmitted to the executing program
in the CP/M location 05CH, which means that an application program can
make use of CP/M's file handling facilities. REC does not preserve the
remainder of the command line, however.

There is no such thing as REC object code, because the compiler is a
loader, which loads for immediate execution.

REC, as a language, can be described rather simply. There are four
control symbols - the two parentheses, colon and semicolon. There are
also operations, and some of these can have truth values [true or
false], and are designated predicates. Operations are executed in the
sequence they are written, which can be interrupted in two ways. A
colon means to repeat the sequence from the beginning, a semicolon means
to quit entirely. A predicate can interrupt the sequence in one further
way - when it is false you take up a new sequence. The new sequence can
be written after the nearest colon or semicolon, a place which otherwise
is inaccessible. After mentioning two idiosyncracies the description of
the control is complete.

First, several predicates in a sequence all go over to the same alternate
sequence in the case they are false. Instead of giving a program the form
of a binary tree in which the point at which execution has arrived is
well defined by the hisory of previous definitions, many of the branches
are allowed to overlap. Often this is of no concern, but it will now and
then happen that it is necessary to know how the program got to a given
point. Setting and testing variables is a simple resolution which goes
outside the framework of a simple control structure - it implies a theory
of variables. Repeating a past predicate may be inconvenient [say there
is a very long calculation involved] or impossible [a real-time event
has passed]. This quandray already appears in the calculation of an
exclusive or [a and not-b or not-a and b; you may have to do one or the
other twice].  Recognizing the existence of this point goes a long way
toward meeting the confusion it may cause, which in many years of REC
programming has seemed to be the only really serious source of indecision
in laying out programs.

The second point is that logical negatives are essential in setting up
a program, and can be obtained in a slightly backhanded way by reversing
the sense of all predicates in the final segment of a REC expression. In
a simpler form, if p is a predicate, (p) is its negative.

Indeed, boolean combinations of predicates can be realized through the
flow of control set up by the syntactic elements we have mentioned. If
p and q are two predicates, (p;q;) is their logical or, (pq;) is their
and. (p(q);(p)q;) is the exclusive or but, as remarked, it may involve
doing p or q twice. Since the evaluation of a series of predicates is
sequential, from left to right, no more of them will be executed than is
necessary to reach a decision. As a consequence the boolean operations are
not really commutative, and care must be taken  with their ordering. In
practice this situation is more of an advantage than a disadvantage.

Few programs can subsist without subroutines, and REC is no exception.
The mechanism of subroutine definition is to enclose a list of the
desired subroutines and their names, together with the program which
is to use them, in a pair of curly braces. The combination might show
the form {(www) a (xxx) b ... (yyy) n (zzz)}, in which the first
parenthesis [which itself could be enclosed in braces because of some
subroutine nesting] the definition of the subroutine a, and so on. It 
is agreed that all the subroutine definitions will be valid within the
main program and each other, but only within the curly braces, and
unless superceded at a lower level by a new subroutine definition on
that level.

In reality, the syntactical definition of REC says nothing whatsoever
about the choice of operators or predicates for the language, nor about
the symbolism used to represent them, beyond the reservation of the
symbols which have been mentioned.  It is a historical custom that REC
has used single letters for symbols, mainly because of the convenience
of a single keystroke in its interactive use, and the resulting economy
of not having to parse identifiers. REC80, which is linguistically
identical to REC86, displays a collection of operators and predicates
assigned to slightly less than the 96 printable characters of the ASCII
alphabet, whose detailed description is often quite intricate, and which
may be deduced from the explanations given in the source listings. Some
twenty years of experience, and the influence of several people have gone
into the choice of a relatively general-purpose collection. Anyone could
start afresh and create a new version of REC by changing its inherent
function library; something which has been done from time to time in the
past.

The REC in this collection makes use of three principal data structures,
to which a proportionate share of the available memory is dedicated. They
are: the compiling area, the pushdown list, and the workspace. The first
of these is where the program which REC compiles is normally stored;
if access to the compiler itself is granted in the choice of functions
[C, in the present example], the use of this area can be programmed.
The second, the pushdown list, reflects a decision to work on the level
of reversed Polish notation, passing arguments to functions through a
stack and recovering results from the same place. Chains of characters
are likely to be the principal data type used, which means that some
provision must be made for handling variable length arguments.  The
third structure is a sort of accumulator for long character strings,
and is serviced by specialized functions such as searches, insertions,
and deletions.

The source listings contain descriptions of each of the operators and
predicates, and should be consulted for the details of their functioning.
The listing is broken down into separate modules according to these
functional groupings, namely compiler, pushdown list, workspace, main
program and input-output.

REC interacts with the CP/M operating system by treating BDOS as a
subroutine [k, or K] to which the necessary parameters are passed.
Space for buffers, file control blocks and the like can be taken from
the pushdown list, the workspace, or from absolute locations in the
memory.

The disk is mostly filled with applications programs.  Five of them
are compilers for "languages," such as one would meet in a Computer
Science course. Indeed, in Puebla they are used for exercises in just
such courses. Consequently, a certain background in things mathematical
 - symbolic logic,  for example - would probably be useful preparation
for the examples. Be that as it may, the applications of these languages
are in turn readily comprehensible and rather amusing.  Much sport can
be had making up Turing Machines or Markov Algorithms to do simple
things.

The languages are:

		Turing Machine
		Markov Algorithm
		Post Production
		LISP
		CONVERT.

They range from simple to complex; by far the most attention was
devoted to the last of these, because it is a practical language of
many applications.  So is LISP, of course, but here it was written in
REC as an exercise, which means that one should turn to one of the
many serious LISP processors available if the intention is to use it
for a major undertaking.

In all cases one or two sample programs are shown for these languages
so that some comparisons can be made between the ways that each of
these languages goes about doing things; a binary sum is one example,
a self-compiler of the language where appropriate is another.  The
user of this disk is urged to prepare for unpleasant surprises by
making a backup copy before doing any experimentation. The REC versions
of the sample programs will actually run, and are intended to be
significant and illustrative in their own right. However, the five
programs cited above will compile them from their own source programs.
If this compilation is botched, one will be deprived of the enjoyment
of the sample programs until the art of compilation has been mastered;
this may or may not take a while.

One of the great stimulants for the evolution of REC in particular
directions was the desire to have a language with which CONVERT could
be compiled; CONVERT has previously existed as an interpreter within
LISP. Even yet there are details to be mastered in the generation of
search patterns for CONVERT, so that CNVRT.REC will suffice for a time.
CONVERT is a pattern matching language whose evolution from a scheme
such as Post Productions is relatively evident. The idea behind its
operation is to isolate segments of a text, either because they satisfy
some requirement or because they are situated in a prescribed envoiron-
ment [letters between a pair of parentheses, for example]. The segments
so isolated are to be combined into new text, preferably with the help
of very general functions and control structures. The whole process can
be repeated over and over until a desired result is obtained.

A brief outline of CNVRT is the following. The same structure that REC
has is followed, to facilitate compilation. Thus a string of subroutines
is to be written terminated by a main program; CNVRT.REC will insert the
curly braces and the set of system subroutines.

Each individual program is a quadruple, (()()()()), consisting of a list
of pattern definitions, a list of skeleton definitions, a list of the
variables to be used, and a rule set. Some of these elements may be null,
but they must nevertheless be represented in the quadruple by a parenthesis
pair.

Pattern and skeleton lists follow the convention of alternating the body
of the definition with the name which will represent it. Variables are
simply numbers, in the range 0 to 20 or so (32, minus variables used by
the system), separated by spaces. The only penalty attached to listing
more variables than are needed is that they will be needlessly saved and
unsaved many times.

The rule set is a sequence of pairs, each pair must be followed by a
colon or semicolon; in the nature of REC this determines whether the
rule is a repetitive rule or a terminal rule. The pair consists of a
pattern and a skeleton; the pattern is a predicate which ascertains
whether the content of REC's workspace has a prescribed form or not.
If it does, it may have identified some strings and labelled them as
values of the variables. A successful pattern match is followed by
substitution in the skeleton, an unsuccessful match results in passing
to the next rule. If no rules remain, the process halts, the workspace
remains unchanged.

The skeleton portion of the rule determines the new content of the
workspace. The workspace may be transformed iteratively by a repetitive
rule, in which case the same rule set is scanned anew. Among the patterns
are functions, whose arguments fill the workspace, and whose own rule set
determines the further course of transformation. Several functions can
work in sequence, each building up its part of the new workspace.

Patterns are primitive and composite. Primitive patterns can be constants
or variables. Implicitly, any character string which is not otherwise a
pattern is a constant, recognizing the same character string in the
workspace text. An interplay between round parentheses and angular
brackets is used to quote these syntactic elements. Variables are strings
of characters which are identified by special symbols, and which ought to
have been selected in advance of the pattern matching. Part of the intricacy
of CNVRT.REC, whose development has still not been brought to completion,
lies in the mixing of the generation of variable candidates with the search
and matching mechanism. If they are not mixed, however, the search becomes
even more time consuming than it already is.

Composite patterns are either boolean combinations of simpler patterns, or
pattern definitions, or recursive combinations of the two. Two special
cases, maximal iteration and minimal iteration, handle the majority of
applications of recursively defined patterns [a parenthesis nest, a list
of even length, or something similar], and may indeed be sufficient for
all such combinations.

Skeletons are either constants, variables, or functions. Again, anything
which is not explicitly something else is implicitly a constant, which
saves a great deal of quotation. Variables are character strings which
were dissected out of the source text by the matching pattern paired with
the skeleton. For the moment, functions are defined externally, but with
time and experience inherent functions will probably be included in the
language - counterparts of such things as do, while, or if in some of the
other popular languages.

Two principal examples of the use of CNVRT are included; one is the
traditional symbol manipulation exercise of calculating symbolic
derivatives without deltas and epsilons, but rather from a systematic
application of the rules for the construction of algebraic expressions
and the rules for the derivatives of these constructions; both can be
given a very elegant formulation in recursive terms.

The second example is not complete, but is included because of its
utility in understanding another popular language - "C". The five
programs

		ASMBL.CNV
		STMNT.CNV
		DCLRN.CNV
		EXPRN.CNV
		PRGRM.CNV

each illustrate one aspect of"C".  They can be used to test one's
understanding of respectively, "C", Kernighan and Ritchey's book,
CNVRT, or any combination of the three.  In particular, STMNT does
not exactly handle "CASE" correctly; this can be easily corrected,
it is a useful exercise to see how to do so.  It is interesting that
the five programs were written in one month's time, using appendix
A of the book, with little previous understanding of many of the
principal ingredients.  Two months are still lacking to realize the
hypothetical goal of establishing a "C" compiler in a new machine,
and will evidently be spent in learning how to bring down the size
of all the intermediates a little.

As a final illustration, a CNVRT "compiler" of REC is shown, which
in turn may be examined for some insight into how REC works.

One portion of the REC compiler, which occupies about 1.5K bytes,
does the actual compilation, and functions just about as efficiently
as a loader would.  In fact, the custom of restricting REC functions
to single letters results in the compiled code occupying about three
times the volume of the source code.  This is understandable in terms
of the jumps and calls, each three byte instructions, into which the
majority of REC characters translate.  Particularly in the days of
paper tape readers or punched card readers, the saving in volume amply
repaid the time dedicated to compilation.

A second portion of REC contains functions managing the pushdown list,
with another third involved in the operation of the workspace. Some
space is dedicated to tables, and to input-output routines.  However,
the majority of these depend upon CP/M for their operation, which is
a part of the system overhead.  All told, REC for either machine, 8080
or 8086, occupies 5K bytes of code.

There is obviously some inefficiency resulting from compiling a program
in REC rather than writing it in assembly language.  The restriction of
the number of functions to the ASCII alphabet has resulted in some
ingenious combinations and programming conventions which all take their
toll in space.  As usual, the rapidity with which certain tasks can be
programmed in any language overshadows many of the penalties which are
paid for a systematic and stylized way of doing things.

The situation becomes more pronounced in CNVRT; to program every
possible combination within a general class without fail requires
elaborate precautions which are hardly ever necessary in the simpler
examples.  One has only to think of the similar situation in "C",
FORTRAN, and other languages where the generality of indexing in
arrays leads to formulations which appear excessive when applied to
one-dimensional character strings.  CNVRT seems to inflate by a
factor between two and five when compared to REC, again depending
on the proportion of comments present.  This can mean a factor of
ten to twenty compared to machine language.

This condition will have to be overcome before a "C" compiler written
in CNVRT becomes a practical reality. As it would seem, the same could
be said of a "C" compiler written in "C".

The programming examples are intended to be self-sufficient.  Thus,
when executed, say by a command line such as:

A>REC80 DERIV.REC

they will respond with an indication of the kind of input that they
expect.  Many are one-shot, running through a single cycle of oper-
ation before returning to CP/M, others will cycle indefinitely until
given null input.  A person who wishes to improvise some programs of
his own should note the trick employed by the five language compilers.
Any commentary enclosed between double square brackets - [[...]] -
will be echoed by the program as part of its initialization.  Also,
the source programs have three blank lines near their beginning, which
designates a preferred location for the pre-defined subroutines which
the compiler will incorporate into its object program.

In general, these two features must be found within the first 1K bytes
of the source program, and considerable caution should be exercised
regarding the placement of parentheses, double square brackets, and
triple blank lines at the beginning of programs.  Also, a title such
as [PROGRAM.XXX] will be changed to [PROGRAM.REC], helping to identify
the compiled program.

The compilers require a command line of the form:

A>REC80 LANGUAGE.REC PROGRAM.LNG

where LNG is the extension - TNG, MKV, PST, LSP, or CNV - identifying
the language.  In each case, the program will expect the user to employ
it as an editor, compiling the source code line by line.  Inexperienced
users will probably not want to follow this procedure; therefore when
the language extension is omitted:

A>REC80 LANGUAGE.REC PROGRAM

the compiler will follow an automatic sequence which will require no
operator intervention. However, the file must already exist on the disk
with the appropriate extension.  The usual CP/M conventions for designating
alternative disks may be used, nor need REC80 reside on disk A. In this
mode of operation there is no interaction via the console, and thus there
need be no preoccupation with the cursor controls and screen clearing
commands.

[end]