.ll 80
.lt 80
Mike:

This is an effort to provide some explanation of what I've done
(and some insights into the cryptic compiler tables).  You'll have
to edit it for the consumption of the general public, since I'm not
sure just which parts of the work you will want (or legally be able) to
distribute.

The floating-point-instruction-set C compiler
was generated by modifying the 'Version 6.5'
C compiler.
As it turned out, it was only necessary to modify the second pass
of the compiler (that part whose source is contained in 'c1?.c').
However, given the vagaries of distribution, I've sent back the source
for all the passes to insure that everyone is working on the same basis.

It should be noted at the start that I did not fix any
of the known bugs in the compiler (sequential execution, integer pointer
incrementation, ad nauseum).

The files 'cc.1' and 'cc.6' are new manual entries.

The shell command file 'install' copies 'c0', 'c1', and 'c2'
to '/lib/c0', '/lib/c1', and '/lib/c2';
it also updates'/lib/libc.a' and '/lib/liba.a'
with subroutines created from files in the directory 'subs'.
BE SURE to back up existing versions BEFORE 'install' is used (or else!).

The shell command file 'run' recompiles the compiler (from scratch).
Note that it is slightly different from the 'run' provided in Version 6.5.

In modifying the C-language compiler source, deletions of lines were accomplished
by commenting them out and tagging them 'BELL', that is:

.nf
/*	printf("This line was in the original and was deleted.\\n");	/*BELL*/
.fi

while additions were tagged 'ADO', for example:

.nf
	printf("This line was added to the compiler.\\n");		/*ADO*/
.fi

(There are also some lines tagged 'DIAG' that I added; they are presently
commented out.)
Less than one-hundred lines in all are affected.

The file 'table.s' was substantially modified;
no effort was made to tag the changes.

One line of 'cvopt.c' was modified to allow diferentiation between
constants and short floating-point constants.
The new 'cvopt', when applied to the old 'table.s', produces the old 'table.i',
so if anyone back East is worried about such matters, there is no need to
keep both versions.

The result of all this dickering is a compiler that produces code which,
when confronted with the need to do floating-point operations, puts the
operands on the stack and then either
does a 'fadd sp' (or 'fsub sp', 'fmul sp', or 'fdiv sp'),
or, if a conversion or test is to be performed, calls a subroutine
(all of whose names begin with the letters 'fis' and all of whose source code
is in the file 'subs/fis.s').

Just like the one, only and original C compiler, this one expands 'float'
entities to 'double' entities when they are passed to subroutines.

The compiler assumes that a function returning a 'float' or 'double'
leaves (should leave) the result in a '.globl' variable named 'ac0'.
The file 'subs/fp.s' contains the source for a floating-point simulator in which
simulated register zero is named 'ac0' and is in '.comm'.
In those cases where C programs are to be linked with subroutines that
use 11-45 instructions, this simulator must be used.
('subs/fp.s' also has the 'fpsim' business that was used up to now to deal
with floating-point problems; it may be useful to those with
assembly-language routines and a reluctance to recode.)

The compiler doesn't produce the world's best code.  Part of the problem is
that when an expression like
	a + b
.br
is to be compiled for later use, there's no way to know whether the later
use is to be as an argument to a function (in which case a double entity should
be put on the stack)
or in a place where putting a float on the stack would suffice.
Therefore, double results are always generated.  Thus the following program:

.nf
float	a,b,c;
test()
	{
	a = b + c;
	}
.fi

produces the following code:

.nf
_test:
	clr	-(sp)
	clr	-(sp)
	mov	2+_b,-(sp)
	mov	_b,-(sp)
	mov	2+_c,-(sp)
	mov	_c,-(sp)
	fadd	sp
	mov	(sp)+,_a
	mov	(sp)+,2+_a
	cmp	(sp)+,(sp)+
.fi

where clearly the 'clr's and the 'cmp' aren't needed.  Oh, well.\ .\ .
(On the bright side, things like 'a = b' and 'a =+ b' are done better.)

On to table.s (at least to the degree that I understand it).
A lot of this will be stuff you (and others) have already deduced, but
I'll include it here for the unenlightened.

There are actually four tables: 'regtab', 'cctab', 'efftab', and 'sptab'.
The table to be used is determined by the use to which an expression is to
be put and the side effects of expression evaluation desired.
\&'sptab' evaluates the expression and leaves it's value on the stack;
\&'regtab' evaluates the expression and leaves it's value in a register
(symbolized in 'table.s' by the character 'R');
\&'efftab' evaluates the expression without leaving the value anywhere
(but generating any side effects of the evaluating--for example, in
evaluating 'a = b' it moves the contents of the memory location associated
with the name 'b' to the memory location associated with the name 'a');
\&'cctab' evaluates the expression in such a way that it's value can be tested
(that is, it sets the condition codes correctly).

Within each of four tables there are entries for those operators ('+', '=+',
NAME, etc.) for which code having the properties associated with the
table can be produced.
Each entry consists of a number associated with the operator and a pointer
to yet another table.

Within a table for a particular operator there are entries for those operand
pairs (or for that operand in the case of unary operators) for which
code can produced.
For each operand pair there is information which encodes the type of operand
for which code may be generated and information about the code to be produced
(huzzah! we're at the payoff at last).
The operand-pair description always appears in 'table.s' in the
following form:

.nf
	%<operand description>,<operand description>
.fi

where the operand description to the left of the comma applies to the
left-hand operand.

Each of the operand descriptions has the same form:

.nf
	<class description><optional type description><optional '*'>
.fi

The class description is one of the following characters (for each
of which a 'meaning' is given):

.nf
	z (Zero)	The operand is a constant whose value is zero.
	1 (one)		The operand is a constant whose value is one.
	c (Constant)	The operand is a constant.
	a (Assemblable)	The operand has an assembly-language representation.
			For example, a constant whose value is zero can be
			represented by '$0' in UNIX assembly language.
			Likewise, the memory location associated with the
			name 'b' can be represented by '_b'.
	e (Extra)	The operand has no assembly-language representation,
			but an extra register is available for use in the
			compilation of the current expression.
			Note that it is not possible to use 'e' in both
			operand descriptions of an operand-pair description.
	n (aNything)	Any operand is permitted.
.fi

The optional type description, if present, is one of the following characters
(for each of which a 'meaning' is given):

.nf
	b (Byte)	The operand should be accessed with byte instructions.
	w (Word)	The operand should be accessed with word instructions.
	l (Long)	The operand is of type 'long'.
	f (Float)	The operand is of type 'float' or type 'double'.
	d (Double)	The operand is of type 'double'.
.fi

In the absence of a type description, it is assumed that the entry in question
can be used to generate code for either the byte case or the word case.

The optional '*', if present, indicates that the operand should be a pointer
to an entity whose nature has been described rather that just the entity.

In using the table associated with an operator, the compiler examines each
operand-pair description in turn to see if the code associated with that
description can be used on the operands it is currently confronted with.
The order of entries within the table is therefore important.
If a table had a
	%n,a
.br
entry followed by a
	%n,c
.br
entry, the second entry would never be used (if the right-hand side is
a constant, it is assemblable, so the first entry would be used to generate
code).

Following the operand-pair description is a representation of the code
to be generated.  Note that, for any given representation, it is possible
to have more that one operand-pair description associated with the code; for
example,

.nf
	%ab,n*
	%a,nw*
		representation follows here
.fi

indicates that the representation can be applied if the left-hand operand
is an assemblable byte entity and the right-hand operand is a pointer to
anything OR if the left-hand operand is an assemblable byte or word entity
and the right-hand side is a pointer to any word entity.
Note also that a label can be associated with a representation.  The label
must appear before any operand-pair descriptions.  An example:

.nf
	%[label:]
	%ab,n*
		representation follows here
.fi

The label is useful when a given representation can be applied to more than
one operator--the table entry for those operators other than the
first take the form:

.nf
	%ab,n*
	%	[label]
.fi

Now what about those representations?
Well, in 'table.s', each one is terminated with a blank line.
Some lines begin with 'F' or 'S' and instruct the compiler
to do more table searching.
The 'F' entries apply to the First (left-hand) operand;
the 'S' entries apply to the Second (right-hand) operand.
The most important cases are summarized below:

.nf
	F		Evaluate the First operand, leaving its value
			in a register.  The register is later symbolized 'R'.
			'R' is the register the compiler expects the value
			of the expression to be left in when 'regtab' is
			being used; in other cases, it is a scratch register.
	S		Evaluate the Second operand into register R.
	FS		Evaluate the First operand onto the Stack.
	SS		Evaluate the Second operand onto the Stack.
	F* (or S*)	The operand is a pointer to an entity.
			Put the address of the entity (possibly plus or
			minus a constant) in register R.
	FS* (or SS*)	Put the address of the pointed-to entity
			(possibly plus or minus a constant) on the Stack.
	F1* (or S1*)	Put the address of the pointed-to entity
			(possibly plus or minus a constant) into a register.
			The register is later symbolized 'R1'.
			'R1' is the 'extra' register that the compiler was
			notified would be used by having an 'e' in an
			operand description;  don't try to use F1* or S1*
			if there wasn't an 'e' in the right place.
.nf

In those cases where a pointer is involved (F*, S*, etc.) the 'constant'
referred to is later symbolized '#1' (if an 'F' pointer is involved)
or '#2' (if an S pointer is involved).

Lines that do not begin with an 'F' or 'S' cause generation of code
without further searches through regtab et. al.
The lines are just copied to the assembly-language output file, with the
following being the most important 'escapes':

.nf
	R	Produces 'rx', where rx is the name of the register into
		which the value of the expression is to be placed when
		'regtab' is in use, and is the name of a scratch register
		otherwise.
	R+	If 'R' would produce 'r1', R+ will produce 'r2'.
	R-	If 'R' would produce 'r1', R- will produce 'r0'.
	R1	Produces 'rx' where rx is the name of the 'extra' register.
	A1	Produces the assemblable string associated with the
		left-hand operand.
	A2	Produces the assemblable string associated with the
		right-hand operand.
	A1+	Produces a string equivalent to '2+A1' when it is meaningful;
		produces 'A1' otherwise.  Used with longs (and with floats
		and doubles in some compilers).
	A2+	See above.
	A1'	If A1 would produce '(rx)+', A1' produces '(rx)'--
		postincrementation is inhibited (as is postdecrementation).
		Use of this 'feature' can cause bad code to be generated;
		try '*--ip =^ 5;' for a laugh.
	#1	Produces a string representing the offset part of
		the left-hand operand (which must be a pointer).
		Since this offset is applied to a register (or a thing on
		the stack), F* or F1* (or FS*) must appear before this does.
	#2	Produces a string representing the offset part of
		the right-hand operand (which must be a pointer).
		Since this offset is applied to a register (or a thing on
		the stack), S* or S1* (or SS*) must appear before this does.
	B1	Produces 'b' or nothing depending on whether the left-hand
		operand is (or is not) a byte entity.
	B2	Produces 'b' or nothing depending on whether the right-hand
		operand is (or is not) a byte entity.
	BE	Produces 'b' or nothing depending on whether the whole
		expression is (or is not) a byte entity.
		expression is (or is not) a byte entity.
	F	Produces 'f', 'of', 'fo', or nothing as appropriate.
	I	Produces the primary string associated with an operator
		('add', 'sub', etc.).
	I'	Produces the secondary string associated with an operator.
		For example, shifts are primarily associated with 'ash'
		but secondarily (in special cases) associated with 'asr'
		or 'asl'.
.fi

"Let's check our understanding" with a typical 'table.s' entry:

.nf
/ +, -, |, &~, <<
cr40:
%n,z
	F

%n,1
	F
	I'	R

%[add1:]
%n,aw
	F
	IB2	A2,R

	.
	.
	.
.fi


Here, we're going to try to perform an addition (subtraction, etc.) and
leave the result in a register (hence the 'r' in 'cr40').
If we're trying to add anything to zero ('%n,z') we just evaluate the
left-hand side, leaving its value in the appropriate register ('F').
If we're trying to add anything to one ('%n,1') we evaluate the
left-hand side into the appropriate register ('F'), then use a secondary
operator on the relevant register ('I'	R')--for example, 'inc'
instead of 'add' or 'dec' instead of 'sub'.
To add anything to something which is addressable and
is a word entity ('%n,aw'),
we evaluate the left-hand side into a register ('F'), then
perform the appropriate operation ('add', 'sub', etc.) using the
second operand in the address field of the instruction.  (Note that
although in this case we KNOW that 'A2' is a word entity and
might therefore expect that the 'B2' is unnecessary, this particular
code representation is labelled--and therefore might be used in other
instances where the 'B2' WAS necessary.)

I feel obligated to conclude with a "Caveat lector".
My confidence in my understanding of the compiler tables is by no
means as high as I might like it to be.  Everything I have said above should
be taken with a full shaker of salt.  Further, keep in mind that future
compilers which find their way out of Bell Labs will probably use the same
letters to represent different things.  But this description may be better
than nothing.
