.pl 72
.ND
.sp 5
.TL
MACTAB II
.sp
Multiple Assembly-language Compiler
.br
Table Formatter.
.sp 2
.AU
Ross Nealon
.sp
.AI
University of Wollongong.
.AU
Recent mods. by Peter Lamb
.sp
.AI
University of Melbourne
.bp
.NH 1
GENERAL DESCRIPTION.
.PP
MACTAB is one of the companion programs to the
MAC cross-assembler.
Its purpose is to aid the user in the
formatting and production of a description file
for the MAC cross-assembler.
.PP
This file is expected to be an absolute
binary data file, containing a concise description
of the target machines architecture, the desired
format of the assembly-language source line, and
other necessary details.
.PP
All error messages are reported to the standard
output, and are flagged with the line number of the
description line in error.
Any error will abort production of the description
file.
.bp
.NH 1
TERMINOLOGY.
.IP "r-file:  " 14
An r-file is a file containing a formatted
description of a machine for the MAC cross-assembler.
.IP "d-file:  " 14
A d-file is the (almost) human-readable input to
mactab.
.IP "opcode:  " 14
An opcode symbolic is the symbolic
label chosen to represent a particular
instruction. Thus - 'sub' may be chosen
to mean the subtraction instruction.
.IP "class:  " 14
The opcode class is a number, which describes
to which category of instructions this
particular symbolic belongs.
Classes may represent addressing modes, or
instruction types, such as branches, loads
and stores.
A label class is a symbolic which groups
together labels which have special properties,
for example register names or branch conditions.
.IP "descriptor:  " 14
A format descriptor is a concise description
of a binary instruction, giving the location
of the argument fields within the instruction,
and the width of the fields.
There must exist one format descriptor
for each possible binary instruction format.
.IP "picture:  " 14
The argument picture is the part of the source line
after the symbolic opcode. The picture
contains expressions, which are evaluated to
form arguments to the instruction being
assembled.
The picture can be used to recognise different
classes of opcodes.
Thus - '#123' might mean immediate mode,
while '123' might mean the contents of address 123.
.IP "labels:  " 14
Labels are defined as standard MAC labels,
consisting of at least one alpha character,
from the set {a-z . _} and can optionally
be followed by one to seven alphanumerics, from
the set {a-z . _ 0-9}.
MACTAB allows the user to pre-define labels,
that is - to assign a value to a label.
This label thereafter cannot be redefined,
and has its assigned value in any expression.
Mac local symbols (those starting with '`')
cannot be predefined.
.IP "classed labels:" 14
A label may be assigned to be a member of one or more
label classes (A class name is of the same form
as a label).
Classed labels are generally used for naming registers etc. where
a general expression may not be desirable.
.IP "literals:  " 14
Literals are special labels, that are defined as reserved,
and have no value. Literals cannot be used in expressions,
but are useful for recognising different argument pictures.
Thus, if 'x' is defined as literal, the picture 'expr,x'
would indicate that an expression then comma then literal 'x'
are required in that order for a succesful parse of that
source line.
.bp
.NH 1
SYNOPSIS.
.PP
mactab  [-lnfx] [-hheader] d-file  [r-file]
.sp 2
.IP "    d-file:  " 14
Input description source code.
.IP "    r-file:  " 14
Output filename.
.sp 2
.PP
Only the source file name is required.
MACTAB reads from 'source' until end-of-file or the 'end' section
is encountered.
MACTAB then collects individually compiled
sections, and writes them to the file 'r-file'
if one was specified.
Any error will abort production of the r-file,
but will continue scan of source.
.DS
-l:	produce a listing which shows groups
	of opcodes arranged by the groups
	of argument pictures allowed for them.

-f:	(implies -l) additionally supply a list
	of all the binary output forms of the
	instructions.
	The output is interpreted as:

	0 or 1	bits produced by 'o', 'n' or 'v' formats
	a - m   bits generated by args a - m, as is
	A - M   bits generated by args a - m, PC relative
	____	Underlined -- bit-field swapped first half
	underlined, last half plain.

-x:	(implies -l) also provide the information
	given by -l in a cross-reference form.
	All the argument pictures are printed and
	numbered, and then a per opcode table
	of allowed arguments is printed. '**'
	means that combination is illegal.

-n:	produce the output in a form for n/troff
	to produce a pretty listing complete
	with emboldening etc.

.DE
.sp 3
.NH 1
SOURCE FORMAT.
.PP
The r-file source description is made up
of several distinct parts, two of which are optional.
These parts are called SECTIONS.
The order of the sections is not critical, but
some sections require that other sections be
previously defined.
It is suggested that the user follow the
given section ordering and internal arrangement.
.PP
A section has the general form:-
.DS L
            section-name
            .
            .
            <description>
            .
            .
            %
.DE
.sp 2
.PP
The '%' at the end indicates end-of-section.
The section-names and end-of-section
character must begin in column one of the source line.
Any line may contain a comment, introduced by ';' and
continuing to the end of the line. Empty lines are
permitted only if the first char is a ';'.
.bp
.NH 1
HEADER SECTION.
.PP
The header section generally describes the
target machine's architecture to MAC.
A header section MUST be supplied.
This section produces the header record
at the beginning of each r-file.
.PP
MACTAB allows the definition of special
pseudo opcodes for the definition of
constants during the assembly process.
These opcodes, called Define Constant (dc)
opcodes, can be given one extra identifying
character, appended to the 'dc'.
.IP "      e.g.-" 16
dc4     dcf     dc.
.br
dcb     dch     dc_
.PP
The identifying character must be a legal
alphanumeric, so that the whole dc opcode
forms a legal MAC label.
.sp 2
.PP
Each dc can be constructed to define
constants of particular length and/or format.
Thus - 'dcb' may be declared to define one byte
of storage with a constant in it, 'dch'
to define a half-word constant, 'dcw'
to define a full-word constant.
The source line required is:-
.IP "" 12
dc      <character>    <format>
.PP
The required <format> field is described in detail
later in this manual. Please refer to the
FORMATS section for this description.
.sp 2
.PP
MACTAB and MAC allow the user to define
a default dc, which has no identifying
character. The source line required is:-
.IP "" 12
defmt   <format>
.PP
Where <format> is as described later.
.sp 2
.PP
For the use of loaders handling MAC
output, perhaps from a range of different
machines, the magic number put into the
m.out file may be specified here.
The magic number defaults to
MWORD in m.out.h.
.IP "" 12
magic   <value>
.sp 2
.PP
Program counter incrementation is important
for addressing relative to the program counter.
The pc can be pre-incremented (before an instruction
is executed) or post-incremented (after an instruction is executed).
Declaration is as follows:-
.IP "" 12
pc      post           OR
.br
pc      pre
.bp
.PP
The assembler requires the width of the basic
address unit (byte) in bits. Declaration is:-
.IP "" 12
byte    <width>
.PP
Where <width> is a constant (numeric)
with a value equal to the byte width.
.sp
.PP
Similarly - the assembler needs to know the
number of bytes per word.
It's definition is:-
.IP "" 12
word    <length>
.PP
Where <length> is the number of bytes per word.
.sp 2
.PP
The address width of the machine (in bits) may also be selected. This
controls the format of the address field of the listing. The
default is 16.
.IP "" 12
addr	<value>
.sp 2
.PP
For machines where the start of an instruction must
be on some multiple of a basic unit, but not all
instructions are equal in length to that multiple
(you don't believe that such machines exist ?? Ho, Ho).
Mac provides padding to make up the difference.
Instr tells mac how long (in b.u's) an instruction is. Default
is 1 b.u.
Bind binds instructions to 'instr' boundaries.
The padding is always with b.u.'s of value 0.
.IP "" 12
bind
.IP "" 12
addr	<value>
.sp 2
.PP
The user is requested to supply some opcode value
that is treated as illegal by the target machine.
If none is supplied - zero (0) is assumed.
This is now superseded, and produces only a warning.
The declaration is:-
.IP "" 12
ii      <value>
.sp 2
.PP
MAC allows the user to supply a string of up
to thirty characters that is printed at the head of each
page generated by MAC (listings and so on).
This is optional.
.IP "" 12
mac     "Title string"
.sp 2
.PP
MAC allows the user to give a page length in lines
for listings and so on. This is specified as a number of lines.
The default is 60.
.IP "" 12
page    <lines>
.bp
.PP
A typical declaration is illustrated below.
The format descriptors on the define constant
lines are described later.
.DS
.sp
header
pc      post
byte    8
word    2
dc      b       a:8
dc      f       a:16
defmt           a:8
mac     "Dummy machine"
page    60
ii      0xff
%
.DE
.bp
.NH 1
LITERALS SECTION.
.PP
The literals section allows definition of the
special class of labels called literals.
Literals are standard MAC labels that
have been defined as reserved,
and have no value assigned to them.
Literals cannot be used in expressions,
but are useful in an instructions argument field
for recognising different pictures.
A literal cannot be redefined as a label with
a value.
A typical definition may be:-
.DS
.sp
literals
x
y
a
%
.sp
.DE
.PP
The literals section (if needed) MUST appear
before the args section.
This section is optional and may be totally
omitted if desired.
.sp 3
.NH 1
LABELS SECTION.
.PP
This is the only other optional section.
This section allows the user to pre-define
up to NSYM (see mactab.h) labels and assign them values.
Mac local symbols (those starting with '`') may not
be predefined.
These labels will always be defined to
all users of the r-file,
and so provide a mechanism
for remembering frequently used
addresses or values, such as
register names or
subroutines in the monitor Read-Only-Memory.
The labels are defined to have non-relocatable
values.
This section must appear before the 'args' and the 'classes' section
if classed labels are to be used (see section on label classes).
The definition is:-
.IP "" 12
label    <value>
.PP
Where <value> is a legal MAC constant.
A typical section definition is:-
.DS
.sp
labels
adc     4
nul     0
tty     0xc0fe
mask    0777
%
.sp 3
.NH 1
CLASSES SECTION.
.PP
This section is the last optional section. Its purpose
is to bind some of the symbols defined in the labels section
into classes. A label may be bound to any number of classes from
zero up to the number of classes defined. The number of permitted
classes is equal to the number of bits in a 'C' 'int' variable
i.e. 16 on PDP-11's, 32 on VAX, Interdata 8/32, 3220 etc. If a
label is not mentioned in this section, then it is considered
'unclassed'. Classed variables may appear in any position in
an argument string where a label of a class of which they are
a member is specified.
A class definition is :
.IP "" 14
<classname>	<label1> <label2> <label3>
or
.IP "" 14
<classname>	<label4>,<label5>,<label6>
.br
A classes section may be:
.br
.IP "" 12
.br
.sp
.br
classes
.br
intreg	r0,r1,r2,r3
.br
fpreg	fr0,fr2
.br
index	ir0,ir1
.br
%
.bp
.NH 1
.bp
.NH 1
FORMATS SECTION.
.PP
This section describes the format of the binary instruction
(or data for a define constant) to MAC.
Generally - each class of instructions (zero
page, pc relative, register indexed etc)
will have a seperate format descriptor.
Each descriptor can define one instruction
or data format.
.PP
Each descriptor is made up of field descriptors,
called subset descriptors.
Each subset is defined as follows:-
.IP "" 12
<subset-name>:<subset-width>
.sp
.PP
Each subset name is one identifying character.
The letter 'o' means opcode value, '!' means
the current value of the program counter,
a '#' means a constant will follow,
and the letters 'a' to 'm' mean the values of argument
expressions one to thirteen respectivly.
The special subset name 'v' indicates a selected value
is to be used.  The mechanism for selecting
this value is described later in the argument section.
The subset name 'n' means the next (least significant)
number of bits of the opcode value.
.PP
Pre-fixing the subset names 'a' to 'm' with the
letter 'p' indicates that this argument should
be assembled program counter relative.
That is - the value assembled is the value of the
expression minus the current value of the location
counter.
Pre-fixing the subset names 'a' to 'm' with the
special field 'r<n>' where <n> is a decimal constant,
implies that the argument expression should be
assembled with the most significant <n> (number) of bits
in the expressions value
swapped with the least significant <n> bits.
This format prefix assumes that the maximum
length of the expression's value will be
two times the constant <n>.
.PP
The name 'o' implies assemble the value
of the current opcode being assembled here.
Then 'o:8' implies that the opcode
value should be assembled in the next 8 bits
of memory.
Thus 'a:16' means assemble argument 'a' (argument one)
in the next 16 bits, '#123:16' means assemble
the decimal constant 123 in the next 16
bits, 'pa:8' means argument one made pc
relative in the next 8
bits, 'r8a:16' means assemble argument one
in 16 bits, with the first 8 bits and the next 8 bits
swapped.
The form 'n:6' requests the least significant (next) six
bits of the opcode value to be assembled in six bits.
The opcode value is then shifted right by six bits.
The form 'v:3' indicates the next (least sig.)
three bits of the selected value should be assembled
in a field three bits wide.
This action is similar to that of the 'n' subset name,
except that MAC uses a selected value, and not the opcode
value.
.bp
.PP
The full descriptor is made up of several
subset descriptors grouped together, optionally separated by white space.
.DS
|------------------------------|
| opcode | arg 1 |    arg 2    |
|------------------------------|
    8        8         16            (bits)
.sp
The above instruction may be described as:-
.sp
o:8 a:8 b:16 
.DE
.sp
.PP
Each descriptor must be prefixed with a number
indicating the number of arguments to this instruction,
blanks/tabs, then the format descriptor.
.IP "      e.g.-  " 12
2       o:8 a:8 b:16 
.sp
.PP
A typical definition may be:-
.DS
.sp
formats
0       o:8
2       o:8 b:4 a:12 
1       #0xf:4 o:4 a:8 
1       o:6 a:12 
1       a:4 o:4 #0x12:8 
1       o:8 r4a:8 
1       o:8 pa:8 
2       n:7 v:3 o:3 a:3 b:16 
1       o:8 a:8 !:8 
%
.sp 2
.DE
.PP
MACTAB scans each format descriptor for validity,
and reports any inconsistencies.
The total width of the format descriptor in bits
must be an even multiple of the width of the
basic address unit (byte). For this
reason, the header section must be defined before
the formats section.
.PP
Each descriptor is assigned a number,
starting at zero (0) and being incremented by one
for each new descriptor.
These logical numbers are the only method of referring
to a format descriptor.
.bp
.NH 1
OPCODES SECTION.
.PP
This section is used to describe all
of the possible opcode symbolics (except the pseudo
opcodes which are always defined) and the
values of each symbolic.
Each different argument picture may be used to
select a new opcode class (a particular value
out of a class of values).
Thus - the construct
.sp
.DS L
		sub   expression
.DE
.PP
may select a value from
a list for the 'sub' instruction, while
the construct
.sp
.DS L
		sub   #expression
.DE
.PP
may be set
to select another value from the same list
for the 'sub' instruction.
The table is set out as lines of triples enclosed
in parentheses.
Each triple corresponds to a legal opcode value. The first
member of the triple is the opcode class. If this value matches the
opcode class value generated by the argument scan, the triple
is used for generating the opcode.
If there is no triple matching the class returned by the
argument scan, then this combination of argument
and opcode is illegal.
The second member of the triple is the number of a
format descriptor to use in assembling the instruction,
if no other descriptor is selected (see argument section).
The third member of the triple is the value used
by the instruction formatter for the opcode.
.sp
opcodes
add	(0 0 0x10) (2 1 0x30)
jmp	(3 2 0x7f)
sub	(0 0 0x50) (2 1 0x70)
%
.sp
.DE
.PP
In the above example there are two classes for each
of the instructions 'add' and 'sub', the first one
selected when the class returned by the argument is
0, and the second selected when the class returned is 2.
Any argument which selects a class other than 0 or 2
is illegal with 'add' or 'sub'. For the case of 'add',
if the class is 0, then the format selected is format 0,
provided the argument did not preselect the format, and
the opcode value made available to the formatter is '0x10'.
The 'jmp' instruction will only be legal with argument
pictures returning class 3.
.PP
There is no limit on the number of opcodes or classes.
However, if the listing options are to operate correctly
the largest class should be one less than the
wordsize of the machine (max class no. should
be 15 on a PDP-11, or 32 on a VAX or PERKIN-ELMER
8/32 or 3220). This limit may be raised in future releases.
.PP
The reader intending to create d-files should first look
at some of the d-files supplied with MAC.
.NH 1
ARGS SECTION.
.PP
This section is responsible for the generation of the
parser table used by MAC.
This section takes the user's description of
the argument pictures and using a top-down
recursive descent algorithm builds a
finite-state parser table.
Each picture is made up of literals, the reserved
keyword 'expr' meaning an expression, delimiters (commas
and so on) and some required characters,
such as a '#' meaning immediate mode, '$' meaning
zero-page addressing, '@' meaning indexed and so on.
These characters are 'required' in the sense that
they must be present in the argument picture for
MAC to recognise that format of picture.
Of course - it is totally up to the user
as to
which characters will mean what.
.PP
MACTAB recognises the keyword 'expr', and
where-ever it occurs, MACTAB substitutes a call
to an expression parser, that is pre-defined.
MACTAB also recognises label class names which have
been previously defined in the CLASSES section. Where-ever
a class name appears in the argument picture, a label
bound to that class may appear in the source of an
assembly program.
.PP
The user can associate a series of actions to
perform upon recognition of an argument picture.
The actions currently implemented are:-
.IP "    1)" 8
Select a new format descriptor
.IP "    2)" 8
Select a class of opcode values for
this symbolic
.IP "    3)" 8
Select a value (numeric constant)
to be used in formatting the instruction.
.sp
.PP
Typical argument pictures could be:-
.DS
x
expr , x
expr , ( y )
# expr
a , expr
( $ expr ) , y
fpreg , expr
intreg , fpreg

; Where intreg and fpreg are classed
; labels (see CLASSES section)
.DE
.PP
Selection of actions is by addition of
four constants after the picture but on the same
source line.
The first describes which action to take (if any).
Each bit in the constant means a particular argument.
If a bit is set, MAC will try to perform the requested
action.
The remaining three constants are arguments to
the actions, specifying such things as
which new format descriptor, which
opcode class , and the actual numeric
value (8 bits max) to use when selecting a value.
If the bits are numbered 0 as the least-significant bit
(right-most), then if bit zero is set, MAC will select
a new format descriptor, and assume the fourth
argument constant will be the new format descriptor.
If bit one is set, MAC will select a class of
opcode values, that class being the value of
the third constant. If no class is selected,
the default is class 0.
If bit two is set, MAC assumes the second constant is
an 8-bit value to use somehow.
Selection actions are optional.
The actions are selected by preceeding the four constants
with a brace '{'.
All four constants are needed after the brace.
Constants should be seperated by blanks and/or tabs.
.bp
.PP
Several pictures should always be defined by
the user.
The picture " " (a string) should be defined,
for the dc and title pseudo opcodes.
A blank line is equivalent to the case of
an instruction having no argument picture.
See appendix 1 for examples of real definitions.
.sp 5
.NH 1
END SECTION.
.PP
The end section is needed to actually
create the named r-file.
Upon recognition of this section,
MACTAB collects the compiled sections,
and writes them in their correct order
onto the r-file.
Any errors except the end section
missing will cause the r-file not
to be produced.
If an end-of-file is encountered before
the end section,
MACTAB reports this as a warning only,
and assumes an end section.
In this case only, an r-file will
be produced.
No terminating '%' is required for this section.
.bp
