.	C Manual - Language - nroff with cdoc0
.HI 1
.ce
C Manual - Language
.ce
Edited by R.P.A. Collinson
.ce
Document No: DOC/UNIX.K3.10/1
.sp 5
.ce
Contents
.in +20
.SC 1 Introduction
.SC 2 "C Reference Manual by D.M. Richie & K. Thompson"
.SC 3 "Notes on C"
.br
3.1	The C language
.br
3.2	File name conventions
.br
3.3	The Operation of the C compiler
.br
3.4	Extension to cc
.br
3.5	Extension to the C compiler
.br
3.6	Register Allocation
.br 
3.7	Loading C programs
.SC 4 "Unix Commands for C"
.br
ar, as, cc, ld
.SC 5 "Debugging C programs"
.br
cdb, ctracer, db
.SC 6 "Miscellaneous System routines"
.br
abort, ctime, hmul, ldiv, perror, reset, time
.SC 7 "System Error numbers"
.sp 5
.in -20
.so unixlicense
.SH 1 "Introduction"
.NF
.PG
This manual gives information for UNIX users who wish
to program in C. It should be used in conjunction with the UKC
document "UNIX Introduction" (DOC/UKC.UNIX.K0/1) which gives
details of the operating system; and the "UNIX Programmers Manual"
from Bell Labs which gives an alphabetic list of commands, system calls and
user subroutines.
For space reasons, this manual does not cover input/output in C or how to create and
control new processes; there are two subsidary documents covering these
topics:
.sp
1)	C Manual - Input/Output - DOC/UNIX.K3.10/2
.sp
2)	C Manual - process control - DOC/UNIX.K3.10/3.
.PG
The sections of this manual are self explanatory and (where releveant)
are prefaced by a short introduction passing on some of the folk-lore and
pitfalls not mentioned clearly in the documentation.
.PG
The second section of this manual is a reprint of the 'C Reference Manual'
obtained from Bell Labs. Newcomers to C are advised not to consult
the 'C-reference manual' until they have read and experimented with programs
in the 'C-Tutorial' by B.W. Kernighan.
.SH 3 "Notes on C"
.PG
This section contains details on the operation of the C compiler; how
C programs are loaded; and some extensions to compiler which are present
in the program but which are not documented in the official manuals.
.SB "The C Language"
.PG
This subsection passes on some of the pitfalls which are not mentioned in the documentation.
.sp
Comments.
.PG
Comments may not be nested, for instance
.br
	/* /* ..... */ */
.br
is illegal.  The compiler picks up the first '*/' and takes it
as the closing bracket of the comment.
.sp
Semi-colons.
.PG
Semi-colons in C are a problem. A statement is
defined as ending with a semi-colon, but note that
.sp
	{............};
.sp
is defined as two statements, with the semicolon being a 'null' statement.
In the code section of the program, a closing bracket is always preceded by a semi-colon, so
.sp
	{ a = 8 }
.sp
is illegal; and must be written as
.sp
	{ a = 8; }
.sp
A closing bracket never requires to be followed by a semi-colon (except in one position - see below).
However, it usually does not matter if a semi-colon is present, but there are a couple
of places where it certainly must not be placed, for example
.sp
	if(...)
	{
	.
	.
	.
	}
	else x = 9;
.sp
if a semi-colon was put after the closing bracket, this would result in a trailing else.
.PG
The one exception to the "don't put semi-colons after }'s" rule is that a
semicolon must be placed after a variable definition statement which performs initialisation, for
example:
.sp
	char *names[] {
		"fred",
		"bill",
		"joe",
		"arthur",
		"mike",
		"bob"
	};
.sp
must have a semi-colon after the closing }.
.sp
Binding of operators.
.PG
A very common mistake is to get the priority of the '&' operator wrong.
For instance:
.sp
	if(a & 010) {.....}
.sp
will perform the contents of the curly brackets if the correct bit is set in a.
However, if the sense of the test is to be reversed, the statement
.sp
	if(a & 010 == 0) {...}
.sp
does not do what it seems as the == operator is executed first. So the statement
must be written
.sp
	if((a & 010) == 0) {.....}
.sp
The moral of this tale is (as usual) "if in doubt use brackets"
but it is possible to find out the code which C generates by using the -S
option to 
.it cc
and inspecting the '.s' file.  This is often a good plan (assuming you can read
assembly code).
.SB "File name conventions"
.PG
Unix expects certain files to have fixed extensions and also
generates files with given extensions; these extensions are:-
.sp
.in+10
.ti-10
file.c	A file containing a C source program, the compiler 
.it cc
(I) will not accept a C program on a file unless the file has the
extension '.c'.
.br
.ti-10
file.s	A file containing an assembler source program destined for processing
by 
.it as
(I).
.ti-10
file.o	A file containing program and data in loader format.
.ti-10
file.i	Output from the
.it cc
program with only macro processing being done - see below.
.ti-10
file.a	A file in archive format generated by 
.it ar
(I), in the context of the C compiler the subfiles on the
archive file are usually '.o' files ready for searching by the link-loader
.it ld
(I).
.ti-10
a.out	The output file from the Unix loader
.it ld.
.in -10
.SB "The Operation of the C compiler"
.PG
To compile a C program the system program
.it cc
(I) is used.
.it cc
has two functions: first, 
it performs a macro pass on the source of the
program, dealing with the #define and #include statements;
and secondly,
it controls all the phases of the compilation and linking.
The output from the macro pass may be obtained by setting the -P switch into 
.it cc
and inspecting the output written onto the file with the extension '.i'.
.PG
A program written in C is translated by the C-compiler into source for the UNIX
assembler.
This source may be optimised (by the use of the -O switch to
.it cc).
The source is passed to
.it as
to produce
a '.o' file which 
is then linked with any system calls and subroutines using the UNIX loader
.it ld
(I).
.it cc
deletes any workfiles which it may use on the users current directory.
.PG
The loading phase links a small header file
which calls the users 'main' routine and on return performs a 
system 
.it exit
(II) operation; the user program modules; and selected
files from the C library archive file (/lib/libc.a) and the
assembler library archive file (/lib/liba.a).
The two archive files
contain a number of loader format files with one user routine per '.o' file.
A particular '.o' file is not loaded into the program unless it contain a definition
for a previously undeclared name; in this way only the code which
the program requires is actually loaded.
.br
.ne 10
.PG
For example, if the program
.br
	main()
	{	char c;
		c = getchar();
		putchar(c);
		exit();
	}

.br
was entered into a file called single.c; and the
command
.br
	cc single.c
.br
issued; the following sequence of events would occur.
.it cc
calls the
two passes of the compiler and generates a file called single.s, which only contains the
code relating to the program in single.c.  When 
.it as
is called, it will generate a file called single.o which will contain a symbol table
with three undeclared symbols (_getchar, _putchar and _exit - the hyphen is prefixed
automatically to the name).
The loader is then called by
.it cc
and the loader will look in the archive files for subfiles containing
definitions of _getchar, _putchar and _exit.  When the subfile is found, it is loaded after the
user program and this in turn will add a few more undeclared symbols to the symbol table.
The search through the archive files is performed serially, so it is important
that the archive subfiles are in the correct sequence.
The symbol table in the a.out and .o files may be inspected by the use of 
.it nm
(I).
.PG
The sequence of operations which take place on a C program means that program
segments may be split over several
source files and the '.o' files may be retained to regenerate a whole program.
For instance, the command
.br
.ne 5
.sp
	cc f1.c  f2.c
.br
will generate three files - f1.o, f2.o and a.out.
If the program is to be recompiled with only f2.c altered, the command
.sp
	cc f1.o f2.c
.sp
may be used. This will recompile f2.c; generating a new version of f2.o, which is loaded in the
file a.out by the loader with the unchanged version of f1.o.
.PG
It is not possible to input a '.s' file into 
.it cc.
It is illegal to say, for example
.sp 
	cc f.s
.sp
The only way round this problem is to compile the '.s' file using
.it as
and then pass the resultant '.o' file into the loader (perhaps using cc).
.SB "Extension to cc"
.PG
The #define and #include features are performed by
.it cc
and not by the C compiler itself.
.it cc
will
perform conditional compilation depending on the definition of a variable.
The following commands are used to control conditional compilation:-
.sp
.in+20
.ti-20
#ifdef <definevar>	compile code which follows if the variable 'definevar' is defined
by an appropriate #define statement.
.ti-20
#ifndef <definevar>	compile code which follows if the variable 'definevar' is not defined.
.ti-20
#endif		end of conditional.
.sp
.in-20
The # character must be the first character in the line, and the file must
start with a single # on a line. The conditions may be nested.
.PG
This feature may be used to put debugging statements into the code, e.g:-
.sp
	#define DEBUG 01
	.
	.
	.
	.
	#ifdef DEBUG
		printf("Debugging s = %o\\n", s);
	#endif
	.
	.
	.
.br
which would result in the printf statement being compiled.
If the #define statement was not present, the printf statement would not be
compiled.
.SB "Extension to the C compiler"
.PG
A double length integer may be defined as type 'long'. For example:
.br
	long a,b;
.br
would result in space being allocated for two double length integers.  This feature
has not been extensively tested, but it is safe to assume that the compiler will accept 'long'
in any position where it currently accepts 'int'.  However, although long addition and
subtraction do work; multiplication, division and the remainder operation on long
integers should be performed explicitly by calls to 
.it hmul
(III) and
.it ldiv
(III). When encountering long multiplication, division and remaindering the compiler
generates references to routines which are not present in the current system, this
fact will not be signalled as an error until the loading phase of the compilation.
.SB "Register Allocation"
.PG
C uses the following register allocation:-
.in+20
.ti-10
r0	- used for partial results and returning results from function calls.
.ti-10
r1	- used for partial results.
.ti-10
r2-r4	- allocated to the user by the 'register' variable definition.
.ti-10
r5	- points to base of current local variables for a routine.
.ti-10
sp	- points to current top of stack.
.in-20
.PG
Routine calls are always:
.sp
	jsr	pc,routinename
.sp
and the first instruction in the routine is:-
.sp
	jsr	r5,csv
.sp
which adjusts sp and r5 and saves r2, r3 and r4.
.br
The instruction used to return from a routine is
.sp
	jmp	cret
.sp
which reset the stack and the registers; and performs the appropriate 'rts'
instruction.
.SB "Loading C programs"
.PG
Unix programs are compiled and stored on the a.out file in three
segments:
the 'text' segment, the 'data' segment and the 'bss' segment.  The
text segment is the (possibly pure) code of the program; the data segment
contains initialised variables and strings; and the bss segment contains
the uninitialised data areas.
.PG
Unix loads the file into store, again making three segments: the text segment (possibly
write protected) which starts at virtual store location 0; the data segment, which
first contains the 'data' from the file followed by the bss segment; and a stack
which starts at location 0177777 and runs up the store.  When a program is
loaded, it  is always loaded into zeroised store and so all variables from the
bss segment are initially set to zero.
.PG
There is thus a difference between global arrays and variables (which are located in
the bss segment) and local variables and arrays (located on the stack).
If the following program is considered:
.br
.ne 11
.sp
	int f[55];
	.
	.
	main()
	{	int g[55];
	.
	.
	.
	}

The array f will be located in the data segment and will be initialised to zero, but the
array g will be located on the stack and its contents are undefined.
.SH 4 "Unix Commands for C"
.PG
This section contains reprints from section I of the "Unix Programmers Manual"
giving the commands relating to C compilations. The commands givens are:
.sp
.in+10
.ti-10
ar	The archive program.
.ti-10
as	The Unix assembler.
.ti-10
cc	Compile a C program.
.ti-10
ld	The Unix link-loader.
.sp
.in-10
.SH 5 "Debugging C programs"
.PG
This section contains the manual pages for two programs supplied with the
system:
.it cdb
(I), the C debugging program; and 
.it db
(I), the general purpose debugger.
The section also contains the manual for ctracer, which can be compiled with
your program by the use of a special version of
.it cc
and provides an interactive method of monitoring and inspecting the running program.
.SB "Notes on cdb"
.PG
.it cdb
allows the user to inspect and run a C program under breakpoint control, it
is also used to investigate the reasons for core dumps produced intentionally or
unintentionally.
The following points should be borne in mind:
.br
1)	Exit from 
.it cdb
by typing Control-D.
.br
2)	In most circumstances it is necessary to have a core dump, obtain one
by entering your program and typing Control-shift-\\.
.br
3)	
.it cdb
allows the user to access local variable names by typing a routine
name, a colon and a variable name, e.g:
.br
	main:argc
.br
will print the contents of argc (assuming it has been defined). Global
variables may be accessed by simply typing the global name.  With local names, there are
some restrictions on their access. First, if the variable is defined as a 'register' variable,
its contents cannot be printed in this way (type $r to get the register contents).
Secondly, remember that C uses a stack for execution and a local variable is not present
unless that routine is actually being executed.
.SB "Ctracer"
.PG
To compile ctracer with your program, use 'cct' to compile the program. This is a special
version of the compiler with modifications designed to insert subroutine calls to routines at certain
places in the program.  The compiler is also the floating point version.
.PG
Ctracer works fine unless your program takes its input from
the standard input channel and the standard input is from a file. In this case
ctracer reads a lot of the file looking for its own particular variety of commands and generally
the whole thing collapses in a heap.
.PG
Ctracer originated in Amsterdam (which accounts for some of the English found
in the manual) and is not totally foolproof. However, in most cases it
will tell the user what is going on in the program.
.SH 6 "Miscellaneous System routines"
.PG
The following routines are included in this section:-
.in+10
.ti-10
abort	Used to generate a core dump.
.ti-10
ctime	Convert time  and date to a printable string.
.ti-10
hmul	Return high order product.
.ti-10
ldiv	Long division routines.
.ti-10
perror	Print system error messages - see Section 7 of this manual for a numeric list of
error messages.
.ti-10
reset	Perform a 'long jump'.
.ti -10
time	get the system's idea of the time.
.in-10
.SH 7 "System Error Numbers"
.PG
This section contains a numeric list of the system error numbers and their meaning.
In C, the error number from a system call is passed back to the user in a variable
called 'errno' which must be defined as external.
