Pascal, Part II. This lecture will be about the part of Pascal that is most "Pascalish", i.e. those aspects of the language that differentiate it from such languages as Fortran and Algol. RECORDS type gender=(male,female,other); persrec=record name:packed array[1..10]of char; ssn:integer; sex:gender end; var pers,pers2:persrec; begin pers.name := 'Hedrick '; pers.ssn := 123456; pers.sex := male; writeln(pers.name); pers2 := pers; ... TYPE section contains definitions of user-declared types. Once defined, used like INTEGER, REAL, etc. The following are equivalent: type bigarray=array[1..100]of integer; var x:bigarray; and: var x:array[1..100]of integer; GENDER is an "enumerated type". If you declare a variable to be of type GENDER, it can take on the values MALE, FEMALE, or OTHER, and only those. MALE, FEMALE, and OTHER are added to the language as constants. A RECORD contains a list of "fields". Since PERS is a PERSREC, it consists of three fields: PERS: [DEC-20 implementation] --------------------- NAME: : : : : : : 5 chars to a word on DEC-20 --------------------- : : : : : : --------------------- SSN: : : --------------------- SEX: : : --------------------- You can treat PERS as a single object, or you can look at these individual fields. If you say PERS2 := PERS you are treating it as a single object. The whole record, i.e. all of its fields, is copied. To look at an individual field, say PERS.field, e.g. PERS.NAME. PERS.field is treated as a simple variable with the declaration given in the RECORD declaration. Thus PERS.SSN is an integer variable. You can do arithmetic with it or anything else: PERS.SSN := PERS.SSN + 1 When a variable is declared in the VAR section, it is just a fixed piece of memory, with space for all of its fields. If you need to create records dynamically, Pascal puts them into the "heap". This is a part of memory that expands automatically. NEW allocates more space. In this case, all that gets allocated is a "pointer" to the variable. Pointers are indicated by ^ {This program fragement reads names, one per line of input. It constructs a list of records with these names} type perslist = record name:packed array[1..10]of char; next: ^perslist end; var listhead,newone: ^perslist; begin listhead := nil; while not eof do begin new(newone); read(newone.name); readln; newone^.next := listhead; listhead := newone end end. In a type declaration, ^ BEFORE a type name means we have a pointer to a record of that type. RECORD ... NEXT: ^ FOO <------ address of a FOO is put here ... RECORD ... NEXT:FOO <----- an actual FOO record is put here ... Alternatively, you can say FOOPTR = ^ FOO; ... RECORD ... NEXT: FOOPTR ... In the body of the program, ^ AFTER a variable means to follow the pointer. Consider LISTHEAD^.NEXT LISTHEAD is NOT a record. It is a pointer to a record. So LISTHEAD.NEXT would be illegal. LISTHEAD^ is a record, the record pointed to by LISTHEAD. Since it is a record, it has fields. Thus LISTHEAD^.NEXT is one of its fields. NB: Some versions of the Pascal standard do not have any way to recover space once it is allocated. The proposed new standard has DISPOSE(LISTHEAD). This is implemented on the DEC-20. Note that DISPOSE returns the record pointed to by LISTHEAD. If LISTHEAD has pointers in it, the records pointed to by them are not returned. You should return them first if you want to. The rule is: Each call to DISPOSE returns exactly one record. NIL is a pointer constant. It is compatible with any pointer type. It is a pointer that points notwhere. If you LISTHEAD^ when LISTHEAD contains NIL, you will get an error. FILES program test(infile,output); type binfile=file of integer; var infile:binfile This is equivalent to program test(infile,output); var infile:file of integer A file is just a variable. It just be declared, like any other. FILE OF CHAR is a normal readable file, i.e. a file with characters in it. Each time you do an input, you get one character. FILE OF INTEGER is a binary file. Each time you do an input, you get a complete integer, in internal format. You can have FILE OF anything, although some implementations may not allow FILE OF FILE, and pointers in files are a bit odd. All except CHAR are binary. Pascal just dumps the internal code for the object, using however many words it takes in memory. When you declare a file, you get a "buffer variable", name^ var infile:file of integer INFILE^ is the buffer variable for INFILE. It is of the same type as the base type of the file, in this case INTEGER. The buffer variable acts as a "window" into the file. It contains the current element of the file. GET(INFILE) reads the next element of the file, putting it into the buffer variable. PUT(INFILE) writes the current contents of the buffer variable Here is a program to copy a binary file: program bincopy(infile,output); var infile,outfile:file of integer; begin reset(infile); {reset opens a file for read} rewrite(outfile); {rewrite opens a file for write} while not eof(infile) do {EOF is true at end of file} begin outfile^ := infile^; put(outfile); get(infile) end end. {all files are automatically closed at the end} Note - by listing INFILE and OUTFILE in the PROGRAM statement, it tells the system to get file names from the outside: JCL in IBM prompt for a file name in DEC-20 logical file names for VAX UCSD is non-standard - You must open all files except INPUT and OUTPUT with RESET or REWRITE. (INPUT and OUTPUT are openned automatically if listed in the PROGRAM statement.) - You must declare all files except INPUT and OUTPUT. (INPUT and OUTPUT are predeclared as FILE OF CHAR.) - RESET reads the first element - EOF is set when a GET fails, i.e. when you try to do a read beyond the last one. Text files: The following are predeclared: type text=file of char; var input,output:text The program BINCOPY shown about will work on text files, if you change the file declarations to FILE OF CHAR. That is, the primitives for text files are still GET and PUT. But in addition, READ and WRITE are defined, and most users will use them. Instead of dealing with single characters, they deal larger objects. E.g. to read 1.23E3 with GET you would see a 1, a ., a 2, a 3, an E and a 3. But if you said READ(X), this would automatically call GET for you 6 times and put the characters together to form the number 1230.0. To show the relationship between READ and GET, I will show how to write READ in terms of GET, at least for reading integers: READ(I), where I is an integer: while (input^ = ' ') or eoln(input) do get(input); i := 0; while (input^ in ['0' .. '9']) do begin i := ord(input^) - ord('0') + 10*i; get(input) end - skip spaces and end of lines - decode digits until you see a non-digit - NB: input^ "looks ahead" by one character. That is, after reading 123, I is 123, but INPUT^ contains the first character after the 123. This is because you can't tell that you are at the end of a number until you see a non-digit. So INPUT^ is left at this non-digit. READ(CH), where CH is a CHAR: CH := INPUT^; GET(CH) - This is so READ(CH) done right after READ(I) will get the first character after the integer. This character is already in INPUT^. So you first use the character, and then do a GET READ can read - integers (with sign) - reals. In the Pascal standard, if you do READ(X) and X is a real, the thing you read must have the syntax of a real. That is, if the program wants a real, you can't type 123, you must type 123.0 - CHAR (a single character) - in DEC-20 and VAX, PACKED ARRAY OF CHAR. In UCSD, STRING. End of line is funny. Some systems don't have EOL characters. So at end of line, INPUT^ contains a blank. That is, when you type carriage return, what the program sees is a blank. In order to know that it was a carriage return and not a real blank, Pascal sets a special thing EOLN(INPUT). So to copy a text file: while not eof do if eoln then begin writeln; readln end else begin output^ := input^; put(output); get(input) end - You can't do OUTPUT^ := INPUT^ at the end of line, since that would turn all end of lines into blanks. - WRITELN writes an end of line. Conceptually it is like WRITE(carriage-return). But since some systems don't use carriage-return a special function is needed. - READLN reads past an end of line. It puts the first character of the next line into INPUT^: skip the rest of the characters on the current line skip the end of line get the 1st char of the next line into INPUT^ READLN(X,Y) is like READ(X,Y); READLN; Note that all of these functions take optional arguments to indicate what file they apply to. If you leave the argument out, the default is INPUT for input functions and OUTPUT for output functions. READ(X) = READ(INPUT,X) READLN(X) = READ(INPUT,X) READLN = READLN(INPUT) EOF = EOF(INPUT) EOLN = EOLN(INPUT) WRITE(X) = WRITE(OUTPUT,X) WRITELN(X) = WRITELN(OUTPUT,X) WRITELN = WRITELN(OUTPUT) WRITE can write - integers - reals - char - packed array of char - Boolean. Writes as TRUE or FALSE You can choose the format for WRITE, by using a : after the expression WRITE(I:5,X:10,Z:I) specifies to use 5 columns for I, 10 columns for X, and I columns for Z Reals are normally written in E format, e.g. 1.2345E01 To get F format, use another colon: WRITE(X:10:2) specifies F format, in 10 columns, with 2 digits after the decimal pt., e.g. 1.23 with 6 leading blanks. It is like format F10.2 in Fortran. INTERACTIVE I/O The RESET problem: Consider: {wrong} RESET(FOO); WRITE('Please type a number: '); READ(FOO,I); This doesn't do what you expect. RESET is defined as reading the first character, and READ(I) uses the first character from FOO^ before reading the second character. This is the one-character lookahead problem. Thus the program will try to read the number before printing the prompt. Solutions: Tell the user to hit carriage-return at the start of the program, wait for the prompt before typing the real data. This works when you are reading numbers, since the program skips end of lines and blanks. thus the extra carriage-return doesn't hurt. More generally, have the program do a READLN to throw away the extra carriage-return: RESET(FOO); WRITE('Please type a number: '); READLN(FOO); READ(FOO,I); You must do the READLN after the WRITE, as it reads the first character on the next line. If you are on the DEC-20 or any CDC system, declare the file as interactive. This makes Pascal supply the extra carriage return for you. For the file INPUT, which is opened automatically, put / after it in the PROGRAM statement (CDC) or :/ (DEC-20) PROGRAM FOO(INPUT:/,OUTPUT) - DEC-20 PROGRAM FOO(INPUT/,OUTPUT) - CDC For other files, they are opened via RESET. I think on CDC, / still works. On DEC-20, specify /I in the RESET: RESET(FOO,'','/I') Some systems (including VAX) use "lazy I/O". This delays reading characters until they are actually used. In this case, the original version will work fine. The READLN problem: Often you want to throw away junk. READLN is good for this. It throws away anything left on the current line and goes to the next. But READLN reads the first character of the next line. Thus you must do it after the prompt. WRITE('Please type a number: '); READLN; READ(I) If you used {wrong} WRITE('Please type a number: '); READLN(I); this would be equivalent to {wrong} WRITE('Please type a number: '); READ(I); READLN; This would do the READLN at the wrong time. Note that the solution to both problems is the same. The correct sequence to use is write prompt readln read(x) The sequence {wrong} write prompt readln(x) will result in the program waiting for input before printing the prompt, unless your implementation uses "lazy I/O", in which case the wrong way is right. Each implementation has a slightly different solution for interactive I/O: DEC-20 and CDC - put the READLN in as explained above - make the file interactive, to prevent the implicit GET after RESET. You must specifically request this. UCSD - make the file interactive. This changes the definition of READ and READLN, so they no longer do one-character lookahead. In this case, the method shown above as wrong is right: write prompt; readln(x) - INPUT and OUTPUT are interactive by default - after reading an item, the contents of INPUT^ are different than in standard Pascal, because there is no one-character lookahead. NBS (PDP-11), CMU (PDP-10), VAX - lazy I/O - GET doesn't do anything until someone actually looks at INPUT^. The read is done then. This allows you to use write prompt; readln(x) But it doesn't always work, e.g. if you pass a file buffer variable as a parameter to a procedure and do GET on the file. All known textbooks ignore this. They teach readln(x,y) as reading a line with x and y on it, implying (and in some texts saying) that write prompt; readln(x,y) will work. The major motivation of UCSD's change to the semantics of GET, and lazy I/O, is to make Pascal work with these incorrectly written textbooks. With these implementations standard textbooks can be used as long as the student does not think clearly about what is going on. If he does, he will wonder how this sequence can possible work. MORE DATA TYPES Subrange types type smallint=0..255; lightcolor= pink..lavender; var i:smallint These allow the system - to save space by using only enough bits for the subrange - to put in checking code to verify that you don't produce something outside the range Packed records packed record a:0..255; b:0..3; c:^form end This will all be put in one PDP-10 word. If PACKED were not used, each item would be in a separate work. This is a time-space tradeoff. Putting more than one thing in a word saves space, but slows down access. Packed records can also be used for tricks in preparing magic control blocks for operating system calls. Records and arrays can be packed. Variant records record name:packed array[1..10]of char; case sex:sextype of male:(battingave:real; beer: beerbrand); female:(bowlingave:color; age:0..21) end All records of this type have a NAME and SEX field. Depending upon the value of SEX, they have NAME, SEX, BATTINGAVE, BEER or NAME, SEX, BOWLINGAVE, AGE These fields are stored in the same place. This is male female ====================== ========================= : name : : name : ---------------------- ------------------------- : : : : ====================== ========================= : sex : : sex : ====================== ========================= : battingave : : bowlingave : ====================== ========================= : beer : : age : ====================== ========================= This allows you to save space when you know that certain fields will never be needed at the same time. You can also declare a variant record without a place to store the key: record name:packed array[1..10]of char; case sextype of male:(battingave:real; beer: beerbrand); female:(bowlingave:color; age:0..21) end Then you can't tell by looking which type of record you have. This can be useful for tricks in converting data types: x:packed record case Boolean of true: (r:real); false: (i:integer) end; begin x.r := 1.0; writeln(x.i) This will write the real number 1.0 as if it were an integer. It might be useful for seeing what the representation of real numbers is on your system. Sets type cset=set of char; var s1,s2:cset; ch:char; ... s1 := ['A']; s2 := ['B']; s1 := s1 + s2; {s1 is now ['A','B']} s1 := s1 * ['B'..'Z'] {'B'..'Z' is the set of B through Z. * is intersection. s1 is now ['B']} s2 := ['A'..'C','P'..'Q'] {A,B,C,P,Q} s1 := ['A'..ch] operations: + union * intersection - difference = <> equality <= >= inclusion IN membership IF 'A' IN S1 THEN ... - You can have sets of any finite type: subranges, enumerated types, or CHAR. SET OF 0..35 SET OF COLOR - Each implementation has a maximum set size. 72 on DEC-20. Since there are 128 ASCII characters, SET OF CHAR is kludged on the DEC-20. - Sets are supposed to be implemented fairly efficiently, as bit vectors, using full-word logical operations.