SECTION 10 ATOM, STRING, ARRAY, AND STORAGE MANIPULATION 10.1 Pnames and Atom Manipulation The term "print name" (of an atom) in LISP 1.5 referred to the characters that were output whenever the atom was printed. Since these characters were stored on the atom's property list under the property PNAME, pname was used interchangeably with "print name". In INTERLISP, all pointers have pnames, although only literal atoms and strings have their pname explicitly stored. The pname of a pointer are those characters that are output when the 1 pointer is printed using prin1, 2 e.g., the pname of the atom ABC%(D consists of the five characters ABC(D. The pname of the list (A B C) consists of the seven characters (A B C) (two of the characters are spaces). Sometimes we will have occasion to refer to the prin2-pname. The prin2-pname are those characters output when the corresponding pointer is printed using prin2. 3 Thus the prin2-pname of the atom ABC%(D is the six characters ABC%(D. ------------------------------------------------------------------------ 1 except that for the purposes of the functions described in this chapter, i.e., unpack, nchars, etc. the prin1-pname of an integer is defined as though radix=10. Note that integers will still be printed by prin1 using the current radix, as described in Section 14. However, we want pack[unpack[X9]] to always be X9 (and not sometimes X11) regardless of the setting of the radix. 2 % is the escape character. See Sections 2 and 14. 3 Note that the prin2-pname also depends on what readtable is being used (see Section 14), since this determines where %'s will be inserted. Note also that the prin2-pname of an integer depends on the setting of radix. 10.1 pack[x] If x is a list of atoms, the value of pack is a single atom whose pname is the concatenation of the pnames of the atoms in x, e.g., pack[(A BC DEF G)]=ABCDEFG. If the pname of the value of pack[x] is the same as that of a number, pack[x] will be that number, e.g., pack[(1 3.4)]=13.4, pack[(1 E -2)]=.01. Although x is usually a list of atoms, it can be a list of arbitrary INTERLISP pointers. The value of pack is still a single atom whose pname is the same as the concatenation of the pnames of all the pointers in x, e.g., pack[((A B)"CD")] = %(A% B%)CD. In other words, mapc[x;prin1] and prin1[pack[x]] 4 always produce the same output. In fact, pack actually operates by calling prin1 to convert the pointers to a stream of characters (without printing) and then makes an atom out of the result. Note: In INTERLISP-10, atoms are restricted to < 127 characters. Attempting to create a larger atom either via pack or by typing one in (or reading from a file) will cause an error, ATOM TOO LONG. unpack[x;flg;rdtbl] The value of unpack is the pname of x as a list 5 of characters (atoms), e.g., unpack[ABC] = (A B C) unpack["ABC(D"] = (A B C %( D) In other words prin1[x] and mapc[unpack[x];prin1] produce the same output. If flg=T, the prin2-pname of x is used, (and computed with respect to rdtbl) e.g., unpack["ABC(D";T]= (%" A B C %( D %"). Note: unpack[x] performs n conses, where n is the number of characters in the pname of x. dunpack[x;scratchlist;flg;rdtbl] ------------------------------------------------------------------------ 4 Except for integers when radix is other than 10, e.g., mapc[(X 9);PRIN1] produces X11 when radix is 8, but pack[(X 11Q)]=X9. (See footnote 1.) 5 There are no special "character-atoms" in INTERLISP, i.e., an atom consisting of a single character is the same as any other atom. 10.2 a destructive version of unpack that does not perform any conses but instead uses scratchlist to make a list equal to unpack[x;flg]. If the p-name is too long to fit in scratchlist, dunpack calls unpack and returns unpack[x;flg]. Gives an error if scratchlist is not a list. 6 nchars[x;flg;rdtbl] number of characters in pname of x. If flg=T, the prin2-pname is used. E.g. nchars["ABC"]=3, nchars["ABC";T]=5. nthchar[x;n;flg;rdtbl] Value is nth character of pname of x. Equivalent to car[nth[unpack[x;flg];n]] but faster and does no conses. n can be negative, in which case counts from end of pname, e.g., -1 refers to the last character, -2 next to last, etc. If n is greater than the number of characters in the pname, or less than minus that number, or 0, the value of nthchar is NIL. packc[x] like pack except x is a list of character 7 codes, e.g., packc[(70 79 79)]=FOO. chcon[x;flg;rdtbl] like unpack, except returns the pname of x as a list of character codes, e.g., chcon[FOO] = (70 79 79). If flg=T, the prin2-pname is used. chcon1[x] returns character code of first character of pname of x, e.g., chcon1[FOO] = 70. Thus chcon[x] could be written as mapcar[unpack[x];chcon1]. dchcon[x;scratchlist;flg;rdtbl] similar to dunpack character[n] n is a character code. Value is the atom having the corresponding single character as its ------------------------------------------------------------------------ 6 Both nthchar and nchars work much faster on objects that actually have an internal representation of their pname, i.e., literal atoms and strings, than they do on numbers and lists, as they do not have to simulate printing. 7 INTERLISP-10 uses ASCII code. 10.3 8 pname, e.g., character[70] = F. Thus, unpack[x] could be written as mapcar[chcon[x];character]. fcharacter[n] fast version of character that compiles open. gensym[char] Generates a new atom of the form xnnnn, where x=char (or A if char is NIL) in which each of the n's is a digit. Thus, the first one generated is A0001, the second A0002, etc. gensym provides a way of generating new atoms for various uses within the system. The value of gennum, initially 10000, determines the next gensym, e.g., if gennum is set to 10023, gensym[]=A0024. The term gensym is used to indicate an atom that was produced by the function gensym. Atoms generated by gensym are the same as any other literal atoms: they have property lists, and can be given function definitions. Note that the atoms are not guaranteed to be new. For example, if the user has previously created A0012, either by typing it in, or via pack or gensym itself, when gennum gets to 10011, the next value returned by gensym will be the A0012 already in existence. mapatoms[fn] Applies fn to every literal atom in the system, e.g., mapatoms[(LAMBDA(X)(AND(SUBRP X)(PRINT X)))] will print every subr. Value of mapatoms is NIL. 10.2 String Functions stringp[x] Is x if x a string, NIL otherwise. Note: if x is a string, nlistp[x] is T, but atom[x] is NIL. strequal[x;y] Is x if x and y are both strings and equal, i.e., print the same, otherwise NIL.. Equal uses strequal. Note that strings may be equal without being eq. mkstring[x] Value is string corresponding to prin1 of x. ------------------------------------------------------------------------ 8 See footnote 2. 10.4 rstring[] Reads a string - see Section 14. substring[x;n;m] Value is the substring of x consisting of the nth through mth characters of x. If m is NIL, the substring is the nth character of x thru the end of x. n and m can be negative numbers, as with nthchar. Returns NIL if the substring is not well defined, e.g., n or m > nchars[x] or < minus[nchars[x]] or n corresponds to a character in x to the right of the character indicated by m. If x is not a string, equivalent to substring[mkstring[x];n;m], except substring does not have to actually make the string if x 9 is a literal atom. For example, substring[(A B C);4;6]="B C". gnc[x] get next character of string x. Returns the next character of the string, (as an atom), and removes the character from the string. Returns NIL if x is the null string. If x isn't a string, a string is made. Used for sequential access to characters of a string. Note that if x is a substring of y, gnc[x] does not remove the character from y, i.e., gnc doesn't physically change the string of characters, just the pointer and the byte 10 count. glc[x] gets last character of string x. Above remarks about gnc also supply to glc. concat[x1;x2;...;xn] lambda nospread function. Concatenates (copies of) any number of strings. The arguments are transformed to strings if they aren't strings. Value is the new string, e.g., concat["ABC";DEF;"GHI"] = "ABCDEFGHI". The value of concat[] is the null string, "". rplstring[x;n;y] Replace characters of string x beginning at ------------------------------------------------------------------------ 9 See string storage section that follows. 10 See string storage section that follows. 10.5 character n with string y. n may be positive or negative. x and y are converted to strings if they aren't already. Characters are smashed into (converted) x. Returns new x. Error if there is not enough room in x for y, i.e., the 11 new string would be longer than the original. Note that if x is a substring of z, z will also be modified by the action of rplstring. mkatom[x] Creates an atom whose pname is the same as that of the string x or if x isn't a string, the same as that of mkstring[x], e.g., mkatom[(A B C)] is the atom %(A% B% C%). In INTERLISP-10, if the atom would have > 126 characters, causes an error, ATOM TOO LONG. Searching Strings strpos is a function for searching one string looking for another. Roughly it corresponds to member, except that it returns a character position number instead of a tail. This number can then be given to substring or utilized in other calls to strpos. strpos[x;y;start;skip;anchor;tail] x and y are both strings (or else they are converted automatically). Searches y beginning at character number start, (or else 1 if start is NIL) and looks for a sequence of characters equal to x. If a match is found, the corresponding character position is returned, otherwise NIL, e.g., strpos["ABC","XYZABCDEF"]=4 strpos["ABC","XYZABCDEF";5]=NIL strpos["ABC","XYZABCDEFABC";5]=10 skip can be used to specify a character in x that matches any character in y, e.g., strpos["A&C&";"XYZABCDEF";NIL;&]=4 If anchor is T, strpos compares x with the characters beginning at position start, or 1. If that comparison fails, strpos returns NIL without searching any further down y. Thus it can be used to compare one string with some portion of another string, e.g., strpos["ABC";"XYZABCDEF";NIL;NIL;T]=NIL strpos["ABC";"XYZABCDEF";4;NIL;T]=4 ------------------------------------------------------------------------ 11 If y was not a string, x will already have been partially modified since rplstring does not know whether y will "fit" without actually attempting the transfer. 10.6 Finally, if tail is T, the value returned by strpos if successful is not the starting position of the sequence of characters corresponding to x, but the position of the first character after that, i.e., starting point plus nchars[x] e.g., strpos["ABC";"XYZABCDEFABC";NIL;NIL;NIL;T]=7. Note that strpos["A";"A";NIL;NIL;NIL;T]=2, even though "A" has only one character. Example Problem Given the strings x, y, and z, write a function foo that will make a string corresponding to that portion of x between y and z, e.g., foo["NOW IS THE TIME FOR ALL GOOD MEN";"IS";"FOR"] is " THE TIME ". Solution: (FOO [LAMBDA (X Y Z) (AND (SETQ Y (STRPOS Y X NIL NIL NIL T)) (SETQ Z (STRPOS Z X Y)) (SUBSTRING X Y (SUB1 Z]) strposl[a;str;start;neg]str is a string (or else it is converted automatically to a string), a is a list of 12 characters or character codes. strposl searches str beginning at character number start (or else 1 if start=NIL) for one of the characters in a. If one is found, strposl returns as its value the corresponding character position, otherwise NIL. E.g., strposl[(A B C);"XYZBCD"]=4. If neg=T, strposl searches for a character not on a, e.g., strposl[(A B C); "ABCDEF";NIL;T]=4. If a is an array, it is treated as a bit table. The bits of (ELT A 1) correspond to character codes 0 to 43Q, of (ELT A 2) to codes 44Q to 107Q, etc. Thus an array whose first element was 17Q would be equivalent to a list (40Q 41Q 42Q 43Q) or (% ! %" #). ------------------------------------------------------------------------ 12 If any element of a is a number, it is assumed to be a character code. Otherwise, it is converted to a character code via chcon1. Therefore, it is more efficient to call strposl with a a list of character codes. 10.7 If a is not a bit table (array), strposl first converts it to a bit table using makebittable described below. If strposl is to be called frequently with the same list of characters, a considerable savings can be achieved by converting the list to a bit table once, and then passing the bit table to strposl as its first argument. makebittable[l;neg;a] makes a bit table suitable for use by strposl. l and neg are as for strposl. If a is not a suitable array, makebittable will create an array and return that as its value. Otherwise it uses (and changes) a. Note: if neg=T, strposl must call makebittable whether a is a list or an array. To obtain bit table efficiency with neg=T, makebittable should be called with neg=T, to construct the "inverted" table, and the resulting table (array) should be given to strposl with neg=NIL. String Storage A string is stored in 2 parts; the characters of the string, and a pointer to the characters. The pointer, or "string pointer", indicates the byte at which the string begins and the length of the string. It occupies one word of storage. In INTERLISP-10, the characters of the string are stored five characters to a word in a portion of the address space devoted exclusively to storing characters. Since the internal pname of literal atoms also consists of a pointer to the beginning of a string of characters and a byte count, conversion between literal atoms and strings does not require any additional storage for the characters of the pname, although one cell is required 13 for the string pointer. When the conversion is done internally, e.g., as in substring, strpos, or strposl, no additional storage is required for using literal atoms instead of strings. The use of storage by the basic string functions is given below: mkstring[x] x string no space x literal atom new pointer other new characters and pointer substring[x;n;m] x string new pointer x literal atom new pointer other new characters and pointer ------------------------------------------------------------------------ 13 Except when the string is to be smashed by rplstring. In this case, its characters must be copied to avoid smashing the pname of the atom. rplstring automatically performs this operation. 10.8 gnc[x] and glc[x] x string no space, pointer is modified other like mkstring, but doesn't make much sense concat[x1;x2;...xn]args any type new characters for whole new string, one new pointer rplstring[x;n;y] x string no new space unless characters are in pname space (as result of mkstring[atom]) in which case x is quietly copied to string space x other new pointer and characters y any type type of y doesn't matter 10.3 Array Functions Space for arrays and compiled code are both allocated out of a common array space. Arrays of pointers and unboxed numbers may be manipulated by the following functions: array[n;p;v] This function allocates a block of n+2 words, of which the first two are header information. The next p <= n are cells which will contain unboxed numbers, and are initialized to unboxed 0. The last n-p >= 0 cells will contain pointers initialized with v, i.e., both car and cdr are available for storing information, and each initially contain v. If p is NIL, 0 is used (i.e., an array containing all INTERLISP pointers). The value of array is the array, also called an array pointer. If sufficient space is not available for the array, a garbage collection of array space, GC: 1, is initiated. If this is unsuccessful in obtaining sufficient space, an error is generated, ARRAYS FULL. Array-pointers print as #n, where n is the octal representation of the pointer. Note that #n will be read as a literal atom, and not an array pointer. arraysize[a] Returns the size of array a. Generates an error, ARG NOT ARRAY, if a is not an array. arrayp[x] Value is x if x is an array pointer otherwise NIL. No check is made to ensure that x actually addresses the beginning of an array. swparrayp[x] Value is x if x is a swappable array, NIL otherwise. 10.9 14 elt[a;n] Value is nth element of the array a. elt generates an error, ARG NOT ARRAY, if a is not 15 the beginning of an array. If n corresponds to the unboxed region of a, the value of elt is the full 36 bit word, as a boxed integer. If n corresponds to the pointer region of a, the value of elt is the car half of the corresponding element. seta[a;n;v] sets the nth element of the array a. Generates an error, ARG NOT ARRAY, if a is not the beginning of an array. If n corresponds to the unboxed region of a, v must be a number, and is unboxed and stored as a full 36 bit word into the nth element of a. If n corresponds to the pointer region of a, v replaces the car half of the nth element. The value of seta is v. Note that seta and elt are always inverse operations. eltd[a;n] same as elt for unboxed region of a, but returns cdr half of nth element, if n corresponds to the pointer region of a. setd[a;n;v] same as seta for unboxed region of a, but sets cdr half of nth element, if n corresponds to the pointer region of a. The value of setd is v. In other words, eltd and setd are always inverse operations. 10.4 Storage Functions reclaim[n] Initiates a garbage collection of type n. Value of reclaim is number of words available (for that type) after the collection. Garbage collections, whether invoked directly by the user or indirectly by need for storage, do not confine their activity solely to the data ------------------------------------------------------------------------ 14 elt[a;1] is the first element of the array (actually corresponds to the 3rd cell because of the 2 word header). 15 arrayp is true for pointers into the middle of arrays, but elt and seta must be given a pointer to the beginning of an array, i.e., a value of array. 10.10 type for which they were called, but automatically collect some or all of the other types (see Section 3). ntyp[x] Value is type number for the data type of INTERLISP pointer x, e.g., ntyp[(A . B)] is 8, the type number for lists. Thus GC: 8 indicates a garbage collection of list words. type number arrays, compiled code 1 machine code 2 swapped array handles 4 stack pointers 5 list words 8 atoms 12 floating point numbers 16 large integers 18 small integers 20 string pointers 24 pname storage 28 16 string storage 30 typep[x;n] eq[ntyp[x];n] gcgag[message] affects messages printed by garbage collector. If message=T, whenever a garbage collection is begun, GC: is printed, followed by the type number. When the garbage collection is complete, two numbers are printed the number of words collected for that type, and the total number of words available for that type, i.e., allocated but not necessarily currently in use (see minfs below). Example: _RECLAIM(18) GC: 18 511, 3071 FREE WORDS 3071 _RECLAIM(12) GC: 12 1020, 1020 FREE WORDS 1020 ------------------------------------------------------------------------ 16 New user data types (see Section 23) are assigned type numbers beginning with 31. 10.11 If message=NIL, no garbage collection message is printed, either on entering or leaving the garbage collector. If message is a list, car of message is printed (using prin1) when the garbage collection is begun, and cdr is printed when the collection is finished. If message is a literal atom or string, message is printed when the garbage collection is begun, and nothing is printed when the collection finishes. If message is a number, the message is the same as for gcgag[T], except if the total number of free pages left after the collection is less than message, the number of free pages is printed, e.g., _GCGAG(100) T _RECLAIM() GC:8 10369, 10369 FREE WORDS, 87 PAGES LEFT. The initial setting for gcgag is 40. The value of gcgag is its previous setting. minfs[n;typ] Sets the minimum amount of free storage which will be maintained by the garbage collector for data types of type number typ. If, after any garbage collection for that type, fewer than n free words are present, sufficient storage will be added (in 512 word chunks) to raise the level to n. If typ=NIL, 8 is used, i.e., the minfs refers to list words. If n=NIL, minfs returns the current minfs setting for the corresponding type. A minfs setting can also be changed dynamically, even during a garbage collection, by typing control-S followed by a number, followed by a 17 period. If the control-S was typed during a garbage collection, the ------------------------------------------------------------------------ 17 When the control-S is typed, INTERLISP immediately clears and saves the input buffer, rings the bell, and waits for input, which is terminated by any non-number. The input buffer is then restored, and the program continues. If the input was terminated by other than a period, it is ignored. 10.12 number is the new minfs setting for the type being collected, otherwise for type 8, i.e., list words. Note: A garbage collection of a "related" type may also cause more storage to be assigned to that type. See discussion of garbage collector algorithm, Section 3. storage[flg] Prints amount of storage (by type number) used by and assigned to the user, e.g., _STORAGE() TYPE USED ASSIGNED 1 8927 12288 2 5120 5120 4 23 512 8 6037 15360 12 2169 3584 16 0 512 18 173 2048 24 110 2048 28 802 2048 30 312 512 SUM 23673 44032 If flg=T, includes storage used by and assigned to the system. Value is NIL. gctrp[n] garbage collection trap. Causes a (simulated) control-H interrupt when the number of free list words (type 8) remaining equals n, i.e., when a garbage collection would occur in n more conses. The message GCTRP is printed, the function interrupt (Section 16) is called, and a break occurs. Note that by advising (Section 19) interrupt the user can program the handling of a 18 gctrp instead of going into a break. Value of gctrp is its last setting. gctrp[-1] will "disable" a previous gctrp since there are never -1 free list words. gctrp is initialized this way. ------------------------------------------------------------------------ 18 For gctrp interrupts, interrupt is called with intype (its third argument) equal to 3. If the user does not want to go into a break, the advice should still allow interrupt to be entered, but first set intype to -1. This will cause interrupt to "quietly" go away by calling the function that was interrupted. The advice should not exit interrupt via return, as in this case the function that was about to be called when the interrupt occurred would not be called. 10.13 gctrp[] returns number of list words left, i.e., number of conses until next type 8 garbage collection, see Section 21. conscount[n] conscount[] returns number of conses since INTERLISP started up. If n is not NIL, resets conscount to n. closer[a;x] Stores x into memory location a. Both x and a must be numbers. openr[a] Value is the number in memory location a, i.e., boxed. 10.14