SECTION 17 1 AUTOMATIC ERROR CORRECTION - THE DWIM FACILITY 17.1 Introduction A surprisingly large percentage of the errors made by INTERLISP users are of the type that could be corrected by another LISP programmer without any information about the purpose of the program or expression in question, e.g., misspellings, certain kinds of parentheses errors, etc. To correct these types of errors we have implemented in INTERLISP a DWIM facility, short for DO-What-I-Mean. DWIM is called automatically 2 whenever an error occurs in the evaluation of an INTERLISP expression. DWIM then proceeds to try to correct the mistake using the current context of computation plus information about what the user had previously been doing, (and what mistakes he had been making) as guides to the remedy of the error. If DWIM is able to make the correction, the computation continues as though no error had occurred. Otherwise, the procedure is the same as though DWIM had not intervened: a break occurs, or an unwind to the last errorset, as described in Section 16. The following protocol illustrates the operation of DWIM. Example The user defines a function fact of one argument, n. The value of fact[n] is to be n factorial. _DEFINEQ((FACT (LAMBDA (N) (COND ((ZEROP N9 1) ((T (ITIMS N (FACCT 8SUB1 N] (FACT) _ Note that the definition of fact contains several mistakes: itimes and fact have been misspelled; the 9 in N9 was intended to be a right ------------------------------------------------------------------------ 1 DWIM was designed and implemented by W. Teitelman. It is discussed in [Tei2]. 2 Currently, DWIM only operates on unbound atoms and undefined function errors. 17.1 parenthesis, but the shift key was not depressed; similarly, the 8 in 8SUB1 was intended to be a left parenthesis; and finally, there is an extra left parenthesis in front of the T that begins the final clause in the conditional. _PRETTYPRNT((FACCT] [1] =PRETTYPRINT [2] =FACT [3] (FACT [LAMBDA (N) (COND ((ZEROP N9 1) ((T (ITIMS N (FACCT 8SUB1 N]) (FACT) _ After defining fact, the user wishes to look at its definition using PRETTYPRINT, which he unfortunately misspells.[1] Since there is no function PRETTYPRINT in the system, a U.D.F. error occurs, and DWIM is called. DWIM invokes its spelling corrector, which searches a list of functions frequently used (by this user) for the best possible match. Finding one that is extremely close, DWIM proceeds on the assumption that PRETTYPRNT meant PRETTYPRINT, notifies the user of this, [2] and calls prettyprint. At this point, PRETTYPRINT would normally print (FACCT NOT PRINTABLE) and exit, since facct has no definition. Note that this is not an INTERLISP error condition, so that DWIM would not be called as described above. However, it is obviously not what the user meant. This sort of mistake is corrected by having prettyprint itself explicitly invoke the spelling corrector portion of DWIM whenever given a function with no expr definition. Thus with the aid of DWIM, prettyprint is able to determine that the user wants to see the definition of the function fact,[3] and proceeds accordingly. _FACT(3] [4] N9 [IN FACT] -> N ) ? YES [IN FACT] (COND -- ((T --))) -> (COND -- (T --)) ITIMS [IN FACT] -> ITIMES [5] FACCT [IN FACT] -> FACT 8SUB1 [IN FACT] -> ( SUB1 ? YES 6 _PP FACT [6] (FACT [LAMBDA (N) (COND ((ZEROP N) 1) (T (ITIMES N (FACT (SUB1 N]) FACT _ The user now calls his function fact.[4] During its execution, five errors occur, and DWIM is called five times.[5] At each point, the error 17.2 is corrected, a message printed describing the action taken, and the computation allowed to continue as if no error had occurred. Following the last correction, 6 is printed, the value of fact(3). Finally, the user prettyprints the new, now correct, definition of fact.[6] In this particular example, the user was shown operating in TRUSTING mode, which gives DWIM carte blanche for most corrections. The user can also operate in CAUTIOUS mode, in which case DWIM will inform him of intended corrections before they are made, and allow the user to approve or disapprove of them. For most corrections, if the user does not respond in a specified interval of time, DWIM automatically proceeds with the correction, so that the user need intervene only when he does not approve. Sample output is given below. Note that the user responded to the first, second, and fifth questions; DWIM responded for him on the third and fourth. _FACT(3) N9 [IN FACT] -> N ) ? YES [1] U.D.F. T [IN FACT] FIX? YES [2] [IN FACT] (COND -- ((T --))) -> (COND -- (T --)) ITIMS [IN FACT] -> ITIMES ? ...YES [3] FACCT [IN FACT] -> FACT ? ...YES [4] 8SUB1 [IN FACT] -> ( SUB1 ? NO [5] U.B.A. (8SUB1 BROKEN) : We have put a great deal of effort into making DWIM "smart", and experience with perhaps fifty different users indicates we have been very successful; DWIM seldom fails to correct an error the user feels it should have, and almost never mistakenly corrects an error. However, it 3 is important to note that even when DWIM is wrong, no harm is done: since an error had occurred, the user would have had to intervene anyway if DWIM took no action. Thus, if DWIM mistakenly corrects an error, the user simply interrupts or aborts the computation, UNDOes the DWIM change using UNDO described in Section 22, and makes the correction he would have had to make without DWIM. It is this benign quality of DWIM that makes it a valuable part of INTERLISP. 17.2 Interaction with DWIM DWIM is enabled by performing either DWIM[C], for CAUTIOUS mode, or ------------------------------------------------------------------------ 3 Except perhaps if DWIM's correction mistakenly caused a destructive computation to be initiated, and information was lost before the user could interrupt. We have not yet had such an incident occur. 17.3 4 DWIM[T] for TRUSTING mode. In addition to setting dwimflg to T and redefining faulteval and faultapply as described on page 17.11, DWIM[C] sets approveflg to T, while DWIM[T] sets approveflg to NIL. The setting of approveflg determines whether or not the user wishes to be asked for approval before a correction that will modify the definition of one of his functions. In CAUTIOUS mode, i.e., approveflg=T, DWIM will ask for approval; in TRUSTING mode, DWIM will not. For corrections to 5 expressions typed in by the user for immediate execution, DWIM always 6 acts as though approveflg were NIL, i.e., no approval necessary. In either case, DWIM always informs the user of its action as described below. Spelling Correction Protocol The protocol used by DWIM for spelling corrections is as follows: If the correction occurs in type-in, print = followed by the correct spelling, followed by a carriage-return, and then continue, e.g., user types: _(SETQ FOO (NCOCN FIE FUM)) DWIM types: =NCONC If the correction does not occur in type-in, print the incorrect spelling, followed by [IN function-name], ->, and then the correct 7 spelling, e.g., ITIMS [IN FACT] -> ITIMES as shown on page 17.2. Then, if approveflg=NIL, print a carriage return, make the correction and ------------------------------------------------------------------------ 4 INTERLISP arrives with DWIM enabled in CAUTIOUS mode. DWIM can be disabled by executing DWIM[] or by setting dwimflg to NIL. See page 17.20. 5 Typed into lispx. lispx is used by evalqt and break, as well as for processing the editor's E command. Functions that call the spelling corrector directly, such as editdefault (Section 9), specify whether or not the correction is to be handled as type-in. For example, in the case of editdefault, commands typed directly to the editor are treated as type-in, so that corrections to them will never require approval. Commands given as an argument to the editor, or resulting from macro expansions, or from IF, LP, ORR commands etc. are not treated as type-in, and thus approval will be requested if approveflg=T. 6 For certain types of corrections, e.g., run-on spelling corrections, 8-9 errors, etc., dwim always asks for approval, regardless of the setting of approveflg. 7 The appearance of -> is to call attention to the fact that the user's function will be or has been changed. 17.4 continue. Otherwise, print a few spaces and a ? and then wait for 8 approval. The user then has six options. He can: 1. Type Y; DWIM types ES, and proceeds with the correction. 2. Type N; DWIM types O, and does not make the correction. 3. Type ^; DWIM does not make the correction, and furthermore guarantees that the error will not cause a break. See footnote on page 17.12. 4. Type control-E; for error correction, this has the same effect as typing N. 9 5. Do nothing; in which case DWIM will wait a specified interval, and if the user has not responded, DWIM will type ... followed 10 by the default answer. 6. Type space or carriage-return; in which case DWIM will wait indefinitely. This option is intended for those cases where the user wants to think about his answer, and wants to insure that DWIM does not get "impatient" and answer for him. The procedure for spelling correction on other than INTERLISP errors is analogous. If the correction is being handled as type-in, DWIM prints = followed by the correct spelling, and returns it to the function that called DWIM, e.g., =FACT as shown on page 17.2. Otherwise, DWIM prints the incorrect spelling, followed by the correct spelling. Then if approveflg=NIL, DWIM prints a carriage-return and returns the correct spelling. Otherwise, DWIM prints a few spaces and a ? and then waits for approval. The user can then respond with Y, N, control-E, space, carriage return, or do nothing as described. Note that since the spelling corrector itself is not errorset protected, ------------------------------------------------------------------------ 8 Whenever an interaction is about to take place and the user has typed ahead, DWIM types several bells to warn the user to stop typing, then clears and saves the input buffers, restoring them after the interaction is complete. Thus if the user has typed ahead before a DWIM interaction, DWIM will not confuse his type ahead with the answer to its question, nor will his type ahead be lost. 9 Equal to dwimwait seconds. DWIM operates by dismissing for 500 milliseconds, then checking to see if anything has been typed. If not, it dismisses again, etc. until dwimwait seconds have elapsed. Thus, there will be a delay of at most 1/2 second before DWIM responds to the user's answer. 10 The default is always YES unless otherwise stated. 17.5 typing N and typing control-E may have different effects when the 11 spelling corrector is called directly. The former simply instructs the spelling corrector to return NIL, and lets the calling function decide what to do next; the latter causes an error which unwinds to the last errorset, however far back that may be. Parentheses Errors Protocol As illustrated earlier on page 17.2, DWIM will correct errors consisting 12 of typing 8 for left parenthesis and 9 for right parenthesis. In these cases, the interaction with the user is similar to that for spelling correction. If the error occurs in type-in, DWIM types = followed by the correction, e.g., user types: _(SETQ FOO 8CONS FIE FUM] DWIM types: = ( CONS lispx types: (A B C D) Otherwise, DWIM prints the offending atom, [IN function-name], ->, the proposed correction, a few spaces and a ?, and then waits for approval, e.g., N9 [IN FACT] -> N ) ? as shown on page 17.2. The user then has 13 the same six options as for spelling correction. If the user types Y, DWIM then operates exactly the same as when approveflg=NIL, i.e., makes the correction and prints its message. U.D.F. T Errors Protocol DWIM corrects certain types of parentheses errors involving a T clause in a conditional, namely errors of the form: 1. (COND --) (T --), i.e., the T clause appears outside and immediately following the COND; 2. (COND -- (-- & (T --))), i.e., the T clause appears inside a previous clause; and ------------------------------------------------------------------------ 11 The DWIM error correction routines are errorset protection. 12 Actually, DWIM uses the value of the variables lparkey and rparkey to determine the corresponding lower case character for left and right parentheses. lparkey and rparkey are initially 8 and 9 respectively, but they can be reset for other keyboard layouts, e.g., on some terminals left parenthesis is over 9, and right parenthesis is over 0. 13 except the waiting time is 3*dwimwait seconds. 17.6 3. (COND -- ((T --))), i.e., the T clause has an extra pair of 14 parentheses around it. If the error occurs in type-in, DWIM simply types T FIXED and makes the correction. Otherwise if approveflg=NIL, DWIM makes the correction, and prints a message consisting of [IN function-name], followed by one of the above incorrect forms of COND, followed by ->, then on the next line the corresponding correct form of the COND, e.g., [IN FACT] (COND -- ((T --))) -> (COND -- (T --)) as shown on page 17.2. If approveflg=T, DWIM prints U.D.F. T, followed by [IN function-name], several spaces, and then FIX? and waits for approval. The user then has the same options as for spelling corrections and parenthesis errors. If the user types Y or defaults, DWIM then proceeds exactly the same as when approveflg=NIL, i.e., makes the correction and prints its message, as shown on page 17.3. Having made the correction, DWIM must then decide how to proceed with the computation. In case 1, (COND --) (T --), DWIM cannot know whether the last clause of the COND before the T clause succeeded or not, i.e., if the T clause had been inside of the COND, would it have been entered? Therefore DWIM asks the user 'CONTINUE WITH T CLAUSE' (with a default of YES). If the user types N, DWIM continues with the form after the COND, i.e., the form that originally followed the T clause. In case 2, (COND -- (-- & (T --))), DWIM has a different problem. After moving the T clause to its proper place, DWIM must return as the value of the COND, the value of the expression corresponding to &. Since this value is no longer around, DWIM asks the user, 'OK TO REEVALUATE' and then prints the expression corresponding to &. If the user types Y, or defaults, DWIM continues by reevaluating &, otherwise DWIM aborts, and a U.D.F. T error will then occur (even though the COND has in fact been 15 fixed). In case 3, (COND -- ((T --))), there is no problem with continuation, so no further interaction is necessary. ------------------------------------------------------------------------ 14 For U.D.F. T errors that are not one of these three types, DWIM takes no corrective action at all, i.e., the error will occur. 15 If DWIM can determine for itself that the form can safely be reevaluated, it does not consult the user before reevaluating. DWIM can do this if the form is atomic, or car of the form is a member of the list okreevalst, and each of the arguments can safely be reevaluated, e.g., (SETQ X (CONS (IPLUS Y Z) W)) is safe to reevaluate because SETQ, CONS, and IPLUS are all on okreevalst. 17.7 17.3 Spelling Correction The spelling corrector is given as arguments a misspelled word (word means literal atom), a spelling list (a list of words), and a number: xword, splst, and rel respectively. Its task is to find that word on splst which is closest to xword, in the sense described below. This word is called a respelling of xword. rel specifies the minimum "closeness" between xword and a respelling. If the spelling corrector cannot find a word on splst closer to xword than rel, or if it finds two or more words equally close, its value is NIL, otherwise its value is 16 the respelling. The exact algorithm for computing the spelling metric is described later on page 17.17, but briefly "closeness" is inversely proportional to the number of disagreements between the two words, and directly proportional to the length of the longer word, e.g., PRTTYPRNT is "closer" to PRETTYPRINT than CS is to CONS even though both pairs of words have the same number of disagreements. The spelling corrector operates by proceeding down splst, and computing the closeness between each word and xword, and keeping a list of those that are closest. Certain differences between words are not counted as disagreements, for example a single transposition, e.g., CONS to CNOS, or a doubled letter, e.g., CONS to CONSS, etc. In the event that the spelling corrector finds a word on splst with no disagreements, it will stop searching and return this word as the respelling. Otherwise, the spelling corrector continues through the entire spelling list. Then if it has found one and only one "closest" word, it returns this word as the respelling. For example, if xword is VONS, the spelling corrector will probably return CONS as the respelling. However, if xword is CONZ, the spelling corrector will not be able to return a respelling, since CONZ is equally close to both CONS and COND. If the spelling corrector finds an acceptable respelling, it interacts with the user as described earlier. In the special case that the misspelled word contains one or more alt- modes, the spelling corrector operates somewhat differently. Instead of trying to find the closest word as above, the spelling corrector searches for those words on splst that match xword, where an alt-mode can match any number of characters (including 0), e.g., FOO$ matches FOO1 and FOO, but not NEWFOO. $FOO$ matches all three. In this case, the entire spelling list is always searched, and if more than one respelling is found, the spelling corrector prints AMBIGUOUS, and returns NIL. For example, CON$ would be ambiguous if both CONS and COND were on the spelling list. If the spelling corrector finds one and only one respelling, it interacts with the user as described earlier. For both spelling correction and spelling completion, regardless of whether or not the user approves of the spelling corrector's choice, the respelling is moved to the front of splst. Since many respellings are ------------------------------------------------------------------------ 16 The spelling corrector can also be given an optional functional argument, fn, to be used for selecting out a subset of splst, i.e., only those members of splst that satisfy fn will be considered as possible respellings. 17.8 of the type with no disagreements, this procedure has the effect of considerably reducing the time required to correct the spelling of frequently misspelled words. Synonyms Spelling lists also provide a way of defining synonyms for a particular context. If a dotted pair appears on a spelling list (instead of just an atom), car is interpreted as the correct spelling of the misspelled word, and cdr as the antecedent for that word. If car is identical with the misspelled word, the antecedent is returned without any interaction or approval being necessary. If the misspelled word corrects to car of the dotted pair, the usual interaction and approval will take place, and then the antecedent, i.e., cdr of the dotted pair, is returned. For example, the user could make IFLG synonymous with CLISPIFTRANFLG by adding (IFLG . CLISPIFTRANFLG) to spellings3, the spelling list for unbound atoms. Similarly, the user could make OTHERWISE mean the same as ELSEIF by adding (OTHERWISE . ELSEIF) to clispifwordsplst, or make L be synonymous with LAMBDA by adding (L . LAMBDA) to lambdasplst. Note that L could also be used as a variable without confusion, since the association of L with LAMBDA occurs only in the appropriate context. Spelling Lists Any list of atoms can be used as a spelling list, e.g., brokenfns, filelst, etc. Various system packages have their own spellings lists, e.g., lispxcoms, prettycomsplst, clispforwordsplst, editcomsa, etc. These are documented under their corresponding sections, and are also indexed under "spelling lists." In addition to these spelling lists, the system maintains, i.e., automatically adds to, and occasionally prunes, four lists used solely for spelling correction: spellings1, spellings2, 17 spellings3, and userwords. Spellings1 is a list of functions used for spelling correction when an input is typed in apply format, and the function is undefined, e.g., EDTIF(FOO). Spellings1 is initialized to contain defineq, break, makefile, editf, tcompl, load, etc. Whenever lispx is given an input in apply format, i.e., a function and arguments, the name of the function 18 is added to spellings1. For example, typing CALLS(EDITF) will cause CALLS to be added to spellings1. Thus if the user typed CALLS(EDITF) and later typed CALLLS(EDITV), since spellings1 would then contain ------------------------------------------------------------------------ 17 All of the remarks on maintaining spelling lists apply only when DWIM is enabled, as indicated by dwimflg=T. 18 Only if the function has a definition. 17.9 19 CALLS, DWIM would be successful in correcting CALLLS to CALLS. Spellings2 is a list of functions used for spelling correction for all other undefined functions. It is initialized to contain functions such as add1, append, cond, cons, go, list, nconc, print, prog, return, setq, etc. Whenever lispx is given a non-atomic form, the name of the function is added to spellings2. For example, typing (RETFROM (STKPOS (QUOTE FOO) 2)) to a break would add retfrom to spellings2. Function names are also added to spellings2 by define, defineq, load (when loading compiled code), unsavedef, editf, and prettyprint. Spellings3 is a list of words used for spelling correction on all unbound atoms. Spellings3 is initialized to editmacros, breakmacros, brokenfns, and advisedfns. Whenever lispx is given an atom to evaluate, 20 the name of the atom is added to spellings3. Atoms are also added to spellings3 whenever they are edited by editv, and whenever they are set via rpaq or rpaqq. For example, when a file is loaded, all of the variables set in the file are added to spellings3. Atoms are also added to spellings3 when they are set by a lispx input, e.g., typing (SETQ FOO (REVERSE (SETQ FIE --))) will add both FOO and FIE to spellings3. Userwords is a list containing both functions and variables that the user has referred to e.g., by breaking or editing. Userwords is used for spelling correction by arglist, unsavedef, prettyprint, break, editf, advise, etc. Userwords is initially NIL. Function names are added to it by define, defineq, load, (when loading compiled code, or loading exprs to property lists) unsavedef, editf, editv, editp, prettyprint, etc. Variable names are added to userwords at the same time as they are added to spellings3. In addition, the variable lastword is always set to the last word added to userwords, i.e., the last function or variable referred to by the user, and the respelling of NIL is defined to be the value of lastword. Thus, if the user has just defined a function, he can then edit it by simply typing EDITF(), or prettyprint it by typing PP(). Each of the above four spelling lists are divided into two sections separated by a NIL. The first section contains the "permanent" words; the second section contains the temporary words. New words are added to 21 the corresponding spelling list at the front of its temporary section. ------------------------------------------------------------------------ 19 If CALLLS(EDITV) were typed before CALLS had been "seen" and added to spellings1, the correction would not succeed. However, the alternative to using spelling lists is to search the entire oblist, a procedure that would make spelling correction intolerably slow. 20 Only if the atom has a value other than NOBIND. 21 Except that functions added to spellings1 or spellings2 by lispx are always added to the end of the permanent section. 17.10 (If the word is already in the temporary section, it is moved to the front of that section; if the word is in the permanent section, no action is taken.) If the length of the temporary section then exceeds a specified number, the last (oldest) word in the temporary section is forgotten, i.e., deleted. This procedure prevents the spelling lists from becoming cluttered with unimportant words that are no longer being used, and thereby slowing down spelling correction time. Since the spelling corrector moves each word selected as a respelling to the front of its spelling list, the word is thereby moved into the permanent section. Thus once a word is mispelled and corrected, it is considered important and will never be forgotten. The maximum length of the temporary section for spellings1, spellings2, spellings3 and userwords is given by the value of #spellings1, #spellings2, #spellings3, and #userwords, initialized to 30, 30, 30, and 60 respectively. Using these values, the average length of time to 22 search a spelling list for one word is about 4 milliseconds. Generators for Spelling Correction For some applications, it is more convenient to generate candidates for a respelling one by one, rather than construct a complete list of all possible candidates, e.g., spelling correction involving a large directory of files, or a natural language data base. For these purposes, splst can be an array (of any size). The first element of this array is the generator function, which is called with the array itself as its argument. Thus the function can use the remainder of the array to store "state" information, e.g., the last position on a file, a pointer into a data structure, etc. The value returned by the function is the next candidate for respelling. If NIL is returned, the spelling "list" is considered to be exhausted, and the closest match is returned. If a candidate is found with no disagreements, it is returned immediately without waiting for the "list" to exhaust. 17.4 Error Correction As described in Section 16, whenever the interpreter encounters an atomic form with no binding, or a non-atomic form car of which is not a function or function object, it calls the function faulteval. Similarly, when apply is given an undefined function, it calls faultapply. When DWIM is enabled, faulteval and faultapply are redefined to first call dwimblock, a part of the DWIM package. If the user aborts by typing control-E, or if he indicates disapproval of DWIM's intended correction by answering N as described on page 17.5, or ------------------------------------------------------------------------ 22 If the word is at the front of the spelling list, the time required is only 1 millisecond. If the word is not on the spelling list, i.e., the entire list must be searched, the time is proportional to the length of the list; to search a spelling list of length 60 takes about 7 milliseconds. 17.11 23 if DWIM cannot decide how to fix the error, dwimblock returns NIL. In this case, faulteval and faultapply proceed exactly as described in Section 16, by printing a U.B.A. or U.D.F. message, and going into a break if the requirements of breakcheck are met, otherwise unwinding to the last errorset. If DWIM can (and is allowed to) correct the error, dwimblock exits by performing reteval of the corrected form, as of the position of the call to faulteval or faultapply. Thus in the example at the beginning of the chapter, when DWIM determined that ITIMS was ITIMES misspelled, DWIM called reteval with (ITIMES N (FACCT 8SUB1 N)). Since the interpreter uses the value returned by faulteval exactly as though it were the value of the erroneous form, the computation will thus proceed exactly as though no error had occurred. In addition to continuing the computation, DWIM also repairs the cause 24 of the error whenever possible. Thus in the above example, DWIM also changed (with rplaca) the expression (ITIMS N (FACCT 8SUB1 N)) that caused the error. Error correction in DWIM is divided into three categories: unbound atoms, undefined cars of form, and undefined function in apply. Assuming that the user approves if he is asked, the action taken by DWIM for the various types of errors in each of these categories is summarized below. The protocol of DWIM's interaction with the user has been described earlier. Unbound Atoms 1. If the first character of the unbound atom is ', DWIM assumes that the user (intentionally) typed 'atom for (QUOTE atom) and makes the appropriate change. No message is typed, and no approval requested. If the unbound atom is just ' itself, DWIM assumes the user wants the next expression quoted, e.g., (CONS X '(A B C)) will be changed to (CONS X (QUOTE (A B C))). Again no message will be printed or approval asked. (If no expression follows the ', DWIM gives up.) 2. If CLISP (Section 23) is enabled, and the atom is part of a CLISP construct, the CLISP transformation is performed and the result returned, e.g., N-1 is transformed to (SUB1 N). ------------------------------------------------------------------------ 23 If the user answers with ^, (see page 17.5) dwimblock is exited by performing reteval[FAULTEVAL;(ERROR!)], i.e., an error is generated at the position of the call to faulteval. 24 If the user's program had computed the form and called eval, e.g., performed (EVAL (LIST X Y)) and the value of x was a misspelled function; it would not be possible to repair the cause of the error, although DWIM could correct the misspelling each time it occurred. 17.12 25 3. If the atom contains an 8, DWIM assumes the 8 was intended to be a left parenthesis, and calls the editor to make appropriate repairs on the expression containing the atom. DWIM assumes that the user did not notice the mistake, i.e., that the entire expression was affected by the missing left parenthesis. For example, if the user types (SETQ X (LIST (CONS 8CAR Y) (CDR Z)) Y), the expression will be changed to (SETQ X (LIST (CONS (CAR Y) (CDR Z)) Y)). The 8 does not have to be the first character of the atom, e.g., DWIM will handle (CONS X8CAR Y) correctly. 26 4. If the atom contains a 9 DWIM assumes the 9 was intended to be a right parenthesis and operates as in number 3. 5. If the atom begins with a 7, the 7 is treated as a ', e.g., 7FOO becomes 'FOO, and then (QUOTE FOO). 6. If the atom is an edit command (a member of editcomsa), and the error occurred in type-in, the effect is the same as though the user typed EDITF(), followed by the atom, i.e., DWIM assumes the user wants to be in the editor editing the last thing he referred to. Thus, if the user defines the function foo and then types P, he will see =FOO, followed by EDIT, followed by the printout associated with the execution of the P command, followed by *, at which point he can continue editing foo. 7. If dwimuserfn=T, DWIM calls dwimuserfn, and if it returns a non-NIL value, DWIM returns this value. dwimuserfn is discussed below. 8. If the unbound atoms occurs in a function, DWIM attempts spelling correction using as a spelling list the list of lambda and prog variables of the function. 9. If the unbound atom occurred in a type-in to a break, DWIM attempts spelling correction using the lambda and prog variables of the broken function. 10. Otherwise, DWIM attempts spelling correction using spellings3. ------------------------------------------------------------------------ 25 actually the value of lparkey, initially 8. See footnote on page 17.6. 26 actually the value of rparkey, initially 9. See footnote on page 17.6. 17.13 If all fail, DWIM gives up. Undefined car of Form 1. If car of the form is T, DWIM assumes a misplaced T clause and operates as described on page 17.6. 2. If car of the form is F/L, DWIM changes the F/L to FUNCTION(LAMBDA,e.g., (F/L (Y) (PRINT (CAR Y))) is changed to (FUNCTION (LAMBDA (Y) (PRINT (CAR Y))). No message is printed and no approval requested. If the user omits the variable list, DWIM supplies (X), e.g., (F/L (PRINT (CAR X))) becomes (FUNCTION (LAMBDA (X) (PRINT (CAR X)))). DWIM determines that the user has supplied the variable list when more than one expression follows F/L, car of the first expression is not the name of a function, and every element in the first expression is atomic. For example, DWIM will supply (X) when correcting (F/L (PRINT (CDR X)) (PRINT (CAR X))). 3. If car of the form is IF, if, or one of the CLISP iterative statement operators, e.g., FOR, WHILE, DO et al, the indicated transformation is performed, and the result returned as the corrected form. 4. If car of the form has a function definition, DWIM attempts spelling correction on car of the definition using as spelling list the value 27 of lambdasplst, initially (LAMBDA NLAMBDA). 5. If car of the form has an EXPR property, DWIM prints car of the form, followed by 'UNSAVED', performs an unsavedef, and continues. No approval is requested. 6. If car of the form has a property FILEDEF, the definition is to be found on a file. If the value of the property is atomic, the entire file is to be loaded. If a list, car is the name of the file and cdr the relevant functions, and loadfns will be used. DWIM first checks to see if the file appears in the attached directory, 's directory, or 's directory, and if found, types "SHALL I LOAD" followed by the file name or list of functions. If the user approves, DWIM loads the function(s) or file, and continues the computation. TRANSOR is handled in this fashion. ------------------------------------------------------------------------ 27 The user may wish to add to lambdasplst if he elects to define new "function types" via an appropriate dwimuserfn. For example, the QLAMBDAs of SRI's QLISP are handled in this way. 17.14 7. If CLISP is enabled, and car of the form is part of a CLISP construct, the indicated transformation is performed, e.g., (N_N-1) becomes (SETQ N (SUB1 N)). 8. If car of the form contains an 8, DWIM assumes a left parenthesis was intended e.g., (CONS8CAR X). 9. If car of the form contains a 9, DWIM assumes a right parenthesis was intended. 10. If car of the form is a list, DWIM attempts spelling correction on caar of the form using lambdasplst as spelling list. If successful, DWIM returns the corrected expression itself. 11. If car of the form is a small number, and the error occurred in type-in, DWIM assumes the form is really an edit command and operates as described in case 6 of unbound atoms. 12. If car of the form is an edit command (a member of editcomsl), DWIM operates as in 11. 13. If dwimuserfn=T, dwimuserfn is called, and if it returns a non-NIL value, DWIM returns this value. 14. If the error occurs in a function, or in a type-in while in a break, DWIM checks to see if the last character in car of the form is one of the lambda or prog variables, and if the first n-1 characters are the name of a defined function, and if so makes the corresponding change, e.g., (MEMBERX Y) will be changed to (MEMBER X Y). The protocol followed will be the same as for that of spelling correction, e.g., if approveflg=T, DWIM will type MEMBERX [IN FOO] -> MEMBER X? 15. Otherwise, DWIM attempts spelling correction using spellings2 as the spelling list. If all fail, DWIM gives up. Undefined Function in Apply 1. If the function has a definition, DWIM attempts spelling correction on car of the definition using lambdasplst as spelling list. 2. If the function has an EXPR property, DWIM prints its name followed by 'UNSAVED', performs an unsavedef and continues. No approval is requested. 17.15 3. If the function has a property FILEDEF, DWIM proceeds as in case 6 of undefined car of form. 4. If the error resulted from type-in, and CLISP is enabled, and the function name contains a CLISP operator, DWIM performs the indicated transformation, e.g., the user types FOO_(APPEND FIE FUM). 5. If the function name contains an 8, DWIM assumes a left parenthesis was intended, e.g., EDIT8FOO]. 6. If the "function" is a list, DWIM attempts spelling correction on car of the list using lambdasplst as spelling list. 7. If the function is a number and the error occurred in type-in, DWIM assumes the function is an edit command, and operates as described in case 6 of unbound atoms, e.g., the user types (on one line) 3 - 1 P. 8. If the function is the name of an edit command (on either editcomsa or editcomsl), DWIM operates as in 7, e.g., user types F COND. 9. If dwimuserfn=T, dwimuserfn is called, and if it returns a non-NIL value, this value is treated as the form used to continue the computation, i.e., it will be eval-ed. 10. Otherwise DWIM attempts spelling correction using spellings1 as the spelling list, 11. Otherwise DWIM attempts spelling correction using spellings2 as the spelling list. If all fail, DWIM gives up. 17.5 DWIMUSERFN Dwimuserfn provides a convenient way of adding to the transformations that DWIM performs, e.g., the user might want to change atoms of the form $X to (QA4LOOKUP X). The user defines dwimuserfn as a function of no arguments, and then enables it by setting dwimuserfn to T. DWIM will call dwimuserfn before attempting spelling correction, but after performing its other transformations, e.g., F/L, 8, 9, CLISP, etc. If dwimuserfn returns a non-NIL value, this value is treated as a form to be evaluated and returned as the value of faulteval or faultapply. Otherwise, if dwimuserfn returns NIL, DWIM proceeds as when dwimuserfn is not enabled, and attempts spelling correction. Note that in the event that dwimuserfn is to handle the correction, it is also responsible for any modifications to the original expression, i.e., DWIM simply takes its value and returns it. 17.16 In order for dwimuserfn to be able to function, it needs to know various things about the context of the error. Therefore, several of DWIM's internal variables have been made SPECVARS (See Section 18) and are therefore "visible" to dwimuserfn. Below are a list of those variables that may be useful. faultx for unbound atoms and undefined car of form, faultx is the atom or form. For undefined functions in apply, faultx is the name of the function. faultargs for undefined functions in apply, faultargs is the list of arguments. faultapplyflg Is T for undefined functions in apply. (Since faultargs may be NIL, faultapplyflg is necessary to distinguish between unbound atoms and undefined function in apply, since faultx is atomic in both cases). tail for unbound errors, tail is the tail car of which is the unbound atom. Thus dwimuserfn can replace the atom by another expression by performing (/RPLACA TAIL expr) parent for unbound atom errors, parent is the form in which the unbound atom appears, i.e., tail is a tail of parent. type-in? true if error occurred in type-in. faultfn name of function in which error occurred. (faultfn is TYPE-IN when the error occurred in type-in, and EVAL or APPLY when the error occurred under an explicit call to EVAL or APPLY). dwimifyflg true if error was encountered during dwimifying as opposed to during running the program. expr definition of faultfn, or argument to eval, i.e., the superform in which the error occurs. 17.6 Spelling Corrector Algorithm The basic philosophy of DWIM spelling correction is to count the number of disagreements between two words, and use this number divided by the length of the longer of the two words as a measure of their relative disagreement. One minus this number is then the relative agreement or closeness. For example, CONS and CONX differ only in their last character. Such substitution errors count as one disagreement, so that the two words are in 75% agreement. Most calls to the spelling 17.17 28 corrector specify rel=70, so that a single substitution error is permitted in words of four characters or longer. However, spelling correction on shorter words is possible since certain types of differences such as single transpositions are not counted as disagreements. For example, AND and NAD have a relative agreement of 100. The central function of the spelling corrector is chooz. chooz takes as arguments: a word, a spelling list, a minimum relative agreement, and an 29 optional functional argument, xword, splst, rel, and fn respectively. chooz proceeds down splst examining each word. Words not satisfying fn, or those obviously too long or too short to be sufficiently close to xword are immediately rejected. For example, if rel=70, and xword is 5 30 characters long, words longer than 7 characters will be rejected. If tword, the current word on splst, is not rejected, chooz computes the number of disagreements between it and xword by calling a subfunction, skor. skor operates by scanning both words from left to right one character at 31 a time. Characters are considered to agree if they are the same characters; or appear on the same teletype key (i.e., a shift mistake), ------------------------------------------------------------------------ 28 Integers between 0 and 100 are used instead of numbers between 0 and 1 in order to avoid floating point arithmetic. 29 fn=NIL is equivalent to fn=(LAMBDA NIL T). 30 Special treatment is necessary for words shorter than xword, since doubled letters are not counted as disagreements. For example, CONNSSS and CONS have a relative agreement of 100. (Certain teletype diseases actually produce this sort of stuttering.) chooz handles this by counting the number of doubled characters in xword before it begins scanning splst, and taking this into account when deciding whether to reject shorter words. 31 skor actually operates on the list of character codes for each word. This list is computed by chooz before calling skor using dchcon, so that no storage is used by the entire spelling correction process. 17.18 32 for example, * agrees with :, 1 with !, etc.; or if the character in xword is a lower case version of the character in tword. Characters that agree are discarded, and the skoring continues on the rest of the characters in xword and tword. If the first character in xword and tword do not agree, skor checks to see if either character is the same as one previously encountered, and not accounted-for at that time. (In other words, transpositions are not handled by lookahead, but by lookback.) A displacement of two or fewer positions is counted as a tranposition; a displacement by more than two positions is counted as a disagreement. In either case, both characters are now considered as accounted for and are discarded, and skoring continues. If the first character in xword and tword do not agree, and neither are equal to previously unaccounted-for characters, and tword has more characters remaining than xword, skor removes and saves the first character of tword, and continues by comparing the rest of tword with xword as described above. If tword has the same or fewer characters remaining than xword, the procedure is the same except that the 33 character is removed from xword. In this case, a special check is first made to see if that character is equal to the previous character in xword, or to the next character in xword, i.e., a double character typo, and if so, the character is considered accounted-for, and not 34 counted as a disagreement. When skor has finished processing both xword and tword in this fashion, the value of skor is the number of unaccounted-for characters, plus the number of disagreements, plus the number of tranpositions, with two qualifications: (1) if both xword and tword have a character unaccounted-for in the same position, the two characters are counted only once, i.e., substitution errors count as only one disagreement, not two; and (2) if there are no unaccounted-for characters and no ------------------------------------------------------------------------ 32 For users on model 33 teletypes, as indicated by the value of model33flg being T, @ and P appear on the same key, as do L and /, N and L, and O and _, and DWIM will proceed accordingly. The initial value for model33flg is NIL. Certain other terminals, e.g., Anderson Jacobs terminal, have keyboard layouts similar to the model 33, i.e., N on same key as ^, etc. In this case, the user might also want to set model33flg to T. 33 Whenever more than two characters in either xword or tword are unaccounted for, skoring is aborted, i.e., xword and tword are considered to disagree. 34 In this case, the "length" of xword is also decremented. Otherwise making xword sufficiently long by adding double characters would make it be arbitrarily close to tword, e.g., XXXXXX would correct to PP. 17.19 disagreements, transpositions are not counted. This permits spelling 35 correction on very short words, such as edit commands, e.g., XRT->XTR. 17.7 DWIM Functions dwim[x] If x=NIL, disables DWIM; value is NIL. If x=C, enables DWIM in cautious mode; value is CAUTIOUS. If x=T, enables DWIM in trusting mode; value is TRUSTING. For all other values of x, generates an error. dwimify[x] x is a form or the name of a function. dwimify performs all corrections and transformations that would occur if x were actually run. dwimify is undoable. DW edit macro. dwimifies current expression. addspell[x;splst;n] Adds x to one of the four spelling lists as 36 follows: if splst=NIL, adds x to userwords and to spellings2. Used by defineq. If splst=0, adds x to userwords. Used by load when loading exprs to property lists. If splst=1, adds x to spellings1 (at end of permanent section). Used by lispx. if splst=2, adds x to spellings2 (at end of permanent section). Used by lispx. If splst=3, adds x to userwords and spellings3. splst can also be a spelling list, in which case n is the (optional) length of the temporary section. ------------------------------------------------------------------------ 35 Transpositions are also not counted when fastypeflg=T, for example, IPULX and IPLUS will be in 80% agreement with fastypeflg=T, only 60% with fastypeflg=NIL. The rationale behind this is that transpositions are much more common for fast typists, and should not be counted as disagreements, whereas more deliberate typists are not as likely to combine tranpositions and other mistakes in a single word, and therefore can use more conservative metric. fastypeflg is initially NIL. 36 If x is already on the spelling list, and in its temporary section, addspell moves x to the front of that section. See page 17.10 for complete description of algorithm for maintaining spelling lists. 17.20 addspell sets lastword to x when splst=NIL, 0 or 3. If x is not a literal atom, addspell takes no action. misspelled?[xword;rel;splst;flg;tail;fn] If xword=NIL or alt-mode, misspelled? prints = followed by the value of lastword, and returns this as the respelling, without asking for approval. Otherwise, misspelled? checks to see if xword is really misspelled, i.e., if fn applied to xword is true, or xword is already contained on splst. In this case, misspelled? simply returns xword. Otherwise misspelled? computes and returns fixspell[xword;rel;splst;flg;tail;fn] 37 fixspell[xword;rel;splst;flg;tail;fn;tieflg] The value of fixspell is either the respelling 38 of xword or NIL. fixspell performs all of the interactions described earlier, including requesting user approval if necessary. If xword=NIL or $ (alt-mode), the respelling is the value of lastword, and no approval is requested. If flg=NIL, the correction is handled in type-in mode, i.e., approval is never requested, and xword is not typed. If flg=T, xword is typed (before the =) and approval is requested if approveflg=T. If tail is not NIL, and the correction is successful, car of tail is replaced by the respelling (using /rplaca). In addition, fixspell will correct misspellings caused by ------------------------------------------------------------------------ 37 fixspell has some additional arguments, for internal use by DWIM. 38 If for some reason xword itself is on splst, then fixspell aborts and calls error!. If there is a possibility that xword is spelled correctly, misspelled? should be used instead of fixspell. 17.21 39 running two words together. In this case, car of tail is replaced by the two words, and the value of fixspell is the first one. For example, if fixspell is called to correct the edit command (MOVE TO AFTERCOND 3 2) with tail=(AFTERCOND 3 2), tail would be changed to (AFTER COND 2 3), and fixspell would return AFTER (subject to user approval where 40 necessary). If tieflg=T and a tie occurs, i.e., more than one word on splst is found with the same degree of "closeness", the first word is taken as the correct spelling. If tieflg=NIL and a tie occurs, fixspell returns NIL, i.e., no correction. If tieflg=ALL, the value of fixspell is a list of the respellings (even if there is only one), and fixspell will not perform any interaction with the user, nor modify tail, the idea being that the calling program will handle those tasks. The time required for a call to fixspell with a spelling list of length 60 when the entire list must be searched is .5 seconds. If fixspell determines that the first word on the spelling list is the respelling and does not need to search any further, the time required is .02 seconds. In other words, the time required is proportional to the number of words with which xword is compared, with the time for one comparison, i.e., one call skor, being roughly .01 seconds (varies slightly with the number of characters in the words being compared.) ------------------------------------------------------------------------ 39 In this case, user approval is always requested. In addition, if the first word contains either fewer than 3 characters, or fewer characters than the second word, the default will be N. 'Run-on' spelling corrections can be suppressed by setting the variable runonflg to NIL (initially T). 40 If tail=T, fixspell will also perform run-on corrections, returning a dotted pair of the two words in the event the correction is of this type. 17.22 The function chooz is provided for users desiring spelling correction without any output or interaction: 41 chooz[xword;rel;splst;tail;fn;tieflg] The value of chooz is the corrected spelling of 42 xword or NIL; chooz performs no interaction and no output. rel, splst, tail, tieflg, and fn are as described under fixspell above. If tail is not NIL and the misspelling consists of running two words together, e.g., (BREAKFOO) for (BREAK FOO), the value of chooz will be a dotted pair of the two words, e.g., (BREAK . FOO). If the correction involves a synonym (see page 17.9), i.e., if xword is or corrects to word1 in (word1 . word2), the value of chooz will be 43 (word1 word2). fncheck[fn;noerrorflg;spellflg;propflg;tail] The task of fncheck is to check whether fn is the name of a function and if not, to correct 44 its spelling. If fn is the name of a function or spelling correction is successful, fncheck adds the (corrected) name of the function to userwords using addspell, and returns it as its value. noerrorflg informs fncheck whether or not the calling function wants to handle the unsuccessful case: if noerrorflg is T, fncheck simply returns NIL, otherwise it prints fn NOT A FUNCTION and generates a non-breaking error. ------------------------------------------------------------------------ 41 chooz has some additional arguments, for internal use by DWIM. 42 chooz does not perform spelling completion, only spelling correction. 43 not (word1 . word2), since this would correspond to a run-on correction. 44 Since fncheck is called by many low level functions such as arglist, unsavedef, etc., spelling correction only takes place when dwimflg=T, so that these functions can operate in a small INTERLISP system which does not contain DWIM. 17.23 If fn does not have a definition, but does have an EXPR property, then spelling correction is not attempted. Instead, if propflg=T, fn is considered to be the name of a function, and is returned. If propflg=NIL, fn is not considered to be the name of a function, and NIL is returned or an error generated, depending on the value of noerrorflg. fncheck calls misspelled? to perform spelling correction, so that if fn=NIL, the value of lastword will be returned. spellflg corresponds to misspelled?'s fourth argument, flg. If spellflg=T, approval will be asked if DWIM was enabled in CAUTIOUS mode, i.e., if approveflg=T. tail corresponds to the fifth argument to misspelled?. fncheck is currently used by arglist, unsavedef, prettyprint, break0, breakin, chngnm, advise, printstructure, firstfn, lastfn, calls, and edita. For example, break0 calls fncheck with noerrorflg=T since if fncheck cannot produce a function, break0 wants to define a dummy one. printstructure however calls fncheck with noerrorflg=NIL, since it cannot operate without a function. Many other system functions call misspelled? or fixspell directly. For example, break1 calls fixspell on unrecognized atomic inputs before attempting to evaluate them, using as a spelling list a list of all break commands. Similarly, lispx calls fixspell on atomic inputs using a list of all lispx commands. When unbreak is given the name of a function that is not broken, it calls fixspell with two different spelling lists, first with brokenfns, and if that fails, with userwords. makefile calls misspelled? using filelst as a spelling list. Finally, load, bcompl, brecompile, tcompl, and recompile all call misspelled? if their input file(s) won't open. 17.24