SUBROUTINE GETLEX (ITMP) C C C ************************************************************************ C * C THIS PROGRAM IS PROVIDED ON AN "AS IS" BASIS ONLY. DIGITAL EQUIPMENT * C COMPUTER USER'S SOCIETY, DIGITAL EQUIPMENT CORPORATION, MONSANTO, AND * C THE AUTHOR DISCLAIM ALL WARRANTIES ON THE PROGRAM, INCLUDING WITHOUT * C LIMITATION, ALL IMPLIED WARRANTIES OF MERCHANTABLITY AND FITNESS. * C * C FULL PERMISSION AND CONSENT IS HEREBY GIVEN TO DECUS AND TO THE DECUS * C SPECIAL INTEREST GROUPS TO REPRODUCE, DISTRIBUTE, AND PUBLISH AND * C PERMIT OTHERS TO REPRODUCE IN WHOLE OR IN PART, IN ANY FORM AND * C WITHOUT RESTRICTION, THIS PROGRAM AND ANY INFORMATION RELATING THERETO * C * C ************************************************************************ C C C VERSION: V02.00 C C AUTHOR: RL AURBACH MAPC 11-MAR-79 C C GETLEX PERFORMS LEXICAL TOKEN ANALYSIS. C GIVEN AN ASCII STRING, IT RETURNS A CODE IDENTIFYING C THE NEXT LEGICAL TOKEN, AND IN SOME CASES, PLACES THE VALUE OF C THE TOKEN ON THE LEXICAL STACK. C C COMPILATION INSTRUCTIONS: C C GETLEX,GETLEX=GETLEX/-SP C [FILE WIRCOM.FTN INCLUDED] C [FILE PARSE.COM INCLUDED] C C MODIFICATION HISTORY: C C V01.00 EAS 18-SEP-75 ORIGINAL CODE C V01.01 EAS 03-NOV-75 RT-11 VERSION C V01.02 EAS 06-DEC-76 ADD SUPPORT FOR - TO BE EITHER A C MINUS SIGN OR A HYPHEN. C V01.03 RLA 09-JUN-78 RSX-11M VERSION C V01.04 RLA 14-JUL-78 ADD SUPPORT FOR LOWER-CASE CHARACTERS. C V02.00 RLA 11-MAR-78 1) FORTRAN REWRITE C 2) ADD SUPPORT FOR ALPHAMERIC SYMBOLS C IN THE FORM '74LS00' C C C PAGE 2 - WIRLEX - DOCUMENTATION. C C ORIGINAL CONCEPTION DUE TO: C ERIC SCHIFF C LASSP C CORNELL UNIVERSITY C ITHACA, NEW YORK 14853 C C GETLEX IS A SUBROUTINE TO PERFORM A VERY SIMPLE LEXICAL ANALYSIC PROCEDURE. C C FIVE CLASSES OF LEXICAL TOKENS ARE RECOGNIZED: C C 1 END-OF-MEDIUM OR END-OF-FILE C 2 END-OF-RECORD OR END-OF-LINE C 3 NON-ALPHANUMERIC CHARACTER C 4 ALPHANUMERIC SYMBOL C 5 INTEGER LITERAL C C THE FIRST THREE TOKENS HAVE THE OBVIOUS MEANINGS. C AN ALPHANUMERIC SYMBOL IS ANY CHARACTER STRING COMPOSED SOLELY OF LETTERS C AND NUMBERS. AT LEAST ONE OF THE CHARACTERS IN THE SYMBOL MUST BE A LETTER. C OTHERWISE THE SYMBOL WILL BE ASSUMED TO BE AN INTEGER LITERAL. C C THE ROUTINE WORKS ON A LINE BUFFER WITH STATUS WORDS LOCATED IN THE LABELED C PROGRAM SECTION "PARSE". THE LINE BUFFER CAN HOLD 122 CHARACTERS. THE C NUMBER OF CHARACTERS IN THE BUFFER IS INDICATED BY THE WORD ICHAR AS A C POSITIVE INTEGER. A "CURRENT CHARACTER" POINTER IS MAINTAINED AS "NEXCHR". C A POINTER TO THE CURRENTLY ANALYZED TOKEN IS MAINTAINED AS "TOKEN". C C THE ROUTINE MAINTAINS A 'NEW LINE' FLAG WHICH RESULTS IN THE CALLING OF C ANOTHER SUBROUTINE 'GETLIN' WHEN SET (TRUE). THE FLAG IS SET AFTER C ENCOUNTERING AN END-OF-LINE; GETLIN IS CALLED ON THE NEXT CALL TO GETLEX. C C THE RESULTS OF GETLEX'S OPERATION ARE LEFT ON AN INTEGER STACK MAINTAINED C IN THE LABELED PROGRAM SECTION "LEXATT". THE STACK IS MAINTAINED IN A C FORTRAN-COMPATABLE MANNER USING A STACK OFFSET POINTER LEXSP. C C GETLEX IS CALLED WITH AN INTEGER ARGUMENT. UPON RETURN FROM GETLEX, THIS C ARGUMENT CONTAINS THE LEXICAL TYPE CODE (1-5) LISTED ABOVE. C C FOR CODES 1, 2, AND 3, NO INFORMATATION IS PLACED ON THE LEXICAL STACK IN C "LEXATT". FOR AN ALPHANUMERIC SYMBOL (CODE 4), TWO WORDS ARE PUSHED ON THE C STACK. ONE WORD DEEP IN THE STACK IS A POINTER TO THE FIRST CHARACTER IN C THE SYMBOL. AT THE TOP OF THE STACK IS THE NUMBER OF CHARACTERS IN THE C SYMBOL. FOR AN INTEGER LITERAL (CODE 5), ONE WORD IS PUSHED ON THE STACK. C THIS WORD IS FILLED WITH THE VALUE OF THE INTEGER LITERAL. C C PAGE 2 A - WIRLEX - DOCUMENTATION (CONTINUED). C C C THE ROUTINE HAS TWO ENTRY POINTS. C IF CALLED AT "GETLEX", THE '-' SYMBOL, IF FOLLOWED BY NUMERIC CHARACTERS, IS C INTERPRETED AS A MINUS SIGN. C IF CALLED AT "GETLX0", THE '-' SYMBOL IN SIMILAR CIRCUMSTANCES IS INTERPRETED C AS A DASH. C C THAT IS, AT "GETLEX", '-74 ' IS INTERPRETED AS INTEGER LITERAL (-74) FOLLOWED C BY NON-ALPHANUMERIC TERMINATOR (SPACE). C ON THE OTHER HAND, IF CALLED AT "GETLX0", '-74 ' IS INTERPRETED AS C NON-ALPHANUMERIC TERMINATOR (-), INTEGER LITERAL (74), AND NON-ALPHANUMERIC C TERMINATOR (SPACE). C C CALLING SEQUENCE: C C CALL BY: CALL GETLEX (ITMP) C OR C CALL GETLX0 (ITMP) C C WHERE: ITMP IS THE INTEGER ASSOCIATED VARIABLE. C ON RETURN, IT CONTAINS THE NUMERIC CODE OF THE TOKEN. C C ON ENTRY: ITMP ARBITRARY C "PARSE" COMMON AREA ASSUMED VALID C C ON EXIT: ITMP CONTAINS CODE OF LEXICAL TOKEN C "PARSE"-ICHAR UNCHANGED C TOKEN POINTS TO BEGINNING OF CURRENT TOKEN C NEXCHR POINTS TO NEXT CHARACTER IN BUFFER C NEWLIN .TRUE. IF EOL DETECTED, ELSE UNCHANGED C ICODE UNCHANGED C LINBUF UNCHANGED C C ERROR CONDITIONS: C C NONE C C SPECIAL NOTES: C C *** VERSIONS 1.4 AND LATER *** C C THIS ROUTINE NOW RECOGNIZES LOWER CASE CHARACTERS AS ALPHAMERIC C CHARACTERS. NO CASE CONVERSION IS DONE. C C SUBROUTINES CALLED: C C GETLIN PUSH C C LOCAL FUNCTIONS: C C EOL NUMBER ALPHA C C PAGE 3 - WIRLEX - DATA STORAGE DESCRIPTIONS. C C CALLING PARAMETERS: C INTEGER ITMP !GETLEX ASSOCIATED VARIABLE C C PARAMETER STATEMENTS C INCLUDE 'WIRCOM.FTN' INCLUDE 'PARSE.COM' C C LOCAL VARIABLE AND ARRAY DEFINITIONS: C LOGICAL*2 MINFLG !IF TRUE, TREAT '-' AS MINUS SIGN LOGICAL*2 NEG !IF TRUE, NUMBER IS NEGATIVE (AND THEREFORE ! NOT ALPHABETIC) LOGICAL*2 EOL !LOGICAL FUNCTION TRUE IF EOL FOUND LOGICAL*2 ALPHA !LOGICAL FUNCTION TRUE IF ALPHABETIC CHAR FOUND LOGICAL*2 NUMBER !LOGICAL FUNCTION TRUE IF NUMBER FOUND INTEGER*2 N !LOCAL VALUE. = VALUE OF NUMERIC LITERAL BYTE ZERO !"0" BYTE NINE !"9" BYTE UA !"A" BYTE UZ !"Z" BYTE LA !"a" BYTE LZ !"z" BYTE SEMI !";" BYTE DASH !"-" C C COMMON AREAS C C C LOCAL VARIABLE AND ARRAY INITIALIZATION: C DATA ZERO,NINE / '0', '9' / DATA UA,UZ,LA,LZ / 'A', 'Z', 'a', 'z' / DATA SEMI,DASH / ';', '-' / C C PAGE 4 - WIRLEX C C C DEFINE LOCAL LOGICAL FUNCTIONS. C C EOL IS TRUE IF AN END-OF-LINE IS FOUND C EOL(L) = L .GT. ICHAR C C NUMBER IS TRUE IF A NUMERIC CHARACTER IS FOUND C NUMBER(L) = (LINBUF(L) .GE. ZERO) .AND. (LINBUF(L) .LE. NINE) C C ALPHA IS TRUE IF AN ALPHABETIC CHARACTER IS FOUND C ALPHA(L) = ((LINBUF(L) .GE. UA) .AND. (LINBUF(L) .LE. UZ)) .OR. 1 ((LINBUF(L) .GE. LA) .AND. (LINBUF(L) .LE. LZ)) C C GETLEX ENTRY POINT C C TREAT - AS MINUS SIGN C MINFLG = .TRUE. GOTO 1 C C GETLX0 ENTRY POINT C C TREAT - AS DASH C ENTRY GETLX0 (ITMP) MINFLG = .FALSE. C C CHECK TO SEE IF GETLIN NEEDS TO BE CALLED. CALL IT IF NECESSARY. C THEN CHECK FOR END-OF-FILE (ICODE .NE. 0). C C CHECK FOR THE KIND OF TOKEN (TYPES 1,2,3,4,5). C DISPATCH TO (10,20,30,40,50) RESPECTIVELY TO HANDLE EACH TYPE. C C NOTE THAT IF A NUMBER IS SEEN, THE TOKEN COULD BE EITHER NUMERIC C ALPHANUMERIC. WE ASSUME IT IS NUMERIC UNTIL PROVEN DIFFERENT. C 1 CONTINUE IF (.NOT. NEWLIN) GOTO 2 CALL GETLIN IF (ICODE .NE. 0) GOTO 10 2 CONTINUE TOKEN=NEXCHR IF (EOL(NEXCHR)) GOTO 20 IF (NUMBER(NEXCHR)) GOTO 52 IF (ALPHA(NEXCHR)) GOTO 40 GOTO 30 C C PAGE 5 - WIRLEX C C HANDLE END OF FILE TOKEN C 10 CONTINUE ITMP=1 RETURN C C HANDLE END OF LINE TOKEN C 20 CONTINUE NEWLIN=.TRUE. ITMP=2 RETURN C C HANDLE NON-ALPHAMERIC TERMINATOR TOKEN C C IF THE CHARACTER IS A SEMICOLON, IT INDICATES A COMMENT. C COMMENTS ARE TREATED AS AN EOL. C C IF THE CHARACTER IS A DASH AND THE NEXT CHARACTER IS A NUMBER, C THEN IF MINFLG IS TRUE, THE CHARACTER IS A MINUS SIGN. C ELSE IT IS A DASH. C 30 CONTINUE IF (LINBUF(NEXCHR) .EQ. SEMI) GOTO 20 IF (LINBUF(NEXCHR) .NE. DASH) GOTO 35 IF (EOL(NEXCHR+1) .OR. .NOT. MINFLG) GOTO 35 IF (NUMBER(NEXCHR+1)) GOTO 50 35 CONTINUE NEXCHR=NEXCHR+1 ITMP=3 RETURN C C PAGE 6 - WIRLEX C C HANDLE ALPHANUMERIC TOKEN C C SEARCH FOR THE END OF THE TOKEN (END OF LINE OR NON-ALPHANUMERIC CHARACTER) C C PUSH THE STARTING ADDRESS OF THE TOKEN AND ITS LENGTH ONTO THE LEXICAL STACK. C 40 CONTINUE NEXCHR=NEXCHR+1 IF (EOL(NEXCHR)) GOTO 42 IF (ALPHA(NEXCHR) .OR. NUMBER(NEXCHR)) GOTO 40 42 CONTINUE CALL PUSH(TOKEN) CALL PUSH(NEXCHR-TOKEN) ITMP=4 RETURN C C HANDLE NUMERIC TOKEN C C ACCUMULATE THE NUMERICAL VALUE OF THE TOKEN WHILE SEARCHING FOR ITS END. C C IF AN EOL OR A NON-ALPHANUMERIC SYMBOL IS FOUND, IT MARKS THE END OF THE C NUMERIC LITERAL. C C IF AN ALPHABETIC CHARACTER IS FOUND, THE SYMBOL IS REALLY AN ALPHANUMERIC ONE C AND CONTROL IS PASSED TO THE ALPHANUMERIC TOKEN HANDLER (AT 40). C C AS A SPECIAL CASE, IF MINFLG IS TRUE, THE STRING "-72S" WILL BE INTERPRETED C AS NUMBERIC TOKEN (-72) FOLLOWED BY ALPHAMERIC TOKEN (S). C 50 CONTINUE NEG = .TRUE. NEXCHR=NEXCHR+1 GOTO 53 52 CONTINUE NEG = .FALSE. 53 CONTINUE N=0 54 CONTINUE N=10*N+(LINBUF(NEXCHR)-"60) NEXCHR=NEXCHR+1 IF (EOL(NEXCHR)) GOTO 58 IF (ALPHA(NEXCHR) .AND. .NOT. NEG) GOTO 40 IF (NUMBER(NEXCHR)) GOTO 54 58 CONTINUE IF (NEG) N=-N CALL PUSH(N) ITMP=5 RETURN C C END