PROGRAM DISCRIPTION: THIS ROUTINE TAKES TWO FILES. A USER DEFINED STOP LIST, AND A FILE TO BE KEY-WORD-IN-CONTEXT INDEXED. THE USER SUPPLIES THE LOCATION OF THE INPUT FILES AND A PLACE TO WRITE THE INDEX FILE AND A TITLE FOR THE LISTING. THIS ROUTINE READS THE ENTIRE MASTER FILE (DATA TO BE INDEXED) INTO CORE AND MUST BE ABLE TO READ IT ALL INTO CORE AT ONCE TO RUN. THE PROGRAM ALSO MAKES A FREQUENCY FILE WHICH CONSISTS OF THE NUMBER OF TIMES EACH INDEX TERM WAS USED. THIS PROGRAM WAS WRITTEN BY G.B. MOERSDORF AT THE OHIO STATE UNIVERSITY. THE SYSTEM WAS DEVELOPED ON OSU'S NON DISK NON SWAPPING 32K PDP-10. THE SYSTEM RUNS UNDER A 4NN72 OR BETTER MONITOR. THE CODE WAS WRITTEN TO BE COMPLETELY DEVICE INDEPENDANT. THE ONLY RESTRICTION ON THE INPUT DEVICES IS THAT THEY CAN DO IMAGE BINARY MODE (10) INPUT. THE RESTRICTION ON THE LISTING DEVICE IS THAT IT CAN DO ASCII LINE MODE (1) OUTPUT. THE LISTING WIDTH CAN BE ADJUSTED TO ANY SIZE LINE PRINTER OR TELETYPE WHICH HAS MORE THAN 60 PRINT POSITIONS. QUICK INSTRUCTIONS TO RUN KWIC: THE BEST WAY TO DESCRIBE THE INDEX IS BY MAKING ONE, TO USE THE DEMO DATA SUPPLIED DO THE FOLLOWING: (1) MOUNT DISTRIBUTION DECTAPE AND ASSIGN 'DSK' OR 'PIP' TAPE TO YOUR DISK AREA. (2) TYPE RUN DTA#:KWIC (FOR DISK RU KWIC) (3) WHEN ASKED FOR 'MASTER FILE' TYPE CR TO USE DEFAULT OR THE FILE NAME SPECIFICATION IN THE FORM 'DEV:FILE.EXT'. I.E. DEVICE NAME: FILE NAME. EXTENSION (CR). DEFAULTS: DEV=DSK, FILE=KWIC, EXT=MAS. DEFAULTS SPECIFY NAME OF TEST DATA SET. (4) WHEN ASKED FOR 'STOP FILE' TYPE CR OR FILE SPECIFICATION AS ABOVE. DEFAULTS: DEV=DSK, FILE=KWIC, EXT=STP. (5) WHEN PROMPED FOR 'INDEX FILE' TYPE CR OR FILE SPECIFICATION AS ABOVE DEFAULTS: DEV=DSK, FILE=KWIC, EXT=NDX. (THIS WILL WRITE LISTING ON DECTAPE OR DISK IF YOU HAVE ONE, UNDER THE NAME 'KWIC.NDX' PPN 0,0). (6) WHEN PROMPTED WITH 'FREQUENCY FILE' TYPE A CARRIAGE RETURN TO DEFAULT TO 'DSK:KWIC.FRQ'. THIS IS THE WHERE THE PROGRAM WILL WRITE THE FREQUENCY FILE. (7) WHEN PROMPTED WITH 'LISTING TITLE' TYPE YOUR NIFTY COMPANY NAME OR SLOGAN. (MAX 80 CHARACTERS) (8) WHEN IT PRINTS 'EXTI' THE INDEX HAS BEEN WRITTEN ON THE FILE DESCRIBED IN STEP 5 AND THE FREQUENCY LIST ON THE FILE SPECIFIED IN STEP 6 (9) PRINT THE INDEX AND FREQUENCY FILES WITH 'PIP'. AREN'T THEY BEAUTIFULL? (10) IF IT IS NOT BEAUTIFUL GO TO 'IMPLEMENTATION ON YOUR 10'. FORMAT OF 'STOP LIST' FILE: THE USER CREATES A 'STOP LIST' OF WORDS WHICH THE USER FEELS HAVE NO USE AS INDEX TERMS FOR HIS PATRTICULAR APPLICATION. ONE SUCH 'STOP LIST' IS SUPPLIED WITH THE PACKAGE, IT IS CALLED 'KWIC.STP'. THE SUPPLIED LIST IS A GENERALIZED STOP LIST WHICH CONTAINS 'LOW VALUE' KEYWORDS SUCH AS, A, AN, IN, THE. THIS FILE MUST BE IN ALPHABETICAL ORDER. THE FILE MAY HAVE STANDARD D.E.C. SEQUENCE NUMBERS. EACH WORD TO BE STOPPED MUST BE DELIMITED BY A CARRIAGE RETURN LINE FEED. SPACES AND TABS ARE IGNORED. FORMAT OF 'MASTER' FILE: THE MASTER FILE CONSISTS OF THE DATA TO BE INDEXED BY THE KWIC PROGRAM. THIS MAY BE ANY TYPE OF ALPHANUMERICAL DATA. THE USUAL DATA WOULD BE IN THE FORM OF MANY BOOK TITLES IN A SPECIFIC AREA OF STUDY, OR POSSIBLY A WHOLE LIBRARY'S CATALOGUE. BUT THE PROGRAM IS FLEXIBLE ENOUGH TO ALLOW KWIC INDEXING OF A THESIS PAPER OR SIMILAR DOCUMENT (FOR WHAT IT'S WORTH). THE DELIMITERS FOR EACH FIELD OF DATA (ALL 3 DELIMITERS) ARE DECIDED UPON BY THE USER AT ASSEMBLE TIME. THIS FILE SHOULD HAVE SEQUENCE NUMBERS AS THEY ARE USED IN THE IDENTIFICATION OF SYNTAX ERROR LOCATIONS. AFTER DEBUGGING THE DATA YOU MAY REMOVE THE SEQUENCE NUMBERS TO SAVE DISK SPACE AND THE PROGRAM WILL OPERATE NORMALLY. GENERAL FORMAT OF AN ITEM IN THE MASTER FILE AS FOLLOWS: 1) STANDARD D.E.C. SEQUENCE NUMBER 2) FIELD OF DATA TO BE INDEXED (MAY BE CONTINUED ON ANY NUMBER OF LINES. I.E. A CARRIAGE RETURN LINE FEED IS IGNORED COMPLETELY) 3) THE DELIMITER FOR SORT FIELD ('=' ON THE DISTRIBUTED VERSION. 4) NEXT ANY DATA TO BE TOTALLY IGNORED BY THE SYSTEM SUCH AS COPYRIGHT DATE AND PUBLISHER. THIS WAS DONE SO THAT THE SAME DATA BASE CAN BE USED FOR THIS PROGRAM AS FOR OTHERS. I.E. ONE THAT USES DATA NOT NORMALLY KWIC INDEXED. 5) THE I.D. DELIMITER CHARACTER (IN DISTRIBUTION IT IS A '[') 6) THE IDENTIFICATION NUMBER TO BE ASSOCIATED WITH THE ITEM. THE MAXIMUM LENGTH OF THIS FIELD IS ALSO ADJUSTABLE AT ASSEMBLE TIME. IN THE DISTRIBUTION IT IS 10 DIGITS. WARNING! USE NO SPACES OR TABS IN THIS FIELD 7) THE END OF ITEM DELIMITER. IN THE DISTRIBUTION IT IS A ']' NOTE 1: SINCE A CARRIAGE RETURN LINE FEED IS IGNORED TO CONTINUE A WORD ON ANOTHER LINE THE USER MERELY TYPES THE REST OF THE WORD WITH NO SPACES. BUT IF HE WISHES TO DELIMIT THE WORD WITH A SPACE HE MUST TYPE IT EITHER AT THE END OF THE LINE OR THE BEGINNING OF THE CONTINUATION LINE. EX: THE OHIO STATE UNIVE RSITY (CONTINUATION OF SAME WORD) THE OHIO STATE UNIVERSITY (TWO SEPERATE WORDS) NOTE 2: A SPACE AND A TAB ARE THE ONLY CHARACTERS WHICH DELIMIT A WORD FROM ITS NEIGHBOR. SEQUENCIAL SPACES OR TABS ARE REDUCED TO ONE SPACE ON THE LISTING. NOTE 3: TWO CONVENTIONS HAVE BEEN USED IN THE TEST DATA WHICH YOU MAY WANT TO USE. THE FIRST IS TO PLACE ALL THE AUTHOR'S NAMES IN PARENS. THIS WILL MAKE ALL THE AUTHOR'S NAMES APPEAR IN ONE SPOT IN THE INDEX. THE SECOND CONVENTION IS TO USE A '/' IN FRONT OF ANY WORD WHICH IS NOT IN THE TITLE, BUT YOU FEEL HAS VALUE AS A INDEX TERM FOR THIS ITEM. IMPLEMENTATION ON YOUR 10: THERE ARE MANY ASSEMBLY PARAMETERS AND THE ONES WHICH DIRECTLY AFFECT YOUR INSTALATION ARE AS FOLLOWS: SWITCH OR DEFAULT DESCRIPTION OR VARIABLE VALUE ACTION TAKEN ____________________________________________________________ LPTSIZ 132 THE WIDTH OR DESIRED WIDTH OF INDEX LINE. YOU MAY WANT TO RESTRICT THIS FOR DUPLICATION PURPOSES. IT MAY BE ANY EVEN NUMBER FROM 60 TO THE WIDTH OF THE OUTPUT DEVICE LINE. DELSRT "=" DELIMITER FOR THE SORTED DATA FIELD. (THE FIRST FIELD DELIMITER) THIS MAY BE ANY CHARACTER GREATER THAN A SPACE (40) PUT THE CHARACTER IN DOUBLE QUOTES. DELKEY "[" DELIMITER FOR IDENTIFICATION FIELD. SAME RESTRICTIONS AS FOR DELSRT. DON'T THINK YOU'RE SMART AND USE THE SAME CHARACTERS FOR ALL OR SOME DELIMITERS. DELEOL "]" DELIMITER FOR THE END OF THE ITEM (FOLLOWS THE IDENTIFICATION FIELD) SAME RESTRICTIONS AS FOR DELKEY MAXLIN ^D50 NUMBER OF LINES PUT ON A PAGE (NOT INCLUDING THE HEADER) SIZWRD ^D50 MAXIMUM NUMBER OF CHARACTERS IN ANY ONE WORD. I.E. BEFORE A SPACE OR TAB. (THIS ALLOWS ALL THOSE ALL TIME FAVORITES LIKE ANTIDISESTABLISHMENTENTARIASM) MAXSAM ^D300 MAXIMUM NUMBER OF WORDS WHICH ARE NOT STOP WORDS AND ARE IDENTICAL. THIS IS THE SIZE OF THE HASH TABLE. DEBUG 0 IF 1 WILL MAKE A NON REENTRANT BEGUGGING VERSION. USED ONLY WHEN FIXING PROGRAM) REENT 1 IF 1 GIVES REENTRANT CODE AND IF 0 MAKES A NON REENTRANT. IDLEN ^D10 MAXIMUM SIZE OF THE I.D. FIELD FREQSW 1 IF 1 ASSEMBLES THE FREQUENCY LIST CODE. IF 0, NO FREQUENCY LIST IS GENERATED. NOTES AND RANDOM INFO: 1) DO NOT (NOT!) USE A STRING OF 5 OR MORE "_" CHARACTERS IN SEQUENCE ON ANY DATA FILE. 2) THE USER CANNOT SPECIFY A PPN IN A FILE SPECIFICATION. 3) DO NOT END THE STOP LIST WITHOUT A CARRIAGE RETRUN LINE FEED. EX: NOT THIS^Z ^ WRONG WAY BUT THIS ^Z ^ CORRECT WAY 4) IF NO SEQUENCE NUMBERS ARE ON THE MASTER FILE THE ERROR MESSAGES WILL NOT LOCATE THE LINES IN ERROR ON THE FILE BUT MERELY PRINT THE FACT THAT THEY EXIST. 5) THE SYSTEM RUNS UNDER 4NN72 OR BETTER MONITORS (THERE SHOULD BE NO MONITOR RESTRICTIONS IF IT RUNS UNDER OUR MONITOR) 6) ON OUR SYSTEM USING ALL OF USER CORE (23K) WE CAN HOLD AND KWIC INDEX A LIBRARY CATALOGUE OF 4000 ITEMS. 7) USING THE SAME DATA (A SMALL AMOUNT) THIS PROGRAM HAS RUN FASTER ON THE 10 THAN ON OUR 370/165. 8) IF A WORD IN THE STOP LIST IS LONGER THAN 12 CHARACTERS IT WILL BE TRUNCATED IN THE LISTING BUT ITS VALUE WILL BE UNCHANGED. ERROR MESSAGES: THE FOLLOWING IS A LIST OF ERROR MESSAGES AND THEIR MEANING. 1) CANNOT INIT XXXXX DEVICE DEVICE SPECIFIED IN AN INPUT PARM OR A DEFAULT SPECIFICATION WAS NOT CORRECT OR AVAILABLE TO THE USER. 2) CANNOT FIND XXXXX FILE THE FILE SPECIFIED (TYPE 'XXXXXX') COULD NOT BE FOUND. 3) CANNOT ENTER XXXXX FILE THE DIRECTORY ON THE DEVICE SPECIFIED TO WRITE THE 'XXXXX' LISTING ON WAS FULL. 4) ?READ ERROR ON 'XXXXX' FILE A DEVICE ERROR OCCURED ON THE 'XXXXX' FILE WHILE READING. 5) ?WRITE ERROR ON 'XXXXX' FILE A DEVICE ERROR OCCURED ON THE 'XXXXX' FILE WHILE WRITING. 6) ?MASTER FILE NO LONGER AVAILABLE THE PROGRAM RELEASES THE MASTER FILE FOR A SHORT PERIOD WHILE IT READS IN THE STOP LIST FILE. THIS IS SO ON A DECTAPE SYSTEM THESE TWO FILES MAY BE ON THE SAME DRIVE. THIS ERROR OCCURES WHEN IT LOOKS FOR THE FILE THE SECOND TIME (AFTER THE STOP LIST IS READ IN) AND CANNOT FIND IT. THIS SHOULD NEVER HAPPEN, IF IT DOES THE JOB BOMBS OFF. 7) ?FATAL UUO FAILURE -BADFAL- A CORE UUO FAILED WHILE DE-ALLOTING CORE. THIS SHOULD BE AN IMPOSSIBLE CONDITION. THE JOB BOMBS OFF. 8) ?MAXIMUM SIZE WORD EXCEEDED WORD=CCCCCCCCCC A WORD LONGER THAN THE LENGTH SPECIFIED BY THE 'SIZWRD' ASSEMBLY CONSTANT WAS EXCEEDED. JOB BOMBS OFF. THE 'CCCCCC' WILL BE THE WORD IN ERROR. 9) ?TOO MANY MATCHES FOR ARRAY WORD=CCCCCCCCC MORE THAN THE NUMBER OF IDENTICAL INDEX ITEMS SPECIFIED BY THE 'MAXSAM' ASSEMBLY CONSTANT WERE FOUND. JOB BOMBS OFF. THE 'CCCCC' WILL BE THE WORD WHICH OCCURED MANY TIMES. 10) ?CORE UUO FAILED--TRYING AGAIN IF THE CORE UUO FAILS WHILE TRYING TO READ IN DATA THIS MESSAGE IS PRINTED. 30 SECONDS LATER THE PROGRAM WILL TRY TO ALLOCATE THE CORE AGAIN. IT CONTINUES LOOPING TILL IT GETS THE CORE. THIS IS USEFULL ON NON SWAPPING SYSTEMS WHERE A USER CAN WAIT FOR THE CORE TO BECOME FREE. 11) ?ERROR IN LINE NNNNN--- THE LINE NNNNN IS BAD OR ONE OF THE NEAR LINES. THE SPECIFIC ERROR FOLLOWS. IF ANY OF THESE ERRORS (THE ONES IN SECTION 11) OCCUR THE KWIC INDEX AND FREQUENCY LIST ARE NOT GENERATED, ONLY THE STOP LIST. ---I.D. NUMBER TOO LONG MEANS SIZE OF IDENTIFICATION NUMBER GREATER THAN 'IDSIZ' ASSEMBLY CONSTANT. ---NO I.D. NUMBER FOUND MEANS JUST WHAT IT SAYS. ---NO SORT DELIM FOUND MEANS JUST WHAT IT SAYS. ---SYNTAX ERROR UNDIAGNOSABLE ERROR. (YOUR GUESS) HAVING PROBLEMS: IF YOU FIND ANY BUGS OR HAVE ANY SUGGESTIONS PLEASE USE THE BELOW ADDRESS. G.B. MOERSDORF PDP-10 ROOM CALDWELL LAB. OHIO STATE UNIVERSITY COLUMBUS, OHIO 43210 614-422-8039