/***************************************************************************** spell.c spelling checker V2.3 copyright Harold Z. Bencowitz changed most recently 13-feb-87 ****************************************************************************** description: spell is a spelling checker written in whitesmith's C which runs on rt11 and tsx+. it has been tested on v5.3 and v6.01 respectively. words in the input file are compared to one or more dictionaries (files of alphabetized words) and an alphabetized list of the unmatched words is sent to the output file(s). the output list can list each word as many times as it is used or optionally only once. spell can be used as a tool to alphabetize a list of words (eg a dictionary) or to produce an alphabetized list of the words in a text file without comparing to a dictionary. operating instructions: instructions are also explained by the help (\h) command. the operator is prompted by '*' from the rt11 command string interpretor (csi). in response enter: [outfile1][,outfile2][,outfile3]=infile1[,infile2][,infile3][/o][/p] where there can be 0-3 output files (can be tt:, lp: etc) and 1-6 input files. if none is specified the output file default is *.spl where the name is that of input file 1. input files 2-6 are optional dictionaries (in addition to the main dictionary sy:spell.wrd). the default input filetype is .wrd for both the file being checked and user dictionaries. note that if a device name is specified in the output or input file lists, that device is used as the default device for any other files in the same list. the default dictionary is sy:spell.dic. there are several options which can be combined in any combination (except with /h help, the rest of the command is ignored). other options are: /d (do not use the default dictionary), /u (unique - output file contains unique words ie only one copy of each unmatched word), /r (runoff - ignores lines in the input file starting with '.', /i (input file is a word list, ie one word per line), and /s (shell sort - much faster for input files already sorted or nearly sorted, eg a dictionary). spell can be used (if no dictionary specified and /d option) to convert a text file to an alphabetized list of the words contained or to alphabetize a list of words. to alphabetize a word list (eg a user dictionary) use options /d/s/i/u. possible future changes: a display in context feature, possibly with automatic dictionary updates is under development. revision history: v1.1 completed 16-oct-83 v1.2 completed 06-sep-86 minor code changes for compatibility with newer versions of Whitesmith's v2.0 completed 31-oct-86 a complete revision with most of the code rewritten. the basic algorithm was completely changed to read input file into memory and access dictionaries from disk (opposite to v1.2). bugs fixed in input of hyphenated words. bugs fixed in handling of runoff files. default output filetype added. timing added. quick sort added. faster copy routine added. char/word optimized. default dictionaries combined into one. output routine changed. new faster input routine to read dictionary files. v2.1 completed 11-dec-86 all input/output routines rewritten. external sort (merge added). words less than 2 characters excluded. enter/leave and other changes for improved error handling. v2.2 completed 19-dec-86 /i option for faster input of word lists. better quicksort. default dictionary compressed. bugs in getword() and hgetc() fixed. strip out common words in load(). v2.3 completed 13-feb-87 use of memory altered to use all of memory without fixed char/word ratio (pointer array grows down from memtop). bug in sp_out() fixed. skip most dictionary word comparisons. bugs in hopen() and merge() fixed. limitations: the default dictionary must be on SY: and in compressed format. before compression it must be in the user dictionary format. user dictionary files must be alphabetized, all in lower case, and each word separated by including the last word. only a - z and ' characters may be used. user dictionaries are compressed into dictionaries using program compd. the instructions for this program are in on-line help. the output list of words (from spell) is always alphabetized and can not be obtained in order of appearance. uppercase letters are converted to lower case before comparing to the dictionaries. words can not be displayed in context. there is no limit to the size of the input file and the dictionary files other than available disk space. there is no specific terminal requirement. installation and building: compile spell.c and link with hclib.obj (my c library) and clib (whitesmith's c library). to install place spell.sav wherever you wish and the default dictionary (spell.dic) on the systems disk (sy:). spell.dic can be replaced by any ascii file list of words containing only alphabetic characters (a-z) in lower case and the apostrophe character (') with each word separated by carriage return-line feed (ie '\n') and in alphabetical order if this list is compressed using program compd.sav. it is probably easier to use this default dictionary and either add words to it as needed or else put additional words into special user dictionaries (format as above but not compressed), although if a large user dictionary is to be used it may be faster to add the words to spell.dic since a compressed dictionary is read faster. implementation notes: the program sizes available memory at run time and uses all of it to store words. as the input file is read it is parsed to remove all characters but ' and a-z. upper case letters are converted to lower case. hyphenated words at the end of a line are joined. hyphenated words within a line are treated as two separate words. runoff '.' commands are ignored if option /r is selected. the input words are loaded into memory. unless /d is specified each input word is compared to a list of the most common english words and not loaded if matched. words of less than 2 characters are also not loaded. the number of words and characters loaded will not include those rejected because of a match or one character. if the input file is a word list it will be loaded faster with the /i option which uses getword() instead of parsing the file a character at a time. the words are then alphabetized. quick sort is used but shell sort (option /s) is twice as fast if the words are close to alphabetical. if option /u is selected, duplicate words are removed. remaining words in memory are then compared to the default dictionary (unless option /d selected) and any other dictionaries specified. the routine which reads the dictionary files makes very strict assumptions about the format of the file as described above. dictionary files must only contain a-z [lower case], ', , and . unless /d or /i is selected, the input words are compared to a list of common words and rejected if matched before loading them into memory. spell can be used as a tool to prepare dictionaries since its output is in the correct format. the output is written to a temporary disk file. this file is copied to the specified output files or renamed if on the same disk.