/*****************************************************************************

	spell.c 	spelling checker			V2.3
			copyright Harold Z. Bencowitz
			changed most recently 13-feb-87

******************************************************************************

description:

	spell is a spelling checker written in whitesmith's C which
	runs on rt11 and tsx+. it has been tested on v5.3 and v6.01
	respectively. words in the input file are compared to one
	or more dictionaries (files of alphabetized words) and an
	alphabetized list of the unmatched words is sent to the output
	file(s). the output list can list each word as many times as it
	is used or optionally only once. spell can be used as a tool to
	alphabetize a list of words (eg a dictionary) or to produce an
	alphabetized list of the words in a text file without comparing
	to a dictionary.

operating instructions:

	instructions are also explained by the help (\h) command. the
	operator is prompted by '*' from the rt11 command string
	interpretor (csi). in response enter:

     [outfile1][,outfile2][,outfile3]=infile1[,infile2][,infile3][/o][/p]

	where there can be 0-3 output files (can be tt:, lp: etc) and 1-6
	input files. if none is specified the output file default is
	*.spl where the name is that of input file 1. input files 2-6 are
	optional dictionaries (in addition to the main dictionary
	sy:spell.wrd). the default input filetype is .wrd for both the
	file being checked and user dictionaries. note that if a
	device name is specified in the output or input file lists, that
	device is used as the default device for any other files in the
	same list. the default dictionary is sy:spell.dic.
	there are several options which can be combined in any
	combination (except with /h help, the rest of the command is
	ignored). other options are: /d (do not use the default
	dictionary), /u (unique - output file contains unique words ie
	only one copy of each unmatched word), /r (runoff - ignores lines
	in the input file starting with '.', /i (input file is a word list,
	ie one word per line), and /s (shell sort - much
	faster for input files already sorted or nearly sorted, eg a
	dictionary). spell can be used (if no dictionary specified and /d
	option) to convert a text file to an alphabetized list of the
	words contained or to alphabetize a list of words. to alphabetize
	a word list (eg a user dictionary) use options /d/s/i/u.

possible future changes:

	a display in context feature, possibly with automatic dictionary
	updates is under development.

revision history:

	v1.1 completed 16-oct-83	

	v1.2 completed 06-sep-86	minor code changes for compatibility
					with newer versions of Whitesmith's

	v2.0 completed 31-oct-86	a complete revision with most of
					the code rewritten. the basic
					algorithm was completely changed to
	read input file into memory and access dictionaries from disk
	(opposite to v1.2). bugs fixed in input of hyphenated words. bugs
	fixed in handling of runoff files. default output filetype added.
	timing added. quick sort added. faster copy routine added. char/word
	optimized. default dictionaries combined into one. output routine
	changed. new faster input routine to read dictionary files.

	v2.1 completed 11-dec-86	all input/output routines rewritten.
					external sort (merge added). words
	less than 2 characters excluded. enter/leave and other changes for
	improved error handling.

	v2.2 completed 19-dec-86	/i option for faster input of word
					lists. better quicksort. default
	dictionary compressed. bugs in getword() and hgetc() fixed.
	strip out common words in load().

	v2.3 completed 13-feb-87	use of memory altered to use
					all of memory without fixed
	char/word ratio (pointer array grows down from memtop).	bug in
	sp_out() fixed. skip most dictionary word comparisons. bugs in
	hopen() and merge() fixed.

limitations:

	the default dictionary must be on SY: and in compressed format.
	before compression it must be in the user dictionary format.
	user dictionary files must be alphabetized, all in lower case, and
	each word separated by <cr><lf> including the last word. only a - z
	and ' characters may be used. user dictionaries are compressed
	into dictionaries using program compd. the instructions for this
	program are in on-line help. the output list of words (from spell) is
	always alphabetized and can not be obtained in order of appearance.
	uppercase letters are converted to lower case before comparing to
	the dictionaries. words can not be displayed in context. there is
	no limit to the size of the input file and the dictionary files
	other than available disk space. there is no specific terminal
	requirement.

installation and building:

	compile spell.c and link with hclib.obj (my c library) and clib
	(whitesmith's c library). to install place spell.sav wherever you
	wish and the default dictionary (spell.dic) on the systems disk
	(sy:). spell.dic can be replaced by any ascii file list of words
	containing only alphabetic characters (a-z) in lower case and the
	apostrophe character (') with each word separated by
	carriage return-line feed (ie '\n') and in alphabetical
	order if this list is compressed using program compd.sav.
	it is probably easier to use this default dictionary and
	either add words to it as needed or else put additional words
	into special user dictionaries (format as above but not compressed),
	although if a large user dictionary is to be used it may be faster
	to add the words to spell.dic since a compressed dictionary is read
	faster.

implementation notes:

	the program sizes available memory at run time and uses all
	of it to store words. as the input file is read it is parsed
	to remove all characters but ' and a-z. upper case letters
	are converted to lower case. hyphenated words at the end of
	a line are joined. hyphenated words within a line are treated
	as two separate words. runoff '.' commands are ignored if
	option /r is selected. the input words are loaded into memory.
	unless /d is specified each input word is compared to a list
	of the most common english words and not loaded if matched.
	words of less than 2 characters are also not loaded. the number
	of words and characters loaded will not include those rejected
	because of a match or one character. if the input file is a
	word list it will be loaded faster with the /i option which
	uses getword() instead of parsing the file a character at a time.
	the words are then alphabetized. quick sort is used but shell sort
	(option /s) is twice as fast if the words are close to
	alphabetical. if option /u is selected, duplicate words are
	removed. remaining words in memory are then compared to the
	default dictionary (unless option /d selected) and any other
	dictionaries specified. the routine which reads the dictionary
	files makes very strict assumptions about the format of the
	file as described above. dictionary files must only contain
	a-z [lower case], ', <cr>, and <lf>. unless /d or /i is
	selected, the input words are compared to a list of common words
	and rejected if matched before loading them into memory.
	spell can be used as a tool to prepare dictionaries since its
	output is in the correct format. the output is written to a
	temporary disk file. this file is copied to the specified
	output files or renamed if on the same disk.