                        SCRABBLE, ANYONE?

                          John Krahmer


                          INTRODUCTION
     My wife is a word game wizard.  She can regularly beat me at 
Scrabble, Boggle, Anagrams, Crossword Puzzles or any other game 
requiring the manipulation of letters.  I don't even bother 
watching Wheel of Fortune with her; when she's in the room, the 
show doesn't even have a Neilson rating.  Perhaps because of this 
untarnished record of consecutive losses, I occasionally gave 
some thought to the idea of a computer program that would give me 
enough help to at least be a respectable opponent.  She never had 
any objection to my use of such a program, probably because of 
her boredom with playing against an inferior, but I always ran 
into the stumbling block of combinations and permutations.


                           THE PROBLEM
     It is the theory of combinations and permutations that says 
a poker player has only one chance in 2,598,960 of getting a 
straight flush in the first five cards that are dealt (and this 
is without regard to the order in which the cards are dealt).  As 
applied to words, the order of the letters becomes important as 
well and the number of permutations takes a quantum leap.  Not 
only would the simple generation of the permutations take a long 
time but, since not all sequences of letters are also words, 
there is the added difficulty of distinguishing words from non-
words.  I toyed with the idea of using a spelling checker to 
identify legitimate words from a list of letter combinations, but 
the number of possible permutations made that idea ridiculous, 
not to mention the difficulty of modifying a spelling checker to 
find words that were spelled correctly instead of incorrectly.  
The problem seemed insurmountable until I happened upon the 
ingenious concept of "word signatures."  (See Scientific 
American, Vol. 251, No. 4, pp. 20-21 (Oct. 1984)).


                          THE SOLUTION
     A "word signature" is simply the alphabetical re-arrangement 
of the letters contained in any given word.  For example, the 
word signature for the word, "reaction," is "aceinort."  As it 
happens the word, "creation," has the same word signature.  Thus, 
with the combination of letters forming the word signature 
"aceinort," at least two legitimate words can be made by using 
all eight letters.  If less than eight letters are used, the 
words "certain, nectar, action, carton, ornate, ration, erotic, 
canoe, ocean, enact, trace, tonic, cane, race, care, iron and 
torn" can be formed (among others).

     The idea of word signatures meant that I did not have to be 
concerned with all possible combinations after all.  I only had 
to know a list of letters in alphabetical order to determine any 
word or words that could be made with those letters, provided I 
had a database of words linked to the word signatures.  It began 
to look like dBASE might be the answer to the problem of creating 
a viable Scrabble aid.  By combining a data base of legitimate 
words with an index keyed to word signatures, I could take any 
given combination of letters and quickly access the words by 
entering the word signature and using the dBASE FIND command.  My 
difficulty now became how to create a database of words without 
going through the drudgery of typing in a long list of words and 
word signatures.


                      CREATING THE DATABASE
     The simplest way of doing this seemed to be by using a 
spelling checker to output the contents of its dictionary to a 
disk file but, alas, my spelling checker would only send its 
contents to a printer, not to a disk file.  Undaunted by this 
limitation, I bought some plugs and cable from a local computer 
store and wired my two Osborne CP/M machines together and 
"printed" the dictionary from one computer to a disk file on the 
other (MS-DOS users might be able to use the redirection 
capability of DOS to do the same thing on one machine).  The 
resulting disk file was then read into a dBASE II .DBF file with 
a single field named "Word" by using the Standard Data Format 
(SDF) option.  

     Because my purpose was to create a database that would be 
useful in playing Scrabble, I arbitrarily DELETED all words of 
more than eight letters because a player normally has only seven 
letters in a "hand" and can usually link those letters to only 
one letter already played on the game board.  There seemed little 
need for words of more than eight letters.  Words of three 
letters or less were also DELETED because they are easy to find 
and there was no point in unnecessarily increasing the size of 
the database.  The database ended up with 12,759 words.  

     Since the game of Scrabble is really a varient of the class 
of word games described as "anagrams," the name ANA12759.DBF 
suggested itself as a good way to identify the file as an anagram 
database with a specified number of words contained as individual 
records.  The name could then be changed to accurately reflect 
the addition of more words and help insure using the most current 
version of the database.  


                 GENERATING THE WORD SIGNATURES
     As the next step, I APPENDED the words to another dBASE II 
file containing two fields with a length of eight characters 
each, one named "Word" and the other named "Key," and ran the 
12,759 words in the resulting .DBF file through a dBASE II 
command file to generate the word signatures and REPLACE the 
"Key" field in the data base with the proper signature.  This 
command file appears below in Figure 1 under the title 
SIGSORT.CMD.

     While the command file is fairly short, it has some 
interesting aspects because of the nested loops that perform the 
internal sorting of letters within strings, a technique that 
might be useful for other applications.  Following the standard 
lines to clear the system, set defaults and use a named database, 
the command file stores the ASCII value of a capital "A" (65 
decimal) to a memory variable called WHATCHAR and the length of 
the word to be sorted to a memory variable called LENGTH.  A 
space character was stored to the memory variable MKEY just to 
establish that location.  I added a non-functional line at that 
point to write a screen message that all variables were loaded 
just to keep track of the processing sequence (debugging).  The 
next several lines run through two nested loops that do two main 
things.  First, so long as the length of the word is not 
exceeded, the command file keeps testing each letter in the word 
in sequential order against the letter stored in WHATCHR to 
determine if the letter appears one or more times in the word.  
If the letter does appear, it is stored in the first non-blank 
location in the MKEY memory variable.  Second, when the length of 
the word is exceeded, the program increments the ASCII value of 
the search letter by one (i.e., "A" increments to "B" by adding 1 
to the decimal value) and the sequential search is performed 
again with the last search being for the letter "Z" (90 decimal).  
To speed up the processing a little bit, a CASE statement is used 
to test the word with the dBASE @ function (i.e., "does this 
character appear in this string" function) and a negative 
response immediately causes the command file to move on to the 
next letter in the alphabet.  The last part of the program takes 
the rearranged word and replaces the word signature with the 
alphabetized letters contained in MKEY.

     This command file will be useful if you create your own word 
database and need to generate the necessary word signatures.  It 
will also be useful if you want to add large numbers of words to 
the original database in ANA12759.DBF and need to sort the 
letters of each word before you APPEND them to the original 
database.  (The original database and the programs described in 
this article are available on disk in the FOG Library.  One small 
word of warning is necessary about the use of SIGSORT: It takes 
about six seconds to sort each word into a word signature.  If 
you multiply six seconds by several thousand words, it is readily 
apparent that the processing time must be measured in hours for a 
sizable database.  Fortunately, the processing time can be 
calculated in advance and a database can be split into separate 
parts for overnight processing, with the parts reassembled into a 
whole when all of the word signatures have been generated.  It 
took about 24 hours to generate the signatures for ANA12759.DBF.  
An interesting experiment would be a comparison of processing 
times for SIGSORT with compiled versions of dBASE .CMD or .PRG 
files on different hardware configurations.


           USING THE DATABASE AND THE WORD SIGNATURES
     The last step was to write a command file to USE ANA12759 to 
FIND each word signature that would form a word contained in any 
given combination of letters.  This command file appears below in 
Figure 2 under the name ANAGRAM.CMD.  Oddly enough this was the 
easiest part of the job because the command file was highly 
repetitive in its search logic.  The beginning point, of course, 
was to index the database on KEY (the word signature) to take 
advantage of the speed of the dBASE FIND function.  I originally 
did this directly from the dot prompt, but later modified the 
command file to generate the index file automatically if it is 
missing when the command file is run.  Because the database is 
one that is likely to grow in size by the addition of more words 
and word signatures over a period of time, the only lines that 
need to be changed to reflect these additions appear in lines 4 
and 5 of the command file.  If you have added words to the 
database without having the index file in USE, just change the 
name of the database and erase the outdated index file.  The 
command file will ask if you want a new index created and a "Yes" 
answer will generate the file for you.

     After selecting the database and index file for USE (or 
generating the index file if needed), the command file goes 
through a series of error traps and tests.  The first IF/ENDIF 
tests for the existence of the database itself, primarily to be 
sure the name has not been changed from that expected by the 
command file.  The next IF/ENDIF tests for the index file and 
asks if the user wants to create the index file if it is not 
found (again, a simple failure to change the name of the index 
file may be the reason the file is not found).  Following these 
two file verifications, a long command string is stored to the 
memory variable TEST to determine if the letters that are entered 
are already in alphabetical order.  The program runs a little bit 
faster if the letters are entered alphabetically, but this is not 
critical to the operation.

     One more error test checks the length of the entered string 
to be sure it is between four and eight letters long, to be 
compatible with the word lengths contained in the database.  At 
this point, the program begins a word signature search if the 
search string is already in alphabetical order or runs through an 
alphabetizing loop if the letters are not already alphabetized.

     The search pattern itself is simply a process of storing 
each letter in the search string to a separate memory variable 
(with periods used for padding) and then using the FIND command 
to test each possible pattern to see if the database contains a 
word formed from the permutation.  For ease of calculating 
patterns, these memory variables are named "A", "B", "C", etc.  
This violates the usual dBASE custom of naming memory variables 
with meaningful names, but the need to manipulate the possible 
permutations made the use of such names useful in the context of 
this program.  The search sequence itself is tedious, but simple.  
The search for a string of eight letters is illustrative.  If the 
eighth letter is not a period (causing an abort of that search 
sequence), the program attempts to FIND a word signature that 
matches the letters contained in memory variables "A" through "H" 
by combining those variables in a macro sequence (i.e., 
&A&B&C...etc.).  If a match is found, the word formed by the 
letters in the word signature is displayed.

     The program then proceeds through a search for seven letter 
words by doing a FIND for all combinations of seven letters that 
can be derived from eight letters, provided that the seven 
letters are kept in alphabetical order.  It is at this point that 
the concept of word signatures proves its worth in word searches.  
Because the order of the letters is important in forming a word, 
a complete set of permutations would require fifty-six searches.  
By using word signatures, the number of searches is reduced to 
eight.  As a look at the program will demonstrate, this reduction 
factor becomes even more important as the searches for six, five 
and four letter combinations proceeds.  Without the underlying 
theory of word signatures, the program would quickly become 
unmanageable.


                         PROCESSING TIME
     If this program does nothing else, it certainly proves the 
speed of the dBASE II FIND command.  A search for all possible 
words of four to eight letters that can be formed from the word 
signature "aceinort" takes about two and one-half minutes on an 
Osborne 4 with a Westwind/Trantor hard disk using the ANA12759 
database.  I do not have access to other equipment for comparison 
tests, but such comparisons would be of interest.  Compiled 
versions of dBASE would no doubt be much faster.  The search time 
will increase when more words can be formed from a given word 
signature because the program must perform a screen write as an 
intervening step between searches.

     Pascal experts might be able to use the idea of word 
signaures to develop a "stand-alone" program that performs 
similar functions.  I wonder how fast that would be?


                           ODD RESULTS
     When running the program, you will probably find that the 
same word sometimes appears more than once in the list of 
possible words.  This is not an error, but a function of the 
search process that treats a letter in a particular position in a 
word signature as being different from the same letter in a 
another position in the same word signature.  For example, the 
word "data" would show up three times if the search string were 
entered as "database" because the word signature is "aaabdest" 
and a search for four letter words would sequentially identify 
the various "a" combinations.


                          PROGRAM USES
     Besides being a useful tool for Scrabble players like me who 
tend to read words like "aceinort" literally, the program is 
great for solving anagram puzzles.  It has also been of interest 
in testing letter combinations derived from various word games.  
Merely having a database composed of words has been useful in 
finding words where letter locations are important by using the 
the dBASE "LIST for $(WORD,<POSITION 1>)="?" .AND. 
$(WORD,<POSITION 2>)="?" .AND. etc.,"  While this is fairly slow, 
it does solve Wheel of Fortune and Hangman puzzles in a 
reasonable time.

     Beyond such miscellaneous uses, the program has been a lot 
of fun for my eight year old daughter (who happens to be named 
Ana, perhaps the best reason of all for naming the database 
ANA12759).  She loves reading and playing word games and has 
enjoyed using the program to solve anagrams and to find words 
that can be made from different letter combinations.  I have no 
way to measure a difference, but I think her spelling has 
improved by using the program as a game.

     One more use, equally difficult to measure, is the simple 
fact that some programs allow people to enjoy themselves in ways 
that were difficult or impossible before personal computers were 
generally available.  There is something intriguing about typing 
in a random sequence of letters and letting the computer list the 
words that can be formed from the combination of letters.  I have 
a suspicion that semantics, linguistics analysis and encryption 
theory can somehow be tied into the concept of word signatures.  
It is an idea that deserves further exploration.

     I hope you enjoy the program.  By the way, my wife can still 
beat me at Scrabble, but the scores are closer.
