CIPHER: an encryption program Page 1 B. Z. Lederman CIPHER is a program which will encrypt and decrypt alpha-numeric data by adding a bias to every character. The bias is generated by a pseudo-random number generator which is initialized by a user selected key number. As the bias is different for each text character, the text is very effectively scrambled, and can be retrieved only by subtracting the bias. This is done by the same program: the user enters the same key number used for encryption, and selects the decode function. As presently configured, there are one billion key numbers for a given program configuration. The purpose of an encryption program is to take data and place it in a form where it's meaning cannot be easily determined by someone other than those for whom the data is intended. An effective encryption scheme is one which does not use a large amount of resource to implement (in this case resources would include time, CPU processing time, and storage space), but which results in an encrypted form which would be very difficult to break without the proper key. As no cipher is absolutly unbreakable, the goal is to make breaking the cipher more expensive (in time or resource) than the data is worth. The method selected here of adding a pseudo-random bias seperatly to each character meets these goals well: it can be done quickly by computer, it does not increase the size of the data stored, and the resulting data cannot be decoded by normal means. There are several methods for attempting to break a code. The most obvious is to count the distribution of characters, and compare this with the distribution of characters in normal english (e is the most common letter, t the next most common, etc.): this will not work with this method of encryption. Because the bias added comes from a pseudo-random number generator, and because such numbers are evenly distributed over the number range, the frequency distribution of the original text is lost. This can be seen by encrypting a message which consists of only one character, repeated many times. The encrypted message will consists of many different characters, all occuring about the same number of times. A more sophisticated method is to look for pairs of characters: for example, the combination "th" occurs frequently in english. With a pseudo-random number sequence, this pairing of letters also is lost, and cannot be used to break the code. (This is true if the sequence of numbers produced by the pseudo-random generator does not repeat itself until a long sequence of numbers is produced. All such generators eventually repeat, but if the repitition period is long compared with the length of the message, it cannot be detected. If the sequence were to repeat many times within a message, the same sequence of numbers might fall on an identical pair of letters, giving a clue to the method of encryption. Since good generators produce strings which are several thousands of numbers long before repeating, this is not likely to be a problem.) An unsophisticated method is by brute force: if the person breaking the message knows (or can guess) the method of encryption, then they could try every possible key, one at a time, and look to see if decodes a reasonable message. Even if the decoder has the proper pseudo-random number generator (and there are theoretically an infinite number of them), there are billions of possible keys for each generator, so brute force is not likely to work. CIPHER: an encryption program Page 2 B. Z. Lederman The success of the encryption scheme is obviously dependant upon the quality of the pseudo-random number generator. Note the "pseudo" qualifier: what is desired is to generate a sequence of numbers which are as evenly distributed over a range as possible, and where it is little correlation between adjacent numbers (i.e., the "pattern" or sequence of numbers does not repeat often); but it is also neccessary for the sequence to be predictable so that it may be repeated with the same key, otherwise the data could never be decoded. The closer the sequence comes to even distribution, and the longer the string of numbers before the sequence repeats, the more difficult it will be for the code to be broken. The generator supplied with Fortran-77 is quite good in both respects, and it can easily be replaced with a similar generator for additional security, or if the same scheme is to be implemented on many different computer systems or languages, where the generator supplied with each system or language would differ. Random number generators which are initialized by time of day or some other randomizing event are not acceptable, as the decoder would not be likely to hit the same time of day when decoding. The code is most likely to be broken if the person trying to decode a message has access to the same encryption program as was used to code the message, and can guess the correct key. If this program is being used to store department or company confidential information (such as personnel records, budget plans, etc.) on a system and it is intended to keep such information secure from other company employees, then the primary source of security is to keep the key number secure. Persons responsible for encrypting data should not use their telephone numbers, pager numbers, birth date, social security number, zip code, or any other easily guessed number as a key, as the employees from whom the data is supposed to be secure can try all of these obvious choices. Unfortunatly, all easily remembered numbers (such as 123456, 102938, 987654321, et.) should probably also not be used. There is also the problem that the key number should probably be recorded somewhere (as it will be just about impossible to retrieve the data without the key), but not where it is easily obtained by un-authorized personnel. This problem is not unique to data encryption, but also applies to such things as issuing passes to secure areas, and keys (the kind for doors), where ease of access must be balanced against security. With data encryption, the problem is potentially more serious, however. If you lose a door key you can call a locksmith (or if you are really desperate, you can break a window): but if you loose the key for this encryption program, it will be very difficult to retrieve the data, for the reasons described above. An alternative to adding the pseudo-random number to each character is to Exclusive-OR the number with the character. This potentially could be used to encrypt non-alphanumeric data. Another advantage is that the numbers could be computed in a range which has a maximum of 7 bits: this would allow the encryption of 7 bit characters to result in characters which are still 7 bits long, for transmission over lines where the 8th bit is required for parity. This also means that the method of encryption and decryption are identical, as the exclusive-or operation is symmetrical. Preliminary testing shows it should be as effective as the addition method, CIPHER: an encryption program Page 3 B. Z. Lederman but further testing is needed to see if it results in as random a character distribution in the encrypted text as the addition method.