CONVERT, COMPRESS AND RESTORE UTILITY USER MANUAL The Convert, Compress and Resore Utility (CRU) allows you to: 1) Converts files between fixed and variable length records. 2) Converts carriage control types (FORTRAN, LIST, and NONE). 3) Converts between 8 column tabs and spaces. 4) Truncates or padds records. 5) Removes trailing blanks and tabs from records. 6) Selects a range of pages for output. 7) Compresses (or restores) records. CRU is designed like a DEC utility, and can be used interactively or by means of an command file. CRU allows up to 4 levels of indirect indirect command file specification. 0.1 CRU COMMAND FORMAT The general format for entering CRU command lines is: outfile/sw=infile/sw outfile infile Standard DEC file specification, which is of the form: dev:[ufd]filename.filetype;version If the output file is not specified, it defaults to the same file name and type as the input file. An input file MUST be specified. /sw One of the switches described in the next section. 0.2 CRU SWITCHES Switch Description Default -------- ----------------------------------------------- --------- /DE Deletes file after processing is complete. It /-DE is valid only for the input file. /UC Uncompresses records which were previously /-UC compressed by this utility. This is also valid only for the input file. /CC:typ Specifies the carriage control type desired for /CC:LIST the output file. Valid types are LIST, FTN, and NONE. /CM[:n] Uses data compression on the output file. /-CM Data file compression uses imbedded control bytes to contain a character count, and a bit flag which indicates whether the byte following is repeated, or whether the byte(s) are data values. Any time there are three or more successive identical bytes in the uncompressed record, they are converted to a control byte plus one byte which contains the value. The optional parameter n may be used to disable compression of the first n characters of the record. The default for n is 0. /CS Enables the conversion of spaces in the record /-CS to the equivalent 8 column tabs, whenever such conversion would result in fewer characters in the record. /CT Enables the conversion of 8 column tabs in the /-CT record to the equivalent spaces. This switch is complementary to the /CS switch. /FO:rtyp Specifies the record type of the output file. /FO:V Legal values for rtyp are F (fixed length) and V (variable length). When fixed length output is selected, all records are padded or truncated if necessary, regardless of the state of the /PD and /TR switches. The record length may be specified (ex /FO:F:132), or if not, the length of the first input record is used. /LO Enable the logging of the file being processed, /-LO and a report of blocks read and written after processing is complete. Switch Description Default -------- ----------------------------------------------- --------- /NC Disables any carriage control conversions. By /-NC default, the converts carriage control based on the attributes of the input file, and desired attributes of the output file. /PD:o:n Causes any records shorter than n characters to /-PD be padded using the character specified by o (in octal). The default for o is 40 (space), and the default for n is 80 characters. This switch is ignored for fixed length output files, except for the pad character. /PG:m:n Enables page selection, such that only pages /-PG from m through n inclusive are output. The def- ault for m is page 1, and if n is not specified, all pages from m to the end of the file will be output. /PS:n Used in conjunction with the /PG switch to /-PS enable page incrementing by counting lines. The page size (n) by default is 66 lines. /RT Enables the removal of trailing blanks and tabs /-RT from output records. This is valid only for variable length output files. /SP Causes the output file to be spooled when the /-SP processing is complete. /TR:n Causes any records longer than n characters to /-TR be truncated. The default for n is 132. This switch is ignored for fixed length output files. 0.3 OPERATION The various options which CRU can perform on the records are applied in a specific order; and should be considered when determining the desired output. 0.3.1 UNCOMPRESS The first action taken is to uncompress input records if necessary. Uncompression must be applied first since all other functions assume that the record is not compressed. 0.3.2 CONVERT CARRIAGE CONTROL Carriage control is converted next, provided it is not deselected by the /NC switch. Carriage control conversions affect the record length and therefore must be completed before the pad or truncate functions. The goal of carriage control conversions is to preserve the appear- ance of a text file when printed, while converting characters where necessary in the records. For example, if a record from a file with FORTRAN carriage control begins with a "0", the printed output is double spaced. To convert this record to LIST carriage control, a null record is first generated to skip a line, and then the "0" is removed from the front of the original record and output. There are currently restrictions on converting FORTRAN records with carriage returns or line feeds inhibited ($ and + control characters) to LIST carriage control, and similar restrictions converting NONE (imbedded) carriage control to LIST carriage control. The restriction does not allow overprinting to be converted to LIST carriage control. (This restriction could be removed with a little more work.) The carriage control type of the input file is determined from the attributes field of the input file FDB, and the appropriate conversion determined. Should you have a file which has an incorrect value in the record attributes field, the file should first be copied using the /NC switch (to inhibit conversions), into a file with the correct attributes. 0.3.3 PAGE SELECT If the page switch has been used, the record is examined for page feeds. FORTRAN records are checked for records beginning with a "1", and all records are checked for the presence of form feeds anywhere within the record. In addition, when the page size option is used, the page number is incremented each time the line number is greater than the number of lines specified for a page. Page selection criteria is applied at this point. If the current page is not to be output, no further action is taken with the record. 0.3.4 CONVERT TABS TO SPACES 0.3.5 CONVERT SPACES TO TABS Since conversion of 8 column tabs to spaces, or the reverse conversion of spaces to 8 column tabs affects the record length, these operations are applied before truncating or padding the record. To convert tabs to spaces, the input record is examined for tabs. When a tab is found, enough spaces are inserted in the output record such that the last space inserted is at the next tab position. To convert spaces to 8 column tabs, the record is examined at each tab position (8,16,24,...) for a string of 2 to 8 consecutive spaces ending at that tab position. If such a string is found, a single tab is substituted for it in the output record. 0.3.6 TRUNCATION Truncation limits the maximum length of the record. For variable length records, the maxumum length is specified with the truncate switch. For fixed length output, records are always truncated if necessary, and the truncate switch is ignored. 0.3.7 PAD RECORD Using the pad switch establishes a minimum length for the output record, and the character to use to append to any records which are shorter than the minimum. For variable length records, the minimum length is specified with the pad switch. Fixed length output records use the record length specified with the file type switch, and ignore any length specified with the pad switch. Note that information can be stripped from the end of a variable length record by specifying a truncate length which is shorter than the pad length. For fixed length output files, processing is completed at this point, since the remaining options would change the length of the record in many cases. 0.3.8 REMOVE TRAILING BLANKS In the absence of imbedded control characters, trailing blanks and tabs are wasted characters. Since truncating or padding a record both can modify characters at the end of a record, this function is performed after truncation and padding are completed. 0.3.9 COMPRESSION After all other processing is completed, the record is then compressed (if compression is specified). The compression algorithm segments the record into two types of sub-strings. These are strings which repeat the same character three or more times (contain no other characters), and all other strings. In the output record, the sub-strings are stored with a a control byte which identifies the type of sub-string, and its length (up to 127 characters). For strings which repeat the same character, the output is represented in two bytes; one control byte and the following byte which contains the value to be repeated. The usefullness of this compression therefore depends on how much of the record is comprised of this type of sub-string. 0.4 RECOVERING SPACE FROM TEXT FILES For whatever reasons, computer output often contains large portion of blanks. There are two options which can minimize the number of blanks left in a variable length record. The first option converts successive blanks when possible into 8 column tabs. (Most terminals and printers support 8 column tabs. If not, they can be set to NOHHT under RSX.) To convert, there must be a blank at the end position of a tab, and to be worthwhile, there must be at least one blank preceeding it. The second option removes spaces (and tabs) from the end of record, where they serve no function. The net result of these two options is therefore to reduce the number of blocks required store a given text file, without affecting its format when printed. To get some idea of whether your files will benefit from this, you can use the /LO switch which will report the number of blocks read and written. >CRU /CS/RT/LO=TEST.TXT TEST.TXT Blocks read: 143 Blocks written: 121 > 0.5 USING DATA COMPRESSION Data compression is intended to reduce the amount of storage media required for archives. It could also reduce the transmission time of files over communication lines (assuming your communications package does not already perform a similar compression). Text files, which have been processed to convert spaces to tabs and remove trailing blanks, in general do not benefit much from this data compression. The best candidates are fixed length records with blank fillers, partially blank fields such as names and addresses, fields with leading zero's, nulls, etc. There is no restriction on data types. To compress: >CRU PERSON.CRU/CM/LO=PERSON.DAT PERSON.DAT Blocks read: 624 Blocks written: 180 > To restore: >CRU PERSON.DAT/FO:F/LO=PERSON.CRU/UC PERSON.CRU Blocks read: 180 Blocks written: 624 > 0.6 CRU ERROR MESSAGES CRU -- Bad file name Explanation: File name syntax entered incorrectly. User action: Check file names and enter correctly. CRU -- Bad device name Explanation: Device name syntax entered incorrectly. User action: Check device names and enter correctly. CRU -- Command Syntax Error Explanation: Command entered incorrectly. The incorrect portion of the command will be printed following this message. User action: Check the command line entered. Correct the syntax and try again. CRU -- Device not in system Explanation: The driver for the device specified is not present in the operating system. User action: Check the device requested. Enter a valid device or bring up an operating system which supports the desired device. CRU -- Device offline Explanation: The device specified is offline. User action: Bring the device online and mount it, or enter another device. CRU -- Failed to spool output file Explanation: The PRINT$ routine failed to spool the file. User action: Check for print queue installed correctly. Print file using standard procedures. CRU -- Fixed length output incompatible with compression Explanation: When compessing a file, the output file must contain variable length records, even though the input file contains fixed length records. User action: Do not specify file organization when compressing a file. CRU -- Get Command Line - Bad @ File Name Explanation: The indirrect command file name was entered incorrectly. User action: Enter the command file name using correct file name syntax. CRU -- Get Command Line - Command Line Too Large Explanation: Command line exceeded 80 characters. User action: Reduce size of command line. CRU -- Get Command Line - Failed to Open @ File Explanation: CRU could not open the specified command file. User action: Check the command file. Enter the correct command file specification. CRU -- Get Command Line - I/O error Explanation: An I/O error occurred getting the next command from the command file. User action: Check the command file. Retry the operation. CRU -- Get Command Line - Max @ File Depth Exceeded Explanation: Too many nested command file levels. User action: Reduce the number of nested command file levels. CRU -- Illegal carriage control type Explanation: The carriage control type specified was not LIST, FTN, or NONE. User action: Enter carriage control correctly. CRU -- Illegal file organization type Explanation: The file organization entered was not F or V. User action: Enter a valid file organization type. CRU -- Illegal pad/truncate length Explanation: The pad or truncate length specified was negative or greater than the maxumum available record size. User action: Enter a valid pad or truncate length. CRU -- Illegal record size specified Explanation: The record size entered for a fixed length file is negative or greater than the maximum record size available. User action: Increase record size if necessary (see record too large) or enter a valid record size. CRU -- Input records are fixed length Explanation: Uncompress was specified for an input file which contains fixed length records. User action: Check input file name. Enter correct input file. CRU -- More than one Input or Output file Explanation: More than one input or output file was specified on the command line, and CRU does not support multiple input or output files. User action: Restrict input and output to single files. (Use PIP to merge files if that is desired.) CRU -- No input file name Explanation: No input file name was specified in the command. User action: Specify the desired input file. CRU -- No such file Explanation: The specified input file could not be found. User action: Check for proper device, uic, and file name for the input file. CRU -- Privilege violation Explanation: The device specified was not mounted or the user has insufficient privilege to open the requested file. User action: Mount device or get proper privilege. CRU -- Record too large for available buffers Explanation: A record was encountered that was more than 512 bytes in length. User action: Edit the symbol UBSZ in CRU.MAC and increase to the necessary size. Re-assemble and build CRU. CRU -- Wildcard File Names Not Allowed Explanation: CRU does not support wildcard file names. User action: Enter the complete file name desired.