Subject: Object file symbols limited to 8 characters [+FIX] (#172 - #15 of 19) Index: cc,as,ld,ar,ranlib,nm,nlist,adb,... (2.11BSD) Description: For some time now (seem like eons;-)) the object file format used by Unix for the PDP-11 has restricted symbols to 8 significant characters (actually 7 due to the C compiler prefixing symbols with a leading tilde (~) or underscore (_)). Aside from the "creative constraints" this imposes on the programmer there was the continuing problem of 'name collisions', especially when porting applications from machines whose object file format permitted longer symbol names. Numerous workarounds have been employed in the past. The most common one relied on a combination of a name collision detection program ('shortc') and the flexname capability of the C preprocessor ('cpp'). This served to mask the problem while making debugging difficult due to mangled/synthetic symbol names. Repeat-By: Attempt to compile the following program: int this_is_a_long_name; int this_is_a_long_name_too; main() { exit(0)}; Fix: This section is repeated in each of the 19 parts which make up the update kit. You should read it perhaps once or twice, but then skip over it (how to do that is mentioned below). Taking a "hint" from the a.out(5) man page: "The compiler will note name collisions when they occur within a single file... There is really little that can be done about this. Some thought is being given to modifying the loader to flag detectable collisions, but the real solution would be to change over to the 4BSD a.out format. This would involve modifying the compiler, assembler and adb and then simply porting the 4.3BSD ld, nm, ranlib, strip and nlist. Or perhaps simply porting the entire 4.3BSD suite might be best ... Anyone interested in a project?" This I have done. No more volunteers for the project need apply ;-) The new limit on symbol length is 32 characters! There is still a limit (but it is _much_ more reasonable now) simply because of address space constraints - it needs to be possible to hold at least one of the 'symbol' or 'string' tables in memory in many cases (nice to hold both, but - i know, get a 486;-)). It must be noted though that it is almost trivial now to raise the limit if that is desired - the programs which need to know the maximum length of a symbol string all have an easily changed #define statement now (usually MAXSYMLEN but there are a couple exceptions). The 'string table' format itself doesn't care how long the strings are. The actual a.out format won't have to change again to accomodate a higher limit on symbol name length! The "string table" object file format has been ported and all the necessary changes made throughout the entire system. The changes were *massive* and widespread. Programs affected of course included the assembler and compiler. Other programs affected were anything which accessed a symbol table entry either via nlist(3) [ps, pstat, fstat, vmstat, etc] or by reading object files [ld, ranlib, nm, adb, strip, etc]. The actual changes to the compiler and assembler were minor because those programs had already been modified earlier (updates #142, 143, 152, 153). The compiler only needed to have the maximum size of a symbol name raised. The assembler already knew how to generate 'string table' object files - all that needed to be done in 'as' was to flip a bit telling it to generate the new object format instead of the old style. +++++++++++++++ And now for a bit of a narrative about what was done. The detailed instructions for applying this part (#15 of 19) of the update kit follow the 'story' below. This started out as a semi-organized accounting of what was done but then devolved into a semi-rambling tale due to the sheer bulk of the changes. You can skip to the details for applying #172 by searching for the string "=======" below - this header is replicated in all parts of this kit. +++++++++++++++ Alas, the remaining changes were not so simple. Complete replacements for ranlib(1), ar(1), nlist(3) were ported from the Net-2 release. Other programs such as symorder(1) and two new programs 'symcompact' and 'strcompact' (used to compress/compact symbol and string tables) were written from scratch. Perhaps the two hardest parts of the whole effort were rewriting the linker 'ld' and making *large* modifications to the debugger 'adb'. This was a very difficult job. 'ld needed to scan new style ranlib archives, as well as using the "virtual memory" facility (the 'libvmf' routines posted earlier) for symbol table management and so on. 'adb' was a MESS (having been written in a pseudo block structured macro language). Since the new symbol table entry could be so much larger than the old it was no longer possible for adb to hold as much of the symbol table in memory - an alternate method took a while to develope and implement, more on that in the patch which deals with adb (actually the changes to adb are so large there are two substantial parts of this update kit just for adb!). After the basic programs (ar, ld, ranlib, etc) were running the system had to be completely recompiled from sources, beginning with the object libraries. After those were done the process of recompiling the rest of the system could proceed. Guess what happens when you recreate libc.a with a buggy linker? Yep - the system is rendered useless until backup copies of everything can be reloaded. Don't let this happen to you - be sure (and i'll repeat the point later) to back up the system (or at least key executables and .a files) before installing this upgrade. In all there were about 330 files modified during the change of object file format. Some of these were not directly related to the new object file format. There were a number of (obsolete) references to "BSD2_10" lingering in the system. Those have been replaced with "pdp11" and the 'BSD2_10' define has been removed from the C preprocessor (cpp). DO NOT use 'BSD2_10' to #ifdef pdp-11 sensitive code, use "pdp11" instead. During the recompile of the libraries a fairly large number of "shortened" names were lengthened - these included syscall routines such as "gethostname" which no longer had to be munged into "gethname". Also a surprising number of typographical errors were uncovered (mainly in the Fortran libraries) where an extra character (beyond the 7th character) was left off or accidentally added. These were all fixed and eventually, after a couple evenings, the libraries were built and installed. After the libraries were done it was the application programs' turn to be recompiled. This took the better part of a couple weeks to finally make it thru due to (as it turned out) the iterative nature of the task. A symbol would come up undefined and have to be tracked down exactly where the wrong definition/use was coming from. Finally, however, the task was done and it was time to move on to the kernel. The kernel proved to be suprisingly easy - no real complications arose except when it came time to reboot, a bug had been introduced into 'autoconfig' (who uses 'nlist' to scan the kernel symbol table). Ouch! That was another couple late nights. Since the compiler supports unsigned longs now a number of small changes which ifdef'd 'u_long' to 'long' were removed. REMEMBER - you need to recompile 'autoconfig' and install it before rebooting the new kernel ;-) The performance of 'ps' though (and anything else which used nlist(3), 'fstat', 'w' are good examples) was unacceptably slow. So, amidst other delays (real work, the earthquake - which almost tossed the disc drive to the floor, etc) the "symorder" program was written (with ideas borrowed from the Net-2 version). The symorder(1) program rather insists on holding both the symbol and string tables in memory - this was a problem (or could be if the kernel symbol table grows much more) so two new and original programs were written: 'symcompact' and 'strcompact'. The first program compacts the symbol table by removing 'register' local variables (they're of no use to anyone - the debugger doesn't/can't do anything with them) and redundant global text symbols (symbols in an overlaid program which are in the root segment do not need both the '~' and '_' symbols present). The second program 'strcompact' is one that any 'string table' based object file system can use. It implements "shared strings" for symbols - if a program has many references to 'error' as a local symbol, why store the string 'error' more than once? Simply store one instance and then update the symbol table entries to all point to the same string! Using both 'strcompact' and 'symcompact' on the /unix image resulted in a file that was 15kb smaller. Running 'symorder' then puts the most frequently used symbols at the front of the symbol table, the performance of 'w', 'pstat', and other programs which nlist(3) the kernel was now acceptable. Some of the parts of this kit are large. The large patch files have been split into pieces which the 'patch' will handle, other parts (the replacement 'ar' sources) were left as a single 'shar' file rather than split them up. Each part of this kit consists of: a 'patchfile' - this is used with the "patch" program to update files. an optional 'script' - this is run ("sh script") to perform initialization, remove files, create directories and so on. an optional 'new.sources' - this is a "shar" file containing complete sources for a program. ALL pathnames are _absolute_ - this way you do not have to "cd" around the system, you should be able to apply all the patches while you are in /tmp (or /usr/tmp - wherever you have the most free space). Be sure that you have at least 40mb free on /usr before rebuilding the system - if you do not then building in stages will be necessary. Part 19 contains the detailed instructions for rebuilding the system _after_ the previous 18 patches have been applied. The patches (#158 thru #175) should be applied in order following the directions in each part. DO NOT recompile anything once the patching has begun until requested to do so in part 19. Many of the system include files are modified and the object file format is being changed - recompilation will not be possible until the transformation of the system and object libraries is complete. AT A MINIMUM you will want to back up the following files (unless you have a known good backup already made) in case you need to recompile something before part 19 is done: /bin/ar /bin/ld /bin/nm /bin/as /usr/bin/ranlib /lib/c0 /lib/crt0.o /lib/mcrt0.o /lib/libc.a /bin/nm /usr/include/*.h /usr/include/sys/*.h In part 19 there is a *complete* list of all files affected (all 336 of them) - you may wish to back those up also. And now the common header ('boilerplate') is over (at last ;-)), let the installation guide begin. As always, the complete 2.11BSD updates are available via anonymous FTP to 'ftp.iipo.gtegsc.com' in the directory /pub/2.11BSD ========== #172 (Part #15 of 19) 0) Be in a temp directory ("cd /tmp" or "cd /usr/tmp") 1) Save the following shar archive to a file (/tmp/172 for example) 2) Unpack the archive: sh 172 3) Unpack the new source replacements: sh new.sources 4) rm 172 new.sources Part 15 of 19 is done. DO NOT rebuild or compile _anything_ at this point! ===== cut here #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # new.sources # This archive created: Fri Feb 4 23:25:47 1994 export PATH; PATH=/bin:/usr/bin:$PATH if test -f 'new.sources' then echo shar: "will not over-write existing file 'new.sources'" else sed 's/^X//' << \SHAR_EOF > 'new.sources' X#! /bin/sh X# This is a shell archive, meaning: X# 1. Remove everything above the #! /bin/sh line. X# 2. Save the resulting text in a file. X# 3. Execute the file with /bin/sh (not csh) to create: X# /usr/src/ucb/symorder.c X# /usr/src/ucb/symcompact.c X# /usr/src/ucb/strcompact.c X# This archive created: Fri Jan 28 21:21:13 1994 Xexport PATH; PATH=/bin:/usr/bin:$PATH Xif test -f '/usr/src/ucb/symorder.c' Xthen X echo shar: "will not over-write existing file '/usr/src/ucb/symorder.c'" Xelse Xsed 's/^X//' << \SHAR_EOF > '/usr/src/ucb/symorder.c' XX/* XX * Program Name: symorder.c XX * Date: January 21, 1994 XX * Author: S.M. Schultz XX * XX * ----------------- Modification History --------------- XX * Version Date Reason For Modification XX * 1.0 21Jan94 1. Initial release into the public domain. XX*/ XX XX/* XX * This program reorders the symbol table of an executable. This is XX * done by moving symbols found in the second file argument (one symbol XX * per line) to the front of the symbol table. XX * XX * NOTE: This program needs to hold the string table in memory. XX * For the kernel which has not been 'strcompact'd this is about 21kb. XX * It is highly recommended that 'strcompact' be run first - that program XX * removes redundant strings, significantly reducing the amount of memory XX * needed. Running 'symcompact' will reduce the run time needed by XX * this program by eliminating redundant non-overlaid text symbols. XX*/ XX XX#include XX#include XX#include XX#include XX#include XX#include XX#include XX XX#define NUMSYMS 125 XX char *order[NUMSYMS]; XX int nsorted; XX char *Pgm; XX void cleanup(); XXstatic char sym1tmp[20], sym2tmp[20], strtmp[20]; XXstatic char *strtab, *oldname; XX XXmain(argc, argv) XX int argc; XX char **argv; XX { XX FILE *fp, *fp2, *sym1fp, *sym2fp, *strfp; XX int cnt, nsyms, len, c; XX char fbuf1[BUFSIZ], fbuf2[BUFSIZ]; XX off_t symoff, stroff, ltmp; XX long strsiz; XX struct nlist sym; XX struct xexec xhdr; XX XX Pgm = argv[0]; XX XX signal(SIGQUIT, cleanup); XX signal(SIGINT, cleanup); XX signal(SIGHUP, cleanup); XX XX if (argc != 3) XX { XX fprintf(stderr, "usage %s: symlist file\n", Pgm); XX exit(EX_USAGE); XX } XX fp = fopen(argv[2], "r+"); XX if (!fp) XX { XX fprintf(stderr, "%s: can't open '%s' for update\n", Pgm, XX argv[2]); XX exit(EX_NOINPUT); XX } XX setbuf(fp, fbuf1); XX cnt = fread(&xhdr, 1, sizeof (xhdr), fp); XX if (cnt < sizeof (xhdr.e)) XX { XX fprintf(stderr, "%s: Premature EOF reading header\n", Pgm); XX exit(EX_DATAERR); XX } XX if (N_BADMAG(xhdr.e)) XX { XX fprintf(stderr, "%s: Bad magic number\n", Pgm); XX exit(EX_DATAERR); XX } XX nsyms = xhdr.e.a_syms / sizeof (struct nlist); XX if (!nsyms) XX { XX fprintf(stderr, "%s: '%s' stripped\n", Pgm); XX exit(EX_OK); XX } XX stroff = N_STROFF(xhdr); XX symoff = N_SYMOFF(xhdr); XX/* XX * Seek to the string table size longword and read it. Then attempt to XX * malloc memory to hold the string table. First make a sanity check on XX * the size. XX*/ XX fseek(fp, stroff, L_SET); XX fread(&strsiz, sizeof (long), 1, fp); XX if (strsiz > 48 * 1024L) XX { XX fprintf(stderr, "%s: string table > 48kb\n", Pgm); XX exit(EX_DATAERR); XX } XX strtab = (char *)malloc((int)strsiz); XX if (!strtab) XX { XX fprintf(stderr, "%s: no memory for strings\n", Pgm); XX exit(EX_OSERR); XX } XX/* XX * Now read the string table into memory. Reduce the size read because XX * we've already retrieved the string table size longword. Adjust the XX * address used so that we don't have to adjust each symbol table entry's XX * string offset. XX*/ XX cnt = fread(strtab + sizeof (long), 1, (int)strsiz - sizeof (long), fp); XX if (cnt != (int)strsiz - sizeof (long)) XX { XX fprintf(stderr, "%s: Premature EOF reading strings\n", Pgm); XX exit(EX_DATAERR); XX } XX/* XX * Now open the file containing the list of symbols to XX * relocate to the front of the symbol table. XX*/ XX fp2 = fopen(argv[1], "r"); XX if (!fp2) XX { XX fprintf(stderr, "%s: Can not open '%s'\n", Pgm, argv[1]); XX exit(EX_NOINPUT); XX } XX getsyms(fp2); XX XX/* XX * Create the temporary files which will hold the new symbol table and the XX * new string table. One temp file receives symbols _in_ the list, XX * another file receives all other symbols, and the last file receives the XX * new string table. XX*/ XX strcpy(sym1tmp, "/tmp/sym1XXXXXX"); XX mktemp(sym1tmp); XX strcpy(sym2tmp, "/tmp/sym2XXXXXX"); XX mktemp(sym2tmp); XX strcpy(strtmp, "/tmp/strXXXXXX"); XX mktemp(strtmp); XX sym1fp = fopen(sym1tmp, "w+"); XX sym2fp = fopen(sym2tmp, "w+"); XX strfp = fopen(strtmp, "w+"); XX if (!sym1fp || !sym2fp || !strfp) XX { XX fprintf(stderr, "%s: Can't create %s, %s or %s\n", sym1tmp, XX sym2tmp, strtmp); XX exit(EX_CANTCREAT); XX } XX setbuf(sym1fp, fbuf2); XX/* XX * Now position the executable to the start of the symbol table. For each XX * symbol scan the list for a match on the symbol name. If the XX * name matches write the symbol table entry to one tmp file, else write it XX * to the second symbol tmp file. XX * XX * NOTE: Since the symbol table is being rearranged the usefulness of XX * "local" symbols, especially 'register' symbols, is greatly diminished XX * Not that they are terribly useful in any event - especially the register XX * symbols, 'adb' claims to do something with them but doesn't. In any XX * event this suite of programs is targeted at the kernel and the register XX * local symbols are of no use. For this reason 'register' symbols are XX * removed - this has the side effect of even further reducing the symbol XX * and string tables that must be processed by 'nm', 'ps', 'adb' and so on. XX * This removal probably should have been done earlier - in 'strcompact' or XX * 'symcompact' and it may be in the future, but for now just do it here. XX*/ XX fseek(fp, symoff, L_SET); XX while (nsyms--) XX { XX fread(&sym, sizeof (sym), 1, fp); XX if (sym.n_type == N_REG) XX continue; XX if (inlist(&sym)) XX fwrite(&sym, sizeof (sym), 1, sym1fp); XX else XX fwrite(&sym, sizeof (sym), 1, sym2fp); XX } XX XX/* XX * Position the executable file to where the symbol table starts. Truncate XX * the file to the current position to remove the old symbols and strings. Then XX * write the symbol table entries which are to appear at the front, followed XX * by the remainder of the symbols. As each symbol is processed adjust the XX * string table offset and write the string to the strings tmp file. XX * XX * It was either re-scan the tmp files with the symbols again to retrieve XX * the string offsets or simply write the strings to yet another tmp file. XX * The latter was chosen. XX*/ XX fseek(fp, symoff, L_SET); XX ftruncate(fileno(fp), ftell(fp)); XX ltmp = sizeof (long); XX rewind(sym1fp); XX rewind(sym2fp); XX nsyms = 0; XX while (fread(&sym, sizeof (sym), 1, sym1fp) == 1) XX { XX if (ferror(sym1fp) || feof(sym1fp)) XX break; XX oldname = strtab + (int)sym.n_un.n_strx; XX sym.n_un.n_strx = ltmp; XX len = strlen(oldname) + 1; XX ltmp += len; XX fwrite(&sym, sizeof (sym), 1, fp); XX fwrite(oldname, len, 1, strfp); XX nsyms++; XX } XX fclose(sym1fp); XX while (fread(&sym, sizeof (sym), 1, sym2fp) == 1) XX { XX if (ferror(sym2fp) || feof(sym2fp)) XX break; XX oldname = strtab + (int)sym.n_un.n_strx; XX sym.n_un.n_strx = ltmp; XX len = strlen(oldname) + 1; XX ltmp += len; XX fwrite(&sym, sizeof (sym), 1, fp); XX fwrite(oldname, len, 1, strfp); XX nsyms++; XX } XX fclose(sym2fp); XX/* XX * Next write the symbol table size longword followed by the XX * string table itself. XX*/ XX fwrite(<mp, sizeof (long), 1, fp); XX rewind(strfp); XX while ((c = getc(strfp)) != EOF) XX putc(c, fp); XX fclose(strfp); XX/* XX * And last (but not least) we need to update the a.out header with XX * the correct size of the symbol table. XX*/ XX rewind(fp); XX xhdr.e.a_syms = nsyms * sizeof (struct nlist); XX fwrite(&xhdr.e, sizeof (xhdr.e), 1, fp); XX fclose(fp); XX free(strtab); XX cleanup(); XX } XX XXinlist(sp) XX register struct nlist *sp; XX { XX register int i; XX XX for (i = 0; i < nsorted; i++) XX { XX if (strcmp(strtab + (int)sp->n_un.n_strx, order[i]) == 0) XX return(1); XX } XX return(0); XX } XX XXgetsyms(fp) XX FILE *fp; XX { XX char asym[128], *start; XX register char *t, **p; XX XX for (p = order; fgets(asym, sizeof(asym), fp) != NULL;) XX { XX if (nsorted >= NUMSYMS) XX { XX fprintf(stderr, "%s: only doing %d symbols\n", XX Pgm, NUMSYMS); XX break; XX } XX for (t = asym; isspace(*t); ++t) XX ; XX if (!*(start = t)) XX continue; XX while (*++t) XX ; XX if (*--t == '\n') XX *t = '\0'; XX *p++ = strdup(start); XX ++nsorted; XX } XX fclose(fp); XX } XX XXvoid XXcleanup() XX { XX if (strtmp[0]) XX unlink(strtmp); XX if (sym1tmp[0]) XX unlink(sym1tmp); XX if (sym2tmp[0]) XX unlink(sym2tmp); XX exit(EX_OK); XX } XSHAR_EOF Xchmod 644 '/usr/src/ucb/symorder.c' Xfi Xif test -f '/usr/src/ucb/symcompact.c' Xthen X echo shar: "will not over-write existing file '/usr/src/ucb/symcompact.c'" Xelse Xsed 's/^X//' << \SHAR_EOF > '/usr/src/ucb/symcompact.c' XX/* XX * Program Name: symcompact.c XX * Date: January 21, 1994 XX * Author: S.M. Schultz XX * XX * ----------------- Modification History --------------- XX * Version Date Reason For Modification XX * 1.0 21Jan94 1. Initial release into the public domain. XX*/ XX XX/* XX * This program compacts the symbol table of an executable. This is XX * done by removing '~symbol' references when _both_ the '~symbol' and XX * '_symbol' have an overlay number of 0. The assembler always generates XX * both forms. The only time both forms are needed is in an overlaid XX * program and the routine has been relocated by the linker, in that event XX * the '_' form is the overlay "thunk" and the '~' form is the actual XX * routine itself. Only 'text' symbols have both forms. Reducing the XX * number of symbols greatly speeds up 'nlist' processing as well as XX * cutting down memory requirements for programs such as 'adb' and 'nm'. XX * XX * NOTE: This program attempts to hold both the string and symbol tables XX * in memory. For the kernel which has not been 'strcompact'd this XX * amounts to about 49kb. IF this program runs out of memory you should XX * run 'strcompact' first - that program removes redundant strings, XX * significantly reducing the amount of memory needed. Alas, this program XX * will undo some of strcompact's work and you may/will need to run XX * strcompact once more after removing excess symbols. XX*/ XX XX#include XX#include XX#include XX#include XX#include XX#include XX#include XX XX char *Pgm; XXstatic char strtmp[20]; XX XXmain(argc, argv) XX int argc; XX char **argv; XX { XX FILE *fp, *strfp; XX int cnt, nsyms, len, c, symsremoved = 0; XX void cleanup(); XX char *strtab; XX char fbuf1[BUFSIZ], fbuf2[BUFSIZ]; XX off_t symoff, stroff, ltmp; XX long strsiz; XX register struct nlist *sp, *sp2; XX struct nlist *symtab, *symtabend; XX struct xexec xhdr; XX XX Pgm = argv[0]; XX signal(SIGQUIT, cleanup); XX signal(SIGINT, cleanup); XX signal(SIGHUP, cleanup); XX XX if (argc != 2) XX { XX fprintf(stderr, "%s: filename argument missing\n", Pgm); XX exit(EX_USAGE); XX } XX fp = fopen(argv[1], "r+"); XX if (!fp) XX { XX fprintf(stderr, "%s: can't open '%s' for update\n", Pgm, XX argv[1]); XX exit(EX_NOINPUT); XX } XX setbuf(fp, fbuf1); XX cnt = fread(&xhdr, 1, sizeof (xhdr), fp); XX if (cnt < sizeof (xhdr.e)) XX { XX fprintf(stderr, "%s: Premature EOF reading header\n", Pgm); XX exit(EX_DATAERR); XX } XX if (N_BADMAG(xhdr.e)) XX { XX fprintf(stderr, "%s: Bad magic number\n", Pgm); XX exit(EX_DATAERR); XX } XX nsyms = xhdr.e.a_syms / sizeof (struct nlist); XX if (!nsyms) XX { XX fprintf(stderr, "%s: '%s' stripped\n", Pgm); XX exit(EX_OK); XX } XX stroff = N_STROFF(xhdr); XX symoff = N_SYMOFF(xhdr); XX/* XX * Seek to the string table size longword and read it. Then attempt to XX * malloc memory to hold the string table. First make a sanity check on XX * the size. XX*/ XX fseek(fp, stroff, L_SET); XX fread(&strsiz, sizeof (long), 1, fp); XX if (strsiz > 48 * 1024L) XX { XX fprintf(stderr, "%s: string table > 48kb\n", Pgm); XX exit(EX_DATAERR); XX } XX strtab = (char *)malloc((int)strsiz); XX if (!strtab) XX { XX fprintf(stderr, "%s: no memory for strings\n", Pgm); XX exit(EX_OSERR); XX } XX/* XX * Now read the string table into memory. Reduce the size read because XX * we've already retrieved the string table size longword. Adjust the XX * address used so that we don't have to adjust each symbol table entry's XX * string offset. XX*/ XX cnt = fread(strtab + sizeof (long), 1, (int)strsiz - sizeof (long), fp); XX if (cnt != (int)strsiz - sizeof (long)) XX { XX fprintf(stderr, "%s: Premature EOF reading strings\n", Pgm); XX exit(EX_DATAERR); XX } XX/* XX * Next seek to the symbol table position in the file, allocate memory XX * for the symbol table and read it in. XX*/ XX fseek(fp, symoff, L_SET); XX symtab = (struct nlist *)malloc(nsyms * sizeof (struct nlist)); XX if (!symtab) XX { XX fprintf(stderr, "%s: no memory for symbols\n", Pgm); XX exit(EX_OSERR); XX } XX cnt = fread(symtab, sizeof (struct nlist), nsyms, fp); XX if (cnt != nsyms) XX { XX fprintf(stderr, "%s: premature EOF in symbols\n", Pgm); XX exit(EX_DATAERR); XX } XX symtabend = &symtab[nsyms]; XX/* XX * Now compute the in memory address of the strings for each symbol. We XX * do not need to adjust the offset for the string table size longword because XX * the strings were read in using a biased address. XX*/ XX for (sp = symtab; sp < symtabend; sp++) XX sp->n_un.n_name = strtab + (int)sp->n_un.n_strx; XX XX/* XX * Now look for symbols with overlay numbers of 0 (root/base segment) and XX * of type 'text'. For each symbol found check if there exists both a '~' XX * and '_' prefixed form of the symbol. Preserve the '_' form and clear XX * the '~' entry by zeroing the string address of the '~' symbol. XX*/ XX for (sp = symtab; sp < symtabend; sp++) XX { XX if (sp->n_ovly || !sp->n_un.n_name) XX continue; XX if ((sp->n_type & N_TYPE) != N_TEXT) XX continue; XX if (sp->n_un.n_name[0] != '~') XX continue; XX/* XX * At this point we have the '~' form of a non overlaid text symbol. Look XX * thru the symbol table for the '_' form. All of 1) symbol type, 2) Symbol XX * value and 3) symbol name (starting after the first character) must match. XX*/ XX for (sp2 = symtab; sp2 < symtabend; sp2++) XX { XX if (sp2->n_ovly || !sp2->n_un.n_name) XX continue; XX if ((sp2->n_type & N_TYPE) != N_TEXT) XX continue; XX if (sp2->n_un.n_name[0] != '_') XX continue; XX if (sp2->n_value != sp->n_value) XX continue; XX if (strcmp(sp->n_un.n_name+1, sp2->n_un.n_name+1)) XX continue; XX/* XX * Found a match. Null out the '~' symbol's string address. XX*/ XX symsremoved++; XX sp->n_un.n_strx = NULL; XX break; XX } XX } XX/* XX * Done with the nested scanning of the symbol table. Now create a new XX * string table (from the remaining symbols) in a temporary file. XX*/ XX strcpy(strtmp, "/tmp/strXXXXXX"); XX mktemp(strtmp); XX strfp = fopen(strtmp, "w+"); XX if (!strfp) XX { XX fprintf(stderr, "%s: can't create '%s'\n", Pgm, strtmp); XX exit(EX_CANTCREAT); XX } XX setbuf(strfp, fbuf2); XX XX/* XX * As each symbol is written to the tmp file the symbol's string offset XX * is updated with the new file string table offset. XX*/ XX ltmp = sizeof (long); XX for (sp = symtab; sp < symtabend; sp++) XX { XX if (!sp->n_un.n_name) XX continue; XX len = strlen(sp->n_un.n_name) + 1; XX fwrite(sp->n_un.n_name, len, 1, strfp); XX sp->n_un.n_strx = ltmp; XX ltmp += len; XX } XX/* XX * We're done with the memory string table - give it back. Then reposition XX * the new string table file to the beginning. XX*/ XX free(strtab); XX rewind(strfp); XX XX/* XX * Position the executable file to where the symbol table begins. Truncate XX * the file. Write out the valid symbols, counting each one so that the XX * a.out header can be updated when we're done. XX*/ XX nsyms = 0; XX fseek(fp, symoff, L_SET); XX ftruncate(fileno(fp), ftell(fp)); XX for (sp = symtab; sp < symtabend; sp++) XX { XX if (sp->n_un.n_strx == 0) XX continue; XX nsyms++; XX fwrite(sp, sizeof (struct nlist), 1, fp); XX } XX/* XX * Next write out the string table size longword. XX*/ XX fwrite(<mp, sizeof (long), 1, fp); XX/* XX * We're done with the in memory symbol table, release it. Then append XX * the string table to the executable file. XX*/ XX free(symtab); XX while ((c = getc(strfp)) != EOF) XX putc(c, fp); XX fclose(strfp); XX rewind(fp); XX xhdr.e.a_syms = nsyms * sizeof (struct nlist); XX fwrite(&xhdr.e, sizeof (xhdr.e), 1, fp); XX fclose(fp); XX printf("%s: %d symbols removed\n", Pgm, symsremoved); XX cleanup(); XX } XX XXvoid XXcleanup() XX { XX if (strtmp[0]) XX unlink(strtmp); XX exit(EX_OK); XX } XSHAR_EOF Xchmod 644 '/usr/src/ucb/symcompact.c' Xfi Xif test -f '/usr/src/ucb/strcompact.c' Xthen X echo shar: "will not over-write existing file '/usr/src/ucb/strcompact.c'" Xelse Xsed 's/^X//' << \SHAR_EOF > '/usr/src/ucb/strcompact.c' XX/* XX * Program Name: strcompact.c XX * Date: January 21, 1994 XX * Author: S.M. Schultz XX * XX * ----------------- Modification History --------------- XX * Version Date Reason For Modification XX * 1.0 21Jan94 1. Initial release into the public domain. XX*/ XX XX/* XX * This program compacts the string table of an executable image by XX * preserving only a single string definition of a symbol and updating XX * the symbol table string offsets. Multiple symbols having the same XX * string are very common - local symbols in a function often have the XX * same name ('int error' inside a function for example). This program XX * reduced the string table size of the kernel at least 25%! XX*/ XX XX#include XX#include XX#include XX#include XX#include XX#include XX#include XX XX char *Pgm; XX char *Sort = "/usr/bin/sort"; XXstatic char strtmp[20], tempfn[20], symtmp[20]; XXstatic int shared; XXextern long atol(); XXextern time_t time(); XX XXmain(argc, argv) XX int argc; XX char **argv; XX { XX char fbuf1[BUFSIZ], fbuf2[BUFSIZ]; XX char buf1[128], buf2[128]; XX char *string1, *string2, *tab1pos, *tab2pos, *cp; XX FILE *aoutfp, *symfp, *strfp; XXregister FILE *fp; XX struct xexec xhdr; XXregister struct nlist *sp; XX struct nlist *symtab, *symtabend; XX int nsyms, c, cnt, len; XX void cleanup(); XX off_t symoff, stroff, ltmp, offset1, offset2; XX XX Pgm = argv[0]; XX signal(SIGQUIT, cleanup); XX signal(SIGINT, cleanup); XX signal(SIGHUP, cleanup); XX XX if (argc != 2) XX { XX fprintf(stderr, "%s: missing filename argument\n", Pgm); XX exit(EX_USAGE); XX } XX aoutfp = fopen(argv[1], "r+"); XX if (!aoutfp) XX { XX fprintf(stderr, "%s: can not open '%s' for update\n", XX Pgm, argv[1]); XX exit(EX_NOINPUT); XX } XX cnt = fread(&xhdr, 1, sizeof (xhdr), aoutfp); XX if (cnt < sizeof (xhdr.e)) XX { XX fprintf(stderr, "%s: premature EOF\n", Pgm); XX exit(EX_DATAERR); XX } XX if (N_BADMAG(xhdr.e)) XX { XX fprintf(stderr, "%s: Bad magic number\n", Pgm); XX exit(EX_DATAERR); XX } XX nsyms = xhdr.e.a_syms / sizeof (struct nlist); XX if (!nsyms) XX { XX fprintf(stderr, "%s: '%s' stripped\n", Pgm, argv[1]); XX exit(EX_OK); XX } XX XX strcpy(strtmp, "/tmp/strXXXXXX"); XX mktemp(strtmp); XX strcpy(tempfn, "/tmp/SYMXXXXXX"); XX mktemp(tempfn); XX strcpy(symtmp, "/tmp/symXXXXXX"); XX mktemp(symtmp); XX XX symoff = N_SYMOFF(xhdr); XX stroff = N_STROFF(xhdr); XX XX/* XX * Now move to the start of the string table, bypassing the string table XX * size longword. XX*/ XX fseek(aoutfp, stroff + sizeof (long), L_SET); XX XX fp = fopen(tempfn, "w+"); XX if (!fp) XX { XX fprintf(stderr, "%s: can't create temp file\n", Pgm); XX exit(EX_CANTCREAT); XX } XX/* XX * Now read the string table and produce lines of the form: XX * XX * string_offsetsymbol_string XX * XX * in the temp file. XX*/ XX ltmp = sizeof (long); XX while (1) XX { XX if (feof(aoutfp) || ferror(aoutfp)) XX break; XX sgets(aoutfp, fp, <mp); XX } XX fclose(fp); XX/* XX * Next we sort the temp file on the second field (symbol name). Duplicates XX * are _not_ suppressed this time since we will be scanning the symbol table XX * looking for references to offsets belonging to the same symbol. XX*/ XX sprintf(fbuf1, "%s +1 -2 -o %s %s", Sort, tempfn, tempfn); XX system(fbuf1); XX fp = fopen(tempfn, "r"); XX if (!fp) XX fatal("Can't reopen sorted file"); XX/* XX * Now use the local buffer to leave more room to malloc for the XX * symbol table. XX*/ XX setbuf(fp, fbuf1); XX XX/* XX * We need to hold the entire symbol table in memory - for the kernel this XX * is approximately 28kb. XX*/ XX symtab = (struct nlist *)calloc(nsyms, sizeof (struct nlist)); XX if (!symtab) XX fatal("no memory for symbol table"); XX symtabend = &symtab[nsyms]; XX XX fseek(aoutfp, symoff, L_SET); XX cnt = fread(symtab, sizeof (struct nlist), nsyms, aoutfp); XX if (cnt != nsyms) XX fatal("Premature EOF reading symbols"); XX XX/* XX * The sorted strings file looks like this: XX * XX * 1234 _foobar XX * 168 _foobar XX * 6238 _foobar XX * ... XX * 6512 _blatz XX * XX * We want to make all string offsets to '_foobar' be 1234. When a different XX * symbol is encountered (_blatz) we know we're done with the previous symbol XX * and the process starts over. XX*/ XX XX string1 = buf1; XX string2 = buf2; XX fgets(string1, sizeof(buf1), fp); XX XX while (fgets(string2, sizeof (buf1), fp)) XX { XX tab1pos = index(string1, '\t'); XX tab2pos = index(string2, '\t'); XX if (!tab1pos || !tab2pos) XX fatal("malformed input from sort file"); XX tab1pos++; XX tab2pos++; XX/* XX * Compare the previous and current symbol. If they are different then XX * copy the second string to the first and continue the scanning. XX*/ XX if (strcmp(tab1pos, tab2pos)) XX { XX strcpy(string1, string2); XX continue; XX } XX /* XX * If they are the same then look thru the symbol table for references to the XX * current offset, replacing it with the offset from the first instance of XX * the symbol XX*/ XX offset2 = atol(string2); XX for (sp = symtab; sp < symtabend; sp++) XX { XX if (sp->n_un.n_strx == offset2) XX { XX shared++; XX sp->n_un.n_strx = atol(string1); XX } XX } XX/* XX * Since the strings matched we do not swap the buffers and continue looking XX * for matches on the symbol pointed to by 'string1'. XX*/ XX continue; XX } XX fclose(fp); XX fprintf(stderr, "%s: %d shared strings found\n", Pgm, shared); XX if (!shared) XX { XX fclose(aoutfp); XX fatal((char *)NULL); XX } XX/* XX * Now use "uniq -1" on the temp file to remove the duplicates, preserving XX * the first mention of each symbol (which is the one used above). XX*/ XX sprintf(fbuf1, "/usr/bin/uniq -1 %s", tempfn); XX fp = popen(fbuf1, "r"); XX if (!fp) XX fatal("popen uniq failed"); XX/* XX * Now create the temporary files which will hold the new string table and XX * symbol table. As the output from 'uniq' is processed the symbol table XX * is scanned and matches on the 'offset' cause symbols to be output to XX * the symbol table file. As a symbol is placed in the file it is cleared XX * in memory so it is not processed more than once. XX*/ XX XX symfp = fopen(symtmp, "w+"); XX if (!symfp) XX fatal("Create of symtmp failed"); XX setbuf(symfp, fbuf1); XX XX strfp = fopen(strtmp, "w+"); XX if (!strfp) XX fatal("Create of strtmp failed"); XX setbuf(strfp, fbuf2); XX XX/* XX * Initialize the string table offset to the minimum - the long word size XX * includes itself in the string table size. XX*/ XX ltmp = sizeof (long); XX XX while (fgets(buf1, sizeof(buf1), fp)) XX { XX tab1pos = index(buf1, '\t'); XX *tab1pos++ = '\0'; XX tab2pos = index(tab1pos, '\n'); XX *tab2pos++ = '\0'; XX/* XX * Get the offset and enter into the symbol table scan to look for XX * references to this offset. It is a fatal error not to find a match. XX * Write matched symbols out to the file and then clear their string offset XX * so they are not found again. XX*/ XX offset1 = atol(buf1); XX cnt = 0; XX for (sp = symtab; sp < symtabend; sp++) XX { XX if (sp->n_un.n_strx == 0) XX continue; XX if (sp->n_un.n_strx != offset1) XX continue; XX sp->n_un.n_strx = ltmp; /* NEW offset */ XX fwrite(sp, sizeof (struct nlist), 1, symfp); XX sp->n_un.n_strx = 0; XX cnt++; XX } XX if (!cnt) XX fatal("No symbols found in offset scan"); XX/* XX * Now write the string (including the terminating null) and update the XX * string table offset (for the next symbol written). XX*/ XX len = strlen(tab1pos) + 1; XX fwrite(tab1pos, len, 1, strfp); XX ltmp += len; XX } XX/* XX * Close down the input pipe and reposition the temp file output for XX * updating. Free the symbol table - we're done with it now. Position XX * the a.out file to where the symbol table starts. XX*/ XX pclose(fp); XX rewind(symfp); XX rewind(strfp); XX free(symtab); XX fseek(aoutfp, symoff, L_SET); XX XX/* XX * Now append the new symbol table. Then write the string table length XX * followed by the string table. Finally truncate the file to the new XX * length, reflecting the smaller string table. XX*/ XX while ((c = getc(symfp)) != EOF) XX putc(c, aoutfp); XX fwrite(<mp, sizeof (long), 1, aoutfp); XX while ((c = getc(strfp)) != EOF) XX putc(c, aoutfp); XX ftruncate(fileno(aoutfp), ftell(aoutfp)); XX fclose(aoutfp); XX fclose(symfp); XX fclose(strfp); XX fatal((char *)NULL); XX } XX XXfatal(str) XX char *str; XX { XX XX if (tempfn[0]) XX unlink(tempfn); XX if (strtmp[0]) XX unlink(strtmp); XX if (symtmp[0]) XX unlink(symtmp); XX if (!str) XX exit(EX_OK); XX fprintf(stderr, "%s: %s\n", str); XX exit(EX_SOFTWARE); XX } XX XXvoid XXcleanup() XX { XX fatal((char *)NULL); XX } XX XXsgets(aoutfp, fp, ltmp) XX register FILE *aoutfp, *fp; XX long *ltmp; XX { XX char buf[128]; XX int c; XX register char *cp; XX XX cp = buf; XX while ((c = getc(aoutfp)) != EOF) XX { XX if (cp < &buf[sizeof (buf) - 1]) XX *cp++ = c; XX if (c == '\0') XX break; XX } XX *cp++ = '\0'; XX if (buf[0] == '\0') XX return; XX fprintf(fp, "%ld\t%s\n", *ltmp, buf); XX *ltmp += (strlen(buf) + 1); XX } XSHAR_EOF Xchmod 644 '/usr/src/ucb/strcompact.c' Xfi Xexit 0 X# End of shell archive SHAR_EOF fi exit 0 # End of shell archive