Subject: Object file symbols limited to 8 characters [+FIX] (#168 - #11 of 19) Index: cc,as,ld,ar,ranlib,nm,nlist,adb,... (2.11BSD) Description: For some time now (seem like eons;-)) the object file format used by Unix for the PDP-11 has restricted symbols to 8 significant characters (actually 7 due to the C compiler prefixing symbols with a leading tilde (~) or underscore (_)). Aside from the "creative constraints" this imposes on the programmer there was the continuing problem of 'name collisions', especially when porting applications from machines whose object file format permitted longer symbol names. Numerous workarounds have been employed in the past. The most common one relied on a combination of a name collision detection program ('shortc') and the flexname capability of the C preprocessor ('cpp'). This served to mask the problem while making debugging difficult due to mangled/synthetic symbol names. Repeat-By: Attempt to compile the following program: int this_is_a_long_name; int this_is_a_long_name_too; main() { exit(0)}; Fix: This section is repeated in each of the 19 parts which make up the update kit. You should read it perhaps once or twice, but then skip over it (how to do that is mentioned below). Taking a "hint" from the a.out(5) man page: "The compiler will note name collisions when they occur within a single file... There is really little that can be done about this. Some thought is being given to modifying the loader to flag detectable collisions, but the real solution would be to change over to the 4BSD a.out format. This would involve modifying the compiler, assembler and adb and then simply porting the 4.3BSD ld, nm, ranlib, strip and nlist. Or perhaps simply porting the entire 4.3BSD suite might be best ... Anyone interested in a project?" This I have done. No more volunteers for the project need apply ;-) The new limit on symbol length is 32 characters! There is still a limit (but it is _much_ more reasonable now) simply because of address space constraints - it needs to be possible to hold at least one of the 'symbol' or 'string' tables in memory in many cases (nice to hold both, but - i know, get a 486;-)). It must be noted though that it is almost trivial now to raise the limit if that is desired - the programs which need to know the maximum length of a symbol string all have an easily changed #define statement now (usually MAXSYMLEN but there are a couple exceptions). The 'string table' format itself doesn't care how long the strings are. The actual a.out format won't have to change again to accomodate a higher limit on symbol name length! The "string table" object file format has been ported and all the necessary changes made throughout the entire system. The changes were *massive* and widespread. Programs affected of course included the assembler and compiler. Other programs affected were anything which accessed a symbol table entry either via nlist(3) [ps, pstat, fstat, vmstat, etc] or by reading object files [ld, ranlib, nm, adb, strip, etc]. The actual changes to the compiler and assembler were minor because those programs had already been modified earlier (updates #142, 143, 152, 153). The compiler only needed to have the maximum size of a symbol name raised. The assembler already knew how to generate 'string table' object files - all that needed to be done in 'as' was to flip a bit telling it to generate the new object format instead of the old style. +++++++++++++++ And now for a bit of a narrative about what was done. The detailed instructions for applying this part (#11 of 19) of the update kit follow the 'story' below. This started out as a semi-organized accounting of what was done but then devolved into a semi-rambling tale due to the sheer bulk of the changes. You can skip to the details for applying #168 by searching for the string "=======" below - this header is replicated in all parts of this kit. +++++++++++++++ Alas, the remaining changes were not so simple. Complete replacements for ranlib(1), ar(1), nlist(3) were ported from the Net-2 release. Other programs such as symorder(1) and two new programs 'symcompact' and 'strcompact' (used to compress/compact symbol and string tables) were written from scratch. Perhaps the two hardest parts of the whole effort were rewriting the linker 'ld' and making *large* modifications to the debugger 'adb'. This was a very difficult job. 'ld needed to scan new style ranlib archives, as well as using the "virtual memory" facility (the 'libvmf' routines posted earlier) for symbol table management and so on. 'adb' was a MESS (having been written in a pseudo block structured macro language). Since the new symbol table entry could be so much larger than the old it was no longer possible for adb to hold as much of the symbol table in memory - an alternate method took a while to develope and implement, more on that in the patch which deals with adb (actually the changes to adb are so large there are two substantial parts of this update kit just for adb!). After the basic programs (ar, ld, ranlib, etc) were running the system had to be completely recompiled from sources, beginning with the object libraries. After those were done the process of recompiling the rest of the system could proceed. Guess what happens when you recreate libc.a with a buggy linker? Yep - the system is rendered useless until backup copies of everything can be reloaded. Don't let this happen to you - be sure (and i'll repeat the point later) to back up the system (or at least key executables and .a files) before installing this upgrade. In all there were about 330 files modified during the change of object file format. Some of these were not directly related to the new object file format. There were a number of (obsolete) references to "BSD2_10" lingering in the system. Those have been replaced with "pdp11" and the 'BSD2_10' define has been removed from the C preprocessor (cpp). DO NOT use 'BSD2_10' to #ifdef pdp-11 sensitive code, use "pdp11" instead. During the recompile of the libraries a fairly large number of "shortened" names were lengthened - these included syscall routines such as "gethostname" which no longer had to be munged into "gethname". Also a surprising number of typographical errors were uncovered (mainly in the Fortran libraries) where an extra character (beyond the 7th character) was left off or accidentally added. These were all fixed and eventually, after a couple evenings, the libraries were built and installed. After the libraries were done it was the application programs' turn to be recompiled. This took the better part of a couple weeks to finally make it thru due to (as it turned out) the iterative nature of the task. A symbol would come up undefined and have to be tracked down exactly where the wrong definition/use was coming from. Finally, however, the task was done and it was time to move on to the kernel. The kernel proved to be suprisingly easy - no real complications arose except when it came time to reboot, a bug had been introduced into 'autoconfig' (who uses 'nlist' to scan the kernel symbol table). Ouch! That was another couple late nights. Since the compiler supports unsigned longs now a number of small changes which ifdef'd 'u_long' to 'long' were removed. REMEMBER - you need to recompile 'autoconfig' and install it before rebooting the new kernel ;-) The performance of 'ps' though (and anything else which used nlist(3), 'fstat', 'w' are good examples) was unacceptably slow. So, amidst other delays (real work, the earthquake - which almost tossed the disc drive to the floor, etc) the "symorder" program was written (with ideas borrowed from the Net-2 version). The symorder(1) program rather insists on holding both the symbol and string tables in memory - this was a problem (or could be if the kernel symbol table grows much more) so two new and original programs were written: 'symcompact' and 'strcompact'. The first program compacts the symbol table by removing 'register' local variables (they're of no use to anyone - the debugger doesn't/can't do anything with them) and redundant global text symbols (symbols in an overlaid program which are in the root segment do not need both the '~' and '_' symbols present). The second program 'strcompact' is one that any 'string table' based object file system can use. It implements "shared strings" for symbols - if a program has many references to 'error' as a local symbol, why store the string 'error' more than once? Simply store one instance and then update the symbol table entries to all point to the same string! Using both 'strcompact' and 'symcompact' on the /unix image resulted in a file that was 15kb smaller. Running 'symorder' then puts the most frequently used symbols at the front of the symbol table, the performance of 'w', 'pstat', and other programs which nlist(3) the kernel was now acceptable. Some of the parts of this kit are large. The large patch files have been split into pieces which the 'patch' will handle, other parts (the replacement 'ar' sources) were left as a single 'shar' file rather than split them up. Each part of this kit consists of: a 'patchfile' - this is used with the "patch" program to update files. an optional 'script' - this is run ("sh script") to perform initialization, remove files, create directories and so on. an optional 'new.sources' - this is a "shar" file containing complete sources for a program. ALL pathnames are _absolute_ - this way you do not have to "cd" around the system, you should be able to apply all the patches while you are in /tmp (or /usr/tmp - wherever you have the most free space). Be sure that you have at least 40mb free on /usr before rebuilding the system - if you do not then building in stages will be necessary. Part 19 contains the detailed instructions for rebuilding the system _after_ the previous 18 patches have been applied. The patches (#158 thru #175) should be applied in order following the directions in each part. DO NOT recompile anything once the patching has begun until requested to do so in part 19. Many of the system include files are modified and the object file format is being changed - recompilation will not be possible until the transformation of the system and object libraries is complete. AT A MINIMUM you will want to back up the following files (unless you have a known good backup already made) in case you need to recompile something before part 19 is done: /bin/ar /bin/ld /bin/nm /bin/as /usr/bin/ranlib /lib/c0 /lib/crt0.o /lib/mcrt0.o /lib/libc.a /bin/nm /usr/include/*.h /usr/include/sys/*.h In part 19 there is a *complete* list of all files affected (all 336 of them) - you may wish to back those up also. And now the common header ('boilerplate') is over (at last ;-)), let the installation guide begin. As always, the complete 2.11BSD updates are available via anonymous FTP to 'ftp.iipo.gtegsc.com' in the directory /pub/2.11BSD ========== #168 (Part #11 of 19) This part updates the following files. BACK THESE UP if you have any worries about the proceedure or do not have a bootable backup already at hand. /usr/src/local/mkovmake/mkovmake.c /usr/src/local/welcome/welcome.c /usr/src/local/less/makefile 0) Be in a temp directory ("cd /tmp" or "cd /usr/tmp") 1) Save the following shar archive to a file (/tmp/168 for example) 2) Unpack the archive: sh 168 3) Patch the files: patch -p0 < patchfile 4) rm 168 patchfile Part 11 of 19 is done. DO NOT rebuild or compile _anything_ at this point! ====== cut here #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # patchfile # This archive created: Fri Feb 4 23:00:51 1994 export PATH; PATH=/bin:/usr/bin:$PATH if test -f 'patchfile' then echo shar: "will not over-write existing file 'patchfile'" else sed 's/^X//' << \SHAR_EOF > 'patchfile' X*** /usr/src/local/mkovmake/mkovmake.c.old Fri Oct 19 15:18:19 1990 X--- /usr/src/local/mkovmake/mkovmake.c Sat Jan 8 21:08:12 1994 X*************** X*** 5,10 **** X--- 5,11 ---- X * v1.2: -o option renamed to -f, new -o and -l options for output name X * and libraries. X * v2: Overlay optimizer! Many changes, practically different program. X+ * v3: Modified to use new object file (strings table) format. 1/9/94 - sms. X */ X X #include X*************** X*** 12,30 **** X #include X #include X #include X X! #ifndef NOVL X! #define NOVL 15 X! #endif NOVL X! #define N_NAMELENGTH 8 X #define IN_BASE 0 X #define UNCOMMITTED -1 X- #define strequ !strcmp X- #define strnequ !strncmp X #define isobj(name) name[0] && name[0] != '-' && rindex(name,'.') \ X! && strequ(rindex(name,'.'), ".o") X #define isarc(name) name[0] && name[0] != '-' && rindex(name,'.') \ X! && strequ(rindex(name,'.'), ".a") X X struct modstruct X { X--- 13,27 ---- X #include X #include X #include X+ #include X X! #define MAXSYMLEN 32 X #define IN_BASE 0 X #define UNCOMMITTED -1 X #define isobj(name) name[0] && name[0] != '-' && rindex(name,'.') \ X! && !strcmp(rindex(name,'.'), ".o") X #define isarc(name) name[0] && name[0] != '-' && rindex(name,'.') \ X! && !strcmp(rindex(name,'.'), ".a") X X struct modstruct X { X*************** X*** 194,200 **** X for (i = 1, n = 0; i < argc; ++i) X { X for (j = 1; j < i; ++j) X! if (strequ(argv[i], argv[j])) X break; X if (argv[i][0] == '-' && argv[i][1] == 'Z') X ovinit = UNCOMMITTED; X--- 191,197 ---- X for (i = 1, n = 0; i < argc; ++i) X { X for (j = 1; j < i; ++j) X! if (!strcmp(argv[i], argv[j])) X break; X if (argv[i][0] == '-' && argv[i][1] == 'Z') X ovinit = UNCOMMITTED; X*************** X*** 442,448 **** X { X while(**youhave) X { X! if (strnequ(*ineed, *youhave++, N_NAMELENGTH)) X { X ++friendly; X break; X--- 439,445 ---- X { X while(**youhave) X { X! if (!strcmp(*ineed, *youhave++)) X { X ++friendly; X break; X*************** X*** 495,514 **** X getnames(n) X int n; X { X! struct exec exp; X struct nlist namentry; X! FILE *obj; X! long offset; X int nundf, ntext; X X if ((obj = fopen(module[n].name,"r")) == NULL) X { X fprintf(stderr, "mkovmake: cannot open %s.\n", module[n].name); X exit(8); X } X! fseek(obj, 0L, 0); X! fread((char *)&exp, 1, sizeof(struct exec), obj); X! module[n].text = exp.a_text; X if (!optimize) X { X fclose(obj); X--- 492,512 ---- X getnames(n) X int n; X { X! struct xexec exp; X struct nlist namentry; X! FILE *obj, *strfp = NULL; X! off_t stroff; X int nundf, ntext; X+ char name[MAXSYMLEN + 2]; X X+ bzero(name, sizeof (name)); X if ((obj = fopen(module[n].name,"r")) == NULL) X { X fprintf(stderr, "mkovmake: cannot open %s.\n", module[n].name); X exit(8); X } X! fread((char *)&exp, 1, sizeof(exp), obj); X! module[n].text = exp.e.a_text; X if (!optimize) X { X fclose(obj); X*************** X*** 515,526 **** X return(0); X } X X! offset = (long)sizeof(struct exec) + ((long)exp.a_text+exp.a_data)*2; X! fseek(obj, offset, 0); X X ntext = nundf = 0; X! while (fread((char *)&namentry, sizeof(struct nlist), 1, obj) == 1) X { X if (namentry.n_type & N_EXT) X { X switch (namentry.n_type&N_TYPE) X--- 513,525 ---- X return(0); X } X X! fseek(obj, N_SYMOFF(exp), L_SET); X X ntext = nundf = 0; X! while (fread((char *)&namentry, sizeof(namentry), 1, obj) == 1) X { X+ if (feof(obj) || ferror(obj)) X+ break; X if (namentry.n_type & N_EXT) X { X switch (namentry.n_type&N_TYPE) X*************** X*** 539,554 **** X module[n].undfnames = (char **) malloc(++nundf * 2); X if (!module[n].textnames || !module[n].undfnames) X { X! nosyms: fprintf(stderr, "mkovmake: not enough memory for symbols list.\n"); X fprintf(stderr, "mkovmake: can't optimize.\n"); X optimize = 0; X fclose(obj); X return(1); X } X X ntext = nundf = 0; X! fseek(obj, offset, 0); X! while (fread((char *)&namentry, sizeof(struct nlist), 1, obj) == 1) X { X if (namentry.n_type & N_EXT) X { X--- 538,557 ---- X module[n].undfnames = (char **) malloc(++nundf * 2); X if (!module[n].textnames || !module[n].undfnames) X { X! nosyms: fprintf(stderr, "mkovmake: out of memory for symbols list.\n"); X fprintf(stderr, "mkovmake: can't optimize.\n"); X optimize = 0; X fclose(obj); X+ if (strfp) X+ fclose(strfp); X return(1); X } X X+ strfp = fopen(module[n].name, "r"); X ntext = nundf = 0; X! fseek(obj, N_SYMOFF(exp), L_SET); X! stroff = N_STROFF(exp); X! while (fread((char *)&namentry, sizeof(namentry), 1, obj) == 1) X { X if (namentry.n_type & N_EXT) X { X*************** X*** 557,579 **** X case N_UNDF: X if (!namentry.n_value) X { X if (!(module[n].undfnames[nundf] X! = malloc(N_NAMELENGTH))) X goto nosyms; X! strncpy(module[n].undfnames[nundf++], X! namentry.n_name, N_NAMELENGTH); X if (listnames) X! pname(n,namentry.n_name,0); X } X break; X case N_TEXT: X! if (!(module[n].textnames[ntext] X! = malloc(N_NAMELENGTH))) X goto nosyms; X! strncpy(module[n].textnames[ntext++], X! namentry.n_name, N_NAMELENGTH); X if (listnames) X! pname(n,namentry.n_name,1); X break; X } X } X--- 560,586 ---- X case N_UNDF: X if (!namentry.n_value) X { X+ fseek(strfp, X+ stroff + namentry.n_un.n_strx, X+ L_SET); X+ fread(name, sizeof (name), 1, strfp); X if (!(module[n].undfnames[nundf] X! = strdup(name))) X goto nosyms; X! nundf++; X if (listnames) X! pname(n, name, 0); X } X break; X case N_TEXT: X! fseek(strfp, stroff + namentry.n_un.n_strx, X! L_SET); X! fread(name, sizeof (name), 1, strfp); X! if (!(module[n].textnames[ntext]= strdup(name))) X goto nosyms; X! ntext++; X if (listnames) X! pname(n,name,1); X break; X } X } X*************** X*** 581,586 **** X--- 588,594 ---- X module[n].undfnames[nundf] = ""; X module[n].textnames[ntext] = ""; X fclose(obj); X+ fclose(strfp); X return(0); X } X X*************** X*** 587,603 **** X /* X * pname(n,s,t) X * prints global Text(t=1) and Undf(t=0) name s encountered in module n. X- * (takes care of possible lack of null terminator) X */ X pname(n,s,t) X int n,t; X char *s; X { X- char buf[N_NAMELENGTH+1]; X- strncpy(buf, s, N_NAMELENGTH); X- buf[N_NAMELENGTH] = 0; X if (t) X! fprintf(stderr, "%s: T %s\n", module[n].name, buf); X else X! fprintf(stderr, "%s: U %s\n", module[n].name, buf); X } X--- 595,607 ---- X /* X * pname(n,s,t) X * prints global Text(t=1) and Undf(t=0) name s encountered in module n. X */ X pname(n,s,t) X int n,t; X char *s; X { X if (t) X! fprintf(stderr, "%s: T %s\n", module[n].name, s); X else X! fprintf(stderr, "%s: U %s\n", module[n].name, s); X } X*** /usr/src/local/welcome/welcome.c.old Wed Jan 6 21:01:29 1993 X--- /usr/src/local/welcome/welcome.c Fri Dec 31 22:42:51 1993 X*************** X*** 3,12 **** X #include "cpu.h" X #include X X- #ifdef PDP X- #include X- #endif X- X extern struct nlist namelist[]; X extern struct nlist nl[]; X X--- 3,8 ---- X*** /usr/src/local/less/makefile.old Sat Jan 23 17:49:37 1993 X--- /usr/src/local/less/makefile Sun Jan 16 20:34:29 1994 X*************** X*** 139,148 **** X--- 139,150 ---- X X install_less: less X for f in $(INSTALL_LESS); do rm -f $$f; cp less $$f; done X+ chmod 751 $(INSTALL_LESS) X touch install_less X X install_help: less.help X for f in $(INSTALL_HELP); do rm -f $$f; cp less.help $$f; done X+ chmod 444 $(INSTALL_HELP) X touch install_help X X install_man: $(MANUAL) SHAR_EOF fi exit 0 # End of shell archive