The development of mm 0.91 and ccmd 2.1 [28-Jun-1994] Nelson H. F. Beebe Center for Scientific Computing Department of Mathematics University of Utah Salt Lake City, UT 84112 USA Email: beebe@math.utah.edu (Internet) ======================================================================== COPYRIGHTS, AND ALL THAT The copyright on all of the work for mm 0.91 and ccmd 2.1 is hereby assigned by the developer, Nelson H. F. Beebe , to The Trustees of Columbia University in the City of New York, with grateful thanks to * Mark Crispin and his many Arpanet collaborators who produced the original mm in DEC-20 assembly code, * the TOPS-20 developers and users who originally designed and implemented the COMND% JSYS parser in TOPS-20, * Andrew Lowry and Howie Kaye who wrote ccmd, the C reimplementation of the wonderful DEC TOPS-20 COMND% JSYS command parser, * Frank da Cruz, Christine M. Gianone, and all of the other Kermit contributors. for the past 16 years of daily use of mm and kermit. They have both handled umpteen gigabytes of data for me. ======================================================================== Version 2.1 of ccmd and 0.91 of mm represents a major step forward for these packages, including the conversion of the code for compilation under strict ANSI/ISO Standard C compilers. The changes from mm 0.90 and ccmd 2.0 are too numerous to usefully include as rcsdiff patches to that version; thus, the new versions will have a brand new source distribution after it has been tested on numerous machines. The work began after the initial port of version mm 0.90 and ccmd 2.0 to the DEC Alpha running OSF/1 version 2.0. That work uncovered a dozen or more bugs, mostly due to the lack of interprocedure type checking on the compilers available at the time these packages were written in the mid 1980s, but also do to a coding style that can be described as `vintage K&R', notably, the omission of function and function argument types, when they were intended to be int types. This practice led in several instances to passing on of arguments that were really pointers: while this works on many machines, it fails on those for which int and void* do not have the same size (e.g. DEC Alpha, SGI Indigo MIPS R4000 (IRIX 5.0 or later), and Intel 80xxx). The README.OSF1 file record fixes to these bugs, some of which occurred many times: (1) FILE*, instead of char*, argument to unlink(). (2) missing initial FILE* argument in fprint(). (3) non-comment non-blank text following #endif and #else. (4) int argument, instead of char*, argument to fopen(). (5) use of functions before definition or declaration. (6) function declaration as "type functionname()" and later definition as "static type functionname()". (7) missing argument in signal handler. (8) wrong argument type to time(). (9) extra */ on preprocess #endif comment. (10) duplication of library routine (basename()) with different argument types. (11) wrong argument type in lookupzone() in dt.c; on the DEC Alpha, int is 32-bit, while pointers are 64-bit, values. The further development of mm 0.91 and ccmd 2.1 repaired a great many more, typically wrong numbers or types of arguments, and non-void functions returning without defining a function value. Although I have been running mm under Solaris 2.x for 18 months, I have not found it as reliable as the version under SunOS 4.1.x. In particular, there is nasty bug that results in mm going into an infinite CPU-hogging loop with the curious side effect that the process cannot be killed. The only solution is to reboot the computer. Inasmuch as our department plans to upgrade 50 SunOS 4.1.3 systems to Sun Solaris 2.3 in the summer of 1994, the existence of this bug is particularly troublesome; we would probably choose to run the SunOS 4.1.3 version instead just to avoid it. This bug has been seen within a few minutes of starting mm, and also after 3 weeks or more of successful use, which has made it frustrating to track down. I ran mm under dbx for weeks, but each time the loop happened, either dbx could not regain control, or when it did, the stack trace and local variable dumps were uninformative. Given the large number of errors that the new development has uncovered and fixed, I have some hope now that that particular bug has either been eliminated, or will be catchable, and I expect to run mm under dbx on several architectures during a suitable test period. In order to root out further instances of such bugs, over the course of 4 days, I made dozens of passes over the source code of ccmd and mm, using Sun Solaris 2.3 lint and cc 3.0, and gcc 2.5.8, turning on all warnings supported by those compilers. It rapidly became evident that the only way to do this properly was to provide the compilers with full information about function arguments and types, through Standard C and C++ function prototype declarations. Fortunately, the public-domain mkptypes utility makes it easy to generate these prototypes automatically from C source code, and I followed modern programming practice of placing function prototypes for global functions in header files to eliminate, or at least minimize, prototype duplication. lint fortunately provides the necessary information to distinguish global functions from local ones; the latter have been declared static, reducing global namespace pollution. Sun's Solaris 2.x cscope utility for building a rapid-access database for C and C++ code was enormously helpful in tracking down function, variable, preprocessor symbol, and header file uses. ------------------------------ All extern declarations have been moved from function bodies to the file preamble, enhancing their visibility, and reducing duplication. ------------------------------ gcc identified all of the places where function types and function arguments were omitted, and the omissions have been rectified. Functions that don't return a value have been properly declared of type void, rather than int. ------------------------------ In order to enhance compiler optimization possibilities, and conform to modern coding practice, I also introduced the const modifier on char* and char** variables where possible. This took numerous passes over the source code, but with the help of gcc's warning messages, was completed reliably. I had to rewrite the body of one routine (cmxerr() in ccmd/ccmdio.c) to remove the need for temporary modification of a const string. Although most compilers will just issue a warning, Sun Solaris 2.x lint 3.0 views it as a fatal error, and the replacement code is marginally faster because it removes one level of function call. The old code has been preserved in the false branch of a preprocessor conditional. ------------------------------ gcc will produce several warnings of the type filename.c:27: warning: cast discards `const' from pointer target type These arise from functions that return char* pointers into const char* argument strings; strcpy() is the best-known example; there is no way do be strict about this in Standard C, and C++ designers are wrestling with a syntax to deal with it cleanly. They also arise from casting of constant strings in the mady fdb structures. Both of these situations are expected, and the warnings are harmless. ------------------------------ gcc will raise numerous complaints about local variables shadowing global variables; I simply ignore these, since the set of globals varies from machine to machine. I did draw the line at locals shadowing locals, e.g. int foo() { int n; bar(n); { int n; for (n = 0; n < 3; ++n) bar(n); } } Every such case was dealt with by renaming the nested local variable. ------------------------------ gcc and lint also raise warnings about arguments that are unused; these are either artifacts of older versions of ccmd and mm, or they arise because the function is a memory of a family of functions that are stored in tables, and must be called with identical arguments. There is one aberration that I have not attempted to repair: function indiract() in ccmd/stdact.c takes 4 arguments, instead of the 3 that all of the other parsing action functions take. This is a design flaw which could be easily remedied through introduction of a global variable to hold the fourth argument, or more cleanly, but also more laboriously, by passing a struct as a single argument. I have not done either of these, but instead, simply introduced a special type case, HANDLER_CAST4, to override the default function interpretation at the point of the two calls in ccmd/ccmd.c. ------------------------------ The compilers and lint detected many instances of dead code, such as local variables declared, or declared and assigned values, but then never used. I've eliminated them all, although in the case of larger code sections, such as unused functions, I've left them in the code bracketed by preprocessor lines like this: #if 0 ...dead code.. #endif ------------------------------ In ccmd, lint raised a great many warnings like these: bitwise operation on signed value possibly nonportable Most have been eliminated by the introduction of a new unsigned data type in ccmd/ccmd.h, flag_t, which is used for the command parser's bit flags. It is currently typedef'ed as "unsigned int". ------------------------------ In mm, there are 9 lint warnings function falls off bottom without returning value which arise because a function ends with a while(1) loop that contains a return statement; a smarter lint implementation would detect that this is legitimate, and be silent. ------------------------------ lint raises warnings about free() calls where the argument type does not match the library type (often char*, but strictly, void* in Standard C). These could be eliminated by explicit casts, but a cleaner solution would be to redefine free to include an appropriate cast: #define free(p) (free)((char*)(p)) In Standard C, the parentheses around (free) prevent recursive macro expansion; unfortunately, older preprocessors, and even the odd supposedly Standard C preprocessor, will go into an infinite loop with this form, so I have refrained from using it, and instead, just ignore lint and compiler warnings about argument types to free(). The same problem arises with the first argument of realloc(), and as with free(), it has been left unresolved. ------------------------------ In Standard C, memory lengths passed to library routines are of type size_t, which is required to be an unsigned type. However, older implementations have defined it as a signed type, and still older ones variously use int or long. This affects many library calls in ccmd and mm. Rather than use explicit casts, I recommend using a good Standard C compiler which will supply the appropriate casts at compile time, based on its having seen prototypes for all library functions used in ccmd and mm. ------------------------------ lint warns constant in conditional context about loops coded as "while (1) {...}". A better choice would be "for(;;) {...}", which eliminates the warning, and has the same effect; it might even produce tighter code. ------------------------------ A common practice in mm 0.90 and ccmd 2.0 was conditionals of the form if (a = b) {...} The problem with this is that it requires further study to determine whether the programmer intended assignment, and subsequent test for non-zero, or whether the code is wrong, and should have been an equality test: if (a == b) {...} gcc and lint warn about these, and I fixed several errors where == was meant, but = was entered. I even fixed one assignment written as "a == b;", changing it to the correct "a = b;". gcc recommends additional parentheses when the first form is wanted, but there is a better solution: C's comma operator. I've therefore rewritten almost all of the first form as if ((a = b, a)) {...} which makes it clear that assignment, then a test for non-zero, is wanted. The only places that I refrained from this were ones with side effects, such as if (*a++ = *b++) {...} ------------------------------ lint warnings of the form pointer cast may result in improper alignment can be safely ignored; they are a generic problem with lint implementations. ------------------------------ mm and ccmd have 5 functions that take variable numbers of arguments. The current implementation uses the Berkeley interface. This poses problems for type checking, and Standard C provides which the restriction that the first argument must always be explicitly typed, with ... marking remaining arguments. extern int printf(const char *, ...); lint doesn't know about the varargs.h style, and warns argument does not match remembered type: arg #1 I have not updated the code to use the new style, which would eliminate the lint warnings; that is work for the next version. All of the routines that take a variable number of arguments are easily identified in ccmd/ccmd.h and mm/extern.h by declarations like this: extern int sorry ARGS(()); /* NB: cannot represent variable number of arguments with old-style varargs.h interface */ The (()) string is distinctive, and occurs in only 5 places in the entire source code, in those two header files. ------------------------------ lint detects more than 350 external names that collide with others in their first 6 characters. I have not attempted to produce preprocessor symbols to map them to names of 6 or fewer characters, on the grounds that such severe limits have effectively disappeared, spurred on perhaps by C++'s name mangling that can require hundreds of unique characters in external names. ------------------------------ In mm/message.h, getmsg is #define'd to Getmsg, to avoid a conflict with a Sun library routine. getmsg() is defined in mm/send.c, and used there and in mm/sendcmds.c. ------------------------------ The return values of malloc() and realloc() are now always properly type cast; Standard C types them as void*, where the void* type is guaranteed to be as large as the pointer to any non-function data type. The private memory management functions dcalloc(), dfree(), dmalloc(), drealloc(), safe_free(), and safe_realloc() now deal with void*, rather than char*, data. In a pre-Standard C environment where void* is not recognized, the code is simply compiled with void redefined to char at compile time. ------------------------------ I ran mm under the Sun dbx 3.0 debugger with its "check -leaks" option to check for allocated, but unfreed, memory: this resulted in addition of several free() calls to eliminate some of the memory leaks. I also used dbx's "check -access" option to check for references to uninitialized variables, and as a consequence, made a source change in ccmd.c to initialize h->enabled in one place where it was missing. ------------------------------ The Makefiles have been updated to have machine-specific target names, avoiding the need to make source code changes of configuration files. ccmd has only a modest number of machine-dependencies which are easily handled by compile-time preprocessor symbol definitions. mm has many more; they are encapsulated in the s-xxx.h files, and one of those file is included in config.h by this code: #ifdef S_FILE #include S_FILE #else #include "s-sun50.h" #endif This form permits defining the s-xxx.h file as the value of S_FILE at compile time, so that config.h should not need editing. I did find however, that Sun's lint 3.0 does not handle the #include S_FILE form: it always included the other file. For development purposes, I made it default to s-sun50.h to match the system where most of the mm 0.91 and ccmd 2.1 development was carried out. ------------------------------ ======================================================================== FUTURE WORK (1) Extend ccmd to support 8-bit characters; the current version is based on 7-bit characters, with break table sizes and macros in ccmd/cmfncs.h set accordingly. In mm/send.c, characters are masked against 0x7f to strip the high order bit, and break tables in mm/address.c, mm/alias.c, mm/keywords.c, mm/parse.c, mm/parsemsg.c, mm/seq.c, and mm/sys-prof.c are initialized assumed 128 characters. (2) Convert to use stdarg.h instead of varargs.h. (3) Extended function headers to support compilation under K&R C, and Standard C and C++, possibly using the style #if STDC int foo(int a, const char *b) #else int foo(a,b) int a; const char *b; #endif { ... } where STDC is defined like this: #if defined(__STDC__) || defined(__cplusplus) || defined(c_plusplus) #define STDC 1 #else #define STDC 0 #endif or possibly using argument-count specific macros to allow code like this: int foo ARGS_2((a,b), int a, const char *b) { ... } with definitions like this: #if STDC #define ARGS_2(list,a,b) (a, b) #else #define ARGS_2(list,a,b) list a; b; #endif I made some exploratory attempts at C++ with copies of a couple of mm source files modified to use Standard C/C++ style function headers. C++ has some extra reserved words, including new, this, and delete, which are used in the mm and ccmd source code as variable names; this could be easily worked around with preprocessor definitions. More troublesome is the reuse of typedef names as variable names, which is illegal in C++, e.g. typedef struct mail_msg { headers *headers; /* list of unordered headers */ ... } Fixing this will require manual editing of the source code. Nevertheless, the strict type checking of C++ compilers has proved so useful in my other C code development that I routinely use them in preference to C compilers. It is much easier to fix bugs caught at compile time than at core dump time. (4) Redesign ccmd and rewrite from scratch in C++, with much more information hiding and visibility control, and a much simpler interface to the ccmd parsing facilities, which were largely copied intact from the DEC-20 assembly code implementation. (5) Convert ccmd/ccmd.guide to (La)TeXinfo for convenient typeset and online documentation.