A. Dolenc
A. Lemmke
[+]
and
D. Keppel
[+]
A few words about the intended audience before we begin. This document is mainly for those who have never ported a program to another platform -- a specific hardware and software environment -- and, evidently, for those who plan to write large systems which must be used across different vendor machines.
If you have done some porting before you may not find the information herein very useful.
We suggest that [Can89] be read in conjunction with this document[+]. Submitters to the News group comp.lang.c have repeatedly recommended [Hor90, Koe89][+].
Disclaimer: The code fragments presented herein are intended to make applications ``more'' portable, meaning that they may fail with some compilers and/or environments.
This file can be obtained via anonymous ftp from sauna.hut.fi [130.233.251.253] in ~ftp/pub/CompSciLab/doc. The files portableC.tex, portableC.bib and portableC.ps.Z are the LaTeX, BibTeX and the compressed PostScript, respectively.
The aim of this document is to collect the experience of several people who have had to write and/or port programs in C to more than one platform.
In order to keep this document within reasonable bounds we must restrict ourselves to programs which must execute under Unix-like operating systems and those which implement a reasonable Unix-like environment. The only exception we will consider is VMS.
A wealth of information can be obtained from programs which have been written to run on several platforms. This is the case of publicly available software such as developed by the Free Software Foundation and the MIT X Consortium.
When discussing portability one focuses on two issues:
We include in our discussions the standardization efforts of the language and the environment. Special attention will be given to floating-point representations and arithmetic, to limitations of specific compilers, and to VMS.
Our main focus will be boiler-plate problems. System programming[+] and twisted code associated with bizarre interpretations of [X3J88] - henceforth refered to as the Standard - will not be extensively covered in this document[+].
All standards have a good and an evil side. Due to the nature of this document we are forced to focus our attention on the later.
The American National Standards Institute (ANSI) has recently approved of a standard for the C programming language [X3J88]. The Standard concentrates on the syntax and semantics of the language and specifies a minimum environment (the name and contents of some header files and the specification of some run-time library functions).
Copies of the ANSI C Standard can be obtained from the following address:
American National Standards Institute
Sales Department
1430 Broadway
New York, NY 10018
(Voice) (212) 642-4900
(Fax) (212) 302-1286
We first bring to attention the fact that the Standard states some environmental limits. These limits are lower bounds, meaning that a correct (compliant) compiler may refuse to compile an otherwise correct program which exceeds one of those limits[+].
Below are the limits which we judge to be the most important. The ones related to the preprocessor are listed first.
It is really unfortunate that some of these limits may force a programmer to code in a less elegant way. We are of the opinion that the remaining limits stated in the Standard can usually be obeyed if one follows ``good'' programming practices.
However, these limits may break programs which generate C code such as compiler-compilers and many C++ compilers.
The following are examples of unspecified and undefined behaviour:
The list is long. One of the main reasons for explicitly defining what is not covered by the Standard is to allow the implementor of the C environment to make use the most efficient alternative.
The objective of the POSIX working group P1003.1 is to define a common interface for UNIX. Granted, the ANSI C standard does specify the contents of some header files and the behaviour of some library functions but it falls short of defining a usefull environment. This is the task of P1003.1.
We do not know how far P1003.1 addresses the problems presented in this document as at the moment we lack proper documentation. Hopefully, this will be corrected in a future release of this document.
Preprocessors may present different behaviour in the following:
#define D define #D this that
The Standard does not allow such a syntax (see section 3.8.3 §20 in [X3J88]).
The #pragma directive should pose no problems even to old preprocessors if it comes indented[+]. Furthermore, it is advisable to enclose them with #ifdef's in order to document under which platform they make sense:
#ifdef <platform-specific-symbol> #pragma ... #endif
#ifdef __STDC__ # define GLUE(a,b) a##b #else # define GLUE(a,b) a/**/b #endif
If needed, one could define similar macros to GLUE several arguments [+].
#ifdef __STDC__ # define MAKESTRING(s) # s #else # define MAKESTRING(s) "s" #endif
There are good publicly available preprocessors which are ANSI C compliant. One such preprocessor is the one distributed with the X Window System developed by the MIT X Consortium.
Take note of #pragma directives which alter the semantics of the program and consider the case when they are not recognized by a particular compiler. Evidently, if the behaviour of the program relies on their correct interpretation then, in order for the program to be portable, all target platforms must recognize them properly.
Finally, we must add that the Standard has fortunately included a #error directive with obvious semantics. Indent the #error since old preprocessors do not recognize it.
The syntax defined in the Standard is a superset of the one defined in K&R. It follows that if one restricts oneself to the former there should be no problems with an ANSI C compliant compiler. The Standard extends the syntax with the following:
We encourage the use of the reserved words const and volatile since they aid in documenting the code. It is useful to add the following to one's header files if the code must be compiled by an non-conforming compiler as well:
#ifndef __STDC__ # define const # define volatile #endif
However, one must then make sure that the behaviour of the application does not depend on the presence of such keywords.
The syntax does not pose any problem with regard to interpretation because it can be defined precisely. However, programming languages are always described using a natural language, e.g. English, and this can lead to different interpretations of the same text.
Evidently, [KR78] does not provide an unambiguous definition of the C language otherwise there would have been no need for a standard. Although the Standard is much more precise, there is still room for different interpretations in situations such as f(p=&a, p=&b, p=&c). Does this mean f(&a,&b,&c) or f(&c,&c,&c)? Even ``simple'' cases such as a[i] = b[i++] are compiler-dependent [Can89].
As stated in the Introduction we would like to exclude such topics.
The reader is instead directed to the USENET news group comp.std.c
or comp.lang.c
where such discussions take place and from where the above example
was taken. The Journal of C Language Translation[+] could, perhaps,
be a good reference. Another possibility is to obtain a clarification
from the Standards Committee and the address is:
X3 Secretariat, CBEMA
311 1st St NW Ste 500
Washington DC, USA
A long time ago (1969), Unix said ``papa'' for the first time at AT&T (then called Bell Laboratories, or Ma Bell for the intimate) on a PDP-11. Everyone liked Unix very much and its widespread use we see today is probably due to the relative simplicity of its design and of its implementation (it is written, of course, mostly in C).
However, these facts also contributed for each one to develop their own dialect. In particular, the University of Berkeley at California distribute the so-called BSD[+] Unix whereas AT&T distribute (sell) System V Unix. All other vendors are descendants of one of these major dialects.
The differences between these two major flavours should not upset most application programs. In fact, we would even say that most differences are just annoying.
BSD Unix has an enhanced signal handling capability and implements sockets. However, all Unix flavours differ significantly in their raw i/o interface (that is, ioctl system call) which should be avoided if possible.
The reader interested in knowing more about the past and future of Unix can consult [Man89, Int90].
Many useful system header files are in different places in different systems or they define different symbols. We will assume henceforth that the application has been developed on a BSD-like Unix and must be ported to a System V-like Unix or VMS or an Unix-like system with header files which comply to the Standard.
In the following sections, we show how to handle the most simple cases which arise in practice. Some of the code which appears below was derived from the header file Xos.h which is part of the X Window System distributed by MIT. We have added changes, e.g. to support VMS.
Many header files are unprotected in many systems, notably those derived from BSD version 4.2 and earlier. By unprotected we mean that an attempt to include a header file more than once will either cause compilation errors (e.g. due to recursive includes) or, in some implementations, warnings from the preprocessor stating that symbols are being redefined. It is good practice to protect header files.
They provide the same functionality in all systems except that some symbols must be renamed.
#ifdef SYSV # define _ctype_ _ctype # define toupper _toupper # define tolower _tolower #endif
Note however that the definitions in <ctype.h> are not portable across character sets.
Many files which a BSD systems expects to find in the sys directory are placed in /usr/include in System V. Other systems, like VMS, do not even have a sys directory[+].
The symbols used in the open function call are defined in different header files in both types of systems:
#ifdef SYSV # include <fcntl.h> #else # include <sys/file.h> #endif
The semantics of the error number may differ from one system to another and the list may differ as well (e.g. BSD systems have more error numbers than System V). Some systems, e.g. SunOS, define the global symbol errno which will hold the last error detected by the run-time library. This symbol is not available in most systems, although the Standard requires that such a symbol be defined (see section 4.1.3 of [X3J88]).
The most portable way to print error messages is to use perror.
System V has more definitions in this header file than BSD-like systems. The corresponding library has more functions as well. This header file is unprotected under VMS and Cray, and that case we must do-it-ourselves:
#if defined(CRAY) || defined(VMS) # ifndef __MATH__ # define __MATH__ # include <math.h> # endif #endif
Some systems cannot be treated as System V or BSD but are really a special case, as one can see in the following:
#ifdef SYSV #ifndef SYSV_STRINGS # define SYSV_STRINGS #endif #endif #ifdef _STDH_ /* ANSI C Standard header files */ #ifndef SYSV_STRINGS # define SYSV_STRINGS #endif #endif #ifdef macII #ifndef SYSV_STRINGS # define SYSV_STRINGS #endif #endif #ifdef vms #ifndef SYSV_STRINGS # define SYSV_STRINGS #endif #endif #ifdef SYSV_STRINGS # include <string.h> # define index strchr # define rindex strrchr #else # include <strings.h> #endif
As one can easily observe, System V-like Unix systems use different names for index and rindex and place them in different header files. Although VMS supports better System V features it must be treated as a special case.
When using time.h one must also include types.h. The following code does the trick:
#ifdef macII # include <time.h> /* on a Mac II we need this one as well */ #endif #ifdef SYSV # include <time.h> #else # ifdef vms # include <time.h> # else # ifdef CRAY # ifndef __TYPES__ /* it is not protected under CRAY */ # define __TYPES__ # include <sys/types.h> # endif # else # include <sys/types.h> # endif /* of ifdef CRAY */ # include <sys/time.h> # endif /* of ifdef vms */ #endif
The above is not sufficient in order for the code to be portable since the structure which defines time values is not the same in all systems. Different systems have vary in the way time_t values are represented. The Standard, for instance, only requires that it be an arithmetic type. Recognizing this difficulty, the Standard defines a function called difftime to compute the difference between two time values of type time_t, and mktime which takes a string and produces a value of type time_t.
In some systems the definitions in both header files are contradictory. For instance, the following will produce compilation errors under VMS[+]:
#include <varargs.h> #include <stdio.h>
This is because <stdio.h> includes <stdarg.h> which in turn redefines all the symbols (va_start, va_end, etc.) in <varargs.h>. The solution we adopt is to always include <varargs.h> last and not define in the same module functions which use <varargs.h> and functions which use the ellipsis notation.
In other words, it is not that you should not use them but be careful if you do. Furthermore, the behaviour of a longjmp invoked from a nested signal handler[+] is undefined.
Finally, the symbols _setjmp and _longjmp are only defined under SunOS, BSD, and HP-UX.
In practice, much too frequently one runs into several, unstated compiler limitations:
To say that the implementation of numerical algorithms which exhibit the same behaviour across a wide variety of platforms is difficult is an understatement. This section provides very little help but we hope it is worth reading. Any additional suggestions and information is very much appreciated as we would like to expand this section.
One problem when writing numerical algorithms is obtaining machine constants. Typical values one needs are:
On Sun's they can be obtained in <values.h>. The ANSI C Standard recommends that such constants be defined in the header file <float.h>.
Sun's and standards apart, these values are not always readily available, e.g. in Tektronix workstations running UTek. One solution is to use a modified version of a program which can be obtained from the network called machar. Machar is described in [Cod88] and can obtained by anonymous ftp from the netlib[+].
It is straightforward to modify the C version of machar to generate a C preprocessor file which can be included directly by C programs.
There is also a publicly available program called config.c which attempts to determine many properties of the C compiler and machine that it is run on. This program was submitted to comp.sources.misc[+].
In the days of K&R[KR78] one was ``encouraged'' to use
float and double
interchangeably[+] since all expressions with
such data types where always evaluated using the double representation
- a real nightmare for those implementing efficient numerical algorithms
in C. This rule applied, in particular, to floating-point arguments and
for most compiler around it does not matter whether one defines the argument
as float or double.
According to the ANSI C Standard such programs will continue to exhibit the same behaviour as long as one does not prototype the function. Therefore, when prototyping functions make sure the prototype is included when the function definition is compiled so the compiler can check if the arguments match.
Be careful when using the == and != operators when comparing
floating types. Expressions such as
if (float_expr1 == float_expr2)
will seldom be satisfied due to rounding errors.
To get a feeling about rounding errors, try evaluating the following
expression using your favourite C compiler[KM86]:
(image not reproduced)
Most computers will produce zero regardless if one uses float or double. Although the absolute error is large, the relative error is quite small and probably acceptable for many applications.
It is rather better to use expressions such as (image not reproduced) or (image not reproduced) (if (image not reproduced)), where 0 < K < 1 is a function of:
Other possibilities exist and the choice depends on the application.
The development of reliable and robust numerical algorithm is a very difficult undertaking. Methods for certifying that the results are correct within reasonable bounds must usually be implemented. A reference such as [PFTV88] is always useful.
Floating-point exceptions (overlow, underflow, division by zero, etc) are not signaled automatically in some systems. In that case, they must be explicitly enabled.
Always enable floating-point exceptions since they may be an indication that the method is unstable. Otherwise, one must be sure that such events do not affect the output.
In this section we will report some common problems encountered when porting a C program to a VMS environment and which we have not mentioned in the previously.
Under VMS one can use two flavours of command interpreters: DCL and DEC/Shell. The syntax of file specifications under DCL differs significantly from the Unix syntax.
Some C run-time library functions in VMS which take file specifications as arguments or return file specifications to the caller will accept an additional argument indicating which syntax is preferred. It is useful to use these run-time library functions via macros as follows:
#ifdef VMS #ifndef VMS_CI /* Which Command Interpreter flavour to use */ # define VMS_CI 0 /* 0 for DEC/Shell, 1 for DCL */ #endif # define Getcwd(buff,siz) getcwd((buff),(siz),VMS_CI) # define Getname(fd,buff) getname((fd),(buff),VMS_CI) # define Fgetname(fp,buff) fgetname((fp),(buff),VMS_CI) #else # define Getcwd(buff,siz) getcwd((buff),(siz)) # define Getname(fd,buff) getname((fd),(buff)) # define Fgetname(fp,buff) fgetname((fp),(buff)) #endif /* of ifdef VMS */
More pitfalls await the unaware who accept file specifications from the user or take them from environment values (e.g. using the getenv function).
The easiest solution is to force the linker to add the module using the /INCLUDE command modifier. Of course, there is the possibility that the command line may exceed 256 characters...(*sigh*).
long int str[2] = {0x41424344, 0x0}; /* ASCII ``ABCD'' */ printf (``%s\n'', (char *)&str);
A little-endian (e.g. VAX) will print ``DCBA'' whereas a big-endian (e.g. MC68000 microprocessors) will print ``ABCD''.
int *p = (int *) malloc(...); ... free(p);
This code may malfunction in architectures where int* and char* have different representations because free expects a pointer of the latter type.
A null pointer of a given type will always convert to a null pointer of another type if implicit or explicit conversion is performed. (See item 4 above.)
The contents of a null pointer may be anything the implementor wishes and dereferencing it may cause strange things to happen...
We are grateful for the help of Antti Louko (HTKK/Lsk) and Jari Helminen (HTKK) in commenting and correcting a previous draft of this document. We thank all the contributors of USENET News groups comp.std.c and comp.lang.c from where we have taken a lot of information. Some information within was obtained from [Hew88].
DEC, PDP-11, VMS and VAX are trademarks of Digital Equipment Corporation.
HP is a trademark of Hewlett-Packard, Inc.
MC68000 is a trademark of Motorola.
PostScript is a registred trademark of Adobe Systems, Inc.
Sun is a trademark of Sun Microsystems, Inc.
UNIX is a registred trademark of AT&T.
X Window System is a trademark of MIT.
Notes On Writing Portable Programs In C
(June 1990, 5th Revision)
This document was generated using the LaTeX2HTML translator Version 96.1-h (September 30, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html portableC.tex.
The translation was initiated by Christopher Lott on Thu Mar 13 13:33:05 EST 1997
Document processed by Christopher Lott, Contact me