This chapter introduces a small--but real--worked example, to illustrate some of the features, and highlight some of the pitfalls, of the GNU Autotools discussed so far. All of the source can be downloaded from the book's web page (5). The text is peppered with my own pet ideas, accumulated over a several years of working with the GNU Autotools and you should be able to easily apply these to your own projects. I will begin by describing some of the choices and problems I encountered during the early stages of the development of this project. Then by way of illustration of the issues covered, move on to showing you a general infrastructure that I use as the basis for all of my own projects, followed by the specifics of the implementation of a portable command line shell library. This chapter then finishes with a sample shell application that uses that library.

9.1.1 Project Directory Structure

Before starting to write code for any project, you need to decide on the directory structure you will use to organise the code. I like to build each component of a project in its own subdirectory, and to keep the configuration sources separate from the source code. The great majority of GNU projects I have seen use a similar method, so adopting it yourself will likely make your project more familiar to your developers by association.

The top level directory is used for configuration files, such as `configure' and `aclocal.m4', and for a few other sundry files, `README' and a copy of the project license for example.

Any significant libraries will have a subdirectory of their own, containing all of the sources and headers for that library along with a `Makefile.am' and anything else that is specific to just that library. Libraries that are part of a small like group, a set of pluggable application modules for example, are kept together in a single directory.

The sources and headers for the project's main application will be stored in yet another subdirectory, traditionally named `src'. There are other conventional directories your developers might expect too: A `doc' directory for project documentation; and a `test' directory for the project self test suite.

To keep the project top-level directory as uncluttered as possible, as I like to do, you can take advantage of Autoconf's `AC_CONFIG_AUX_DIR' by creating another durectory, say `config', which will be used to store many of the GNU Autotools intermediate files, such as install-sh. I always store all project specific Autoconf M4 macros to this same subdirectory.

So, this is what you should start with:

$ pwd ~/mypackage $ ls -F Makefile.am config/ configure.in lib/ test/ README configure* doc/ src/

9.1.2 C Header Files

There is a small amount of boiler-plate that should be added to all header files, not least of which is a small amount of code to prevent the contents of the header from being scanned multiple times. This is achieved by enclosing the entire file in a preprocessor conditional which evaluates to false after the first time it has been seen by the preprocessor. Traditionally, the macro used is in all upper case, and named after the installation path without the installation prefix. Imagine a header that will be intalled to `/usr/local/include/sys/foo.h', for example. The preprocessor code would be as follows:

#ifndef SYS_FOO_H #define SYS_FOO_H 1 ... #endif /* !SYS_FOO_H */

Apart from comments, the entire content of the rest of this header file must be between these few lines. It is worth mentioning that inside the enclosing ifndef, the macro SYS_FOO_H must be defined before any other files are #included. It is a common mistake to not define that macro until the end of the file, but mutual dependency cycles are only stalled if the guard macro is defined before the #include which starts that cycle(6).

If a header is designed to be installed, it must #include other installed project headers from the local tree using angle-brackets. There are some implications to working like this:

You must be careful that the names of header file directories in the source tree match the names of the directories in the install tree. For example, when I plan to install the aforementioned `foo.h' to `/usr/local/include/project/foo.h', from which it will be included using `#include <project/foo.h>', then in order for the same include line to work in the source tree, I must name the source directory it is installed from `project' too, or other headers which use it will not be able to find it until after it has been installed.
When you come to developing the next version of a project laid out in this way, you must be careful about finding the correct header. Automake takes care of that for you by using `-I' options that force the compiler to look for uninstalled headers in the current source directory before searching the system directories for installed headers of the same name.
You don't have to install all of your headers to `/usr/include' -- you can use subdirectories. And all without having to rewrite the headers at install time.

9.1.3 C++ Compilers

In order for a C++ program to use a library compiled with a C compiler, it is neccessary for any symbols exported from the C library to be declared between `extern "C" {' and `}'. This code is important, because a C++ compiler mangles(7) all variable and function names, where as a C compiler does not. On the other hand, a C compiler will not understand these lines, so you must be careful to make them invisible to the C compiler.

Sometimes you will see this method used, written out in long hand in every installed header file, like this:

#ifdef __cplusplus extern "C" { #endif ... #ifdef __cplusplus } #endif

But that is a lot of unnecessary typing if you have a few dozen headers in your project. Also the additional braces tend to confuse text editors, such as emacs, which do automatic source indentation based on brace characters.

Far better, then, to declare them as macros in a common header file, and use the macros in your headers:

#ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { # define END_C_DECLS } #else /* !__cplusplus */ # define BEGIN_C_DECLS # define END_C_DECLS #endif /* __cplusplus */

I have seen several projects that name such macros with a leading underscore -- `_BEGIN_C_DECLS'. Any symbol with a leading underscore is reserved for use by the compiler implementation, so you shouldn't name any symbols of your own in this way. By way of example, I recently ported the Small(8) language compiler to Unix, and almost all of the work was writing a Perl script to rename huge numbers of symbols in the compiler's reserved namespace to something more sensible so that GCC could even parse the sources. Small was originally developed on Windows, and the author had used a lot of symbols with a leading underscore. Although his symbol names didn't clash with his own compiler, in some cases they were the same as symbols used by GCC.

9.1.4 Function Definitions

As a stylistic convention, the return types for all function definitions should be on a separate line. The main reason for this is that it makes it very easy to find the functions in source file, by looking for a single identifier at the start of a line followed by an open parenthesis:

$ egrep '^[_a-zA-Z][_a-zA-Z0-9]*[ \t]*\(' error.c set_program_name (const char *path) error (int exit_status, const char *mode, const char *message) sic_warning (const char *message) sic_error (const char *message) sic_fatal (const char *message)

There are emacs lisp functions and various code analysis tools, such as ansi2knr (see section 9.1.6 K&R Compilers), which rely on this formatting convention, too. Even if you don't use those tools yourself, your fellow developers might like to, so it is a good convention to adopt.

9.1.5 Fallback Function Implementations

Due to the huge number of Unix varieties in common use today, many of the C library functions that you take for granted on your prefered development platform are very likely missing from some of the architectures you would like your code to compile on. Fundamentally there are two ways to cope with this:

Use only the few library calls that are available everywhere. In reality this is not actually possible because there are two lowest common denominators with mutually exclusive APIs, one rooted in BSD Unix (`bcopy', `rindex') and the other in SYSV Unix (`memcpy', `strrchr'). The only way to deal with this is to define one API in terms of the other using the preprocessor. The newer POSIX standard deprecates many of the BSD originated calls (with exceptions such as the BSD socket API). Even on non-POSIX platforms, there has been so much cross pollination that often both varieties of a given call may be provided, however you would be wise to write your code using POSIX endorsed calls, and where they are missing, define them in terms of whatever the host platform provides.

This approach requires a lot of knowledge about various system libraries and standards documents, and can leave you with reams of preprocessor code to handle the differences between APIS. You will also need to perform a lot of checking in `configure.in' to figure out which calls are available. For example, to allow the rest of your code to use the `strcpy' call with impunity, you would need the following code in `configure.in':

AC_CHECK_FUNCS(strcpy bcopy)

And the following preprocessor code in a header file that is seen by every source file:

#if !HAVE_STRCPY # if HAVE_BCOPY # define strcpy(dest, src) bcopy (src, dest, 1 + strlen (src)) # else /* !HAVE_BCOPY */ error no strcpy or bcopy # endif /* HAVE_BCOPY */ #endif /* HAVE_STRCPY */

Alternatively you could provide your own fallback implementations of function calls you know are missing on some platforms. In practice you don't need to be as knowledgable about problematic functions when using this approach. You can look in GNU libiberty(9) or Franзois Pinard's libit project(10) to see for which functions other GNU developers have needed to implement fallback code. The libit project is especially useful in this respect as it comprises canonical versions of fallback functions, and suitable Autoconf macros assembled from across the entire GNU project. I won't give an example of setting up your package to use this approach, since that is how I have chosen to structure the project described in this chapter.

Rather than writing code to the lowest common denominator of system libraries, I am a strong advocate of the latter school of thought in the majority of cases. As with all things it pays to take a pragmatic approach; don't be afraid of the middle ground -- weigh the options on a case by case basis.

9.1.6 K&R Compilers

K&R C is the name now used to describe the original C language specified by Brian Kernighan and Dennis Ritchie (hence, `K&R'). I have yet to see a C compiler that doesn't support code written in the K&R style, yet it has fallen very much into disuse in favor of the newer ANSI C standard. Although it is increasingly common for vendors to unbundle their ANSI C compiler, the GCC project (11) is available for all of the architectures I have ever used.

There are four differences between the two C standards:

ANSI C expects full type specification in function prototypes, such as you might supply in a library header file:
extern int functionname (const char *parameter1, size_t parameter 2);
The nearest equivalent in K&R style C is a forward declaration, which allows you to use a function before its corresponding definition:
extern int functionname ();
As you can imagine, K&R has very bad type safety, and does not perform any checks that only function arguments of the correct type are used.
The function headers of each function definition are written differently. Where you might see the following written in ANSI C:
int functionname (const char *parameter1, size_t parameter2) { ... }
K&R expects the parameter type declarations separately, like this:
int functionname (parameter1, parameter2) const char *parameter1; size_t parameter2; { ... }
There is no concept of an untyped pointer in K&R C. Where you might be used to seeing `void *' pointers in ANSI code, you are forced to overload the meaning of `char *' for K&R compilers.

Variadic functions are handled with a different API in K&R C, imported with `#include <varargs.h>'. A K&R variadic function definition looks like this:

int functionname (va_alist) va_dcl { va_list ap; char *arg; va_start (ap); ... arg = va_arg (ap, char *); ... va_end (ap); return arg ? strlen (arg) : 0; }

ANSI C provides a similar API, imported with `#include <stdarg.h>', though it cannot express a variadic function with no named arguments such as the one above. In practice, this isn't a problem since you always need at least one parameter, either to specify the total number of arguments somehow, or else to mark the end of the argument list. An ANSI variadic function definition looks like this:

int functionname (char *format, ...) { va_list ap; char *arg; va_start (ap, format); ... arg = va_arg (ap, char *); ... va_end (ap); return format ? strlen (format) : 0; }

Except in very rare cases where you are writing a low level project (GCC for example), you probably don't need to worry about K&R compilers too much. However, supporting them can be very easy, and if you are so inclined, can be handled either by employing the ansi2knr program supplied with Automake, or by careful use of the preprocessor.

Using ansi2knr in your project is described in some detail in section `Automatic de-ANSI-fication' in The Automake Manual, but boils down to the following:

Add this macro to your `configure.in' file:
AM_C_PROTOTYPES

Rewrite the contents of `LIBOBJS' and/or `LTLIBOBJS' in the following fashion:

# This is necessary so that .o files in LIBOBJS are also built via # the ANSI2KNR-filtering rules. Xsed='sed -e "s/^X//"' LIBOBJS=`echo X"$LIBOBJS"|\ [$Xsed -e 's/\.[^.]* /.\$U& /g;s/\.[^.]*$/.\$U&/']`

Personally, I dislike this method, since every source file is filtered and rewritten with ANSI function prototypes and declarations converted to K&R style adding a fair overhead in additional files in your build tree, and in compilation time. This would be reasonable were the abstraction sufficient to allow you to forget about K&R entirely, but ansi2knr is a simple program, and does not address any of the other differences between compilers that I raised above, and it cannot handle macros in your function prototypes of definitions. If you decide to use ansi2knr in your project, you must make the decision before you write any code, and be aware of its limitations as you develop.

For my own projects, I prefer to use a set of preprocessor macros along with a few stylistic conventions so that all of the differences between K&R and ANSI compilers are actually addressed, and so that the unfortunate few who have no access to an ANSI compiler (and who cannot use GCC for some reason) needn't suffer the overheads of ansi2knr.

The four differences in style listed at the beginning of this subsection are addressed as follows:

The function protoype argument lists are declared inside a PARAMS macro invocation so that K&R compilers will still be able to compile the source tree. PARAMS removes ANSI argument lists from function prototypes for K&R compilers. Some developers continue to use __P for this purpose, but strictly speaking, macros starting with `_' (and especially `__') are reserved for the compiler and the system headers, so using `PARAMS', as follows, is safer:
#if __STDC__ # ifndef NOPROTOS # define PARAMS(args) args # endif #endif #ifndef PARAMS # define PARAMS(args) () #endif
This macro is then used for all function declarations like this:
extern int functionname PARAMS((const char *parameter));
With the PARAMS macro is used for all function declarations, ANSI compilers are given all the type information they require to do full compile time type checking. The function definitions proper must then be declared in K&R style so that K&R compilers don't choke on ANSI syntax. There is a small amount of overhead in writing code this way, however: The ANSI compile time type checking can only work in conjunction with K&R function definitions if it first sees an ANSI function prototype. This forces you to develop the good habit of prototyping every single function in your project. Even the static ones.
The easiest way to work around the lack of void * pointers, is to define a new type that is conditionally set to void * for ANSI compilers, or char * for K&R compilers. You should add the following to a common header file:
#if __STDC__ typedef void *void_ptr; #else /* !__STDC__ */ typedef char *void_ptr; #endif /* __STDC__ */

The difference between the two variadic function APIs pose a stickier problem, and the solution is ugly. But it does work. FIrst you must check for the headers in `configure.in':

AC_CHECK_HEADERS(stdarg.h varargs.h, break)

Having done this, add the following code to a common header file:

#if HAVE_STDARG_H # include <stdarg.h> # define VA_START(a, f) va_start(a, f) #else # if HAVE_VARARGS_H # include <varargs.h> # define VA_START(a, f) va_start(a) # endif #endif #ifndef VA_START error no variadic api #endif

You must now supply each variadic function with both a K&R and an ANSI definition, like this:

int #if HAVE_STDARG_H functionname (const char *format, ...) #else functionname (format, va_alist) const char *format; va_dcl #endif { va_alist ap; char *arg; VA_START (ap, format); ... arg = va_arg (ap, char *); ... va_end (ap); return arg : strlen (arg) ? 0; }

9.2 A Simple Shell Builders Library

An application which most developers try their hand at sooner or later is a Unix shell. There is a lot of functionality common to all traditional command line shells, which I thought I would push into a portable library to get you over the first hurdle when that moment is upon you. Before elabourating on any of this I need to name the project. I've called it sic, from the Latin so it is, because like all good project names it is somewhat pretentious and it lends itself to the recursive acronym sic is cumulative.

The gory detail of the minutae of the source is beyond the scope of this book, but to convey a feel for the need for Sic, some of the goals which influenced the design follow:

Sic must be very small so that, in addition to being used as the basis for a full blown shell, it can be linked (unadorned) into an application and used for trivial tasks, such as reading startup configuration.
It must not be tied to a particular syntax or set of reserved words. If you use it to read your startup configuration, I don't want to force you to use my syntax and commands.
The boundary between the library (`libsic') and the application must be well defined. Sic will take strings of characters as input, and internally parse and evaluate them according to registered commands and syntax, returning results or diagnostics as appropriate.
It must be extremely portable -- that is what I am trying to illustrate here, after all.

9.2.1 Portability Infrastructure

As I explained in 9.1.1 Project Directory Structure, I'll first create the project directories, a toplevel dirctory and a subdirectory to put the library sources into. I want to install the library header files to `/usr/local/include/sic', so the library subdirectory must be named appropriately. See section 9.1.2 C Header Files.

$ mkdir sic $ mkdir sic/sic $ cd sic/sic

I will describe the files I add in this section in more detail than the project specific sources, because they comprise an infrastructure that I use relatively unchanged for all of my GNU Autotools projects. You could keep an archive of these files, and use them as a starting point each time you begin a new project of your own.

9.2.1.1 Error Management

A good place to start with any project design is the error management facility. In Sic I will use a simple group of functions to display simple error messages. Here is `sic/error.h':

#ifndef SIC_ERROR_H #define SIC_ERROR_H 1 #include <sic/common.h> BEGIN_C_DECLS extern const char *program_name; extern void set_program_name (const char *argv0); extern void sic_warning (const char *message); extern void sic_error (const char *message); extern void sic_fatal (const char *message); END_C_DECLS #endif /* !SIC_ERROR_H */

This header file follows the principles set out in 9.1.2 C Header Files.

I am storing the program_name variable in the library that uses it, so that I can be sure that the library will build on architectures that don't allow undefined symbols in libraries (12).

Keeping those preprocessor macro definitions designed to aid code portability together (in a single file), is a good way to maintain the readability of the rest of the code. For this project I will put that code in `common.h':

#ifndef SIC_COMMON_H #define SIC_COMMON_H 1 #if HAVE_CONFIG_H # include <sic/config.h> #endif #include <stdio.h> #include <sys/types.h> #if STDC_HEADERS # include <stdlib.h> # include <string.h> #elif HAVE_STRINGS_H # include <strings.h> #endif /*STDC_HEADERS*/ #if HAVE_UNISTD_H # include <unistd.h> #endif #if HAVE_ERRNO_H # include <errno.h> #endif /*HAVE_ERRNO_H*/ #ifndef errno /* Some systems #define this! */ extern int errno; #endif #endif /* !SIC_COMMON_H */

You may recognise some snippets of code from the Autoconf manual here--- in particular the inclusion of the project `config.h', which will be generated shortly. Notice that I have been careful to conditionally include any headers which are not guaranteed to exist on every architecture. The rule of thumb here is that only `stdio.h' is ubiquitous (though I have never heard of a machine that has no `sys/types.h'). You can find more details of some of these in section `Existing Tests' in The GNU Autoconf Manual.

Here is a little more code from `common.h':

#ifndef EXIT_SUCCESS # define EXIT_SUCCESS 0 # define EXIT_FAILURE 1 #endif

The implementation of the error handling functions goes in `error.c' and is very straightforward:

#if HAVE_CONFIG_H # include <sic/config.h> #endif #include "common.h" #include "error.h" static void error (int exit_status, const char *mode, const char *message); static void error (int exit_status, const char *mode, const char *message) { fprintf (stderr, "%s: %s: %s.\n", program_name, mode, message); if (exit_status >= 0) exit (exit_status); } void sic_warning (const char *message) { error (-1, "warning", message); } void sic_error (const char *message) { error (-1, "ERROR", message); } void sic_fatal (const char *message) { error (EXIT_FAILURE, "FATAL", message); }

I also need a definition of program_name; set_program_name copies the filename component of path into the exported data, program_name. The xstrdup function just calls strdup, but aborts if there is not enough memory to make the copy:

const char *program_name = NULL; void set_program_name (const char *path) { if (!program_name) program_name = xstrdup (basename (path)); }

9.2.1.2 Memory Management

A useful idiom common to many GNU projects is to wrap the memory management functions to localise out of memory handling, naming them with an `x' prefix. By doing this, the rest of the project is relieved of having to remember to check for `NULL' returns from the various memory functions. These wrappers use the error API to report memory exhaustion and abort the program. I have placed the implementation code in `xmalloc.c':

#if HAVE_CONFIG_H # include <sic/config.h> #endif #include "common.h" #include "error.h" void * xmalloc (size_t num) { void *new = malloc (num); if (!new) sic_fatal ("Memory exhausted"); return new; } void * xrealloc (void *p, size_t num) { void *new; if (!p) return xmalloc (num); new = realloc (p, num); if (!new) sic_fatal ("Memory exhausted"); return new; } void * xcalloc (size_t num, size_t size) { void *new = xmalloc (num * size); bzero (new, num * size); return new; }

Notice in the code above, that xcalloc is implemented in terms of xmalloc, since calloc itself is not available in some older C libraries.

Rather than create a separate `xmalloc.h' file, which would need to be #included from almost everywhere else, the logical place to declare these functions is in `common.h', since the wrappers will be called from most everywhere else in the code:

#ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { # define END_C_DECLS } #else # define BEGIN_C_DECLS # define END_C_DECLS #endif #define XCALLOC(type, num) \ ((type *) xcalloc ((num), sizeof(type))) #define XMALLOC(type, num) \ ((type *) xmalloc ((num) * sizeof(type))) #define XREALLOC(type, p, num) \ ((type *) xrealloc ((p), (num) * sizeof(type))) #define XFREE(stale) do { \ if (stale) { free (stale); stale = 0; } \ } while (0) BEGIN_C_DECLS extern void *xcalloc (size_t num, size_t size); extern void *xmalloc (size_t num); extern void *xrealloc (void *p, size_t num); extern char *xstrdup (const char *string); extern char *xstrerror (int errnum); END_C_DECLS

By using the macros defined here, allocating and freeing heap memory is reduced from:

char **argv = (char **) xmalloc (sizeof (char *) * 3); do_stuff (argv); if (argv) free (argv);

to the simpler and more readable:

char **argv = XMALLOC (char *, 3); do_stuff (argv); XFREE (argv);

In the same spirit, I have borrowed `xstrdup.c' and `xstrerror.c' from project GNU's libiberty. See section 9.1.5 Fallback Function Implementations.

9.2.1.3 Generalised List Data Type

In many C programs you will see various implementations and re-implementations of lists and stacks, each tied to its own particular project. It is surprisingly simple to write a catch-all implementation, as I have done here with a generalised list operation API in `list.h':

#ifndef SIC_LIST_H #define SIC_LIST_H 1 #include <sic/common.h> BEGIN_C_DECLS typedef struct list { struct list *next; /* chain forward pointer*/ void *userdata; /* incase you want to use raw Lists */ } List; extern List *list_new (void *userdata); extern List *list_cons (List *head, List *tail); extern List *list_tail (List *head); extern size_t list_length (List *head); END_C_DECLS #endif /* !SIC_LIST_H */

The trick is to ensure that any structures you want to chain together have their forward pointer in the first field. Having done that, the generic functions declared above can be used to manipulate any such chain by casting it to List * and back again as necessary.

For example:

struct foo { struct foo *next; char *bar; struct baz *qux; ... }; ... struct foo *foo_list = NULL; foo_list = (struct foo *) list_cons ((List *) new_foo (), (List *) foo_list); ...

The implementation of the list manipulation functions is in `list.c':

#include "list.h" List * list_new (void *userdata) { List *new = XMALLOC (List, 1); new->next = NULL; new->userdata = userdata; return new; } List * list_cons (List *head, List *tail) { head->next = tail; return head; } List * list_tail (List *head) { return head->next; } size_t list_length (List *head) { size_t n; for (n = 0; head; ++n) head = head->next; return n; }

9.2.2.1 `sic.c' & `sic.h'

Here are the functions for creating and managing sic parsers.

#ifndef SIC_SIC_H #define SIC_SIC_H 1 #include <sic/common.h> #include <sic/error.h> #include <sic/list.h> #include <sic/syntax.h> typedef struct sic { char *result; /* result string */ size_t len; /* bytes used by result field */ size_t lim; /* bytes allocated to result field */ struct builtintab *builtins; /* tables of builtin functions */ SyntaxTable **syntax; /* dispatch table for syntax of input */ List *syntax_init; /* stack of syntax state initialisers */ List *syntax_finish; /* stack of syntax state finalizers */ SicState *state; /* state data from syntax extensions */ } Sic; #endif /* !SIC_SIC_H */

9.2.2.2 `builtin.c' & `builtin.h'

Here are the functions for managing tables of builtin commands in each Sic structure:

typedef int (*builtin_handler) (Sic *sic, int argc, char *const argv[]); typedef struct { const char *name; builtin_handler func; int min, max; } Builtin; typedef struct builtintab BuiltinTab; extern Builtin *builtin_find (Sic *sic, const char *name); extern int builtin_install (Sic *sic, Builtin *table); extern int builtin_remove (Sic *sic, Builtin *table);

9.2.2.3 `eval.c' & `eval.h'

Having created a Sic parser, and populated it with some Builtin handlers, a user of this library must tokenize and evaluate its input stream. These files define a structure for storing tokenized strings (Tokens), and functions for converting char * strings both to and from this structure type:

#ifndef SIC_EVAL_H #define SIC_EVAL_H 1 #include <sic/common.h> #include <sic/sic.h> BEGIN_C_DECLS typedef struct { int argc; /* number of elements in ARGV */ char **argv; /* array of pointers to elements */ size_t lim; /* number of bytes allocated */ } Tokens; extern int eval (Sic *sic, Tokens *tokens); extern int untokenize (Sic *sic, char **pcommand, Tokens *tokens); extern int tokenize (Sic *sic, Tokens **ptokens, char **pcommand); END_C_DECLS #endif /* !SIC_EVAL_H */

These files also define the eval function, which examines a Tokens structure in the context of the given Sic parser, dispatching the argv array to a relevant Builtin handler, also written by the library user.

9.2.2.4 `syntax.c' & `syntax.h'

When tokenize splits a char * string into parts, by default it breaks the string into words delimited by whitespace. These files define the interface for changing this default behaviour, by registering callback functions which the parser will run when it meets an `interesting' symbol in the input stream. Here are the declarations from `syntax.h':

BEGIN_C_DECLS typedef int SyntaxHandler (struct sic *sic, BufferIn *in, BufferOut *out); typedef struct syntax { SyntaxHandler *handler; char *ch; } Syntax; extern int syntax_install (struct sic *sic, Syntax *table); extern SyntaxHandler *syntax_handler (struct sic *sic, int ch); END_C_DECLS

A SyntaxHandler is a function called by tokenize as it consumes its input to create a Tokens structure; the two functions associate a table of such handlers with a given Sic parser, and find the particular handler for a given character in that Sic parser, respectively.

9.2.3 Beginnings of a `configure.in'

Now that I have some code, I can run autoscan to generate a preliminary
`configure.in'. autoscan will examine all of the sources in the current directory tree looking for common points of non-portability, adding macros suitable for detecting the discovered problems. autoscan generates the following in `configure.scan':

# Process this file with autoconf to produce a configure script. AC_INIT(sic/eval.h) # Checks for programs. # Checks for libraries. # Checks for header files. AC_HEADER_STDC AC_CHECK_HEADERS(strings.h unistd.h) # Checks for typedefs, structures, and compiler characteristics. AC_C_CONST AC_TYPE_SIZE_T # Checks for library functions. AC_FUNC_VPRINTF AC_CHECK_FUNCS(strerror) AC_OUTPUT()

Since the generated `configure.scan' does not overwrite your project's `configure.in', it is a good idea to run autoscan periodically even in established project source trees, and compare the two files. Sometimes autoscan will find some portability issue you have overlooked, or weren't aware of.

Looking through the documentation for the macros in this `configure.scan',
AC_C_CONST and AC_TYPE_SIZE_T will take care of themselves (provided I ensure that `config.h' is included into every source file), and AC_HEADER_STDC and AC_CHECK_HEADERS(unistd.h) are already taken care of in `common.h'.

autoscan is no silver bullet! Even here in this simple example, I need to manually add macros to check for the presence of `errno.h':

AC_CHECK_HEADERS(errno.h strings.h unistd.h)

I also need to manually add the Autoconf macro for generating `config.h'; a macro to initialise automake support; and a macro to check for the presence of ranlib. These should go close to the start of `configure.in':

... AC_CONFIG_HEADER(config.h) AM_INIT_AUTOMAKE(sic, 0.5) AC_PROG_CC AC_PROG_RANLIB ...

An interesting macro suggested by autoscan is AC_CHECK_FUNCS(strerror). This tells me that I need to provide a replacement implementation of strerror for the benefit of architectures which don't have it in their system libraries. This is resolved by providing a file with a fallback implementation for the named function, and creating a library from it and any others that `configure' discovers to be lacking from the system library on the target host.

You will recall that `configure' is the shell script the end user of this package will run on their machine to test that it has all the features the package wants to use. The library that is created will allow the rest of the project to be written in the knowledge that any functions required by the project but missing from the installers system libraries will be available nonetheless. GNU `libiberty' comes to the rescue again -- it already has an implementation of `strerror.c' that I was able to use with a little modification.

Being able to supply a simple implementation of strerror, as the `strerror.c' file from `libiberty' does, relies on there being a well defined sys_errlist variable. It is a fair bet that if the target host has no strerror implementation, however, that the system sys_errlist will be broken or missing. I need to write a configure macro to check whether the system defines sys_errlist, and tailor the code in `strerror.c' to use this knowledge.

To avoid clutter in the top-level directory, I am a great believer in keeping as many of the configuration files as possible in their own sub-directory. First of all, I will create a new directory called `config' inside the top-level directory, and put `sys_errlist.m4' inside it:

AC_DEFUN(SIC_VAR_SYS_ERRLIST, [AC_CACHE_CHECK([for sys_errlist], sic_cv_var_sys_errlist, [AC_TRY_LINK([int *p;], [extern int sys_errlist; p = &sys_errlist;], sic_cv_var_sys_errlist=yes, sic_cv_var_sys_errlist=no)]) if test x"$sic_cv_var_sys_errlist" = xyes; then AC_DEFINE(HAVE_SYS_ERRLIST, 1, [Define if your system libraries have a sys_errlist variable.]) fi])

I must then add a call to this new macro in the `configure.in' file being careful to put it in the right place -- somwhere between typedefs and structures and library functions according to the comments in `configure.scan':

SIC_VAR_SYS_ERRLIST

GNU Autotools can also be set to store most of their files in a subdirectory, by calling the AC_CONFIG_AUX_DIR macro near the top of `configure.in', preferably right after AC_INIT:

AC_INIT(sic/eval.c) AC_CONFIG_AUX_DIR(config) AM_CONFIG_HEADER(config.h) ...

Having made this change, many of the files added by running autoconf and automake --add-missing will be put in the aux_dir.

The source tree now looks like this:

sic/ +-- configure.scan +-- config/ | +-- sys_errlist.m4 +-- replace/ | +-- strerror.c +-- sic/ +-- builtin.c +-- builtin.h +-- common.h +-- error.c +-- error.h +-- eval.c +-- eval.h +-- list.c +-- list.h +-- sic.c +-- sic.h +-- syntax.c +-- syntax.h +-- xmalloc.c +-- xstrdup.c +-- xstrerror.c

In order to correctly utilise the fallback implementation, AC_CHECK_FUNCS(strerror) needs to be removed and strerror added to AC_REPLACE_FUNCS:

# Checks for library functions. AC_REPLACE_FUNCS(strerror)

This will be clearer if you look at the `Makefile.am' for the `replace' subdirectory:

## Makefile.am -- Process this file with automake to produce Makefile.in INCLUDES = -I$(top_builddir) -I$(top_srcdir) noinst_LIBRARIES = libreplace.a libreplace_a_SOURCES = libreplace_a_LIBADD = @LIBOBJS@

The code tells automake that I want to build a library for use within the build tree (i.e. not installed -- `noinst'), and that has no source files by default. The clever part here is that when someone comes to install Sic, they will run configure which will test for strerror, and add `strerror.o' to LIBOBJS if the target host environment is missing its own implementation. Now, when `configure' creates `replace/Makefile' (as I asked it to with AC_OUTPUT), `@LIBOBJS@' is replaced by the list of objects required on the installer's machine.

Having done all this at configure time, when my user runs make, the files required to replace functions missing from their target machine will be added to `libreplace.a'.

Unfortunately this is not quite enough to start building the project. First I need to add a top-level `Makefile.am' from which to ultimately create a top-level `Makefile' that will descend into the various subdirectories of the project:

## Makefile.am -- Process this file with automake to produce Makefile.in SUBDIRS = replace sic

And `configure.in' must be told where it can find instances of Makefile.in:

AC_OUTPUT(Makefile replace/Makefile sic/Makefile)

I have written a bootstrap script for Sic, for details see 8. Bootstrapping:

#! /bin/sh set -x aclocal -I config autoheader automake --foreign --add-missing --copy autoconf

The `--foreign' option to automake tells it to relax the GNU standards for various files that should be present in a GNU distribution. Using this option saves me from havng to create empty files as we did in 5. A Minimal GNU Autotools Project.

Right. Let's build the library! First, I'll run bootstrap:

$ ./bootstrap + aclocal -I config + autoheader + automake --foreign --add-missing --copy automake: configure.in: installing config/install-sh automake: configure.in: installing config/mkinstalldirs automake: configure.in: installing config/missing + autoconf

The project is now in the same state that an end-user would see, having unpacked a distribution tarball. What follows is what an end user might expect to see when building from that tarball:

$ ./configure creating cache ./config.cache checking for a BSD compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking whether make sets ${MAKE}... yes checking for working aclocal... found checking for working autoconf... found checking for working automake... found checking for working autoheader... found checking for working makeinfo... found checking for gcc... gcc checking whether the C compiler (gcc ) works... yes checking whether the C compiler (gcc ) is a cross-compiler... no checking whether we are using GNU C... yes checking whether gcc accepts -g... yes checking for ranlib... ranlib checking how to run the C preprocessor... gcc -E checking for ANSI C header files... yes checking for unistd.h... yes checking for errno.h... yes checking for string.h... yes checking for working const... yes checking for size_t... yes checking for strerror... yes updating cache ./config.cache creating ./config.status creating Makefile creating replace/Makefile creating sic/Makefile creating config.h

Compare this output with the contents of `configure.in', and notice how each macro is ultimately responsible for one or more consecutive tests (via the Bourne shell code generated in `configure'). Now that the `Makefile's have been successfully created, it is safe to call make to perform the actual compilation:

$ make make all-recursive make[1]: Entering directory `/tmp/sic' Making all in replace make[2]: Entering directory `/tmp/sic/replace' rm -f libreplace.a ar cru libreplace.a ranlib libreplace.a make[2]: Leaving directory `/tmp/sic/replace' Making all in sic make[2]: Entering directory `/tmp/sic/sic' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c builtin.c gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c error.c gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c eval.c gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c list.c gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c sic.c gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c syntax.c gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c xmalloc.c gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c xstrdup.c gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c xstrerror.c rm -f libsic.a ar cru libsic.a builtin.o error.o eval.o list.o sic.o syntax.o xmalloc.o xstrdup.o xstrerror.o ranlib libsic.a make[2]: Leaving directory `/tmp/sic/sic' make[1]: Leaving directory `/tmp/sic'

On this machine, as you can see from the output of configure above, I have no need of the fallback implementation of strerror, so `libreplace.a' is empty. On another machine this might not be the case. In any event, I now have a compiled `libsic.a' -- so far, so good.

9.3 A Sample Shell Application

What I need now, is a program that uses `libsic.a', if only to give me confidence that it is working. In this section, I will write a simple shell which uses the library. But first, I'll create a directory to put it in:

$ mkdir src $ ls -F COPYING Makefile.am aclocal.m4 configure* config/ sic/ INSTALL Makefile.in bootstrap* configure.in replace/ src/ $ cd src

In order to put this shell together, we need to provide just a few things for integration with `libsic.a'...

9.3.1 `sic_repl.c'

In `sic_repl.c'(13) there is a loop for reading strings typed by the user, evaluating them and printing the results. GNU readline is ideally suited to this, but it is not always available -- or sometimes people simply may not wish to use it.

With the help of GNU Autotools, it is very easy to cater for building with and without GNU readline. `sic_repl.c' uses this function to read lines of input from the user:

static char * getline (FILE *in, const char *prompt) { static char *buf = NULL; /* Always allocated and freed from inside this function. */ XFREE (buf); buf = (char *) readline ((char *) prompt); #ifdef HAVE_ADD_HISTORY if (buf && *buf) add_history (buf); #endif return buf; }

To make this work, I must write an Autoconf macro which adds an option to `configure', so that when the package is installed, it will use the readline library if `--with-readline' is used:

AC_DEFUN(SIC_WITH_READLINE, [AC_ARG_WITH(readline, [ --with-readline compile with the system readline library], [if test x"${withval-no}" != xno; then sic_save_LIBS=$LIBS AC_CHECK_LIB(readline, readline) if test x"${ac_cv_lib_readline_readline}" = xno; then AC_MSG_ERROR(libreadline not found) fi LIBS=$sic_save_LIBS fi]) AM_CONDITIONAL(WITH_READLINE, test x"${with_readline-no}" != xno) ])

Having put this macro in the file `config/readline.m4', I must also call the new macro (SIC_WITH_READLINE) from `configure.in'.

9.3.2 `sic_syntax.c'

The syntax of the commands in the shell I am writing is defined by a set of syntax handlers which are loaded into `libsic' at startup. I can get the C preprocessor to do most of the repetitive code for me, and just fill in the function bodies:

#if HAVE_CONFIG_H # include <sic/config.h> #endif #include "sic.h" /* List of builtin syntax. */ #define syntax_functions \ SYNTAX(escape, "\\") \ SYNTAX(space, " \f\n\r\t\v") \ SYNTAX(comment, "#") \ SYNTAX(string, "\"") \ SYNTAX(endcmd, ";") \ SYNTAX(endstr, "") /* Prototype Generator. */ #define SIC_SYNTAX(name) \ int name (Sic *sic, BufferIn *in, BufferOut *out) #define SYNTAX(name, string) \ extern SIC_SYNTAX (CONC (syntax_, name)); syntax_functions #undef SYNTAX /* Syntax handler mappings. */ Syntax syntax_table[] = { #define SYNTAX(name, string) \ { CONC (syntax_, name), string }, syntax_functions #undef SYNTAX { NULL, NULL } };

This code writes the prototypes for the syntax handler functions, and creates a table which associates each with one or more characters that might occur in the input stream. The advantage of writing the code this way is that when I want to add a new syntax handler later, it is a simple matter of adding a new row to the syntax_functions macro, and writing the function itself.

9.3.3 `sic_builtin.c'

In addition to the syntax handlers I have just added to the Sic shell, the language of this shell is also defined by the builtin commands it provides. The infrastructure for this file is built from a table of functions which is fed into various C preprocessor macros, just as I did for the syntax handlers.

One builtin handler function has special status, builtin_unknown. This is the builtin that is called, if the Sic library cannot find a suitable builtin function to handle the current input command. At first this doesn't sound especially important -- but it is the key to any shell implementation. When there is no builtin handler for the command, the shell will search the users command path, `$PATH', to find a suitable executable. And this is the job of builtin_unknown:

int builtin_unknown (Sic *sic, int argc, char *const argv[]) { char *path = path_find (argv[0]); int status = SIC_ERROR; if (!path) { sic_result_append (sic, "command \""); sic_result_append (sic, argv[0]); sic_result_append (sic, "\" not found"); } else if (path_execute (sic, path, argv) != SIC_OKAY) { sic_result_append (sic, "command \""); sic_result_append (sic, argv[0]); sic_result_append (sic, "\" failed: "); sic_result_append (sic, strerror (errno)); } else status = SIC_OKAY; return status; } static char * path_find (const char *command) { char *path = xstrdup (command); if (*command == '/') { if (access (command, X_OK) < 0) goto notfound; } else { char *PATH = getenv ("PATH"); char *pbeg, *pend; size_t len; for (pbeg = PATH; *pbeg != '\0'; pbeg = pend) { pbeg += strspn (pbeg, ":"); len = strcspn (pbeg, ":"); pend = pbeg + len; path = XREALLOC (char, path, 2 + len + strlen(command)); *path = '\0'; strncat (path, pbeg, len); if (path[len -1] != '/') strcat (path, "/"); strcat (path, command); if (access (path, X_OK) == 0) break; } if (*pbeg == '\0') goto notfound; } return path; notfound: XFREE (path); return NULL; }

Running `autoscan' again at this point adds AC_CHECK_FUNCS(strcspn strspn) to `configure.scan'. This tells me that these functions are not truly portable. As before I provide fallback implementations for these functions incase they are missing from the target host -- and as it turns out, they are easy to write:

/* strcspn.c -- implement strcspn() for architectures without it */ #if HAVE_CONFIG_H # include <sic/config.h> #endif #include <sys/types.h> #if STDC_HEADERS # include <string.h> #elif HAVE_STRINGS_H # include <strings.h> #endif #if !HAVE_STRCHR # ifndef strchr # define strchr index # endif #endif size_t strcspn (const char *string, const char *reject) { size_t count = 0; while (strchr (reject, *string) == 0) ++count, ++string; return count; }

There is no need to add any code to `Makefile.am', because the configure script will automatically add the names of the missing function sources to `@LIBOBJS@'.

This implementation uses the autoconf generated `config.h' to get information about the availability of headers and type definitions. It is interesting that autoscan reports that strchr and strrchr, which are used in the fallback implementations of strcspn and strspn respectively, are themselves not portable! Luckily, the Autoconf manual tells me exactly how to deal with this: by adding some code to my `common.h' (paraphrased from the literal code in the manual):

#if !STDC_HEADERS # if !HAVE_STRCHR # define strchr index # define strrchr rindex # endif #endif

And another macro in `configure.in':

9.3.4 `sic.c' & `sic.h'

Since the application binary has no installed header files, there is little point in maintaining a corresponding header file for every source, all of the structures shared by these files, and non-static functions in these files are declared in `sic.h':

#ifndef SIC_H #define SIC_H 1 #include <sic/common.h> #include <sic/sic.h> #include <sic/builtin.h> BEGIN_C_DECLS extern Syntax syntax_table[]; extern Builtin builtin_table[]; extern Syntax syntax_table[]; extern int evalstream (Sic *sic, FILE *stream); extern int evalline (Sic *sic, char **pline); extern int source (Sic *sic, const char *path); extern int syntax_init (Sic *sic); extern int syntax_finish (Sic *sic, BufferIn *in, BufferOut *out); END_C_DECLS #endif /* !SIC_H */

To hold together everything you have seen so far, the main function creates a Sic parser and initialises it by adding syntax handler functions and builtin functions from the two tables defined earlier, before handing control to evalstream which will eventually exit when the input stream is exhausted.

int main (int argc, char * const argv[]) { int result = EXIT_SUCCESS; Sic *sic = sic_new (); /* initialise the system */ if (sic_init (sic) != SIC_OKAY) sic_fatal ("sic initialisation failed"); signal (SIGINT, SIG_IGN); setbuf (stdout, NULL); /* initial symbols */ sicstate_set (sic, "PS1", "] ", NULL); sicstate_set (sic, "PS2", "- ", NULL); /* evaluate the input stream */ evalstream (sic, stdin); exit (result); }

Now, the shell can be built and used:

$ bootstrap ... $ ./configure --with-readline ... $ make ... make[2]: Entering directory `/tmp/sic/src' gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic.c gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_builtin.c gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_repl.c gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_syntax.c gcc -g -O2 -o sic sic.o sic_builtin.o sic_repl.o sic_syntax.o \ ../sic/libsic.a ../replace/libreplace.a -lreadline make[2]: Leaving directory `/tmp/sic/src' ... $ ./src/sic ] pwd /tmp/sic ] ls -F Makefile aclocal.m4 config.cache configure* sic/ Makefile.am bootstrap* config.log configure.in src/ Makefile.in config/ config.status* replace/ ] exit $

This chapter has developed a solid foundation of code, which I will return to in 12. A Large GNU Autotools Project, when Libtool will join the fray. The chapters leading up to that explain what Libtool is for, how to use it and integrate it into your own projects, and the advantages it offers over building shared libraries with Automake (or even just Make) alone.

Оставьте свой комментарий !

Ваше имя:

Комментарий:

Оба поля являются обязательными

Автор	Комментарий к данной статье