Chapter 1  Introduction

This version of the document refers to CCured 1.3.5 (based on CIL 1.3.5) and was last modified on 28th September, 2006

CCured is a source-to-source translator for C. It analyzes the C program to determine the smallest number of run-time checks that must be inserted in the program to prevent all memory safety violations. The resulting program is memory safe, meaning that it will stop rather than overrun a buffer or scribble over memory that it shouldn't touch. Many programs can be made memory-safe this way while losing only 10–60% run-time performance (the performance cost is smaller for cleaner programs, and can be improved further by holding CCured's hand on the parts of the program that it does not understand by itself). Using CCured we have found bugs that Purify misses with an order of magnitude smaller run-time cost.

Small programs can be passed through CCured automatically. For medium size and large programs you have to hold CCured's hand a bit but we tried to explain the process clearly in this manual. We have used CCured on programs such as sendmail, bind, openssl, Apache modules, Linux device drivers, and the SPEC95 benchmarks. Some of these programs are quite big (300Kloc) and it can take a few days for somebody to “port” the program to CCured.

The translator itself is written in Ocaml (a dialect of ML). There is also a Perl script, ccured, that operates as a drop-in replacement for 'gcc', so that software packages' existing Makefiles can be used with very minor changes. Finally, CCured provides a library of runtime functions (including the Boehm-Weiser conservative garbage collector).

CCured is implemented on top of the CIL framework. for analysis and transformation of C programs. This one you can use to write easily a program analysis module that works on ANSI C code as well as on code that uses the GNU C extensions.

If you are anxious to see CCured in action you can try out our online demo.

In this manual you can find a tutorial on getting started with CCured (Chapter 3), documentation for all of the features (actually some of the more researchy features are not yet fully documented) and step-by-step accounts on what it took to use CCured on several example programs (Chapter 6). We suggest that you read the chapters in order and go to the “Advanced CCured Features” only if you need it. The Chapter 10 (CCured Warnings and Errors) will help you figure out if you are running into an error that is covered by an advanced feature.

In addition to this manual, you can find information on CCured in the research papers that we have written. A comprehensive look at CCured can be found in our ACM TOPLAS article, which includes much of the material from the other CCured papers. You may also be interested in the POPL '02 paper describing the type system and inference algorithm, and the “CCured in the Real World” paper from PLDI '03 that discusses several advanced features that we discovered were important for large legacy systems.

1.1  Authors

CCured was developed primarily by George Necula, Scott McPeak, Westley Weimer, Matthew Harren and Jeremy Condit. Other people helped with various components: Shree Rahul, Raymond To, Aman Bhargava, James Lee, Winston Liaw.

This work was supported in part by the National Science Foundation under Grants No. 9875171, 0085949 and 0081588, and gifts from Microsoft Research. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the other sponsors.

Chapter 2  Installation

CCured works on Linux and MS Windows (Win95 operation is unreliable but Win98 and (highly recommended instead) Win2k or WinXP should work). CCured might also work on other systems that use gcc, but we have not tried it.

CCured is somewhat sensitive to the version of the compiler that you are using. More precisely, CCured is sensitive to the format of the system include files that you are using. When you install CCured it will create slightly modified copies of some of the system include files. These copies are created based on some patches that we distribute with CCured. If your include files are different from those that we used to create the patches then the CCured installation might fail. At the moment we have tested CCured with the following compilers: Some of the system include files that CCured depends on are really part of the standard library. CCured has been tried with the following versions of glibc: glibc-2.2.3, glibc-2.2.5, glibc-2.2.93, and glibc-2.3.2. To find out the version of your glibc you can run /lib/libc.so.6 on Linux.

If you want to use CCured on Windows then you must get a complete installation of cygwin (make sure you install the development tools such as gcc and ld as well and the perl interpreter) and the source-code Ocaml distribution and compile it yourself using the cygwin tools (as opposed to getting the Win32 native-code version of Ocaml). You will need Ocaml release 3.08 or higher to build CCured. If you have not done this before then take a look here. (Don't need to worry about cvs and ssh unless you will need to use the master CVS repository for CCured.)

2.1  Get the CCured sources

Download the CCured distribution (latest version is distrib/ccured-1.3.5.tar.gz). See the Section 13 for recent changes to the CCured distribution.

2.2  Configure and Compile CCured

Run the following commands in the top level directory. If you are using Windows then at least the configure command must be run from within bash.
./configure
make
make quicktest (optional)
cd test
make testrun/hello INFERBOX=infer
The configure script tries to find appropriate defaults for your system. You can control its actions by passing the following arguments: The last line in the above sequence of commands will apply CCured (and then run the result) on the file test/small1/hello.c. (See the test/Makefile for the sequence of commands.)

It is possible that you get a configuration error saying that certain patterns did not match. This means that your standard include files are different than those that we have prepared the distribution for. (See the above discussion.) Your recourse in this case is either to install one of the versions of the compiler that we tested CCured for or to extend the patch files so that they match your includes. It is not hard, and it is explained in the patcher documentation.

After running make you have built a few executables (in the obj directory) and have configured the bin/ccured Perl script. If you want to move this script to another directory (e.g. to /usr/local/bin) make sure to copy the CilConfig.pm file to the same directory.

Now you can continue with a tutorial (Chapter 3), or you can jump ahead and find out how to run CCured (Chapter 4).

2.3  Test CCured

Once you have built CCured you can run
make quicktest
This will run a few small examples.

Chapter 3  CCured Tutorial



CCured is an extension of the C programming language that distinguishes among various kinds of pointers depending on their usage. The purpose of this distinction is to be able to prevent improper usage of pointers and thus to guarantee that your programs do not access memory areas they shouldn't access. You can continue to write C programs but CCured will change them slightly so that they are type safe. In this chapter we explain in what situations will your program be changed and in which way.

CCured leaves unchanged code that does not use pointers or arrays. Actually, CCured is implemented on top of the C Intermediate Language (CIL) infrastructure, which means that C programs are first translated into a subset of the C language that has simple semantic rules. The following are some of the transformations that are performed: For a complete description of the CIL infrastructure see the CIL documentation.

3.1  CCured Attributes

The most significant difference between C and CCured is that CCured pays close attention to how pointers are manipulated and it classifies pointers into various kinds according to what you do with them. We'll discuss the various kinds starting in the next section but before that we need to introduce an important notation that you can use to communicate to CCured which pointer kinds you want for your pointers. The same notation is then used by CCured to explain in the transformed program what pointer kind if inferred for each pointer.

CCured uses type attributes to express the kind of pointers. Type attributes exist in a limited form in ANSI C (i.e. the volatile, const and restrict type qualifiers) and in a richer form in the GCC dialect of C. CCured, just like GCC, allows any attributes to be specified for types, names of variables, functions or fields, and for structure or union declarations. Unlike GCC, CCured has precise rules for how attributes are interpreted in a declaration (instead GCC relies on knowing the semantics of the attribute in order to associate it with the proper element of a declaration). The rule of thumb is that the attribute of a pointer type is written immediately following the * pointer-type constructor and the attribute of a name is written immediately before the semicolon or the = sign that terminates the declaration of the name. CCured uses pointer-kinds such as SAFE, SEQ and WILD and the corresponding attribute are formed by adding two leading underscores. For example, in the following declaration:

int * __WILD * __SEQ x __SAFE; 
the type of x is declared to be a SEQuence pointer to a WILD pointer (just like pointer-types in C, attributes are read from right-to-left). The __SAFE attribute in this case applies to the name x, which in the context of CCured means that whenever we take the address of the variable x we are going to obtain a SAFE pointer. The type of such a pointer would be int * __WILD * __SEQ * __SAFE (read as a SAFE pointer to a SEQ pointer to a WILD pointer to an integer.).

(The complete attribute-parsing rules for CIL are described in the CIL manual.)

CCured is designed to work on regular C programs (i.e. without pointer-kind attributes). One of the main features of CCured is that it will analyze your pointer usage and will find for all pointers in your program what is the best pointer-kind that can be ascribed to that pointer. However, you can also place pointer-kind annotations and force CCured to use certain pointer kinds.

3.2  SAFE pointers

The main action in CCured concerns pointers and arrays. Pointers in C can be assigned to l-values, dereferenced, subject to pointer arithmetic and cast to other pointer or non-pointer types. In contrast, pointers in a typical type-safe language (e.g. Java, Basic, ML) cannot be subject to arithmetic or (arbitrary) casts. CCured allows all the pointer operations that C allows but gives preferential treatment to pointers that are not subject to arithmetic or to casts. CCured refers to such pointers as SAFE pointers.

Consider for example this small code-fragment that computes the length of a linked list:

struct list { 
   void * car; 
   struct list * cdr; 
};

int length(struct list * l) {
  int i = 0;
  while(l) {
    l = l->cdr; 
    i ++;
  }
  return i;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

The only pointers used in this code fragment are pointers to list cells and they are not subject to arithmetic or to casts. In fact, this code fragment can be transcribed literally into Java or C#. You can see in the cured code that CCured has inferred that these pointers are indeed SAFE.

Properties of SAFE pointers
The SAFE pointers are the best kind of pointers, meaning that they incur the least amount of run-time cost. Here is a list of the properties of SAFE pointers: All of these restrictions are such that the following invariant holds for all SAFE pointers:
A SAFE pointer to type T is either 0 or else it points to a valid area of memory containing an object of type T. Furthermore, all other pointers to the same area are also SAFE and agree on the type T of the stored object.


3.2.1  Safe Casts

Casting a pointer to an integer is always allowed. CCured does actually allow certain other casts on SAFE pointers. For example it is safe to cast a pointer to a structure containing two integers into a pointer to integers. In general it is safe to cast a pointer to a long structure into a pointer to a short structure as long as the two structures agree on the types of the elements in the overlapping portion. CCured is actually quite liberal about these rules and will think of nested combinations of structures and arrays as one big structure with non-structure and non-array fields. This feature is called physical subtyping. For example, in the code shown below, all of the four casts implicit in the assignments are safe and CCured will infer that all pointers involved are SAFE.

struct large {
  struct small {
      int * f1;
      int * f2; 
  } a;
  int * f3;
} x;

struct small * s1 = & x;
int *        * s2 = & x;
struct { int *a1, *a2, *a3; } * s3 = & x;
struct { int *a1, *a2[2];   } * s4 = & x;
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Notice that all of the s1, s2, s3 and s4 are aliases for the address of x but they agree on the type of the object pointed to.

Following are two examples of casts that are not allowed (for SAFE pointers; you can see that CCured infers the WILD kind for the pointers involved):

int y1;
int * * x1 = & y1; // Cast an int * to a int * *  
int y2;
struct { int * a1, a2; } * x2 = &y2;
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If the first cast were allowed then by writing to y1 an arbitrary integer we would be invalidating the assumption that x1 points to a pointer value. The second cast is similar.

3.2.2  Union types

Further complications arise in the case of union types. A pointer to union type can be SAFE if it obeys all of the restrictions mentioned above and also for all two fields of the union type, they agree on the types of the elements in the overlap. For example, in the code below the type of x can be a SAFE pointer.

union { 
    int *f1;
    int *f2[2];
    struct { int *a1, *a2, *a3; } f3;
} * x;

int* foo() { return x->f1; } //use x so it is analyzed.
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

But in the following code the type of x cannot possibly be SAFE because the type of the field f3.a2 does not match the type of the overlapping field f2[1] and thus x->f3.a2 could be used to write an arbitrary integer that can later be interpreted as a pointer using the expression x->f2[1].

union { 
    int *f1;
    int *f2[2];
    struct { int *a1, a2, *a3; } f3;
} * x;

int* foo() { return x->f1; } //use x so it is analyzed.
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If your program uses a union with incompatible fields you can still obtain SAFE pointers if you rewrite the union to be a struct. This will waste some space but in some cases (in which the program requires the type to have a given size) it might break the semantics of the program.

Or, you can use tagged unions (Section 9.7) in which CCured will insert run-time checks to ensure that you are not trying to read a pointer for a field of a union when you wrote a scalar (or an incompatible pointer) using another field.

3.2.3  SAFE Function Pointers

There is nothing special about function pointers. They can be safe provided that they are not cast to incompatible pointer types. A function pointer type is compatible only with another function pointer type with the same number and type of arguments and the same result type.

A common problem with function pointers (and functions) in CCured is if your program uses external function without prototypes. This makes CCured think that the function is taking no arguments and returning an integer and every time you use it in a different way CCured behaves as if you are casting the function pointer (denoted implicitly by the function's name) to the type needed in the cast. CCured will print warnings about using functions without prototype and we recommend that you fix those problems and try CCured again. For a discussion of what happens when you do not use your function pointer in a clean way you should read to the end of this tutorial chapter and then read Section 9.1.

3.3  Checks for SAFE Pointers

As we mentioned above, every time a SAFE pointer is dereferenced it must be checked whether it is null or not. We know from the invariant for SAFE pointers that non-null pointers can be dereferenced and we can count that the value read through them has the type given by the pointer type.

A null check appears in the output of CCured as a call to the function CHECK_NULL. This and other run-time checking functions used by CCured have a name that starts with the prefix CHECK_ and are declared in the file ccuredcheck.h. You will see in that file that most of these functions are declared inline.

Checking for null pointers is necessary not just when reading or writing through them but also when they are used to compute the address of a subobject of the object they point to. For example, in the following code CCured will add a run-time check that s is not null before computing the value of x. Then again there will be a check that x is not null before dereferencing it.

struct str  {
   int a, b;
};

int getaddr(struct str * s) { 
   int * x = & (s->b);
   return *x;        
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

The first check in the code above is necessary to enforce the invariant that SAFE pointers are either 0 or else valid pointers. Without that check the value of x would be 4 (on most machines) which would break the invariant and would defeat the second null-check thus letting you dereference an invalid pointer.

At this point you are starting to see some of the subtleties in the design of CCured. To ensure that we got everything right we have formalized the type system of CCured and we proved (for a subset of CCured) that the set of run-time checks and invariants achieve memory safety. In fact, in the first implementation of CCured we had forgotten about the first null check in the above example and the need for it was revealed while trying to prove that CCured is sound. To read our formalization and see the soundness proofs take a look at our paper “CCured: Type-Safe Retrofitting of Legacy Code”.

CCured includes a simple optimizer that tries to eliminate redundant checks and checks that cannot possibly fail (such as checking that the address of a global variable is non-null). Currently the optimizer is fairly naive. For example, it does not know that since s is a non-null SAFE pointer to struct str then &(s->b) is guaranteed to be non-null as well, thus the second check is not really necessary.

Speaking of too many checks, some of the more experienced C programmers will have noticed that our run-time checks prevent a common idiom for computing the offset of fields in structures. The typical code for doing that is shown below (as is defined as the macro offsetof in many C libraries):

struct str  {
   int a, b;
};

int get_offset_of_b() {
  return (int) &(((struct str*)0)->b);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

CCured recognizes in this specific case that you are casting the result of the & operator to an integer, so it avoids the run-time check.

3.4  Checks for Returning Pointers

Note: Checks for returning stack pointers have been disabled in the current version. This was done because recent versions of gcc perform more aggressive inlining that results in false positives for our return-pointer checks. For more information, contact the CCured developers.

One of the unsafe features of C is that the address of a local variable can be returned from a function and later be used in a context in which the storage for the local variable has been reused. Many C compilers try to give warnings when they notice this happening but it is way to easy to fool them. For example, in the code example below the function bar does return the address of its local variable and this is going to be missed by most compilers.

int *foo(int *in) { // in is a stack address
   *in = 5;
   return in;     
}
int* bar() {
  int local = 0;
  return foo(&local);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Each function that returns a pointer value will have one call to CHECK_RETURNPTR that will verify that the pointer value is not in the stack frame of the function that is returning. Note that the pointer value can be 0, or can be a pointer to a heap area or to a cooler stack frame (a caller stack frame).

There is one complication with the return checks. There is no portable way to implement the check that a pointer is in the frame of the returning function. Currently CCured checks that the pointer value is not in the 1Mbyte range that starts at the current address of the frame pointer going towards lower addresses. However, getting the address of the frame pointer is somewhat unreliable in the context of heavy optimizations. The method that seems to work best is to introduce a volatile local variable whose address is then the address of the stack frame. Since the address of the variable is taken and the variable appears first in the local declarations, it appears that both GCC and MSVC will allocate such a variable at the highest address in the frame.

Note that we have observed the CHECK_RETURNPTR check to lead to spurious failures in the case when the function returning a pointer is inlined into its caller. For example, the code example from above the check for the return of foo should succeed and the one for the return of bar should fail. If foo is inlined into bar then foo's check will see that in is in the current stack frame and will generate a run-time error. This is not an ideal situation and we are looking for a better solution.

3.5  Checks for Writing Pointers

Another possible unsoundness with addresses of local variables is when the address of a local variable is written to a global or to the heap. In that case the pointer value might be used later at a time when the underlying storage is being reused by another activation frame.

For example, in the following code fragment, both of the assignments are checked using the CHECK_STOREPTR run-time function. The first one is checked because we are obviously writing to a global variable. The second one is checked because we are writing through a pointer and thus we cannot know for sure whether we are writing to the heap or to the stack.

int *g;
void foo(int * *x) {
  g = *x; // Check this
 
  *x = g; // Check this  
}        
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

The CHECK_STOREPTR is passed both the address we are writing to and the pointer that is being written. This function will in fact allow writing of stack pointers into cooler stack locations (deeper into the stack). The function will also allow the writing of null pointers anywhere. As a special feature, CCured will allow the writing of stack pointers that are in the stack frame of main or at higher addresses. This is useful because the command-line arguments and the environment strings are allocated on the stack of the program before main is called.

In rare occasions, we have encountered programs that do want to write the address of locals variables into global variables. CCured provides an easy-to-use mechanism for dealing with those situations. If you add the attribute __HEAPIFY to the name of a local variable, the CCured will move that variable to the heap using dynamic memory allocation. In fact, just one allocation is made for all __HEAPIFY local variables in a stack frame. Take a look at what happens in the following code fragment (do not be fooled by the call to free; in CCured that is only a hint to the built-in garbage collector):

int *g;
void foo() {
  int local __HEAPIFY = 5;
  g = &local;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

3.6  SEQuence Pointers

So far we have discussed pointers for which we disallow most casts and pointer arithmetic. In this section we will discuss another family of CCured pointers that can be used in pointer arithmetic operations. We call these sequence pointers and they come in two flavors: those that can only be advanced through pointer arithmetic (called forward-sequence pointers or FSEQ) and the regular sequence pointer that can be moved both forward and backward (we use the kind SEQfor these pointers). The cost that the programmer pays for using these more capable pointers is that each dereference will be accompanied by a bounds check.

Consider the following code fragment. The pointer x cannot be SAFE because it is involved in pointer arithmetic. Since we are adding a non-constant value CCured cannot be certain that the pointer is only advancing so it will assign the more general SEQ kind to it.

int * arith(int * x, int delta) {
   return x + delta;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

3.6.1  Representation

By looking at the CCured output for the above code you will notice several changes in the code. First, the type of the x parameter and the name of the function have changed as follows:

struct seq_int {
   int *  __SEQ  _p ;
   struct meta_seq_int {
      void *_b ;
      void *_e ;
   } _ms ;
};
typedef struct seq_int seq_int;

int * arith_sq(seq_int p, int delta);
We see that CCured has created the type seq_int (sequence pointer to an integer). This type has two components, the regular pointer value (field _p) and a metadata component (the field _ms). Metadata is the CCured terminology for additional information that CCured is carrying with the pointers in order to be able to check their usage. All multi-word pointers in CCured are represented as a structure with two elements: a field _p that stores the value of the pointer and the field _ms that stores the metadata for the pointer.

In the case of a SEQ pointer the metadata consists of two pointers, one that stores the beginning of the memory area in which a pointer was created (stored in field _b), and the end of that memory area (in field _e). Such a memory area is also called a home area for a pointer. The meta-data of the home area are generated by CCured for a pointer obtained by allocation or by taking the address of a variable, and is passed along in an assignment. Thus, a SEQ pointer carries with it the beginning and the end of the home area from which it originates and these values will be used to perform the necessary bounds checking.



The structures denoting fat pointers are named by adding a prefix corresponding to the kind of fat pointer to a canonical name of the type. For the general rules for naming types, see Section A.1.

Notice also that the name of the arith function has changed. CCured mangles the names of globals whose type has changed. We do this to ensure that you are not going to be linking your CCured code with a library that, for example, calls the arith function with a regular pointer argument. The mangling is always in the form of a suffix separated from the main name by an underscore. The suffix is constructed as a sequence of letters, each one signifying a certain kind of pointer (q stands for SEQ). The order of the letters corresponds to the order in which the pointer type is encountered in a depth-first in-order traversal of the structure of the global's type (for functions we scan the result type first and then the arguments in order; however, we do not scan structures and unions). For the general rules on global name mangling, see Section 8.1.

Sequence pointers have an additional capability: they can be set to any integer value, not just to 0 as in the case of SAFE pointers. We allow this because the sequence pointers have the additional fields that can be encoded to identify an integer disguised as a pointer. In particular both the _b and the _e fields of a SEQ pointer are null in the case when the pointer is actually an integer. The example below uses this capability of sequence pointers. Notice that null pointers (those in which all three fields are 0) are just a special case in which a SEQ pointer is actually an integer.

int * __SEQ getSeq() {
    return 5;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

3.6.2  Invariants for SEQ pointers

When sequence pointers are assigned, passed as arguments or as return values, written or read from memory they carry their metadata unchanged. Same happens when they are subject to casts or to pointer arithmetic. (Pointer arithmetic affects only the _p component of the sequence pointer but not the home area.) There are two operations in which SEQ pointers “are born”: by using the name of a global or local array (possibly embedded inside other structures or arrays) or by dynamic memory allocation. It is at that time that the metadata for the SEQ pointers is computed and initialized, in the case of an array based on the array length. and in the case of memory allocation based on the allocated size. Take a look at the code CCured generates to initialize the metadata for the r1 and r2 pointers in the code below, both for the case of the memory allocation and for using the name of an array.

extern void* malloc(unsigned int);
int foo(int x) {
  int *p, *r1, *r2;
  int a[8];

  r1 = (int*)malloc(16);
  r2 = a;
  p = r1; 
  p = r2;
  return *(p + x); // Force p to be SEQ
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Whenever a pointer is subject to pointer arithmetic CCured will force that pointer to be SEQ. This means that the pointer should come accompanied by appropriate metadata. Thus the CCured inferencer will propagate the request for the metadata backwards through the data flow, across function calls and returns, to all the places where the pointer might be produced. For example in the previous example p is subject to pointer arithmetic so it must be SEQ. But since p is assigned from r1 and r2, they too must be SEQ. Finally the request reaches the name of the array a (in which case the metadata for a SEQ pointer is computed based on the length of a) and also the malloc (in which case the metadata is computed based on the allocated length).

3.6.3  Run-time checks for SEQ pointers

One of the design decisions for SEQ pointers was whether to check that the pointer remains within bounds after each arithmetic operation, or to allow pointer to go temporarily out of bounds and do the check when you use the pointer. We chose to check the dereferences because the C standard actually allows pointers to point outside their home area.

Sometimes a pointer is subject to arithmetic and then assigned to a pointer that is used only for reading and writing. The latter pointer will be inferred to be SAFE and the SEQ pointer will be converted to a SAFE pointer.

int foo(int x) {
  int *p, *safe;
  p += x; // p is SEQ
  safe = p; // safe is SAFE
  return *safe;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

To convert a SEQ pointer to SAFE we must check that it is in within the bounds of the home area or is null. Note that even though a SEQ pointer might contain arbitrary integers, a SAFE pointer can only contain the integer 0. In the above code you can observe a run-time call to CHECK_xxx that performs the bounds checking. The same check is used when reading or writing through a SEQ pointer, as in the example below.

int addAll(int * p, int len, int stride) {
    int sum = 0, i;
    for(; len >= 0; len -= stride, p += stride) {
       sum += *p;
    }
    return sum;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Just like for SAFE pointers we must check the pointer validity when taking the address of a field of an object pointed to by a SEQ pointer:

struct elem {
   int f1, f2;
   int nested[8];
};

int foo(struct elem *array, int len) {
  int * pnested, * pnestedseq;
  array += len; // Make array a SEQ
  pnested = & array->f2; // A bounds check here
  pnestedseq = & array->nested[2]; // A bounds check here
  pnestedseq += len; // pnestedseq is a SEQ
}   
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In the example above array is a SEQ pointer pointing to elements of type struct elem. As long as we only do pointer arithmetic on array no bounds checking is necessary. However, when we take the address of the field f2 we will obtain a SAFE pointer and thus we must be sure that arrayis within the bounds of its home area. Another interesting situation occurs on the next line in the above example where we obtain a SEQ pointer with a home included within the home of array. The home of pnestedseq is the nested array within the struct elem element pointed to by array. But again we must know that array is within bounds.

Just like for SAFE pointers we must check for stack addresses when a SEQ pointer is written through a pointer or returned from a function. But in this case the check is more subtle. To see why consider the following program fragment:

// return a fat pointer to my own local, but using arithmetic
// to hide the fact that it's mine
int * sneaky()
{
  int local[2];
  int *x = local;
  x += 200;            // push x (apparently) above my frame
  return x;
}
int main() {
  int *plocal = sneaky();
  return *(plocal - 200); // Back into its (vanished) home
  
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If we were to check that the _p field is not a local stack address we would fail to notice the unsoundness in the above example. For this reason the stack checks are performed using the _b pointer. Another to do so is that if the _b pointer is null then we are returning an integer disguised as a pointer and we should not care whether it is equal to a stack address (since such a pointer cannot ever be dereferenced).

The bounds checks in CCured are more involved that in languages like Java where all of the elements have the same size. It turns out that we can write faster checks if we maintain the invariant that each FSEQ or SEQ pointer points to an area that contains a whole number of elements of the given type. Consider the following code fragment:

char buffer[17]; 
int main() {
  long * __FSEQ p = buffer; // This will fail
  return p[2];
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

This will fail with an alignment check because there is no room for a whole number of long elements in buffer.
Failure ALIGNSEQ at ./foo.c:3: main(): Creating an unaligned sequence
In general, this check can appear as part of a cast from a sequence pointer to a sequence pointer of a wider base type. However, the above program is perfectly fine. We can do several things:

3.6.4  Casts allowed for SEQ pointers

. It is always possible to cast a SEQ pointer to a SAFE pointer with the same underlying type. A bounds-check is performed in that case. It is also possible to cast a SAFE pointer to a SEQ pointer, in which case the home area of the new pointer is the memory range occupied by one element of the SAFE pointer type.

Casting of SEQ pointers is only allowed when the underlying types are the same or very closely related. We cannot freely allow the casting of SEQ pointers using the physical subtyping rules that we used for SAFE pointers. To see why consider the following program:
[--noSplitPointers]
struct wide {
   int i;
   int *p;
};
void foo(struct wide * __SEQ x) {
  int * __SEQ pi = (int*)wide;
  *(pi + 1) = 5;
}
Notice that the cast has the property that it casts a pointer to a large structure to a pointer to smaller structure that is compatible with the large one. If we were to allow the above program then we would be able to write an arbitrary integer in a place where the pointer-type field p is stored. The rule for SEQ pointers is that an infinite-tiling of the two types being cast is compatible. This allows us to cast a pointer to an array into a pointer to array elements (a very useful operation when working with multi-dimensional arrays):

double a[8][8];
int zero() {
  double * pa = a;
  for(int i=0; i<sizeof(a)/sizeof(double); i++) {
     * pa ++ = 0.0;
  }
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

3.7  FSEQ Pointers

We observed in our experiments that most SEQ pointers only move forward. Thus the lower-bound check is not needed and also the _b field of the SEQ pointer is also not needed. To capture this common case CCured is using the FSEQ pointer kind (forward sequence). The FSEQ pointer is very similar to the SEQ pointer with a few exceptions.

Consider the following example:

int addAll(int * p, int len) {
    int sum = 0, i;
    for(; len >= 0; len --, p ++) {
       sum += *p;
    }
    return sum;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

A FSEQ pointer is represented as a two-word structure with just the _p (actual pointer value) and _ms._e (end of the home area) fields:

struct meta_fseqp_int {
   void *_e ;
};
struct fseqp_int {
   int *  __FSEQ  _p ;
   struct meta_fseqp_int _ms ;
};

typedef struct fseqp_int fseqp_int;

int addAll_f(int *  __FSEQ  p, void *p_ms_e , int len);




A FSEQ pointer can also encode an integer, in which case the _e field is null.

A FSEQ pointer with a non-null _e field points always to an address that is above or equal to the beginning of the home area. However, it might be beyond the end of the home area, and that is why a FSEQ pointer requires an upper-bound check whenever it is used (see the CCured output for the above example).

When doing stack checks for the FSEQ pointer we use the value of the _e field.

However, we must check for each arithmetic operation on FSEQ pointers whether it is advancing the pointer or not. This is done using the CHECK_ADVANCE run-time function. Notice that just because we add 1 to a pointer it does not mean that we are advancing it. We might be trying to overflow the addition and to break the invariant.

What remains to be said about FSEQ pointer is how does CCured infer that a pointer is FSEQ as opposed to SEQ. CCured looks at all arithmetic operations and if we always adding a positive constant to a pointer then CCured will infer that pointer to be FSEQ. Another useful heuristic that CCured uses is that pointer arithmetic expressed using the array indexing notation is taken as an indication that we are advancing the pointer:

int addAll(int * p, int len) {
    int sum = 0, i;
    for(i=0; i<len; i++) {
       sum += p[i];
    }
    return sum;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Finally, FSEQ pointers can be cast to and from SAFE pointers using the same rules that we discussed for SEQ pointers. FSEQ pointers can also be cast to and from SEQ pointers. When casting a SEQ pointer into a FSEQ pointer we must perform a lower-bound check. When casting a FSEQ pointer into a SEQ pointer we consider that the home area starts at the place where the pointer is pointing (provided that the pointer is not encoding an integer and is within bounds). Note however that such a cast will never occur in a program without pointer-kind annotations. The CCured inferencer will instead prefer to propagate the constraint that all pointers which are assigned to SEQ pointer must themselves be SEQ pointers and thus have a valid _b field.

3.7.1  Arrays of unspecified length

We saw that a SEQ pointer obtains its metadata from the length present in the array type from which they originate. But occasionally it is useful to have arrays with either unspecified length or with a zero-length. Consider the following code, in which the struct open is open-ended, that is the number of integer pointers contained in the rest array field is determined at the allocation time (in this case 4):

struct open {
  int   count;
  int * rest[0];
};
extern void* malloc(unsigned int);
int main() {

 struct open *p = (struct open*)malloc(20);
 return p->rest[5];
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

CCured supports this paradigm and computes the size of the rest field at the time of the allocation. Then CCured will turn the array rest into a sized array, which is essentially a structure with two fields: a field _size which stores the size of the array and a second field _array which contains the array itself:

struct _sized_a_char {
   unsigned int _size ;
   char (  __SIZED  _array)[20] ;
};
Sized arrays are very similar to the Java arrays in that they store their length in the first word of the data structure. When a SEQ pointer is created from such an array the metadata is computed based on the stored size.

There is one more situation in which CCured will automatically infer that an array must be sized. That is when the array is declared external and without a length:

extern a[];
int main() {
    return a[3];
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Additionally, the programmer can request that an array be represented in sized form by using the __SIZED attribute on the array name:

int *a [8] __SIZED;
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Note that CCured does not support the allocation of sequences of arrays of structures with open arrays. For example, the following code will produce a warning and will allocate a sequence of struct open elements each with a zero-length rest field!

struct open {
  int   count;
  int * rest[0];
};
extern void* malloc(unsigned int);
int main() {

 struct open *p = (struct open*)malloc(20);
 p ++; // Make p FSEQ
 return p->rest[5];
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

You might have noticed in the previous example that when reading from a SEQ pointer, CCured will allow the reading of the byte immediately following the array. This “feature” is part of our current mechanism for handling null-terminated strings, in which case the terminal null character can be read but not written, as discussed in the next example.

3.8  WILD Pointers

The pointer kinds we have seen so far can be dereferenced and can be subject to pointer arithmetic but can only be cast in very restrictive ways. Therefore we cannot really hope to be able to annotate all existing C programs with kinds like SAFE, SEQ and FSEQ. We also need a pointer kind that can also be cast to any other type. The WILD pointer kind plays this role. Looking back at the kinds of pointers we have introduced so far we observe that the most restrictive kind of pointer, the SAFE pointer, is also the cheapest to use. It requires only one word for storage and only a null check for dereference; just like Java references. Then as we add more capabilities we also increase the cost of the pointer. The FSEQ pointers have all the capabilities of SAFE pointers but can also move forward. The additional cost is an extra word required for the storage of the end of the valid range and an upped-bound check required before dereference. The SEQ pointers have the additional capability of moving backwards and the additional cost of one more storage word and a lower-bound check before dereference. Keeping with this trend it is to be expected that WILD pointers are going to be even more costly. As we shall see, the WILD pointers must be able not only to find the bounds of the range in which they are supposed to navigate but they must also know for each word in that range whether it is a pointer or a non-pointer. The previously-introduced kinds of pointers did not need to maintain that information at run-time because the lack of casts allowed the compiler to keep track of such information statically.

One way to think of the CCured pointer-kind inferencer is as an analysis that classifies your pointers into two big categories: those for which the static type is an accurate description of the values pointed to; and those for which it is not. We refer to the first category as the statically-typed pointers and they consist of the pointers discussed so far: SAFE, SEQ, and FSEQ. The second category consist of the dynamically-typed pointers and include the WILD pointers. Since the compiler cannot verify statically the type of the values the WILD pointers point to, then CCured inserts code to maintain at run-time information about the contents of a memory range pointed to by WILDpointers.

Consider the following example, in which a (of type int *) is cast to type int * *:

int foo(int * a) {
   int * * g = (int * *)a; // Bad cast
   return 0;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

There are several new things in the CCured version of the above program fragment. We start with the type of a:

struct meta_wildp_int {
   void *_b ;
}   ;
struct wildp_int {
   int *  __WILD  _p ;
   struct meta_wildp_int _ms ;
}   ;
typedef struct wildp_int wildp_int;

int foo_w(wildp_int __WILD a);




A WILD pointer is represented as a two-word structure. As usual the _p field stores the actual pointer and just like for SEQ pointers the _b field stores the beginning of the pointer's home area. The major difference in the representation of WILD pointers is the layout of the home area. Since we must keep track at run-time what is stored in each location in a dynamically-typed area we will store a bitmap (one bit per word) at the end of the home area. And just like for sized arrays the word immediately before the home area stores the size in words of the home area. Such an example is shown in the code below where the address of the local variable h is the home for the pointer p.

void foo() {
   int * h = 0;
   int * p = (int *) &h;
   return 0;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

The type of h is:

struct _tagged_wildp_int {
   unsigned int _len ;
   wildp_int    _data  __attribute__((__packed__)) ;
   int _tags[(sizeof(wildp_int    ) + 127U) >> 7]  __attribute__((__packed__)) ;
};
typedef struct _tagged_wildp_int _tagged_wildp_int;
The _data field in a tagged area stores the actual data, which in this case is a WILD pointer. The _tags field contains one word for every 32 words of data. Each bit in the tags is 1 if the corresponding word in the data field stores the _b field of a WILD pointer, and 0 otherwise (if it contains an integer or the _p field of a WILD pointer or equivalently a value that is not CCured metadata). To maintain this invariant we must update the tag bits every time we perform a memory write. When we write an integer into a word we clear the bit corresponding to that word. When we write a WILD pointer then we set the two bits corresponding to the two words to 0 and 1 respectively. When we read an integer we do not need to check tags. When we read a WILD pointer then we check that the tag bit for the word from which we'll read the _b component has a set tag bit. If this is not true then we check whether the _b component that we'll read is 0. In this latter case we would be reading a pointer that cannot be used for memory dereference anyway. This latter situation occurs when we read a pointer from an area that has been initialized with zeros.

Notice that this scheme ensures that we will never interpret a word in a tagged area as a _b field except if it was last written with a contents of a _b field. This however does not prevent code that overwrites the _p fields from running (except that the resulting pointers might not be later usable).

The following run-time support functions are used in conjunction with WILD pointers:

/* Fetch the size (in words) of the tagged area pointed to by a WILD pointer.
   This also checks that the pointer has a valid _b field */
unsigned int 
CHECK_FETCHLENGTH(void *_p,          /* The _p field of the pointer */
                  void *_b);         /* The _b field */

/* Do bounds checking for WILD pointers */
CHECK_BOUNDS_LEN(void *_b,           
                 unsigned int bwords,/* Result of FETCHLENGTH */
                 void *_p, 
                 unsigned int plen); /* The size in bytes of the memory area
                                        being accessed */

/* Clear the tags for a memory range. This is called before writing a scalar
   or a structure containing at least one scalar into a tagged area. */
CHECK_ZEROTAGS(void *base,            /* The base of the tagged area */
               unsigned int nrwords,  /* Number of data words in the area */
               void *start,           /* Start of the memory range for which 
                                         to clear the tags */
               unsigned int size);   /* Size in bytes of the memory range for
                                        which to clear the tags */

/* Set the tags for writing a pointer. This also checks that we are not
   writing a stack pointer. This is called for EACH pointer in a structure
   that is being written. */
CHECK_WILDPOINTERWRITE(void *base,    /* The base of the tagged area in which
                                         we write */
                       unsigned int nrwords, /* Number of data words in the 
                                                area */
                       void **where, /* The address in the tagged area where 
                                        we are about to write */
                       void *_b,     /* The _b field of the written pointer */
                       void *_p);    /* The _p field of the written pointer */


/* Check that the pointer we are about to read has a _b field that has not
been tampered with */
CHECK_WILDPOINTERREAD(void *base,    /* The base of the tagged area in which
                                         we write */
                       unsigned int nrwords, /* Number of data words in the 
                                                area */
                       void **where, /* The address in the tagged area where 
                                        we are about to write */
                       void *_b,     /* The _b field of the written pointer */
                       void *_p);    /* The _p field of the written pointer */
The code fragment below uses these runtime functions:

struct s {
  int  i; // Some integer
  int *q; // And some pointer
} g, * __WILD pg = &g;

int foo(struct s * __WILD x) {
   // Read an integer from x. 
   // Must do bounds check
   int read = x->i;
   // Read a pointer from x
   // Must do bounds check and check that the _b field is valid
   int * ptr = x->q;
   // Write an integer
   x->i = 5;
   // Write a pointer
   x->q = (int*)6;
   // Read and write the whole struct
   g = *x;     
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

A WILD pointer can be used in a very flexible way but there are some constraints. Like any of the other pointers that can be subject to pointer arithmetic the WILD pointer carries with it the identity of the home area and can be used to access only that area. But most importantly a WILD pointer can only be case to and from another WILD pointer and can only point to scalars or other WILD pointers. Essentially this means that the dynamically-typed universe can touch the statically-typed universe in only a single way: a WILD pointer can be stored in a statically-typed area.

The following pointer kind is not legal “int * SAFE * WILD”. To see why consider the (ill-typed) code below:

int foo(int * __SAFE * __WILD x) {
  int * __SAFE y;
  *(int __WILD *)x = 5; // Ok since x can be cast to another __WILD pointer
  y = * x; // Ok
  return *y; // Ok since y is SAFE and non-null
}
Essentially we cannot count on the accuracy of the types pointed to by WILD pointers. For this reason we can only allow WILD pointers to point to scalars or other WILD pointers. So, in the above example CCured does not recognize x as a valid type.

For the same reason we cannot cast between WILD pointers and non-WILD pointers.

3.9  Split Metadata

In the previous examples pointers with metadata are represented as structures. It turns out that gcc or the Microsoft Visual C compiler are not very effective at optimizing code that uses many variables with structured types. Thus, CCured has the ability to split such variables into several single-word variables. Consider again one of the examples from before:

int * arith(int * x, int delta) {
   return x + delta;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Notice that x becomes split into three variables: x (for the _p field, or the regular pointer value), and x_b and x_e for the beginning and end metadata fields. Notice also that function parameters and arguments (but not the results are split in the same way):

int * arith_sq(int * x, void * x_b, void * x_e, int delta);
By default, CCured splits the metadata. You must pass the argument –noSplitPointers to prevent it.

Chapter 4  Invoking CCured

CCured consists of several components: an Ocaml application that does the main work, a set of Perl scripts that are used to invoke the CCured application, and a set of header and run-time library files.

The easiest way to use CCured is through the bin/ccured script. This script is intended to be used in the same context and with the same command-options as either the gcc compiler or the Microsoft Visual C compiler (MSVC). This script is configured at installation time to know where the rest of the CCured installation resides. If you move this script to another directory you must also make a copy in the new directory of the CilConfig.pm file.

Since ccured is a drop-in replacement for gcc, for most software projects you can reuse the regular build-infrastructure:
make mystuff CC="bin/ccured [options]"
Here is the sequence of actions that the ccured script performs:
  1. It recognizes among the command-line arguments those that are intended for the pre-processor; then, for each source file (i.e. with the extension t.c), calls the preprocessor and places the result in a file with the extension .i in the same directory as the source file.
  2. For every .i file that it produces and that must be compiled (i.e. the -E option was not specified to require only preprocessing) ccured will save a copy of the file with the extension .o, thus “fooling” make that the object file was actually produced.
  3. Whenever ccured is invoked to link into a library a number of .o files that are actually preprocessed sources saved in the previous step, the CCured engine will be invoked to parse all of the files and produce a single C file with the same content but with names of types and variables properly renamed. The output is then saved as the resulting library. You should use the –mode=AR argument to CCured if you want to pass the remaining arguments as for the ar utility. More details about the merging stage that CCured used can be found .
  4. Finally, when ccured is called to link into an executable a number of object files and libraries it will separate from them those that are actually saved sources and will merge them all in memory. The resulting file is then subject to CCured type inference followed by the insertion of run-time checks. Optionally, an optimizer is invoked to try to clean up some of the inserted run-time checks. The result is saved using the full name of the desired executable with the suffix ccured.c. Finally this file is preprocessed, compiled and linked using the underlying compiler.
Notice that by default the curing process is invoked on the whole program. This is necessary to allow the CCured inferencer to see all the uses of your pointers. An alternative is to annotate the include files with pointer-kind annotation and let the inferencer do the inference within one file only.

4.1  Command-line options

You always run CCured with the command options of gcc (e.g., -c to compile only, -o to specify the output file, etc.). Here are some common ways to use invoke CCured: Most of the command-line options that you pass to ccured will be passed along to the underlying preprocessor, compiler or linker. However the ccured script recognizes the following special options: For most performance you should use the options:
--optimize --releaselib --alwaysStopOnError --failIsTerse
All of the other options that start with (and are not recognized as compiler options) are passed unmodified to the CCured Ocaml application.

4.2  Controlling error handling at run time

After you cure a program you can run it as usual. The operations of the CCured-inserted run-time checks can be controlled with a few environment variables when the target program is run:

Chapter 5  CCured Type Inference

If you do not pass pass the –nocure argument to ccured, CCured will automatically infer pointer kinds (see Section 3.1) for the pointer types in your program. In a nutshell, this is done by creating a graph with one node for every pointer type in the program. If the program contains a cast or assignment from one type to another, the graph contains an edge between the corresponding nodes.

Once the graph has been created, the inferencer will examine every node and edge in the graph. If the edge represents a cast that is not captured by our notion of physical subtyping (i.e., a cast that we cannot statically verify to be valid) we mark the involved nodes (and thus the pointers in the program they are associated with) as WILD. Since WILDpointers may only point to other WILDareas, any node connected to a WILDnode must also be WILD.

The inferencer then checks the remaining nodes to see if the types they represent are involved in pointer arithmetic. A node that is only incremented can be made FSEQwhile a node that is subject to general arithmetic must be SEQ. All remaining nodes (i.e., those that adhere to our notion of physical subtyping and have no other constraints) become SAFE.

The actual inferencer includes support for a number of specialty pointer types, like the run-time type information pointer (RTTI), but the basic idea remains as described above. The end result of the inferencer is the graph, which serves as a mapping from types in the program to pointer kinds. The module that inserts run-time checks into the program uses this information to determine which checks to put where.

In the process of adding run-time checks to make the program type- and memory-safe, CCured introduces new types, changes old types and changes function prototypes. In each of these cases a new name is introduced to eliminate confusion and to prevent the resulting program from linking improperly.

5.1  Using the pointer browser

To inspect the results of the inference you should use the pointer browser. Every time you run CCured it will produce a directory called foo.browser (where foo is the name of the executable you are creating). Alternatively, you can use the –browserdir option to CCured to specify in which directory it should place the browser. That directory contains HTML files and Javascript programs that you can use to find the reasoning that CCured has used during pointer kind inference. To start the browser, point your web browser (Mozilla or IE; Netscape was broken beyond belief when we checked last in September 2002) to the file foo.browser/index.html and get going.

The browser will show you the preprocessed and merged file with annotations about the pointer kinds. The file that you will see also has the result of processing the polymorphism directives (see Section 7.1) and the wrappers (see Chapter 8).

The documentation for the pointer browser is at browser_help.html.

There is an alternative lower-level way to inspect the result of the inference, which is described in Appendix A.

Chapter 6  Using CCured

In this chapter we describe the typical steps that must be taken to use CCured on a new software package. For each step we give general instructions. Then, we look at a few concrete software packages and we describe all the steps that were necessary. We describe the warnings and the errors that we ran into and how we solved them. Most of these warnings and errors are also discussed in Chapter 10. Take a look there if the explanation here is too terse.
  1. Regular build First try to download the sources and build them in the regular manner (using gcc). It might be a good idea to setup a CVS repository right after you download the sources, in order to better keep track of the changes you are making.

  2. Build using CIL Next you should try to build the package using CIL (the front-end that CCured uses). This step is optional but is a good idea just in case your software package exposes a bug in CIL. To perform this step you should edit the Makefile as follows (assuming it uses CC to invoke the compiler and the linker):
    ifdef CCURED
     CC:=/home/necula/ccured/bin/ccured
    endif
    ifdef NOCURE
     CC+= --nocure
    endif
    ifdef NOMERGE
     CC+= --nomerge
    endif
    
    If your project includes a more complicated setup, you must make the necessary changes to use the above commands instead of gcc (both for compiling and for linking).

    Then you can run:
    make clean
    make CCURED=1 NOCURE=1 NOMERGE=1
    
    This should build your project as before, except that each source file is first preprocessed, then passed through the ccured.asm.exe executable to produce the CIL output (a file with suffix cil.c that contains the source of your program after being processed by the CIL front-end), then preprocessed again and then finally passed to gcc. Here is an example of what you should see for the file util.c from the mathopd package:
    /home/necula/ccured/bin/ccured --nocure --nomerge -c -O -Wall -DHAVE_CRYPT_H util.c -o util.o
    gcc -D_GNUCC -E  -O -DHAVE_CRYPT_H util.c -o ./util.i
    /home/necula/ccured/obj/x86_LINUX/ccured.asm.exe --cilout ./utilcil.c --nocure --warnall ./util.i
    gcc -D_GNUCC -E  -O -DHAVE_CRYPT_H -I/home/necula/ccured/include ./utilcil.c -o ./utilcil.i
    gcc -D_GNUCC -c -O -DHAVE_CRYPT_H -Wall -o util.o ./utilcil.i
    
    At this point you should ensure that the executable still works as expected. If it does not then you have found a bug in the CIL front-end or in the ccured Perl script that tries to impersonate gcc. CIL has been tested extensively, so you can consider yourself truly unfortunate. Please let us know about your problem.

    Next, we try the same thing, but this time with merging. In this mode of operation, the CIL front-end will attempt to create a single C source file from all of the files in your project. In this case, when make invokes ccured to compile a source file, the resulting object file will contain just the preprocessed version of the source. And when ccured is invoked to link the executable then all of the preprocessed sources are merged into a single file (with the suffix combcil.c), which is processed as before. Here is an example of what you should see for the mathopd package:
    make clean
    make CCURED=1 NOCURE=1
    ...
    gcc -D_GNUCC -E  -O -DHAVE_CRYPT_H util.c -o ./util.i
    /home/necula/ccured/bin/ccured --nocure  -o mathopd base64.o cgi.o config.o core.o dump.o imap.o log.o main.o redirect.o request.o util.o -lcrypt
    /home/necula/ccured/obj/x86_LINUX/ccured.asm.exe --cilout ./mathopd_combcil.c --nocure  base64.o cgi.o config.o core.o dump.o imap.o log.o main.o redirect.o request.o util.o
    gcc -D_GNUCC -E -I/home/necula/ccured/include ./mathopd_combcil.c -o ./mathopd_combcil.i
    gcc -D_GNUCC -c  -o mathopd_comb.o ./mathopd_combcil.i
    gcc -D_GNUCC  -o mathopd mathopd_comb.o -lcrypt
    
    You should again try to run your executable. If it does not work as expected then you have found a bug in the merger. This is again unlikely, but if it happens, let us know.

    If your project is built by first creating some libraries then merging works in a slightly different way. Take a look at the UCSPI TCP example (Section 6.2) next to see how that goes.

  3. Build with CCured Now we start using CCured:
    make clean
    make CCURED=1
    
    The sequence of operations is the same as in the case of CIL with merging, except that this time, after the sources are merged into foo_comb.c, the ccured.asm.exe engine is invoked. CCured will print information about the stages that it goes through. You should watch for warnings and error messages, especially in the “Inference” stage and in the “Curing” stage. If you are lucky the above steps are enough to build the executable. However, many times CCured will stop with some error. The first thing you should do in that case is to scan the warnings that lead to the error and proceed as explained in Chapter 10.

  4. Write the wrappers

    If curing succeeds the resulting file (with suffix cured.c) is passed to gcc for compilation and linking. Often you will see linking errors that mean that CCured has changed the interface of a function in a way that is incompatible with the library version of the function. See the examples below and Chapter 8 for a tutorial on writing wrappers.

  5. Run and debug the cured code Now that you have build your executable with CCured you should run it on as many examples as possible. Remember that CCured is engineered to catch the majority of the bugs at run-time (it is designed with the philosophy that the C programmer is better than the CCured static analyzer, so CCured just silently inserts a run-time checks when it cannot ensure statically that what the program is doing is guaranteed to be correct).

    When you get an error message from CCured you should investigate it to see if it is a false alarm or a true bug.

    You can tell CCured to continue the execution after it encounters an error if you set the environment variable CCURED_CONTINUE_ON_ERROR.
Next we look at concrete examples. If you want to try your hand at using CCured on real code you might want to try it on these packages and then use the instructions when you get stuck.

6.1  Example: mathopd HTTP server

6.1.1  Step 1: Regular Build

From the README file: “This is Mathopd, a fast, lightweight, non-forking HTTP server for UN*X systems.”

We describe here the steps required for processing the version 1.4-gamma of mathopd (a development version). This package contains of 5000 lines of code.

We download mathopd-1.4-gamma.tar.gz, unpack, change the Makefile as required for Linux and then we try it out:
cd src
make
Next we edit the configuration file (doc/sample.cfg) so that we can run the server on port 8000. Right after the line Server { we add Port 8000. Then we must become root and create a directory for the log:
su root
mkdir /var/mathopd
chmod 777 /var/mathopd
exit
./mathopd <../doc/sample.cfg
Now from another machine (make sure you have  /public_html/index.html on the server machine):
explorer http://manju.cs.berkeley.edu:8000/~necula/
and we see that it works.

6.1.2  Step 2: Build with CIL

Completely uneventful.

6.1.3  Step 3: Build with CCured

For mathopd we saw the warning:
Warning: Generated automatic vararg descriptor for log_d: struct autoVarargDescr_log_d : char const   */* __attribute__((___ptrnode__(922))) */,
uid_t
If this is a printf-like function you should declare it!
As explained in Chapter 10, we take a look at the implementation of log_d (we find it in the merged file mathopd_comb.c) to make sure CCured did not miss anything. Sure enough, log_d is a printf warning. Same thing for die. We fix this by adding the following pragma in main.c (see Section 9.6.1 for details):
#pragma ccuredvararg("log_d", printf(1))
#pragma ccuredvararg("die", printf(2))
In the process of doing mathopd, we encountered the warning:
/home/necula/ccured/include/netdb_wrappers.h:329: Warning: Solver: changing User Specified SAFE node 1371 (the local variable p_ith_alias) to WILD
This turned out to be a bug in the wrapper for the socket functions. We found it using the browser (see Section 5.1).

Then we saw a warning:
/usr/include/sys/socket.h:156: Warning: sendmsg appears to be external
  (it has a wrapper), yet it has a mangled name: sendmsg_scsws_.
  Did you forget to use __ptrof and a version of __mkptr?
 For more information, consult the online documentation on
  "Writing Wrappers".
This turned out to be due to the same socket wrapper error.

Another warning you might see when you run CCured is:
3 incompatible types flow into node void  *1127
  Type struct iovec_LEAN  *1178 at /home/necula/ccured/include/socket_wrappers.h:237
  Type char */* __NODE(1371) __ROSTRING  */ *1372 at /home/necula/ccured/include/netdb_wrappers.h:332
  Type struct iovec_LEAN  *1146 at /home/necula/ccured/include/socket_wrappers.h:219
This means that a void * node is cast to several incompatible types. When you investigate this (using the browser, for example, or just following the line numbers) you discover that the __trusted_add_iov issue is the cause of this also.

Once we fix the above problem we notice that there are no more WILD pointers:
ptrkinds: Graph contains 12886 nodes
ptrkinds:   SAFE - 9256 ( 72\%)
ptrkinds:   SEQ - 429 (  3\%)
ptrkinds:   FSEQ - 3201 ( 25\%)

6.1.4  Step 4: Write the wrappers

We see now that there are some missing functions:
mathopd_comb.o: In function `log_request':
mathopd_comb.o(.text+0x1b02d): undefined reference to `asctime_qs'
See Chapter 8 for information on how to write this wrapper: Essentially this is what we had to add to time_wrappers.h file:
#pragma ccuredwrapper("asctime_wrapper", of("asctime"))
__inline static
char *asctime_wrapper(const struct tm *timep) {
  struct tm *thinTimep = __ptrof(timep);
  char *thinRet = asctime(thinTimep);
  return __mkptr_string(thinRet);
}

6.1.5  Step 5: Run and Debug the cured code

We ran the mathopd server and we get an error right away:
Failure at config.c:924: new_pool(): Ubound
Aborted
We look at the code and we see this code in config.c:

p->ceiling = t + s;
This looks like the pointer that is stored in the ceiling field is outside bounds, that that is Ok. This field is never used as a pointer. So, we change its type to long instead. We could change it to FSEQ as well. See Chapter 10 for more possible solutions.

Now mathopd seems to work!! But of course you should be using it for real in order to find the bugs.

6.2  Example: UCSPI TCP Suite

6.2.1  Step 1: Regular Build

This is a package that provides “tcpserver” and “tcpclient”. From their web page (): “they are easy-to-use command-line tools for building TCP client-server applications.” The package also includes a number of clients built using these tools. There are 6700 lines of code in this package. We actually found one bug in this library.

We downloaded the version 0.88 (as of January 9, 2003). Following the instructions:
gunzip ucspi-tcp-0.88.tar
tar -xf ucspi-tcp-0.88.tar
cd ucspi-tcp-0.88
make
Before we install it, we edit the file conf_home to point to the current directory (we do not want to mess up the /usr/local). Then we continue:
make setup check
And now the big moment:
./http@ www.yahoo.com
Bingo!

6.2.2  Step 2: Build with CIL

This software package has a strange build interface. The Makefile contains things like:
addcr.o: \
compile addcr.c buffer.h exit.h
        ./compile addcr.c

compile: \
warn-auto.sh conf-cc
        ( cat warn-auto.sh; \
        echo exec "`head -1 conf-cc`" '-c $${1+"$$@"}' \
        ) > compile
        chmod 755 compile
What is happening is that the compile script is created from the first line of conf-cc and then used as a compiler. That line contains gcc -O2 right now. We change that file to use instead the value of the CC environment variable:
$CC -O2
(You might want to also change warn-auto.sh to add a -v argument to /bin/sh so that you see what is going on.)

Then we add the stuff to Makefile to define CC (and export it to child scripts). It turns out that this file also was missing a clean target, so we add that as well:
export CC:=gcc
ifdef CCURED
 CC:=/home/necula/ccured/bin/ccured
endif
ifdef NOCURE
 CC+= --nocure
endif
ifdef NOMERGE
 CC+= --nomerge
endif

clean:
        rm -f *.i *.o *.a
        rm -f *cil.c *infer.c *comb.c *cured.c
        rm http@ tcpclient tcpserver
Just to test the new setup we make it again and test it again.

Now we try to make it with CIL.
make clean
make CCURED=1 NOCURE=1 NOMERGE=1
Now we get an error:
tcpserver.o: In function `main':
tcpserver.o(.text+0xa29): undefined reference to `env_get'
tcpserver.o(.text+0xa50): undefined reference to `env_get'
Clearly we have done something wrong. A quick investigation reveals that tcpserver.c does need env_get, which is defined in env.c, but seems to be missing from envcil.c. This means that the CIL front-end has dropped this function. This is an embarrassing CIL bug. Now, when you have things disappear then the fault is most often in the algorithm that CIL uses to remove “unnecessary” things (such as locals or prototypes that are not used). To disable that stage, pass the –keepunused flag to CCured.

Anyway, we fix that bug and now everything works. We now have to try the merging. For this we must also intervene in the way the Makefile links libraries and executables. It uses the scripts makelib to make a library and load to make an executable.

For the load script all we need to change is the conf-ld to use $CC instead of gcc. We do this and we run with merging. We get this error:
gcc -D_GNUCC  -o tcpserver -s tcpserver_comb.o cdb.a dns.a time.a unix.a byte.a
cdb.a: could not read symbols: Archive has no index; run ranlib to add one
As explained in Chapter 10, this is because we should not use ar to archive files in the merging mode, but we should use ccured –mode=AR. We achieve this by changing the Makefile, which creates the makelib script. We change it such that the makelib script uses the environment variable AR instead of ar. And we defined AR in the Makefile as follows:
export AR:=ar
ifdef CCURED
  ifndef NOMERGE
    AR:=/home/necula/ccured/bin/ccured --mode=AR
  endif
endif

We run again in merging mode and now we get:
ranlib: cdb.a: File format not recognized
You look at cdb.a and find that it is a merged source file. You should not use ranlib on such files. We edit the Makefile again and rerun.

Now we get this error message when trying to merge rblsmtpd from a number of object files and libraries:
/usr/include/sys/socket.h:189: Error: Incompatible declaration for accept (4). Previous was at rblsmtpd.c:103 (0) (different type constructors: void  vs. int )
What happens is that the file rblsmtpd.c defines its own global accept (luckily, with a different type than the one in the library; otherwise CCured would not have noticed!). Yet, one of the other files that are merged (socket_accept.c) uses the standard library's accept. This looks like a bug. When the linker puts everything together the references to the “accept” from socket_accept will be resolved to the “accept” from rblsmtpd (which is clearly not an acceptable replacement for the socket function).

We change the name of the “accept” in rblsmtpd and we rerun and now everything works. A result of your work so far is that for all of the utility programs that make up this package you have their sources in one source file (e.g. tcpclient_comb.c. And, we have found a bug even before we started to use the actual CCured!

6.2.3  Step 3: Build with CCured

The first thing we see when we enable CCured on this package is:
chkshsgr.c:8: Warning: Calling function getgroups without proper prototype: will be WILD.
  getgroups has type void * __attribute__((___ptrnode__(12))) /* /* missing proto */  */()
chkshsgr.c:8: Warning: Calling function _exit with 1 arguments when expecting 0: will be WILD.
  _exit has type void ()
Two warnings, both due to missing or incomplete prototypes. In the cast of getgroups it is a missing prototype. We add to chkshgr.c the following:
#include <unistd.h> // For getgroups
#include <grp.h>    // For setgroups
We also exit the prototype of _exit in exit.h:
extern void _exit(int); // The "int" was missing
Now CCured succeeds in making this executable and proceeds to make tcpserver. Here we find this one:
tcpserver.c:352: Error: You did not turn on the handling of inline assembly. Better hide this assembly somewhere else!
This is interesting! We look at the inline assembly (a good place to look in is tcpserver_comb.c, you'll see them all in there). One is a use of ntohs, which is harmless because it does not involve pointers. We'll leave this alone.

But the other 5 or 6 such things are uses of the macros FD_SET and friends from <sys/select.h>. We investigate and we find that these macros are defined in <bits/select.h>, and luckily that file also provides regular C implementation for them, along the following lines:
#if defined __GNUC__ && __GNUC__ >= 2

# define __FD_ZERO(fdsp) \
  do {                                                                        \
    int __d0, __d1;                                                           \
    __asm__ __volatile__ ("cld; rep; stosl"                                   \
                          : "=c" (__d0), "=D" (__d1)                          \
                          : "a" (0), "0" (sizeof (fd_set)                     \
                                          / sizeof (__fd_mask)),              \
                            "1" (&__FDS_BITS (fdsp)[0])                       \
                          : "memory");                                        \
  } while (0)
#else   /* ! GNU CC */
# define __FD_ZERO(set)  \
  do {                                                                        \
    unsigned int __i;                                                         \
    fd_set *__arr = (set);                                                    \
    for (__i = 0; __i < sizeof (fd_set) / sizeof (__fd_mask); ++__i)          \
      __FDS_BITS (__arr)[__i] = 0;                                            \
  } while (0)

#endif  /* GNU CC */
I am going to patch that include file to make the conditional test always false and thus ensure that the C version is used always.
  1. First, we tell CCured that <bits/select.h> is a file that must be patched (you can see that it is not patched already because it is not present in the directory cil/include/gcc_2.95.3; by the time you read this it this whole patching business should have been done already). We go into cil/Makefile.gcc and add to the list of PATCH_SYSINCLUDES the name bits/select.h (but we do it only in the Linux section).
  2. Then, we specify the patch. We add the following to the file cil/ccured_GNUCC.patch:
    <<< file=bits/select.h, system=linux
    #if defined __GNUC__ && __GNUC__ >= 2
    ===
    #if 0 && defined __GNUC__ && __GNUC__ >= 2
    >>> 
    
    This says that the specified patch should be applied to the file bits/select.h when CCured is run on a Linux system. (Not all platforms have bits/select.h.) Matching is done whitespace-insensitive. Now you rebuild CCured (just run make) and you should find the patched file in the cil/include/gcc_2.95.3/bits. Make sure it is as you need. More information about the patcher is at ../cil/patcher.html
And since we have left one inline assembly in, we must tell CCured to accept it as is. We change the Makefile to pass the –allowInlineAssembly to CCured.

Now we find more missing prototype problems:
tcpserver.c:210: Warning: Calling function close with 1 arguments when expecting 0: will be WILD.
  close has type int ()
In fact, close does not have a prototype at all. CCured has supplied one without arguments while making unix.a! We add the “#include <unistd.h>” to tcpserver.c and go on.

We add a few more prototypes and then we get:
buffer_get.c:10: Warning: Calling function (*op) with 3 arguments when expecting 0: will be WILD.
  (*op) has type int ()
We find in buffer.h and buffer_get a function pointer type declared as “int (*op)()”. Again the missing argument types. We fill those in.

Now starts the real fun, chasing away the WILD pointers (see Chapter 7 for general techniques). We see this message:
** 1: Bad cast at cdb_make.c:36 (char  *510 ->struct cdb_hplist  *1376)
** 2: Bad cast at pathexec_env.c:42 (char  *510 ->char */* __NODE(2537)  */ *2538)
** 3: Bad cast at pathexec_env.c:67 (char */* __NODE(2537)  */ *2538 ->char  *2553)
** 4: Bad cast at sig.c:12 (void (int  ) *2695 ->void () *2694)
** 5: Bad cast at sig_catch.c:9 (void () *673 ->void (int  ) *2711)
ptrkinds: Graph contains 4383 nodes
ptrkinds:   SAFE - 3142 ( 72%)
ptrkinds:   SEQ - 15 (  0%)
ptrkinds:   FSEQ - 127 (  3%)
ptrkinds:   WILD - 1099 ( 25%)
535 pointers are void*
5 bad casts of which 0 involved void* and 2 involved function pointers
1 (20%) of the bad casts are downcasts
0 incompatible equivalence classes
The casts 4 and 5 are due to missing argument types function types. We edit sig.h to add the “int” as the argument type for signal handler.

We investigate cast number 2 and we find something like this:
 e = (char **) alloc((elen + 1) * sizeof(char *));
This is a custom allocator. We must declare it (in alloc.h):
extern void *alloc(unsigned int);
#pragma ccuredalloc("alloc", nozero, sizein(1)) // We added this line
We run CCured again and no more bad casts (it looks like all the others were due to alloc), but still a bunch of WILD: pointers:
ptrkinds: Graph contains 4575 nodes
ptrkinds:   SAFE - 3324 ( 73%)
ptrkinds:   SEQ - 41 (  1%)
ptrkinds:   FSEQ - 150 (  3%)
ptrkinds:   WILD - 1060 ( 23%)
579 pointers are void*
0 bad casts of which 0 involved void* and 0 involved function pointers
No bad casts, so no downcasts
2 incompatible types flow into node void  *518
  Type char */* __NODE(2549)  */ *2550 at pathexec_env.c:67
  Type char  *102 at dns_transmit.c:63
2 incompatible equivalence classes
Notice that we have more pointers in the program. This is due to the allocator, which is now polymorphic and is duplicated several times. But we also have incompatible equivalence classes. This is because there is a void * pointer that is used with several incompatible types (in this case char * and char **). See Section 7.1 for more details on this. This turns out to be because the function alloc_free with a declared argument of type void * is used in two places with different argument types. We simply declare that function to be polymorphic (in alloc.h):
#pragma ccuredpoly("alloc_free")
Finally, CCured succeeds, with no WILD pointers, but there still is a warning that we have not looked at:
pathexec_env.c:42: Warning: Encountered sizeof(char */* __attribute__((___ptrnode__(2595))) */) when type contains pointers. Use sizeof expression. Type has a disconnected node.
As explained in Section 9.5 we should take a look at the code. We find this typical example, and we fix it accordingly:
  e = (char **) alloc((elen + 1) * sizeof(* e)); // Was sizeof(char *)

6.2.4  Step 4: Write the wrappers

In this package there in an interesting case:
tcpserver_comb.o: In function `env_get_qf':
tcpserver_comb.o(.text+0x15d10): undefined reference to `environ_qq'
This time the global that needs a wrapper is a pointer to data, not a function. You cannot write a wrapper for this, but you can replace its accesses with functions.

Since the program accesses environ always with an index operation environ[i], we can write a function environ_idx that takes an integer and returns the ith element in environ. I show below the case where it appropriate to trust that the index is within the bounds:

extern char ** environ;
char* environ_idx(int i) {
  char * __SAFE * __SAFE p_environ = __trusted_add(environ, i);
  // We are going to believe that i is within bounds
  return __mkptr_string(* p_environ);
  
}

6.2.5  Step 5: Run and debug the cured code

We have built the suite of tools and now we run it. Right away we get a run-time error, so we set CCURED_CONTINUE_ON_ERROR (Section 4.2) to see them all.
Failure STORE_SP at pathexec_env.c:47: pathexec_qq(): Storing stack address
...
Failure STORE_SP at /home/necula/ccured/include/functions/deepcopy_stringarray.h:70: __deepcopy_stringarray_to_compat___0_ssqq(): Storing stack address
There are two distinct errors. We'll fix them, but here is a way to silence CCured if you are lazy: we can use CCURED_ERROR_HANDLERS to specify that we want to ignore all STORE_SP errors in those two functions. For this we write a text file (ucspi.handlers):
ignore STORE_SP at *:*:pathexec_qq
ignore STORE_SP at *:*:__deepcopy_stringarray_to_compat___0_ssqq
Now we run as follows:
CCURED_ERROR_HANDLERS=ucspi.handlers ./http@
and http@ seems to work.

Let's go back to fixing these errors. These errors are all trying to store strings that are obtained from the environ variable. It turns out that those strings are on the stack (allocated before main is invoked). The solution would be to copy those strings on the heap. But right after I saw this error I realized that CCured should not complain if the address that is being stored is in the stack frame of main or at higher addresses. So, I added this feature to CCured and now you will not see these particular errors. But if you run http@ www.yahoo.com you will get:
Failure STORE_SP at dns_transmit.c:213: dns_transmit_start_sqqff(): Storing stack address
It does not take much to find that this is due to the following code:
# 5 "dns_resolve.c"
int dns_resolve(char *q,char qtype[2])
{
  struct taia stamp;
  struct taia deadline;
  char servers[64];

  if (dns_transmit_start(&dns_resolve_tx,servers,1,q,qtype,"\0\0\0\0") == -1) return -1;
dns_transmit_start then stores the address of the array servers into the heap. The solution here is to move the servers array into the heap (or make it a global). We can achieve the former by declaring:
 char servers[64]  __HEAPIFY;
and CCured will move it to the heap (see Section 3.5).

Bingo! It seems to work. Now, you can start measuring performance and, if desired, you can try to make CCured infer better checks for your code.

6.3  Example: PING

In this section I describe what it took to do the ping utility from netkit-base-0.17.

We download, compile, test it. Then we setup the Makefiles to use ccured instead of gcc and we test that it works in the –nocure mode. Then we turn on curing.

First we see two errors pointing out that there is inline assembly in ping. CCured also prints the instructions and we see that they are just bit manipulations. So, we just pass –allowInlineAssembly to ccured and go on.

Now we get:
Failure UBOUND at ping.c:1303: main(): Ubound
To investigate this one we can use CCURED_SLEEP_ON_ERROR (see Chapter 4). The fragment of code that causes this error is similar to the following code:

struct icmp {
   int various;
   char data[1];
};
char outpack[65536];

char foo() {
 // Get the 8th data character
 return ((struct icmp*)outpack)->data[8];
}
The problem is that once outpack is cast to a struct icmp * it looses the ability to access most element in the original array. The solution for this particular error involves a slight rewrite of the access as:

char foo() {
 // Get the offset of the 8th data character
 int off = (int) & ((struct icmp*)0)->data[8];
 return outpack[off];
}
We need to fix that problem in all accesses to the data field. After that, we run into another problem:
Failure ALIGNSEQ at ./ping_combcured.c:3923: pinger(): Creating an unaligned sequence
This is because at some point we create a sequence pointer whose home area does not contain a whole number of elements. See Section 3.6.3 for the various ways to address this problem. We choose to simply tell CCured to allow partial elements in structures and to adapt its checks accordingly. To achieve this we pass –allowPartialElementsInSequence to CCured.

Now it works, we are done.

6.4  Example: THTTPD Server

In this section, we take a look into how to cure thttpd, described by its author as “a simple, small, portable, fast, and secure HTTP server.” It is currently the fifth most popular HTTP server on the net. You can get it at . The version I work with is 2.23 beta 1.

6.4.1  Step 1: Regular Build

First, we unpack the files:
gunzip thttpd-2.23beta1.tar.gz
tar -xvf thttpd-2.23beta1.tar
Look into the directory for the file configure. We don't want to mess up the /usr/local/ folder, so go into configure and delete the folder path from ac_default_prefix. Now Makefile should make thttpd to the folder where the source code resides. Run configure.
./configure
The file should do some work and generate a few files.

There is also another modification we should make to avoid conflicts with any existing http servers (e.g. Apache). In config.h, go-to Line 321 and change the default port from 80 to something like 7500.

Now we can run make. The compiler does work and should produce a file named thttpd. Run the program and test the server from another computer. In my case, I steered my browser to http://manju.cs.berkeley.edu:7500/ andypang/.
make
a few seconds later...
./thttpd
If you see your homepage, thttpd is working. If not, make sure you have a public_html folder. If you are lazy, there is another way to check if thttpd is working. Check whether the site for the server is up at the port you specified (e.g. http://manju.cs.berkeley.edu:7500/). A page with a green background should appear notifying you that thttpd is running.

6.4.2  Step 2: Build with CIL

It's time to look into that Makefile.in that came with thttpd. Makefile.in is the file from which configure generates Makefile, which in turn is used to build the application.

Below the line that says “You shouldn't need to edit anything below here,” we make the same changes to the CC variable as in the other tutorials.
ifdef CCURED
 CC:=/home/andypang/cil/bin/ccured
endif
ifdef NOCURE
 CC+= --nocure
endif
ifdef NOMERGE
 CC+= --nomerge
endif
Be sure to insert these ifdefs below the block of variable declarations (below line 60). Otherwise, gcc would be used to compile the files.

There will also be more files to clean after CIL/CCured is run, so I added the following to clean up the CIL files and folders that will be generated:
CIL =           $(SRC:.c=.i) *cil*

COMB =          *comb*
I modified the variable CLEANFILES to include those files.
CLEANFILES =    $(ALL) $(OBJ) $(GENSRC) $(GENHDR) $(CIL) $(COMB)
Now we are ready to compile using CIL. First, we run configure again to update the Makefile.
./configure
make clean
make CCURED=1 NOCURE=1 NOMERGE=1
Ensure the application still works correctly. It should. Now build with merging.
make clean
make CCURED=1 NOCURE=1
Again, check for proper functionality.

6.4.3  Step 3: Build with CCured

The big moment.
make clean
make CCURED=1
You should see a number of errors and warnings. Many of these will involve a warning of a “malformed format string” looking something like this:
libhttpd.c:226: Warning: Malformed format string [child wait - %m]
If you take a look at the string, the warning stems from a call to syslog with a %m. This is safe, and can be safely ignored. ***********

Now we see a prototyping warning:
thttpd.c:437: Warning: Calling function sigset without proper prototype: will be WILD.
  sigset has type void * __attribute__((___ptrnode__(1666))) /* /* missing proto */  */()
At this point we realize that some files are read-only. We chmod the thttp folder from its parent directory so that we can write to any file in the directory:
chmod -R u+w thttpd-2.23beta1
Back to the problem at hand. If one punches in sigset on Google, one will that it is a function in signal.h. Add the following thttpd.c to remedy the warning:
extern void *sigset(int, void*);
Although the warning also appeared for the file libhttpd.c, the prototype in thttpd.c will work for all files in the project.

The next most prevalent warning should be a sscanf error, where CCured warns that it does not expect the type char *. CCured does not accept strings in sscanf because one could potentially read in an unbounded string. This characteristic could be used for malicious intentions.

Although %400[a-zA-Z] is used and hence strings that are accepted by sscanf will be limited to 400 bytes, CCured currently does not support this. This may change in the future, because this method of scanning strings is not suspectible to the same problems as simply using %s.

To fix this error, use CCured's sscanf/fscanf functions in place of the ones in stdio.h. Please refer to Section 9.6.3 for more details. As an example, the following is how I modified the code at Line 213 in tdate_parse.c.
    /* DD-mth-YY HH:MM:SS GMT */
    if ( 
        (resetSScanfCount(cp), 
         tm_mday = ccured_fscanf_int(ccured_sscanf_file, "$d-"),
         ccured_fscanf_string(ccured_sscanf_file, "%400[a-zA-Z]-", str_mon),
         tm_year = ccured_fscanf_int(ccured_sscanf_file, "$d "),
         tm_hour = ccured_fscanf_int(ccured_sscanf_file, "$d:"),
         tm_min = ccured_fscanf_int(ccured_sscanf_file, "$d:"),
         tm_sec = ccured_fscanf_int(ccured_sscanf_file, "$d GMT"),
         getScanfCount()) == 6 &&
        /*
        sscanf( cp, "%d-%400[a-zA-Z]-%d %d:%d:%d GMT",
                &tm_mday, str_mon, &tm_year, &tm_hour, &tm_min,
                &tm_sec ) == 6 &&
        */
            scan_mon( str_mon, &tm_mon ) )
        {
Then we see several warnings regarding the use of sizeof:
libhttpd.c:2782: Warning: Encountered sizeof(char */* __attribute__((___ptrnode__(5491))) */) when type contains pointers. Use sizeof expression. Type has a disconnected node.
We take a look at this line and find that it is a function call to RENEW.
nameptrs = RENEW( nameptrs, char*, maxnames );
The definition of RENEW can be found in libhttpd.h.
#define RENEW(o,t,n) ((t*) realloc( (void*) o, sizeof(t) * (n) ))
The warning arises from the fact that the CCured inferencer cannot make the connection between nameptrs and char *'s. See Section 9.5 for more details.

To fix the warning, we let CCured inference the argument in sizeof to nameptr by modifying the code in the following way:

In libhttpd.c
nameptrs = RENEW( nameptrs, nameptr*, maxnames );
In libhttpd.h
#define RENEW(o,t,n) (realloc( (void*) o, sizeof(t) * (n) ))
Now CCured will know that we are allocating memory based on the size of the pointers to nameptr.

The macro NEW is defined one line before RENEW, and we fix a couple of calls to NEW in the same way.

We now face a number of warnings of the following form:
libhttpd.c:2519: Warning: Solver: changing User Specified SAFE node 5283 (an unnamed location (often an inserted cast)) to FSEQ
A quick look at libhttpd.c shows that a call to qsort is made. The wrapper for qsort can be found in stdlib_wrappers.h, and there appears to be two versions of qsort. One versions supports polymorphism and the other does not. Add -DUSE_POLYMORPHIC_QSORT in Makefile.in after we make the call to cil/bin/ccured, so that CCured will know to use the polymorphic qsort.

This fixes the problems in libhttpd.c but not tdate_parse.c. We go to Line 113 and find a call to qsort. At first it looks like a problem with sizeof, but fixing the calls to sizeof proves to be fruitless (although still a good programming practice).

Let's turn to the browser to track down the bad cast for us. Looking at the problematic node will show us that it is in the arguments passed into the comparator for qsort. The arguments of the comparator are declared as char *, but they are also cast to struct strlong * in the function. This is a bad cast. Change the types of the accepted arguments to void * and the problem is solved.

Next we tackle a similar problem in tdate_parse.c:
** 1: Bad cast (seq) at tdate_parse.c:202 (struct tm  *7100 ->char  *7136)
Once again, we use the browser to track down the problem, and find a cast from a struct tm * to a char *. We change the char * to a void * and the bad cast disappears.
    (void) memset( (void*) &tm, 0, sizeof(struct tm) );
The last bad cast:
** 1: Bad cast at timers.h:41 (void */* __NODE(346)  */ *347 ->long  *349)
Going to Line 41 reveals a union of void *, int, and long. This cannot possibly be safe because i could later be used as a pointer with p. We change this union to a tagged union as instructed in Section 9.7.
union ClientData {
    void* p;
    int i;
    long l;
} __TAGGED;
typedef union ClientData ClientData;

Chapter 7  How to Eliminate WILD Pointers

As explained in the tutorial, you can use the WILD pointer types to do most of the things that you can do with pointers in C. And, in fact, CCured's inferencer will turn some of your pointers into WILD pointers if you use them in unusual ways.

WILD pointers are bad. Every time you access them you have to also access the tags. And what makes them really annoying is that they spread very quickly. Even a few bad casts in your program can lead to a contamination of 30% of the pointers with WILDness. And that means that you'll have to write lots of wrappers, and hard ones. (In fact, the support that we provide for writing wrappers does not work in all cases in the presence of WILD pointers.)

So, we recommend that you take a look at the warnings and messages that CCured gives and try to address the bad casts. In this chapter, we describe a few tricks that you can use to change the code, and a few features that CCured has to help you do that.

First, a few notes: When it notices bad casts, CCured will print something like this:
** 1: Bad cast at cdb_make.c:36 (char  *510 ->struct cdb_hplist  *1376)
** 2: Bad cast at pathexec_env.c:42 (char  *510 ->char */* __NODE(2537)  */ *2538)
** 3: Bad cast at pathexec_env.c:67 (char */* __NODE(2537)  */ *2538 ->char  *2553)
** 4: Bad cast at sig.c:12 (void (int  ) *2695 ->void () *2694)
** 5: Bad cast at sig_catch.c:9 (void () *673 ->void (int  ) *2711)
ptrkinds: Graph contains 4383 nodes
ptrkinds:   SAFE - 3142 ( 72%)
ptrkinds:   SEQ - 15 (  0%)
ptrkinds:   FSEQ - 127 (  3%)
ptrkinds:   WILD - 1099 ( 25%)
535 pointers are void*
5 bad casts of which 0 involved void* and 2 involved function pointers
1 (20%) of the bad casts are downcasts
0 incompatible equivalence classes
This means that there are 5 bad casts (which contaminate 25% of your pointers). There are no incompatible equivalence classes in this case.

You can either go directly at the line numbers in which the bad casts are reported, or you can use the browser (Section 5.1).

Bad cast number 4 and 5 in the example above are clear indications that there are some incomplete function types in your program. Go and add the argument types.

The other bad casts are due to an undeclared memory allocator. After we fix those we rerun and we get:
ptrkinds: Graph contains 4575 nodes
ptrkinds:   SAFE - 3324 ( 73%)
ptrkinds:   SEQ - 41 (  1%)
ptrkinds:   FSEQ - 150 (  3%)
ptrkinds:   WILD - 1060 ( 23%)
579 pointers are void*
0 bad casts of which 0 involved void* and 0 involved function pointers
No bad casts, so no downcasts
2 incompatible types flow into node void  *518
  Type char */* __NODE(2549)  */ *2550 at pathexec_env.c:67
  Type char  *102 at dns_transmit.c:63
2 incompatible equivalence classes
Notice that we have more pointers in the program. This is due to the allocator, which is now polymorphic and is duplicated several times. But we also have incompatible equivalence classes. This is because there is a void * pointer that is used with several incompatible types (in this case char * and char **). See Section 7.1 for more details on this.

7.1  Polymorphism

Polymorphism is the ability a program fragment to operate on data of different types. This is a useful thing to be able to do and since C does not have special support for it, each programmer implements polymorphism by extensive use of casting. But not all casts are equal. Consider for example a function that just returns its argument:

int identity_bad(int x) { return x; }
This function can be used with any type that fits in an integer, provided the appropriate casts from the type to int and back are inserted. But as we have already discussed in Section 9.4 this won't work in CCured because the pointers you get out are not usable.

A better way to do this is as follows:

void* identity(void* x) { return x; }
It is a common paradigm in C to use void* for a “pointer to I don't know what” type. CCured supports this view directly by considering each use of void * in the program as an occurrence of an unknown type. The CCured inferencer will try to find a replacement type that makes sense in that context. For example, in the following code fragment CCured will think of both occurrences of void * as actually being int * *.

void* identity(void* x) { return x; }

int main() {
    int * * p = 0;
    int * * res = identity(p);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

This model works for even very complicated code, such as the following fragment that defines a function apply which applies a function pointer to some arguments (see in the output that all pointers are inferred SAFE):

// Applies a function to an argument
void * apply(void* (*f)(void*), void *arg) {
   return f(arg);
}

// A simple dereference function
int * deref(int * * addr) {
    return *addr;
} 

int  main() {
     int * x = 0;
     int * res = apply(deref, & x);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In the above example there are four occurrences of void * in the definition of apply. Based on the actual usage of apply the first two are mapped to int * and the latter two are mapped to int * *.

This very flexible scheme breaks down when you have inconsistent usage of a void * type, such as in the following code:

void* identity(void* x) { return x; }

int main() {
    int * p = 0;
    int * * res_pp = identity(& p);
    int * res_p    = identity(p);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In the above code the identity function is used both with int * and int ** argument. Since CCured cannot find any single non-WILD type that is compatible with all contexts in which the void * is used, it is going to infer that the type of the void * argument is WILD. And since the argument is assigned to the result (implicitly due to the return statement) the result type is also WILD. (You can use the browser to see all the different incompatible types that “flow” into a void *). It seems that we need a way to tell CCured to treat the two invocations separately.

CCured has a crude but effective mechanism for doing just that. First, you have to tell CCured that a function is polymorphic:

#pragma ccuredpoly("identity")
(you can list multiple names in one ccuredpoly pragma. The pragma can appear anywhere in your program.).

If you tell CCured that a function is polymorphic it will take the following steps:
  1. For each call site of the function, CCured will create a copy of the function and it will assign it the name /*15*/identity, where the number 15 is a running counter to ensure that the names are different.
  2. Then it will perform the usual inference in which case each copy of the identity function is used only once.
  3. Finally, for each combination of pointer kinds in the various flavors of identity CCured will keep one copy and erase all the others.
Consider as an example the code from above, in which all pointers are now SAFE. The output contains calls to /*1*/identity and /*2*/identity but since they both have the same pointer kinds for the arguments and results, only the body of /*1*/identity is kept:

#pragma ccuredpoly("identity")
void* identity(void* x) { return x; }

int main() {
    int * p = 0;
    int * * res_pp = identity(& p);
    int * res_p    = identity(p);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If the copies of the polymorphic function do not all have the same pointer kind then multiple definitions are kept, as in the code below where we have both a SAFE and a WILD copy of the identity function:

#pragma ccuredpoly("identity")
void* identity(void* x) { return x; }

int main() {
    int * __WILD p = 0;
    int * * res_pp = identity(& p);
    int * res_p    = identity(p);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Polymorphic types
A similar mechanism is also available for types. You can add in the arguments of the ccuredpoly pragma strings like "struct list" to say that a copy of struct list must be created for each occurrence in the program. The inference will then find out which of the copies have to be compatible and at the very end will keep only one copy for each kind. Note however that this form of polymorphism does not have any run-time cost because only types are duplicated. It will however slow down the CCured type inference.

Note: If the polymorphism directives do not seem to take any effect, pass the -verbose to ccured to see how it parses them.

For example, here is how you would write polymorphic list length:

#pragma ccuredpoly("length", "struct list")
struct list {
   void *car;
   struct list *cdr;
};

int length(struct list *l) {
  for(int i = 0; l; i ++, l=l->cdr) ;
}

int main() {
    struct list list_of_int = { 5, 0 };
    struct list list_of_wild_ptr = { (int * __WILD)5, 0 };
    struct list wild_list = { 5 , (struct list * __WILD)0 };

    int l1 = length(& list_of_int);
    int l2 = length(& list_of_wild_ptr);
    int l3 = length(& wild_list);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

You can see in the browser information that the references to struct list have been replaced with separate names such as struct /*45*/list.

In the case of recursive structures (whose name is refered directly or indirectly in the name of the fields), the fields use the same version of the structure as the structure itself.

CCured has polymorphism for types and for functions because those are the entities that can be copied legally in C. There is no similar polymorphism for data variables, nor should there be..

If you have a type name for a polymorphic structure, then CCured will replace all occurrences of the type name with a reference to the structure itself, meaning that each use of the type name gets its own independent copy.

7.2  User-defined memory allocators

If your program has a user-defined memory allocator that is used to allocate data of different types then its return type will be WILD and so will be all of the pointers you store with the allocated area. Declaring such a function to be polymorphic will likely not help because the function is probably using a global data structure (the allocation buffer) that is shared by all polymorphic copies of the function.

CCured allows you to declare a function to be a user-defined memory allocator using one of the following pragmas:

#pragma ccuredalloc("myfunc", <zerospec>, <sizespec>)
<zerospec> ::= zero | nozero
<sizespec> ::= sizein(k) | sizemul(k1, k2)
The zero argument means that the allocator zeroes the allocated area. Otherwise CCured will zero it itself, if it contains pointers. The sizein(k) argument means that the allocator is being passed the size (in bytes) of the area to be allocated in argument number k (counting starts at 1). The sizemul(k1, k2) argument means that the allocator allocates a number of bytes equal to the product of the arguments number k1 and k2.

For example the following are the pragmas for the standard library allocators malloc and calloc:

void* malloc(unsigned int size);
#pragma ccuredalloc("malloc", nozero, sizein(1))
void* calloc(unsigned int nr_elems, unsigned int size);
#pragma ccuredalloc("calloc", zero, sizemul(1, 2))
A memory allocator should have return type void *. In the pre-ANSI C days allocators were written with the type char *. Once you declare a function to be allocator, its return type will be changed to unsigned long. At all call sites CCured will examine what kind of data is being allocated and will construct the metadata for it.

Note that declaring a function an allocator has the effect of also making it polymorphic. This means that CCured will create as many copies of your allocators as you have allocation sites. (After curing only copies with distinct calling convention will be kept, however.)

Note that when you declare a custom-memory allocator as such, CCured will trust that you are not going to re-use the memory area that you return. This means that you can use this feature to write unsafe programs in CCured. The following program will succeed in trying to dereference the address 5!

#pragma curealloc("myalloc", sizein(1), zero)
int data[8];
void* myalloc(int sz) {
  return data;
}
int main() {
 int ** p = (int **)myalloc(8);
 data[1] = 5; 
 return *p[1]; // Will dereference 5 !!!
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Most often the custom-memory allocators are just wrappers around the system malloc. In that case there is no danger of unsoundness.

Note also that CCured relies on the fact that the result of the custom-memory allocators is assigned to a variable of the right type. It is from the type of the destination of the allocator, or from the type cast with which the allocators is used, that CCured knows what kind of metadata to create.

7.3  Pointers with Run-Time Type Information

There are many C programs in which void * pointers are used non-parametrically. An example is a global variable (of type void *) that is used to store values of different types at different times. Consider for example the following code, where CCured is forced to infer that the g pointer has kind WILD because the struct foo and struct bar are incompatible:

struct foo { 
  int f1;
} gfoo;

struct bar {
  int * f1;
  int f2;
} gbar;

void * g;

int main() {
  int acc = 0;
  g = (void *)&gfoo; 
  acc += ((struct foo *)g)->f1;
  g = (void *)&gbar; 
  acc += ((struct bar *)g)->f2;
  return acc;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In this example g is used polymorphically but not in a way that could be handled through our support of polymorphism. (This form of polymorphism is called non-parametric polymorphism.) CCured will consider the casts on g as bad and will mark those pointers WILD.

CCured contains special support for handling such cases, by tagging the polymorphic values with information about their actual type. To enable this behavior you must use the RTTI pointer kind qualifier on the polymorphic pointer. Consider again the example from before but with a RTTI annotation:

struct foo { 
  int f1;
} gfoo;

struct bar {
  int * f1;
  int f2;
} gbar;

void * __RTTI g;

int main() {
  int acc = 0;
  g = (void *)&gfoo; 
  acc += ((struct foo *)g)->f1;
  g = (void *)&gbar; 
  acc += ((struct bar *)g)->f2;
  return acc;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If you use the browser, you will see that there are no more bad casts and no WILD pointers in this example. If you also look at the CCured output for the above example you will see that instead the g variable is now represented using two words, one to store its value and another to store the actual type of the pointer it contains. This type is created when g is assigned to and is checked when g is used.

CCured can work with run-time type information only for certain pointer types. We call such types as extensible and for each type we also construct a name. Specifically, the extensible types are: RTTI pointers can be created on by casting from a scalar or a SAFE pointer to an extensible type and can be cast only to scalars and a SAFE pointer to an extensible type. In the example above, struct boo and struct bar are extensible pointers and we can cast pointers to these structs to void * RTTI and back.

CCured also supports the RTTI pointer kind on pointers whose base type is different from void. Consider the following example:

struct foo {
   int *f1;
   int  f2;    
} gfoo;

struct bar {
   int *f3;
   int  f4;
   int  f5;
} gbar;

#pragma ccured_extends("Sbar", "Sfoo")

struct foo * __RTTI g;

int main() {
  int acc = 0;
  g = (struct foo *)&gfoo; 
  acc += g->f2;
  g = (struct foo *)&gbar; 
  acc += g->f2;
  acc += ((struct bar *)g)->f5;
  gfoo.f1 ++; // To make foo.f1 and bar.f3 both FSEQ pointers
  return acc;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Notice that the RTTI pointer kind is used with the base kind struct foo. An RTTI pointer is strictly more powerful than a SAFE pointer of the same base type. This means that g in the code above can be used to access the field f1 and f2 without any overhead. This is because CCured enforces the requirement that an RTTI pointer of base type T contains only pointer values whose base type extends T. The extension relationship is a subset of the physical subtyping relationship: we say that type T extends type Q if: The ccured_extends pragmas use extensible type names to declare a extension hierarchy (similar to a single-inheritance class hierarchy) in which void is the top. Note that only extensible types can appear in the hierarchy and an extensible type can appear at most once on the left-side of a ccured_extends pragma. An RTTI pointer can contain values that are pointers to some extensible base type that extends that of the RTTI pointer itself.

The RTTI pointer kind can be applied only to base types that are either void or non-leaf in the extension hierarchy.

For example, in the following code
struct foo { int x; }
struct bar { int y; int z; }
typedef int MY_INT __NOUNROLL;
#pragma ccured_extends("Sbar", "Sfoo")
#pragma ccured_extends("Sfoo", "TMY_INT")
we can use the RTTI pointer kind for struct foo * and MY_INT * but not for struct bar. Notice that in all declared extension relationships physical subtyping is respected.

The inferencer will spread the RTTI pointer kind backwards through assignments but only on pointers that can be RTTI. If you want to cut short the propagation of the RTTI pointer king you can use the SAFE pointer kind.

To summarize, RTTI pointers can be used with the following constraints: Interestingly enough the RTTI pointer kind can be used to implement in a type-safe way virtual method dispatch, as shown in the example below:

typedef struct parent {
  void * __RTTI * vtbl; // virtual table, with various types of functions
  int  *f1;             // some field
} Parent;

#pragma ccured_extends("Schild", "Sparent")

typedef struct child {
  void * __RTTI * vtbl;
  int  *f2;
  int   f3;
} Child;

// virtual method foo for class P
// notice that the self parameter is an RTTI. It must 
// be of base type void to ensure that foo_P and foo_C have the 
// same type
int* foo_P(void * __RTTI self_rtti, Parent *x) {
  Parent * self = (Parent *)self_rtti; // downcast
  return self->f1;
}

// virtual method bar for class P
int * bar_P(void * __RTTI self_rtti) {
  Parent * self = (Parent *)self_rtti;
  return self->f1;
}

int* foo_C(void * __RTTI self_rtti, Parent *x) {
  Child * self = (Child *)self_rtti;
  return self->f2 + self->f3;
}

// Name the types of the virtual methods, to make them extensible
typedef int * FOO_METHOD(void *, Parent *) __NOUNROLL;
typedef int * BAR_METHOD(void *) __NOUNROLL;

// Now the virtual tables
void * vtbl_P[] = { (void*) (FOO_METHOD *)foo_P,
                    (void*) (BAR_METHOD *)bar_P };


// child inherits bar_P
void * vtbl_C[] = { (void*) (FOO_METHOD *)foo_C,
                    (void*) (BAR_METHOD *)bar_P };


int array[8];

// Now the constructors
void ctor_P(Parent * p) {  p->vtbl = vtbl_P; p->f1 = array; }

void ctor_C(Child * c) {  c->vtbl = vtbl_C;  c->f2 = array;  c->f3 = 5; }

int main() {
  Parent p;
  Child c;
  Parent * pp = &p, * pc = &c;
  Child  * pc1;
      
  // Construct
  ctor_P(&p); ctor_C(&c);

  // Now try a downcast
  pc1 = (Child * __RTTI)pc;
  // Now invoke some virtual methods
  {
    FOO_METHOD *pfoo = (FOO_METHOD *) pp->vtbl[0];
    pfoo((void *)pp, pc);
    pfoo = (FOO_METHOD *) pc->vtbl[0];
    pfoo1((void *)pc, pp);  
   }
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Notice the use of the __NOUNROLL typedefs for the function types.

7.3.1  Implementation Details

CCured collects all extensible types in your program (either those declared using the ccured_extends pragma or those that are used in casts to and from RTTI pointers) and constructs the extension hierarchy. An encoding of this hierarchy is dumped in the resulting code in the array RTTI_ARRAY. Each entry in the array corresponds to an extensible type and it contains the difference between the entry corresponding to the parent of the extensible entry and the index of th current entry. The root of the extension hierarchy is always at index 0 and that entry contains 0. The function CHECK_RTTICAST is used to walk this encoding to verify a cast from a RTTI pointer into a SAFE pointer or another RTTI pointer.

7.4  Specifying Trusted Code

In this section we describe a few mechanisms that you can use to override CCured's reasoning. These are powerful mechanisms but you can use them to write unsafe code.

7.4.1  Trusted casts

Occasionally there are casts in your program that are judged as bad, yet you know that they are sound and it is too inconvenient to change the program to expose the soundness to CCured. In that case, you can use the __trusted_cast built-in function. In the following example we know that the boxedint type can encode an integer (if odd) or a pointer to a boxedint if even. We could use RTTI pointers to encode this safely in CCured. Or, we can use a trusted cast:

typedef int boxedint; // If even, then a pointer to a boxedint
int unroll(boxedint x) {
  if(x & 1) return x;
  return unroll(* (int*)__trusted_cast(x));
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

CCured will not complain if the argument and result type of __trusted_cast are incompatible. However, it will ensure the following: For example, in the following example, the variable q and the field f1 in struct foo are made FSEQ. The FSEQ constraint propagates back through __trusted_cast to p.

struct foo {
   int   * f1;
   int     f2;
};
struct bar {
   int   * f1; // This is FSEQ !
   int   * f2;
};
int main(struct bar * p) {
    struct foo * q = __trusted_cast(p);
    p->f1 ++;        // Make foo.f1 FSEQ
    return q[1].f2; // Make q FSEQ
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If you look carefully at the above examples you will see one of the potential dangers of using __trusted_cast: you are on your own to ensure that the argument type and the result type match. In the above example, this is not true because the field f1 in struct bar is SAFE while the field f1 in struct foo is FSEQ!

If you want to prevent a pointer arithmetic operation from generating sequence pointers, you can use the __trusted_add function:

int foo(int *p) {
    int * q = __trusted_add(p, 4);
    return *q;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

You can use a __trusted_cast to cast an integer into a pointer. This works as expected if the type of the resulting pointer is SAFE (as in the example with boxedint earlier in this section). But if it is FSEQ or SEQ then you will get exactly the same effect as if the __trusted_cast was not there: you will obtain a pointer with null metadata and thus unusable for memory dereference.

A better way to cast an integer (or a SAFE pointer into a SEQ or FSEQ one) is to use the __mkptr built-in function. This function takes as a second argument some other pointer whose metadata is used in constructing the result:

int g[8];
int main() {
  int * __SAFE pg = & g[2];
  int * __SEQ sg = __mkptr(pg, g); // We know that the home area of pg and g
                                   // are the same
  int pg1 = (int) & g[3];
  int * __SEQ sg1 = __mkptr(pg1, g);
  return sg[1] + sg1[1];
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Another useful built-in function is __mkptr_size. It allows you to specify the size of the home area in which a pointer lives:

int g[8];
int main() {
  int * __SAFE pg = & g[2];
  // We know that there is are at least 2 more integers after pg
  int * __SEQ sg = __mkptr_size(pg, 2 * sizeof(int)); 
                                   // are the same
  return sg[1];
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

There are other built-in functions that you can use to achieve various things behind CCured's back. Those are mostly intended for use in wrappers for the library functions (which you have to trust anyway). These are described in Chapter 8 and declared in ccured.h.

7.4.2  Turning off curing

You can turn the curing off for a fragment of a source file, for a function, or for a block statement.

You can use the cure pragma to turn curing off for a fragment of a source file (in CCured pragmas can only appear at global scope and therefore you cannot use this mechanism to turn curing off for part of the definition of a global function):

int * g; // This is a pointer to several integers
         // but we do not want to make it SEQ
#pragma ccured(off)
int foo() {
   return g[2]; // CCured won't see this and will leave g SAFE
                // But also CCured won't check this code
}
#pragma ccured(on)
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Alternatively, you can add the nocure attribute to a function to tell CCured to not cure this function:

int * g; // This is a pointer to several integers
         // but we do not want to make it SEQ

// We must put the attribute in a prototype
int foo(void) __NOCURE;
int foo(void) {
   return g[2]; // CCured won't see this and will leave g SAFE
                // But also CCured won't check this code
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

At a finer-grained level, you can use the __NOCUREBLOCK attribute with a block statement:

int * g; // This is a pointer to several integers
         // but we do not want to make it SEQ

int foo(void) { 
   int res;
   { __NOCUREBLOCK
     res = g[2]; // CCured won't see this and will leave g SAFE
   }
   return res; 
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In all of these cases, the CCured inferencer does not even look at the non-cured portions of the code. However, CCured will at least change the non-cured code to access the fat pointers properly. For example, in the following example the global g is a sequence pointer. While CCured will not complain about the unsafe cast to int **, it will make sure that at least the proper component of g is used:

int * g; // This will be FSEQ

int ** foo(void) { 
   int res = g[2]; // Make g FSEQ
   { __NOCUREBLOCK
     return (int **)g; // But not WILD
   }
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Finally, to avoid curing a whole source file (say trusted_foo.c), you can use the –leavealone=trusted argument to CCured. All source files whose names start with the given “leave alone” prefix, are not merged and are not scanned by CCured at all. Instead they are compiled with gcc and linked in the final executable.

Chapter 8  Writing Wrappers

C programs require linking with external libraries before they can be run. Most often this is just the standard C library but occasionally your program might want to use other libraries as well. Often you do not want to process the library code with CCured, either because you do not have the source code or because it would be inconvenient to do so. There are several problems that arise when you link code that was cured with regular code: Both of the above problems can be solved by interposing small wrapper functions around calls to the external functions. Such a wrapper function can perform the necessary checks on the arguments (thus addressing the first problem above) and can also perform some conversions on the arguments and result values before and after invoking the external function (thus addressing the second problem above). However, to write a wrapper you must understand the input-output behavior of the function (i.e., you must read the man-page entry for that function). When writing wrappers you can use a number of built-in functions described in Section 8.5.

Before we look at how one writes wrapper functions, we'll explain how CCured detects the need for a wrapper and how it reports it to you. CCured changes the name of all functions that it processes by adding a suffix describing the kinds of the pointers involved in the prototype of the function. This process is called name mangling and is described in Section 8.1. For example, consider the strchr function from the standard library (it returns NULL or a pointer to the first occurrence of chr into the str argument):

char* strchr(char* str, int chr);
Most likely CCured will view code that uses the returned pointer as a manipulating a pointer to a sequence, and will require that the result of strchr be a FSEQ pointer. In the cured code you will see the following prototype for strchr in that case:

fseqp_char strchr_fs(char* __SAFE str, int chr);
Note that CCured has added a suffix _fs to the name of the function to say that its return value is a FSEQ pointer. When the code is passed to gcc to link you will likely see the following error message:

main.c: undefined reference to `strchr_fs`
This is how you learn that the cured code requires a strchr with a different calling convention than the one provided by the library. (If CCured had left alone both the type and the name of strchr the linker would not have complained but the program would have behaved incorrectly at run-time.).

In the next section we describe in detail the name mangling algorithm, even though it is probably better to use the browser to inspect the type of the undefined mangled functions.

8.1  Name Mangling

There are two forms of mangling: shallow and deep. Shallow mangling differentiates types without descending into structures. It is less verbose and is used on all functions and variables defined in the code being cured. Deep mangling does consider the types of the structure fields and is used for imported (used but not defined) symbols. (The –shallowMangling command line option causes all symbols to be mangled using shallow mangling.)

The mangling algorithm scans the type of a global and produces a suffix that is appended to the name of the global. The mangling computed as above is appended to the global name, following a _ character. As an exception, if the mangling is empty or contains only s characters, then no mangling occurs (and no _ is inserted).

Here are a few of examples of mangling:

struct list {
 void * __WILD data;
 struct list * __SAFE next;
};
struct hash {
 int count;
 struct list * __FSEQ buckets;
};
struct list * __FSEQ getBuckets(struct hash * __SAFE);
// has deep mangling      getBuckets_fcws_scf_
// has shallow mangling   getBuckets_fs

struct twolists {
  struct list * __SEQ  one;
  struct list * __SAFE two;
};
struct twolists * __SAFE getTwoLists(void);
// has deep mangling     getTwoLists_scqcws_s_
// has shallow mangling  getTwoLists

int main() {
  printf("Mangling of struct list = %s\n", CCURED_MANGLING_OF(struct list));
  printf("Mangling of getBuckets = %s\n", CCURED_MANGLING_OF(getBuckets));
}
Note that types are mangled recursively (you can have nested constructions c..._ but a structure is mangled only once in a global's type.

The general rule is that the shallow mangling can be obtained from the deep mangling be removing all c..._ sequences (including nested ones). If what remains is a sequence of s's then the shallow mangling is empty.

CCured has a facility for inquiring about the mangling of a type or variable at run-time, as shown at the end of the above code. You can use the macro CCURED_MANGLING_OF applied to a type or variable (i.e. anything that can be the argument of sizeof) to obtain a string literal that encodes the mangling. Perhaps the most important use of this is to detect at run-time whether a global is compatible with the library or not (its mangling should be the empty string). We shall see how this is done later in this chapter. CCured also defines the macro CCURED_HAS_EMPTY_MANGLING that yields 1 if its parameter has empty mangling. This is equivalent to 0 == * CCURED_MANGLING_OF but is a compile-time constant (after curing).

8.2  Writing Simple Wrappers

Returning to our strchr example, what do we do if the linker complains that it needs the strchr_fs version of the function? The basic idea is that we write the function strchr_fs that in turn calls strchr from the library. The problem with this approach is that over time you might have to write many versions of strchr (such as strchr_qs, strchr_qq and many more). CCured provides a relatively simple mechanism by which you write a wrapper specification and it generates all the required versions automatically from your specification. Furthermore, the wrapper specification looks just like a C function (along with some indication to CCured that this is a wrapper):

extern char* strchr(char*, int);  // Make sure you have a prototype for the 
                                  // function you are wrapping
extern void exit(int);

                                  // Now tell CCured that strchr_wrapper is 
                                  // a wrapper for strchr
#pragma ccuredwrapper("strchr_wrapper", of("strchr"))
char* strchr_wrapper(char* str, int chr) // must match strchr's signature
{
  char* result;
  result = strchr(str, chr);       // Call the underlying function.
  return __mkptr_string(result);   // result should be 0 or a string
}

void foo(char* s){
  char* res = strchr(s, 'q');
  // Taking the address of a wrapped function is handled, too
  char * (*p_strchr)(char *, int) = & strchr;

  if(res != (*p_strchr)(s, 'q')) exit(1);

  res ++; // Make sure res is __FSEQ
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Notice that we have used the __mkptr_string helper function that takes in a standard null-terminated C string and produces any required fat pointer to it. This is done using a simple trick: we let CCured mangle the helper function name as needed and we provide in ccuredlib.c all the required versions. Since the argument should always be a SAFE pointer the versions that we provide are: __mkptr_string_fs, __mkptr_string_qs and, of course, __mkptr_string, which is the identity function. These functions produce the necessary metadata by looking for the null-termination. Call the __mkptr_string function only on the result of trusted library functions, or else bad things might happen (e.g., because the string argument may not actually be null-terminated).

Now, if you click the “Browse” link for the above code fragment you will see several things: Now if you look at the CCured output for the above example you will see that the code has two instances of the wrapper, one for each calling convention. (If you had more calls sites for strchr CCured will coalesce all wrappers that end up having the same mangling.)





It is very important to write the wrappers such that the actual call to the underlying function has arguments and result values that are SAFE (i.e., values that use the standard C representation)! This means two things: So far we have addressed the issue of compatibility with the library but it is still possible for the client program to call strchr with the wrong arguments, thus provoking a memory safety violation in the library itself. To prevent this we add to the wrapper code that verifies that the input argument is a valid string (all the characters up to and including the null-termination are in the home area):

extern char* strchr(char*, int); 
#pragma ccuredwrapper("strchr_wrapper5", of("strchr"))
char* strchr_wrapper5(char* str, int chr)
{
  char* result;
  result = strchr(
                  __stringof(str), // Check that str is a string
                  chr              
                  );
  return result;                   //Passing the return value directly
}

void foo(char* s){
  char* res = strchr(s, 'q');
  res ++; // Force res to be __FSEQ
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

The __stringof function first checks that str is a valid string and then returns the pointer value (leaving the metadata behind). Alternatively, the __ptrof function performs bounds checks on FSEQ and SEQ pointers and null checks as well.

If you browse the above example you will observe that now the “s” argument of the foo function is not SAFE anymore but is FSEQ. If you investigate you find that this is required by the __string_of function itself (in order to be able to perform safely the check that there is a null-character before the end of the memory area!).

There are quite a few helper functions that you can use in the wrapper. They are all documented in ccured.h.





We have not yet discussed the issue of where to put the wrappers that you write. The rule is that the wrapper, along with the #pragma that declares it, must appear somewhere in the project (CCured merges all files together so it will eventually see it). You should put the pragma next to the wrapper function because otherwise the CIL front-end might think that the wrapper is not used and will remove it from the code!

To ensure that you have the wrapper available whenever you want to use the library it is a good idea to put the wrapper in the header file that declares the corresponding function (in that case we suggest you use “__inline static” as declaration specifiers). We also suggest that you put the wrappers at the end of such header files to ensure that you have the prototypes for the wrapper functions already. We have done just that with the wrappers that we wrote for the standard C library. You can find them in files in cil/include directory. For example the stdio_wrappers.h file contains the wrappers for the stdio.h standard header (We use a patched version of stdio.h that includes stdio_wrappers at the end.) We suggest that you look through those files to find more examples of wrappers.

8.3  Writing Complex Wrappers

For most wrappers the description from the previous section should be enough. Some wrappers, however, are more complicated. We discuss in this section some of these complications, along with the CCured mechanisms that address them, for a function similar to the sendmsg function (defined in <socket.h>). Here are the required declarations, along with a code fragment that uses sendmsg in such a way that its msg_iov field becomes FSEQ.:

int sendmsg(int fd, struct msghdr * msg, int flags);
struct msghdr {
     void         * msg_name;       /* optional address */
     int            msg_namelen;    /* size of address */
     struct iovec * msg_iov;        /* scatter/gather array */
     int            msg_iovlen;     /* # elements in msg_iov */
     void         * msg_control;    /* ancillary data, see below */
     int            msg_controllen; /* ancillary data buffer len */
     int            msg_flags;      /* flags on received message */
};
struct iovec {
     char*   iov_base;  /* The base address of the fragment */
     int     iov_len;   /* Length in bytes of the fragment */
};

int foo(int fd, struct iovec *array, int array_len) {
    struct msghdr msg = { 0, 0, array, array_len, 0, 0, 0};
    // Make the msg_iov be FSEQ
    struct iovec * foo = msg.msg_iov + 1;
    return sendmsg(fd, &msg, 0);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Since our code does arithmetic on the msg_iov field, it becomes FSEQ and the required mangling for sendmsg is sendmsg_scsfs (all the other pointers in the prototype remain SAFE). Let us try the wrapper-writing method from above (isolate the arguments of sendmsg):

int sendmsg(int fd, struct msghdr * msg, int flags);
struct msghdr {
     void         * msg_name;       /* optional address */
     int            msg_namelen;    /* size of address */
     struct iovec * msg_iov;        /* scatter/gather array */
     int            msg_iovlen;     /* # elements in msg_iov */
     void         * msg_control;    /* ancillary data, see below */
     int            msg_controllen; /* ancillary data buffer len */
     int            msg_flags;      /* flags on received message */
};
struct iovec {
     char*   iov_base;  /* The base address of the fragment */
     int     iov_len;   /* Length in bytes of the fragment */
};

int foo(int fd, struct iovec *array, int array_len) {
    struct msghdr msg = { 0, 0, array, array_len, 0, 0, 0};
    // Make the msg_iov be FSEQ
    struct iovec * foo = msg.msg_iov + 1;
    return sendmsg(fd, &msg, 0);
}

#pragma ccuredwrapper("sendmsg_wrapper1", of("sendmsg"))
__inline static
int sendmsg_wrapper1(int fd, struct msghdr * msg, int flags) {
  return sendmsg(fd, __ptrof(msg), flags);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

We still get the error:
ccuredcode.tmp/ex44.c:1: Error: sendmsg appears to be external
  (it has a wrapper), yet it has a mangled name: sendmsg_scsfs_.
  Did you forget to use __ptrof and a version of __mkptr?
 For more information, consult the online documentation on
  "Writing Wrappers".
This happens because the wrapper and the true sendmsg still share all of the pointer nodes in the struct msghdr and struct iovec. We must have two versions of at least struct msghdr: one that the client code uses and one that the library is using. If you browse the above code and look at the prototype for the sendmsg function (not the wrapper) you will see that CCured has tried to address this issue by creating the structures msghdr_COMPAT and iovec_COMPAT (called _COMPAT because they are reachable from the prototype of the wrapped function). But that was not sufficient in this case: if you browse the code you will see that even the msg_iov field of the new msghdr_COMPAT is FSEQ. And you get the following error message:
Error: The suffix for the compatible version of msghdr_COMPAT is sfs. 
This means that you have misused this compatible version. Please check your code.
The problem in this case is subtle: because sendmsg is declared to take a struct msghdr argument and the result type of __ptrof is void *, the CIL front-end has inserted a cast (struct msghdr*) in the call to sendmsg in the wrapper. When CCured later decides to change the prototype of sendmsg it does not change the cast also, which makes it appear that an argument of type struct msghdr * is assigned to a formal of type struct msghdr_COMPAT. To prevent this from happening, you must store the result of the __ptrof function in a local variable, whose type you write: struct msghdr __COMPAT *. Note that __COMPAT is here an attribute to the structure name, not part of the name; CCured will do the right thing when it creates the compatible version of msghdr. So, we try the following:

int sendmsg(int fd, struct msghdr * msg, int flags);
struct msghdr {
     void         * msg_name;       /* optional address */
     int            msg_namelen;    /* size of address */
     struct iovec * msg_iov;        /* scatter/gather array */
     int            msg_iovlen;     /* # elements in msg_iov */
     void         * msg_control;    /* ancillary data, see below */
     int            msg_controllen; /* ancillary data buffer len */
     int            msg_flags;      /* flags on received message */
};
struct iovec {
     char*   iov_base;  /* The base address of the fragment */
     int     iov_len;   /* Length in bytes of the fragment */
};

int foo(int fd, struct iovec *array, int array_len) {
    struct msghdr msg = { 0, 0, array, array_len, 0, 0, 0};
    // Make the msg_iov be FSEQ
    struct iovec * foo = msg.msg_iov + 1;
    return sendmsg(fd, &msg, 0);
}

#pragma ccuredwrapper("sendmsg_wrapper2", of("sendmsg"))
__inline static
int sendmsg_wrapper2(int fd, struct msghdr * msg, int flags) {
  struct msghdr __COMPAT *msg1 = __ptrof(msg);
  return sendmsg(fd, msg1, flags);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Using __ptrof is not enough, however, since the object still has a FSEQ field. CCured detects this and changes the compat version to have the same FSEQ field, so we get the same error as before. What we should do instead is to make a copy of the structure pointed to by msg and copy the fields one by one. There are several ways in which CCured can help with that. First, there are some macros that you can use.

You can replace the declaration of msg1 above with:

  __DECL_COMPAT_STACK(msg1, msghdr, msg);
This macro (defined and explained in ccured.h) first reserves some storage for a struct msghdr_COMPAT on the stack (hence the STACK in the name). Then it declares the msg1 variable and it copies the contents of the struct msghdr pointed to by msg into the local copy. This copying is done only if necessary (the msg pointer is not null and the structure it points to is not already compatible). Here is the expansion of the above macro invocation:

  /* Declare the __deepcopy function that we need */                        
  void __deepcopy_msghdr_to_compat(struct msghdr __COMPAT * compat, struct msghdr * fat);
  /* Declare the place where we'll make the copy */                         
  struct msghdr __COMPAT msg1_area;
  struct msghdr * msg1__ptrof = __ptrof_nocheck(msg);
  struct msghdr __COMPAT * msg1 =
    /* We are done if we have NULL or an already compat struct */
    (msg1__ptrof && (! CCURED_HAS_EMPTY_MANGLING(struct msghdr))) ?
      /* Now do the copying as specified in the argument */
      (__deepcopy_msghdr_to_compat(& msg1__area, msg1__ptrof), & msg1__area)
    : /* No copying is needed. Use a trusted_cast to prevent CCured from 
         connecting the two versions of the structure */
      (struct msghdr __COMPAT *)__trusted_cast(msg1__ptrof);
Notice that care is taken to avoid copying if not necessary and also to prevent CCured from connecting directly the two versions of the struct.

There are more macros like __DECL_COMPAT_STACK, that allow you to allocate the space on the heap instead of the stack, or even to avoid copying. We'll discuss these later. Now we look again at the sendmsg wrapper in which we use the new macro:

int sendmsg(int fd, struct msghdr * msg, int flags);
struct msghdr {
     void         * msg_name;       /* optional address */
     int            msg_namelen;    /* size of address */
     struct iovec * msg_iov;        /* scatter/gather array */
     int            msg_iovlen;     /* # elements in msg_iov */
     void         * msg_control;    /* ancillary data, see below */
     int            msg_controllen; /* ancillary data buffer len */
     int            msg_flags;      /* flags on received message */
};
struct iovec {
     char*   iov_base;  /* The base address of the fragment */
     int     iov_len;   /* Length in bytes of the fragment */
};

int foo(int fd, struct iovec *array, int array_len) {
    struct msghdr msg = { 0, 0, array, array_len, 0, 0, 0};
    // Make the msg_iov be FSEQ
    struct iovec * foo = msg.msg_iov + 1;
    return sendmsg(fd, &msg, 0);
}

#pragma ccuredwrapper("sendmsg_wrapper3", of("sendmsg"))
__inline static
int sendmsg_wrapper3(int fd, struct msghdr * msg, int flags) {
  __DECL_COMPAT_STACK(msg1, msghdr, msg);
  return sendmsg(fd, msg1, flags);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If you browse this code you will see that in addition to the macro expansion there is now a definition for the function __deepcopy_msghdr_to_compat. CCured recognizes this name in a prototype (which is added by the __DECL_COMPAT macros) and fills in the body with code that tries to copy one field at a time from a struct msghdr to a struct msghdr_COMPAT.

Here is how CCured fills in the code of deepcopy functions: For our example, the deepcopy code that CCured generates would do. But such code might not be what you need. If the iov_base pointer in the iovec struct becomes non-SAFE then the default deepcopy function will abort. In that case, you can write the parts of the deepcopy function that you care about and let CCured fill in the rest. In the next example we show the full wrapper for sendmsg, that works even when the embedded struct iovec is mangled:

int sendmsg(int fd, struct msghdr * msg, int flags);
struct msghdr {
     void         * msg_name;       /* optional address */
     int            msg_namelen;    /* size of address */
     struct iovec * msg_iov;        /* scatter/gather array */
     int            msg_iovlen;     /* # elements in msg_iov */
     void         * msg_control;    /* ancillary data, see below */
     int            msg_controllen; /* ancillary data buffer len */
     int            msg_flags;      /* flags on received message */
};
struct iovec {
     char*   iov_base;  /* The base address of the fragment */
     int     iov_len;   /* Length in bytes of the fragment */
};

int foo(int fd, struct iovec *array, int array_len) {
    struct msghdr msg = { 0, 0, array, array_len, 0, 0, 0};
    // Make the msg_iov be FSEQ
    struct iovec * foo = msg.msg_iov + 1;
    return sendmsg(fd, &msg, 0);
}

extern void* malloc(int);
extern void  free(void*);
#pragma ccuredwrapper("sendmsg_wrapper4", of("sendmsg"))
__inline static
int sendmsg_wrapper4(int fd, struct msghdr * msg, int flags) {
    __DECL_COMPAT_STACK(msg1, msghdr, msg);
    int result = sendmsg(fd, msg1, flags);
    // We can now free the msg_iov (if it was allocated)
    if(msg1->msg_iov != msg->msg_iov) {
      free(msg1->msg_iov);
    }
    return result;
}


__inline static
__DEEPCOPY_TO_COMPAT_PROTO(iovec) {
  compat->iov_base = __ptrof_nocheck(fat->iov_base);
}

__inline static
__DEEPCOPY_TO_COMPAT_PROTO(msghdr) {
  // We leave the msg_name and msg_control to CCured
  
  if(CCURED_HAS_EMPTY_MANGLING(* fat->msg_iov)) {
    // We do not need to copy the msg_iov array
    compat->msg_iov = __ptrof(fat->msg_iov);
  } else {
    int len = fat->msg_iovlen;
    int v;
    compat->msg_iov = malloc(len * sizeof(compat->msg_iov[0])); 
    for (v=0; v<len; v++) {
      struct iovec __COMPAT *iptr = __trusted_add(compat->msg_iov, v);
      __deepcopy_iovec_to_compat(iptr, & fat->msg_iov[v]);
    } 
  }
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Notice several important things in the bodies for the deepcopy functions: Now we look briefly at the companion method recvmsg. Instead of sending a message, it receives one. This means that the msg_iov data is an output value, not an input. We use the following wrapper in this case:

int recvmsg(int fd, struct msghdr * msg, int flags);
struct msghdr {
     void         * msg_name;       /* optional address */
     int            msg_namelen;    /* size of address */
     struct iovec * msg_iov;        /* scatter/gather array */
     int            msg_iovlen;     /* # elements in msg_iov */
     void         * msg_control;    /* ancillary data, see below */
     int            msg_controllen; /* ancillary data buffer len */
     int            msg_flags;      /* flags on received message */
};
struct iovec {
     char*   iov_base;  /* The base address of the fragment */
     int     iov_len;   /* Length in bytes of the fragment */
};

int foo(int fd, struct iovec *array, int array_len) {
    struct msghdr msg = { 0, 0, array, array_len, 0, 0, 0};
    // Make the msg_iov be FSEQ
    struct iovec * foo = msg.msg_iov + 1;
    return recvmsg(fd, &msg, 0);
}

extern void* malloc(int);
extern void  free(void*);

#pragma ccuredwrapper("recvmsg_wrapper", of("recvmsg"))
__inline static
int recvmsg_wrapper(int s, struct msghdr *fat_msg, int flags) {
  __DECL_COMPAT_STACK(lean_msg, msghdr, fat_msg);
  int result = recvmsg(s, lean_msg, flags);
  // Copy the contents of lean_msg into fat_msg
  __COPYOUT_FROM_COMPAT(lean_msg, msghdr, fat_msg);
  if(lean_msg->msg_iov != fat_msg->msg_iov) {
    // If we allocated new space for msg_iov, free it now
    free(lean_msg->msg_iov);
  }
  return result;
} 


__inline static
__DEEPCOPY_FROM_COMPAT_PROTO(iovec) {
  // Notice how we use iov_len to construct the metadata for the iov field
  fat->iov_base = __mkptr_size(compat->iov_base, compat->iov_len);
}

__inline static
__DEEPCOPY_FROM_COMPAT_PROTO(msghdr) {
  fat->msg_name    = __mkptr_size(compat->msg_name, compat->msg_namelen);
  fat->msg_control = __mkptr_size(compat->msg_control, compat->msg_controllen);

  if(CCURED_HAS_EMPTY_MANGLING(* fat->msg_iov)) {
    // We do not need to copy the msg_iov array
    fat->msg_iov = __mkptr_size(compat->msg_iov,
                                compat->msg_iovlen * sizeof(fat->msg_iov[0]));
  } else {
    int len = compat->msg_iovlen;
    int v;
    fat->msg_iov = malloc(len * sizeof(fat->msg_iov[0])); 
    for (v=0; v<len; v++) {
      struct iovec __COMPAT *iptr = __trusted_add(compat->msg_iov, v);
      __deepcopy_iovec_from_compat(& fat->msg_iov[v], iptr);
    } 
  }
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

For recvmsg we use the companion functions that copy _FROM_COMPAT. We declare the deepcopy functions as before, except that now we must assign to the fat fields, and we must use the __mkptr family of functions to construct the metadata. Notice that for the output argument of recvmsg we copy the data from the stack-allocated lean_msg into msg.

The above macros are written without close attention to memory leaks. However, these are just examples that are intended to show you how to write wrappers.

The CCured library defines the functions __deepcopy_stringarray_from_compat and __deepcopy_stringarray_to_compat that operate on nul-terminated sequences of pointers to nul-terminated strings (such as argv in the prototype for main). To use these functions you must add:
#include "functions/deepcopy_stringarray.h"

8.4  Wrappers for data

So far we looked at wrappers for functions. Sometimes the global that needs to be wrapper is not a function but a name for a data area. In the standard library that is very rare. One example is the environ global variable of type char ** that stores a pointer to a null-terminated sequence of environment strings (like the sometimes-used third argument of main). If you access this variable directly you might get an error like this:
tcpserver_comb.o: In function `env_get_qf':
tcpserver_comb.o(.text+0x15d10): undefined reference to `environ_qq'
You cannot write wrappers for data! You MUST replace the access to the data variable with a function. There are many ways in which you can write the function. If the wrapped global had type char * you could just write a function:

// This is for the cast when environ is a pointer to a string
char* get_environ(void) {
  return __mkptr_string(environ);
}
But this would not be enough for our case. In general, both of the pointer types in environ could be fat, so you must do a deepcopy:

#include "functions/deepcopy_stringarray.h"
// This is Ok, but allocated too often
char** get_environ(void) {
  return __deepcopy_stringarray_from_compat(environ);
}
If your program accesses the environ variable a lot, you do not want to make a deepcopy every time. So, you can make one and save it in a variable, which you can then access at will.

Or, maybe your program accesses environ always with an index operation environ[i]. In that case you can avoid deepcopying altogether by writing a function environ_idx that takes an integer and returns the ith element in environ. I show below the case where it appropriate to trust that the index is within the bounds:

extern char ** environ;
char* environ_idx(int i) {
  char * __SAFE * __SAFE p_environ = __trusted_add(environ, i);
  // We are going to believe that i is within bounds
  return __mkptr_string(* p_environ);
  
}

8.5  Wrapper Helper Functions

8.5.1  Helpers for incoming wrapper arguments

These functions have two purposes. First, they convert the incoming argument (which may be a fat pointer) to a regular C pointer or data to be used in run-time checks and for passing to C library functions. Second, these functions direct CCured to request some minimal metadata to accompany the pointer. For example, on entry to a string library function CCured will want to request that the pointer be accompanied by extent, and the target of the pointer be a null-terminated string buffer.

8.5.2  Helpers for outgoing wrapper return values

These functions allow you to construct fat pointers from regular C pointers. These functions do not do any checking.

8.6  Final Notes on Wrappers

Chapter 9  Advanced CCured Issues

In this chapter we discuss a collection of issues having to do with using CCured on real programs. Some of the issues are related to sound handling of dark corners of the C programming language (e.g. function pointers, initialization of globals, variable argument functions). Other issues are related to mechanisms to give hints to the CCured inferencer with the ultimate goal of reducing the number of cases in which the inferencer gives up and decided to use the conservative but expensive WILD pointers (e.g. polymorphism, custom memory allocators).

9.1  Function Pointers

One of the signs that a C program is a “serious” one is the use of function pointers. There would be nothing wrong or unsafe about that if it wasn't also the case that most programmers do not feel necessary to use accurate types for function pointers, or to even use function prototypes. This is probably due to the fact that the syntax for function types in C is terrible. How often have you declared your function pointers to have type void (*)() when you actually wanted to say int * (* (* x3))(int x)(float) (a pointer to a function that takes an int and returns a pointer to a function that takes a float and returns a pointer to an int).

Of course, misusing function pointers can lead to the worst kind of errors. Fortunately such error rarely go unnoticed in code that is executed.

CCured supports two kinds of function pointers. The SAFE function pointers can only be invoked with the same number of arguments. If the types of the arguments are not right it is the argument that becomes WILD not the function pointer. A SAFE function pointer can only be cast to an integer or to the same function pointer type. We also have WILD function pointers which you can (try to) use as you please. In fact a WILD function pointer can be cast to any other WILD pointer type and can be stored in any tagged area. For this reason its representation must match that of any WILD pointer. However the capabilities of a WILD function pointer are typically quite different from those of a regular function pointer. For example, you should not be able to read or write from a function pointer.

The next picture shows the meaning of the _b field for a WILD function pointer.



Any function whose address is taken and becomes WILD, or that is used without a prototype (see the discussion at the end of this section) is a tagged function and has an associated descriptor that encodes the actual code to the function and the number of arguments. Here is an example:

int taggedfun(int anint, int * aptr) {
    return anint + * aptr;
}

int main() {
  int * i = taggedfun; // Bad cast. wildfun becomes tagged
  // Now we invoke it
  ((void (*)(int,int*))i)(5, i);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

The structure of a function descriptor is as shown below and a pointer to the _pfun field is used as the _b field whenever the address of the function is taken.

struct __functionDescriptor {
  unsigned int _len ; // Always 0
  void (* _pfun)() ; // Pointer to a function
  unsigned int _nrargs ; // The number of arguments
};
Since the _len field is always initialized to zero, whenever this WILD pointer is used for a read or a write it would appear that it points into a zero-length tagged memory area, so the bounds check will fail. We then have to protect against the pointer being subject to arithmetic prior to invocation. We do this by storing in the function descriptor the actual pointer to the function and checking at the time of a call through a WILD function pointer that the _p field of the pointer is equal to the _pfun field in the descriptor.

Finally we have to ensure that the function is called with the right number and kinds of arguments. There is no hope to be able to ensure this statically because a WILD function pointer can be used very liberally as any other WILD pointer. So, CCured conservatively forces all arguments and the return type to be WILD pointers. This includes arguments and return types that are actually scalars (see the example above for how integers are wrapped into WILD pointers). This will ensure that the types are the same (or compatible) and all we have to check is the right number of arguments is passed to the function. To perform these checks we use the following run-time support function:

/* Check that a function pointer points to a valid tagged function and check
   that we are passing enough arguments. We allow the passing of more
   arguments than the function actually expects */
__CHECK_FUNCTIONPOINTER(void *_p, /* The _p field of the function pointer */
                        void *_b, /* The _b field */
                        int nrActualArgs); /* The number of actual arguments */
Also, always use prototypes for the external functions you are using. Otherwise, it will appear to CCured that you are casting the function pointer to various incompatible types corresponding to each use and the function will be declared tagged (and pointers to such function to be WILD). You get some help from CCured here because its whole-program merger will construct prototypes for the functions that are defined somewhere in your program. But when you use even simple things like printf you must include the proper header files.

9.2  The main Function

The main function is the entry point to your program. The most general type of the function is:

int main(int argc, char **argv, char **envp);
although when the arguments are not used it is common to not write them. Depending on how you use the argv and envp arguments, CCured might decide that they should be of some non-SAFE type. In that case CCured will generate code that makes copies of appropriate kind of the argc and envp arguments.

Take a look at what happens for this example:

int main(int argc, char **argv) {
   for(; *argv; argv ++) { // Scan the args
      char *p = *argv;
      while( *p) { p ++; }
   }
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

CCured will also insert a call to ccuredInit, which initializes the CCured run-time library.

9.3  Global Initialization

C has a very rich language of initializers for globals and locals. The language is so rich that neither gcc nor MSVC implement it fully. For a discussion of how our front-end handles initialization, please see the the CIL documentation.

Once programs are presented to CCured all the initialization for locals is turned into assignments, but most initialization code for globals is preserved. However, in some cases CCured must insert some checks related to the initializers. These checks are placed in a special function called a global initializer.

The name of a global initializer starts with __globinit. CCured will try to insert a call to the global initializer that it creates in the main function to ensure that it is run before anything else in the program. If it cannot find a main it will emit a warning:
Warning: Cannot find main to add global initializer __globinit_myfile
If you see such warnings and intend to actually run the code, make sure whoever invokes any function in the cured code calls the global initializer first.

9.4  Casting Integers to Pointers

In CCured it is Ok to cast any kind of pointer to an integer, and in fact any pointer comparison is performed after such a cast. But if you try to cast an integer to a pointer the following two things happen:
  1. The inferencer will not allow the destination pointer to be SAFE unless the value you are casting is the constant 0. Any other kind of pointer can be used.
  2. More importantly, the resulting pointer will have metadata that will prevent you from using the pointer in a memory dereference. For a FSEQ pointer the _e field will be null, for a SEQ both the _b and the _e fields will be null and for a WILD the _b field will be null.
This means that such pointers cannot be used in memory dereferences. If your program casts a pointer into an integer and then back to a pointer this will be an issue. CCured will emit a warning whenever this happens. So far we have very few programs that do this and even then in one of few forms.

Some programs are just not careful about keeping pointers separate from integers and gratuitously cast to integers. The solution in that case is to change the type of the intermediate location to a void* (or to a more precise type of pointer if possible).

Other programs cast pointers to integers because they want to do pointer arithmetic and do not have to worry about the implicit scaling that C uses for pointer arithmetic. Use char* to do such arithmetic.

Some other programs also want to do arithmetic but of a kind not allowed for char* such as the following code which tries to align a pointer to a 16 byte boundary:

int* alignit(int *x ) {
  return (int*)((int)x & ~15);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

The solution here is a cute trick that can be used in some situations to cast an integer into a pointer, provided you know that it has the same metadata with some other legal pointer. Thus to cast the integer x to a pointer, while borrowing the metadata from pointer pdo:

p + (x - (int)p);
Thus you are turning a cast into pointer arithmetic. CCured will force the kind of the pointer to be either WILD or SEQ but everything will work as expected. Of course you have to worry about scaling back the difference by the size of the type pointed to by x. Here is how the previous alignit function can be written:

int* alignit(int *x ) {
  int ix = (int)x & ~15;
  return x + ((ix - (int)x) >> 2);
}

Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If you are lazy and do not want to change your code you can ask CCured to insert code that at every cast from a scalar to a pointer records the line-number of the cast. Then when a non-pointer is dereference the CCured run-time system will try to tell you which particular cast in your program produced this fake pointer. Use the interceptCasts pragma for this purpose (see Section 9.10). We have not found this feature very useful because in fact not too many programs cast integers to pointers.

9.5  The sizeof Issue

It is obvious by now that CCured will change the layout of some datatypes. That can lead to several kinds of problems. For example, if you are calling a library that is not cured then you better not change the layout of the data that is passed back and forth. This issue is discussed more in Chapter 8. Another problem is when the code is written assuming that datatypes have a certain layout, such as the following code that accesses the

#include <stdio.h>
int *a[8];
void bad_code(int * *x) {
  int * pa = a; // Make a's elements WILD
  printf("a has %d elements", sizeof(a) / 4);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

This code will probably print 16 instead of 8 because each element of a is now 8 bytes long. Such code is very ill and it cannot be cured without manual intervention. So, let's assume that we change the code to:

#include <stdio.h>
int *a[8];
void so_and_so_code(int * *x) {
  int * pa = a; // Make a's elements WILD
  printf("a has %d elements", sizeof(a) / sizeof(int *));
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

As it turns out this code is perfectly fine but our inferencer cannot tell that there is a connection between the type of the element of a and the int * that appears in the argument of sizeof. Even though the bad cast will force the array elements to be WILD pointers the int * that appears in the argument of sizeof will be a SAFE pointer. Thus this code will also print 16. In fact, you will see a warning:
pathexec_env.c:42: Warning: Encountered sizeof(int */* __attribute__((___ptrnode__(2595))) */) when type contains pointers. Use sizeof expression. Type has a disconnected node.
If CCured says that the type has a connected node, then you are probably Ok. It means that the node inside sizeof is connected to the other nodes, so it will probably get the right kind. However, if CCured says that the type has disconnected nodes then you should worry.

To really point out the connection change the code as shown below. It can be argued also that this code is clearer and thus should be used even if you do not use CCured.

#include <stdio.h>
int *a[8];
void good_code(int * *x) {
  int * pa = a; // Make a's elements WILD
  printf("a has %d elements", sizeof(a) / sizeof(a[0]));
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Similar problems arise for any use of sizeof, such as in the argument of allocation functions.

9.6  Variable argument functions

Variable-argument functions in C are inherently unsafe since there is no language-level mechanism to ensure that the actual arguments agree in type and number with the arguments that the function will be using. There are several ways to implement variable argument functions in C and CCured supports some of them quite well: There are two kinds of variable-argument functions in C: CCured supports both kinds of functions and will scan the program to find out for each function what types of arguments are passed. In Section 9.6.1 we describe how the programmer can prevent this automatic inference by specifying the set of types of arguments.

CCured redefines the macros in <stdarg.h> and <vararg.h> to do special bookkeeping. In vararg functions, the macro va_start is used to initialize an va_list variable to point to the trailing arguments. CCured checks that the second argument is the last formal before the ....

Both in vararg and valist functions the macro va_arg can be used, as follows:

 T x = va_arg(args, T)
args must be a va_list variable and T must be compatible after the usual actual argument promotions (e.g. char and short to int and float to double) with one of the types in the struct associated with args. CCured checks this at run-time.

The CCured support for variable argument functions is quite flexible. Multiple variable argument lists can be processed in parallel, an argument list can be re-initialized with va_start and processed multiple times. A function can even work with variable argument lists that have different sets of types accepted (but for this you need to specify manually the set of types of arguments as explained in Section 9.6.1). Variable argument lists can be passed down but the regular CCured checks for stack allocated variables will prevent the passing of these lists up the call chain and also their storing in the heap.

The main thing that is not supported in CCured is the fetching of an argument with a different type than it was stored. It remains to be seen if this is a problem. We have looked at several variable argument functions (including full implementations of printf and sprintf) and so far we have found that CCured accepts those functions without any change except for the specification of the struct of the accepted argument types (as explained below).

9.6.1  Programmer control over vararg functions

If you do not want CCured to find automatically all the types that can be passed to a function, you can specify the set of types that can be used for arguments. Also, you should not let CCured infer the argument types for printf-like functions, but you should instead use the special support for them, as explained in Section 9.6.2.

You can declare the argument types by declaring a descriptor. This is a struct data type whose fields have the types that can be passed to the function. The order and the names of the fields do not matter. For example, such a struct for printf would be the following (this structure is defined in ccured.h):

struct printf_arguments {
   int      f_int;
   double   f_double;
   char    *f_string;
};
The simplest way to specify that such a struct describes the types of arguments for a variable argument function is to use a pragma:

#pragma ccuredvararg("myvarargfunction", sizeof(struct printf_arguments))
Notes: An equivalent method is to associate the __CCUREDVARARG(struct printf_arguments) attribute with the type of the function myvarargfunction:

int (__CCUREDVARARG(struct printf_arguments) myvarargfunction)(int last, ...);
You have to use this method if you want to specify that a function pointer is variable argument:

int (__CCUREDVARARG(struct printf_arguments) * myvarargptr)(int last, ...);
typedef int (_CCUREDVARARG(struct printf_arguments) fptr)(char *format,...);
A more fine-grained way to specify the same thing is to use the __CCUREDVARARG type attributes for va_list every time it appears. This allows you to specify different sets of types for different locals:

va_list __CCUREDVARARG(struct printf_arguments) args1, 
        __CCUREDVARARG(struct some_other_type) args2;

9.6.2  Printf-like functions

Since the vast majority of uses of variable argument functions if for printf-like functions, CCured contains special support for them. Specifically if a vararg function is declared to be a printf-like function then all of its invocations in which the format string is a constant will be checked statically. For the other invocations a wrapper for printf will be called that will check the types of the actuals against the format string before calling the real printf function.

To declare a function to be printf-like use the following pragma:

#pragma ccuredvararg("myprintf", printf(1))
where the last argument is the index of the format argument in the argument list (starting with 1). Note that you will get a run-time error if you try to use the va_arg macro in the implementation of such a function. In those implementations you should invoke functions like vprintf and vsprintf instead.

GCC already has support for communicating to the compiler that a function is printf-like. This is done as follows:

int myprintf(const char* format, ...) __attribute__((format(printf, 1, 2)))
where the “1” means that the first argument is the format string and the “2” means that we should start checking with the second argument. CCured recognizes this attribute and it considers it equivalent with the ccuredvararg from above. Note that the second argument in the format attribute is ignored in CCured.

You can use the format attribute even for function pointers:

int (__attribute__((format(printf, 1, 2))) *myptr)(char *format, ...);
Note that CCured does not currently like passing pointers to printf with the intention of printing the pointer value. You should manually cast those pointers to long when passing them to printf-like functions.

Also, you should not let CCured infer automatically the descriptors for printf-like functions. Otherwise, it is quite likely that the descriptor that will be inferred is different than the built-in descriptor printf_arguments (which the runtime library is using to check the calls to printf-like functions. CCured will warn you about all automatically inferred descriptors and you should manually inspect all the functions involved.

As for the regular variable argument functions, the pragma works only for named functions but not for pointers to functions. For that purpose you must use attributes:

int (__CCUREDFORMAT(1) * myprintf)(char *format, ...);
typedef int (_CCUREDFORMAT(1) fptr)(char *format,...);

9.6.3  Scanf-like functions

Since it proved too much trouble to handle scanf-like functions in a safe yet transparent way we currently require the programmer to rewrite the invocations to scanf using a number of functions that we provide. For example instead of

  int entry;   double then;   char buffer[6];

 ... fscanf(file, "Entry:%d;  Then:%lf;  5 digits:%5[0-9]; useless text.", 
            &entry, &then, buffer) ...
you should write

 ... (resetScanfCount(), 
      entry = ccured_fscanf_int(file, "Entry:%d"),
      then  = ccured_fscanf_double(file, ";  Then:%lf"),
      ccured_fscanf_string(file, ";  5 digits:%5[0-9]", buffer), 
      ccured_fscanf_nothing(file, "; useless text."), //advance the file pointer.
      getScanfCount ()) ...
The functions resetScanfCount and getScanfCount are necessary only if you use the result of the call to fscanf in the original code. Note that our replacement scanf functions can be used to return only one result at a time, consequently the format string that is passed must contain only one format specifier, possibly along with characters to be matched.

The following are the scanf-like functions that we currently support:

  extern int    ccured_fscanf_int(FILE *, char *format);
  extern double ccured_fscanf_double(FILE *, char *format);
  extern void   ccured_fscanf_string(FILE *, char *format, char *string);
  extern void   ccured_fscanf_nothing(FILE *, char *format);
If the original program uses scanf, just consider that you are using fscanf from stdin. If instead your program contains sscanf then you can use the function

void resetSScanfCount(char *string);
to dump the string to the temporary file ccured_sscanf_file then use the replacement for fscanf from above. For example,

 ... (resetSScanfCount(inputString), 
      entry = ccured_fscanf_int(ccured_sscanf_file, "Entry:%d"),
      then  = ccured_fscanf_double(ccured_sscanf_file, ";   Then:%lf"),
      ccured_fscanf_string(ccured_sscanf_file, ";   5 digits:%5[0-9]", buffer), 
      getScanfCount ()) ...   //getScanfCount is required when using resetSScanfCount
Note that the current support for scanf is far from satisfactory and will likely change in the future

9.6.4  Implementation Issues

Almost all of the checking for variable-argument functions is done at run-time. At the time of a call each actual argument is compared with the types in the struct associated with the vararg function. A global data structure is filled with the number of arguments (in the global __ccured_va_count and a list of indices describing for each actual argument the index within the struct types (in __ccured_va_tags).

In the body of a vararg function, a data structure is allocated on the stack to hold a copy of the global description of the arguments that was created by the caller. The call to va_start initializes this data structure and each call to va_arg checks that we are not reading past the end of the actuals and also that the type of the fetched argument matches that of the actual argument.

9.7  Tagged Unions

As we have seen in Section 3.2.2 CCured can handle union types whose fields have compatible pointer types at corresponding offsets. If this is not the case then you will need to tell CCured how to handle the union. One option is to turn the union into a struct, but we do not recommend this because it increases memory usage and can change the behavior of your program if your code writes to one union field and then reads from a different one. A better option is to declare that the union is a tagged union. CCured actually supports two forms of tagged unions: one in which CCured adds a tag field and maintains it for you, and one in which your program maintains its own tag, and CCured checks that it is used properly.

9.7.1  CCured-maintained tags

You can declare a union to be tagged by adding the attribute __TAGGED to its definition. CCured will expand the union to contain a tag field. A tag is an RTTI value (Section 7.3) that encodes the type of the last field written in each union value. Here is an example:

union int_or_ptr {
  int   i;
  int  *p;
} __TAGGED; // We declare it tagged

int main() {
  union int_or_ptr x;
  int i;
  x.i = 5;
  i = x.i; // This will work
  i = * x.p; // This will fail
  x.p = &i;
  i = x.i; // This will fail
  i = * x.p; // This will work
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

You can see that CCured has defined the following structure:

struct tagged_int_or_ptr {
   struct RTTI_ELEMENT *    __tag    ;
   union int_or_ptr __data    ;
} __TAGGED  ;
You can also see in the code that CCured generates assignments to the __tag field before each assignment to a union field. And CCured inserts calls to CHECK_UNIONTAG before each read-access to the field.

Notes:

9.7.2  User-defined tags

Many programs define their own tagged unions, in which a struct contains a tag field and a union holding one of several types of data. In these cases, using a CCured-supplied tag is redundant. You can annotate a union to tell CCured what meaning you assign to various tag values, and CCured will then check that the tags are maintained properly.

When a tag is modified, the “data” portion of the structure will be zeroed. Therefore, when writing both the tag and data portions, programs must always modify the tag first, followed by the data.

When a program reads or writes the data part of a tagged union, CCured will read the tag and check that it is appropriate for the union field being accessed.

Tags are defined by annotating each union field with __SELECTEDWHEN(exp) where exp is a boolean expression. exp may contain integer arithmetic and comparisons, and it can refer to the runtime value of a field in an enclosing struct by specifying the name of that field. For example:

enum tags {
  TAG_ZERO = 0,
};

struct host {
  short tag; // 0 for integer, 1 for structure, 10--12 for pointer to int
  union bar {
    int anint      __SELECTEDWHEN(tag == TAG_ZERO);

    struct str {
      int * * ptrptr;
      float f;
    } structure    __SELECTEDWHEN(tag == 1);

    int * ptrint   __SELECTEDWHEN(tag >= 10 && tag <= 12);
  } data;
} g;
int x;

int main() {
  g.tag = 12;                 //Select g.data.ptrint
  g.data.ptrint = &x;

  int* px = g.data.ptrint;    //To check that it's okay to access g.data.ptrint,
                              //CCured checks "g.tag >= 10 && g.tag <= 12"
  return 0;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In this case, the __SELECTEDWHEN attributes tell CCured that the field data.anint is active when the tag field is 0, the field data.structure is active when the tag field is 1, and the field data.ptrint is active when the tag field is between 10 and 12.

Notes:

9.8  Annotated Lengths

Programmers can annotate array pointers with length attributes. CCured will then use the annotated length whenever it needs to do a bounds check on that pointer, instead of transforming the pointer into a fat pointer. This has two advantages: Length annotations are allowed in two situations: struct fields may have length annotations that depend on the values of other fields in that struct, and function parameters may have lengths that depend on other parameters in that function. (NB: but the annotations on function parameters are not yet implemented. Coming soon ...)

Only pointer types may be annotated. The annotation __SIZE(exp) on a field means that the associated pointer is exp bytes long, where the expression exp can involve integer constants, arithmetic, sizeof, and the names of other fields in the same struct. So __SIZE(1 + foo) means that the specified pointer has a length that's one greater than the runtime value of field foo in the same object.

__COUNT(exp) means that the pointer is exp elements long. So when annotating a pointer with type T*, the annotation __COUNT(exp) is equivalent to __SIZE(exp * sizeof(T)).

Any field that is referred to by a __SIZE or __COUNT annotation is a metadata field. When a metadata field is modified, any pointer fields that depend on it are set to NULL. Therefore, when writing both the metadata and pointer fields, programs must always modify the metadata first, followed by the pointer.

When an annotated pointer field is read, CCured will read any metadata fields as well, and associate that length with the pointer. When a pointer field is written, CCured will check that the buffer's length is less than or equal to the length specified by the current value of the metadata fields.

extern void* malloc(int);
#pragma ccuredalloc("malloc", sizein(1), nozero)

struct bar {
  int nrInts;
  int *ints __COUNT(nrInts);
};

struct foo {
  int sizeBars;
  struct bar * bars __SIZE(sizeBars);
};

// Now the function that uses it

void init(struct foo* pFoo) {
  int nrBars = 5;
  pFoo->sizeBars = nrBars * sizeof(* pFoo->bars);
  pFoo->bars = (struct bar*)malloc(pFoo->sizeBars);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In this code, we first overwrite the field pFoo->sizeBars, which automatically sets the field pFoo->bars to NULL. The next step is to write a new pointer to the pFoo->bars field. During this write, CCured will check that the pointer being written (in this case, the result of malloc) is at least “pFoo->sizeBars” bytes long.

9.9  Memory Management

The high order bit: we use the Boehm-Weiser garbage collector.

TODO : finish this section

9.10  CCured Pragmas

The following pragmas are recognized by CCured. Note that pragmas can only appear in between global declarations. Some of them are discussed in more detail in following sections:
  1. #pragma box(off) - Turn off curing. The code that is not cured does not contribute constraints to the type inferencer. However the code is changed slightly during the program transformation phase to make sure that whenever it refers to global variables that were cured it uses their _p field.
  2. #pragma box(on) - Turn curing back on
  3. #pragma nobox("myfunc") - Turn curing off for the function myfunc
  4. #pragma boxtext(...) - CCured will turn this pragma into the ... text in the cured file
  5. #pragma ccuredpoly("myfunc1", "myfunc2", "struct foo") - CCured will treat the myfunc1 and myfunc2 functions and the foo structure polymorphically. See Section 7.1.
  6. #pragma ccuredalloc("malloc", nozero, sizein(1)) - CCured will treat malloc as an allocation function whose length is passed in the first argument and which does not zero the allocated area. See Section 7.2.
  7. #pragma ccuredalloc("calloc", zero, sizemul(1,2)) - CCured will treat calloc as an allocation function whose length is passed are the product of the first two arguments and which does zero the allocated area. See Section 7.2.
  8. #pragma ccuredvararg("myfunc", sizeof(struct myfunc_arguments)) - Declares myfunc to be a variable argument function that can be passed a variable number of arguments each having one of the types of the fields of struct myfunc_arguments. See Section 9.6.
  9. #pragma ccuredvararg("myprintf", printf(2)) - Declares myprintf to be a printf-like function whose format string is in the second argument. See Section 9.6.
  10. #pragma ccuredwrapper("foo_wrapper", of("foo")) - Declares foo_wrapper to be a wrapper for foo. See Chapter 8. Implies #pragma cilnoremove and #pragma ccuredpoly for foo_wrapper.
  11. #pragma cilnoremove("func1", "var2", "type foo", "struct bar") - Instructs CIL to keep the declarations and definitions of the function func1 and variable var2, the definition of type foo and of structure bar.

Chapter 10  CCured Warnings and Errors

As you use CCured you might encounter various kinds of problems. Most of these are due to a combination of aggressive coding practices and CCured being less smart than the programmer. (Note: this section is continuously being expanded; if you do not see the answer to your question, or if the answer is not helpful, let us know).

10.1  Merging

10.2  Inference

10.3  Curing

10.4  Linking

10.5  Running the Cured Code

When you run the code you might get run-time errors. Make sure you read the Chapter 4 on ways to control the handling of errors.

Chapter 11  License

Copyright (c) 2001-2002, All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. The names of the contributors may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Chapter 12  Bug reports

If you find a bug in CCured, please send email to George Necula.

Chapter 13  Changes



Below are some of the changes in the CCured system. These are in addition to changes made to the underlying CIL infrastructure

Appendix A  Inference Results

The correspondence between the source code and the annotated graph can be found in the infer.c file. You may need to pass –emitinfer to ccured to see this file. For a base file named FOO.c, the graph will be shown in FOOinfer.c. This file represents the state of the tool after pointer kind inference but before run-time checks have been inserted and before names have been mangled. The infer.c file consists of two sections: the annotated source code and the inference graph.

A.0.1  Annotated Source

The annotated source code is just the original source code with the special syntax __NODE(n) associated with every pointer type in the program. There are two main places where the __NODE attribute can appear: immediately following a pointer-type constructor * or immediately following the name of a variable. In the former case the attribute specifies the node associated with the pointer-type while in the latter case the attribute specifies the node associated with the address-of the variable.

For example, the line:

  int * __NODE(3) ptr __NODE(4);
  int * __NODE(6) * __NODE(7) matrix __NODE(8);
indicates that node 3 in the graph is associated with the pointer variable ptr and node 4 is associated with the address of ptr. Node 7 is associated with the top-level pointer in the variable matrix, while node 6 is associated with the inner pointer (e.g., node 6 is associated with the type of the expression matrix[0]). __NODE(n) is a type qualifier (like const or restrict) that modifies the pointer type constructor just to its left.

Consider the following simple test case:

  int main() { 
    int * ptr;
    int base;
    ptr = &base;
    ptr ++; 
  } 
If we run it through CCured's inferencer we will see:

  int main() {
    int * __NODE(90) ptr __NODE(91);
    int base __NODE(92); 
    ptr = & base;
    ptr = ptr + 1;
  } 
Node 90 is associated with ptr. The additional nodes to the right are associated with the addresses of variables. Node 91 is associated with &ptr and node 92 is associated with &base. Since there is an assignment between ptr and &base, we expect to see an edge between nodes 90 and 92 in the graph. The graph itself can be found at the end of the C code near a line like:

  /* Now the solved graph (solver) */

A.0.2  Inference Graphs

In order to see the whole graph, you must pass the argument –emitGraphDetailLevel=3 to Ccured. The graph at the end of FILEinfer.c is represented as a printout of every node. Each node description has the following form:

  ID : Location (flags) (node this type points to)
   K=Kind/Reason  T=(base type of this node)
   S=(successor edges)
   P=(predecessor edges)
So the entry for node 90 (from the example in Section A.0.1) might look like:

  90 : Local(./simple.i.main.ptr).1 (stack,posarith,) ()
   K=FSEQ/from_flag T=int
   S=
   P=92:Cast
The node number is 90, the location field says that this node represents a local variable named ptr inside the main function of the file simple.i. The location of a node is a unique name for the pointer type. Locations use a hierarchical naming scheme is composed of a place followed by a period and an index within the place. The following places are used: Since each place can contain many occurrences of a pointer-type constructor (e.g. in a declaration int * * x) we use indices to differentiate between such occurrences. For a variable place (Glob, Static, Local), the index 0 is always associated with the node corresponding to the address of the variable. Then we traverse the type of the variable in depth-first order (for functions we start with the result and then continue with the arguments) and we assign indices starting at 1. For Field and Anon we do a similar traversal.

The flags field is a list of the following values:

Flag Meaning
stack this pointer may contain a stack address
escape this value may be assigned through a pointer and escape to the heap
upd a write may be performed through this pointer
posarith this pointer may be subject to positive pointer arithmetic
arith this pointer may be subject to arbitrary pointer arithmetic
null this pointer may become NULL
int an int may be cast into this pointer
noproto this pointer is associated with a function that is missing a prototype
interf this pointer is associated with a function that is part of an interface
sized this pointer may point into a sized array
reach_s this pointer may flow into a string
reach_q this pointer may flow into a SEQpointer
reach_i this pointer may flow into an INDEXpointer

In our example, node 90 has the posarith flag because it represents ptr and ptr++ appears in the code. Node 90 has the stack flag because &base is a stack address and the assignment ptr = &base occurs in the code.

Continuing our example, the text K=FSEQ/from_flag tells us that node the pointer type associated with 90 is FSEQbecause of one of the flags (in this case, the posarith flag). That is, the type inference has made node 90 a forward sequence pointer (one that carries its upper bound with it) because positive pointer arithmetic is performed on it (so we will need a run-time check to make sure that it stays in bounds).

The text T=int tells us the base type of node 90, that is, node 90 is associated with a pointer to an int.

The S= and P= lists show successors and predecessors of this node. In our example, the text P=92:Cast means that node 90 has a predecessor edge of type Cast from node 92 (the node for &base). That is, information from node 92 flows into node 90. This is the graph's way of representing the assignment statement ptr = &base. There are various sorts of edges in the graph:
Edge Name Meaning
Cast the value of pred may flow into succ
Compat pred and succ must have equal types
Safe if pred is WILD, succ must be WILDas well
  (usually links structures and their fields)
Null a NULL flows in the direction of the edge
Index if pred is INDEX, succ must be INDEXas well

Of these, Cast, Compat and Safe are the most common. Cast represents casts and assignments, Safe links up fields and structures so that they have consistent pointer kinds and Compat ensures equality between underlying pointers. As an example, in the code:

        int * __NODE(1) * __NODE(2) a;
        int * __NODE(3) * __NODE(4) b;
        a = b; 
We would see a Cast edge from 4 to 2 and a Compat edge between 1 and 3. The Compat edge ensures that nodes 1 and 3 will end up with the same pointer kind.

A.1  Type Names

CCured will also rename types that are expanded to contain extra information used by the run-time checks that ensure safety. For example, a SEQpointer needs to carry information about its bounds. This information is stored adjacent to the main pointer, but as a result a SEQpointer is larger than a normal C pointer. Thus a structure that contains a SEQpointer will be larger than an otherwise-identical structure that contains a normal C pointer (or a SAFEpointer). In addition, the offsets of any structure fields that come after the SEQpointer will be different. To mark these differences and indicate new types, CCured will change the names of types and structure. CCured does this by prepending strings to the structure or type name.
Pointer Kind Type Prefix
WILD wildp_
INDEX indexp_
SEQ seq_
SEQN seq_
FSEQ fseqp_
FSEQN fseqp_
SAFE No associated prefix
RWSTRING No associated prefix
ROSTRING No associated prefix

For example, given a program of the form:

  Foo * ptr;
  ptr++;
CCured will probably infer that ptr should be an FSEQpointer. The resulting cured code will have the following form:

  typedef struct {        /* type made by CCured */
    Foo * __FSEQ _p;      /* actual pointer */ 
    void * _e;            /* end of the allocated region: upper bound */
  } fseqp_Foo;            /* FSEQ pointer to an int */ 

  fseqp_Foo ptr;
  // code for "ptr++;"
If the original code were:

  Foo ** ptr;
  ptr++;                /* forces ptr to be FSEQ */
  ptr[0]--;             /* forces *ptr to be SEQ */ 
We would expect the final type of ptr in the cured code to be fseqp_seq_Foo, because ptr will have the type “(forward sequence pointer) to (sequence pointer) to Foo.” Note that fseqp_seq_Foo is not merely a name mangled form of Foo, since both the definition of Foo and the definition of fseqp_seq_Foo will exist in the final cured program.

Appendix B  A Tour of the Source Code

This section was updated on Janury 2003
./: 
  configure:            a shell script that creates the CCured makefiles by 
                        scanning your system for existing programs (like gcc)
  Makefile.cil.in:      instructions for building CIL
  Makefile.in:          instructions for building CCured
  Makefile.ocaml:       used by Makefile.ccured for the Ocaml part
  Makefile.gcc:         Included in the above if you use gcc
  Makefile.msvc:        Included in the above if you use Microsoft Visual C

./src: (ML code)
  main:                 driver for CCured: parses command line arguments
                        and transforms its input C files

./src/ccured: (ML code)
  cure:                  inserts run-time checks into C code based on 
                        pointer annotations
  curesplit:             turns multi-word structure pointer representations 
                        into multiple single-word variables (increases
                        performance by allowing later compilers to make
                        better optimizations)
  curestats:             counts the static number of run-time checks
                        inserted
  curechecks:           Some run-time checks for CCured
  cxxpp:                A preprocessor for EDG's output on C++ lowering.
  markptr:              marks pointers based on their usage so that the
                        inferencer can pick an efficient representation
  markutil:             Various utility functions
  optim:                optimizes the placement of run-time checks
  ptrnode:              a graph data structure used by the inferencer
  poly:                 handling of polymorphism
  seoptim:              a symbolic-execution based run-time check
                        eliminator
  solver:               an old inferencer
  solveutil:            support functions common to all inferencers
  type:                 an implementation of physical subtyping; determines
                        if one C type is a physical subtype of another
  typecheck:            verify that the pointer representations have been
                        assigned soundly
  unionfind:            a support data structure
  vararg:               handling of variable argument functions
  wrappers:             handling of wrappers

./bin/: (scripts)
  ccured:               a drop-in replacement for 'gcc' and for MS 'cl'

./include/: (C header files for use with programs being cured)
  ccured_GNUCC.patch:   a description of what to patch (modify) in GCC's
                        standard header files
  ccured_MSVC.patch:    as above, but for MS VC
  gcc_<version>/:       the patched files created when ccured_GNUCC.patch
                        is applied to the files in your /usr/include.
                        <version> is your version of gcc.  These files are
                        created when you build ccured.
  cl_<version>/:        as above, but for MSVC.
  ccured.h:             included before curing
  ccuredcheck.h:        inline macros for doing run-time checks (included
                        after curing)
  ccuredannot.h:        various declarations and macros common to ccured.h and
                        ccuredcheck.h
  *_wrappers.h:         wrappers for various functions in the standard library.
                        For example, we patch stdio.h to #include
                        stdio_wrappers.h, so that whenever you include stdio.h
                        in the target program the appropriate wrappers are
                        brought in as well.

./lib/: (C code for use at runtime)
  ccuredlib.c:          a library for error handling, wrapper helpers, etc
  gc/:                  boehm-weiser garbage collector 
                        (http://www.hpl.hp.com/personal/Hans_Boehm/gc/)
  browser*:             browser Javascript code

Appendix C  Old Tutorials

C.1  ftpd

Note: this section was written on 6/29/01. CCured has changed since then..

Once you download the sources for a new package, you can run the translator on it. To create a Makefile target for a package you typically add to cil/Makefile a target that just invokes make on the package's own Makefile with the CC variable bound to “ccured –merge”. I'll use the 6/29/01 version of ftpd as my example.

First we try to run it without our tool involved:
  % make ftpd-clean
  % cd test/ftpd/ftpd
  % make
This succeeds, generating an 'ftpd' binary. In the case of 'ftpd', running it is slightly complicated:
  % ./ftpd -D -d -p 3333
  (then in another window)
  % telnet localhost 3333
  Trying 127.0.0.1...
  Connected to localhost.
  Escape character is '^]'.
  220 madrone.cs.berkeley.edu FTP server (Version 6.5/OpenBSD, linux port 0.3.2) ready.
  (etc)
This is (to our way of measuring) success.

Now we try it in 'cil' mode:
  % cd cil
  % make ftpd-clean
  % make ftpd
At the moment, this also works, producing another 'ftpd' binary. We test it the same way, and rejoice at its success.

Finally, we dare to try it in 'box' mode, meaning the instrumentation module will be used:
  % make ftpd-clean
  % make ftpd INFERBOX=4
After crunching for a while, it reports this error (you have to scroll back a bit to see the right one):
  ./ls_all.c:1338: Bug: Calling non-wild ioctl with too many args
This is an error from the 'box' module, complaining about what it perceives to be a type error. If we investigate the named source line, we see

  if(ioctl(1, 0x5413, & win) == 0 && win.ws_col > 0)
confirming that 'ioctl' is involved. Since the *_all.c files are the output of our tool, and do not themselves #include any files, we can simply search in this file for ioctl's declaration. We do so, and see

  extern int ioctl(int __fd , unsigned long int __request , ...) ;
Hmmm... looks like it was declared to accept any number (>=2) of args, so this looks like a bug in the 'box' module; it should accept this code, but it does not.

The next step is to write a tiny C program which calls ioctl (see test/small2/ioctl.c), and verify it fails the same way

  % make scott/ioctl INFERBOX=4
  [...]
  ioctl.c:9: Bug: Calling non-wild ioctl with too many args
  [...]
Yep, same problem. Now we report this to George, since typically he's much faster at identifying the problem, since he wrote the 'box' module.

In the meantime (waiting for George to magically fix the problem), we could temporarily comment-out the call so we can proceed to find other bugs. Or, perhaps we change the ioctl call to instead call a wrapper function (wrappers are defined in lib/ccuredlib.c, which gets linked into the translated program).

Eventually (see "make go INFERBOX=4") we'll get an executable. If it runs correctly, celebration is in order. If not, it will usually fail because of a failed runtime check (this one is from a test vector for which go fails);

  % make go INFERBOX=4
  % cd test/spec/099.go/src
  % ./go 5 4
  [...]
  array bug: index is 5980 (vs 5980)
  Failure: Ubound
  Abort
Tracking down the source of such failures is the most time-consuming part of pushing a program through. Sometimes it's a bug in the translator, in which case ideally a test case can be isolated for easy diagnosis.

Sometimes (more and more often) it's a bug in the original program (go had 10 array bounds violation bugs at last count). In this case you have to change the original code to fix the bug; this may be easy or hard. If it's hard, try just surrounding the offending statement with an explicit bounds check in an 'if' statement, so the program skips the bad statements (that is what I did to cause all of the "array bug" outputs in the "5 4" case above).

C.2  Writing Wrappers Manually

To interface with external code, you are usually better off using the automatic wrapper system described in Chapter 8. However if that isn't possible, you'll need to write a wrapper directly in C:
Step 1:
Consult the name-mangling algorithm documented Section 8.1 and ccuredlib.c to decode the required types of the parameters.

Step 2:
Determine the semantics of the function being wrapped (e.g., if it's a unix libc call, consult its man page). In particular, find out how memory passed via pointers is accessed (read and written).

Step 3:
Write the wrapper, and make calls to the verification and query functions in the section in ccuredlib.c titled “general-purpose”. If the function manipulates wild pointers, be sure to update tags; conversely, if no wild pointers are involved, there are no tags to worry about.
Good examples to consult (in ccuredlib.c) include read_w, fgets_ffw, stat_ww, strcat_www, memmove_www.

C.3  Apache Modules

This section applies to CCured as of July 2002

C.3.1  Introduction

This writing assumes Apache 1.3.19 and an x86/Linux system. Apache is an open source web server that has the ability to dynamically load third-party modules. Modules can examine and alter HTTP requests and also examine and alter the webserver replies. For example, a compression module might examine the HTTP request to see if it contains the “Accept-Encoding: gzip” tag. If it does it might alter the HTTP reply, replacing the text of the webpage body with a compressed version of that text. Modules can be configured (via a file called httpd.conf) so that their behavior is limited to a certain location or directory.

Apache modules share the same address space as the Apache webserver: no software fault isolation is present. As a result, if the module crashes it brings down that webserver (although apache is usually configured to immediately spawn a new webserver thread to replace the fallen one). More distressingly, a module with a security violation (for example, a format-string bug) can allow remote users to gain shell access to the webserver machine (one version of mod_php3 features such a vulnerability: CCured prevents that vulnerability).

Apache modules are typically single files with a fairly standard naming convention: mod_foo.c is the foo module, where foo ranges over fairly descriptive keywords like gzip, random, urlcount, auth or layout. mod_foo.c almost invariable contains a global data structure of type module with the C name foo_module. This data structure is a table of function pointers and entry points. Once mod_foo.c has been compiled to the shared object mod_foo.so, Apache will dynamically load it and call the function pointers listed in the foo_module structure at the appropriate time (e.g., when a new request comes in).

C.3.2  Curing Apache Modules

Most Apache modules are of a relatively modest size and curing them is no great chore. However, some annotation work must be done. Since the cured module must interact with the non-cured Apache webserver, objects that are passed between them must not change size. As a result, WILD and other fat pointers cannot be introduced into such objects. Annotations must be added to convince CCured that the module can be made safe without such run-time checks.

Imagine that you are trying to cure mod_urlcount.c. Take out your favorite text editor and open up the file. Near the top you should find a configuration record structure. Each module defines a separate configuration record structure with a separate (non-exported) name. For example, mod_urlcount has:

typedef struct urlcount_config_rec {
    int         urlcount_default;
    CounterType urlcount_type;
    int         urlcount_auto_add;
    char       *urlcount_file;
} urlcount_config_rec;
Each module also contains functions that create, manipulate and merge such configuration structures. This is the mechanism through which Apache modules maintain global state. Each time Apache calls one of the function pointers exported by the module, it passes along a way to get to the appropriate configuration record. Since Apache does not know how the config structure will be defined, it uses void pointers to describe the type. CCured comes with a set of macros that instantiate those void *s on a per-module basis. Add the line:

NEW_MODULE_TYPE(urlcount, urlcount_config_rec)  // this is a macro
where the first parameter is the module suffix name and the second is the type name of the configuration record. This macro declares a type named module_urlcount. As mentioned earlier, each module exports a module structure (full of function pointers). We must redeclare the module to take advantage of the instantiated types. Change:

module urlcount_module; // full of "void *"s
to

module_urlcount urlcount_module; // uses "url_config_rec *", not "void *"
Now scroll down a bit and look for the word keyword void. Apache modules often feature unnecessary casts to void. For example, mod_headers contains this function:

static void *merge_headers_config(pool *p, void *basev, void *overridesv)
{
    headers_conf *a = (headers_conf *) ap_pcalloc(p, sizeof(headers_conf));
    headers_conf *base = (headers_conf *) basev, 
        *overrides = (headers_conf *) overridesv;
    a->headers = ap_append_arrays(p, base->headers, overrides->headers);
    return a;
}
Every void in this function really stands for headrs_conf (the mod_headers version of urlcount_config_rec). Change it so that the void types are no longer present:

static headers_conf *
    merge_headers_config(pool *p, headers_conf *basev, headers_conf *overridesv)
{
    headers_conf *a = (headers_conf *) ap_pcalloc(p, sizeof(headers_conf));
    headers_conf *base = (headers_conf *) basev, 
        *overrides = (headers_conf *) overridesv;
    a->headers = ap_append_arrays(p, base->headers, overrides->headers);
    return a;
}
Repeat this process with all configuration functions that contain void. Now search for ap_get_module_config. It is a macro that contains a (safe) cast to and from void * – it allows modules to extract their configuration record from the global server state. For example, mod_headers contains:

headers_conf *serverconf =
    (headers_conf *) ap_get_module_config(s->module_config, &headers_module);
Change this to:

    headers_conf * serverconf;
    { __NOBOXBLOCK
    serverconf = ap_get_module_config(s->module_config, &headers_module);
    } 
The __NOBOXBLOCK block keyword tells CCured to leave the block alone: we are asserting that it is already safe. Modify every instance of ap_get_module_config and ap_get_perdir_module_config the same way.

Now look for a datatype with the suffix entry. For example, mod_headers features:

typedef struct {
    hdr_actions action;
    char *header;
    char *value;
} header_entry;
This marks a use of Apache's polymorphic (via void *) array routines. Insert the following macro declaration to tell CCured about this array type:

NEW_TABLE_TYPE(header_entry, header_entry)      // macro
This macro declares a new type, array_header_FOO (where FOO is the first argument) that is a specialized version of the Apache-provided type array_header. Other data structure (for example, the configuration record structure) will contain array_headers. We change them to use this new datatype. Change all declarations like:

typedef struct {
    array_header *headers;
} headers_conf;
into:

typedef struct {
    array_header_header_entry *headers;
} headers_conf;
Now search for every call to ap_make_array, ap_append_arrays, ap_push_array and append FOO (in our running example, header_entry) to the name of each called function. For example, change:

new = (header_entry *) ap_push_array(dirconf->headers);
into:

new = (header_entry *) ap_push_array_header_entry(dirconf->headers);
Finally, surround all global table declarations with #pragmas that tell CCured to leave them alone (because Apache must read them). Often there are three such global tables per module. One is an array of struct const command_recs, one is an array of struct const handler_recs, and the last is a module. Surround them all with #pragmas as follows:

static const handler_rec mod_gzip_handlers[] =
{
    {"mod_gzip_handler", mod_gzip_handler},
    {CGI_MAGIC_TYPE,     mod_gzip_handler},
    {"cgi-script",       mod_gzip_handler},
    {"*",                mod_gzip_handler},
    {NULL}
};
becomes:

#pragma box(off)
static const handler_rec mod_gzip_handlers[] =
{
    {"mod_gzip_handler", mod_gzip_handler},
    {CGI_MAGIC_TYPE,     mod_gzip_handler},
    {"cgi-script",       mod_gzip_handler},
    {"*",                mod_gzip_handler},
    {NULL}
};
#pragma box(on)
Finally, change the declaration of the global module to use the new specialized type we created earlier. For example, change:

module MODULE_VAR_EXPORT urlcount_module = {
into:

module_urlcount MODULE_VAR_EXPORT urlcount_module = {
Voilą.

C.3.3  Linking Apache Modules

Suppose you have just finished making the source modifications to mod_foo.c. Now you want to test it on Apache. Use CCured to compile it to mod_foo.o. Make sure that there are no WILD pointers and that the sizes of types involved in the apache-module interface did not change. Now you must link it:
    $ gcc -shared -o mod_foo.so mod_foo.o
    $ cp mod_foo.so /path/to/apache/bin/
Now go to your Apache binary directory and edit httpd.conf. Go to the LoadModule section and add something appropriate according to the documentation for your module. For example, mod_usertrack can be configured by adding:
LoadModule usertrack_module bin/mod_usertrack.so
CookieTracking On
CookieExpires "1 days"
Now try to start Apache:
    $ ./apachectl stop
    $ ./apachectl start
    ./apachectl start: httpd started
If you see the “httpd started” message, it worked. If there were messages about undefined symbols, you probably have to write a few wrappers. For example, you might see:
    ./httpd: undefined symbol strdup_ff: mod_foo cannot be loaded
In this case you must write a wrapper for strdup that uses FSEQ pointers. Suppose you write it in wrapper_foo.c and compile that to wrapper_foo.o. Now go back to the linking step:
    $ gcc -shared -o mod_foo.so mod_foo.o wrapper_foo.o
    $ cp mod_foo.so /path/to/apache/bin/
And try to start Apache again. Eventually this process converges (you can skip ahead by using a utility like nm to list all of the undefined symbols in mod_foo.so if you like) and your Apache module will be up and running.

If for some reasons your Apache module crashes at run-time, consider using the underlying CIL –logcalls mechanism to track down the error (Apache modules do not treat well with normal debuggers). Make sure that the debugging comments are directed to syslog(3) rather than printf(3) or somesuch.

As daunting as it may seem, it actually takes less than 30 minutes to Cure an Apache module of average size and get it up and running. Some of that time is spend reading the module's documentation so that it can be loaded and tested correctly. Good luck!

Appendix D  Using the Regression Tester

The regression tester is a program that allows you to do two things: Since the running of the tests and the analysis of the output is separated you can easily do things like compare the results on multiple runs, extract various reports from a single output (like what tests have succeeded, which have failed, plus such information split by test groups). You can also extract some data from each test (such as the running time) and make simple reports.

Test cases can have comments associated with them (such as reminders of why it fails) and can be associated with zero or more test groups.

D.1  Running the regression

The regression tester is implemented in Perl as "testsafec.pl", which in turn contains simple wrappers for functions provided by the more generic RegTest.pm.

The regression tester uses relative paths so it must be run in the cil/test directory.

The basic command for running the tests is "testsafec –run". This runs all of the test cases, saving the log in the file "safec.log". Before creating this file, it renames previous versions of this file as "safec.log.<n>" where n is an integer starting from 1 to a maximum number that is configurable.

The following command line options are useful for running the tests (see "testsafec –help" for a complete list:
 --one <testname>       : runs only the named test
 --gory                 : shows lots of details about the execution of the
                          test, such as the commands executed
 --dryrun               : only pretends to run the test. Useful to see what
                          would be run
 --log                  : select the base name of the log file (default
                          "safec.log")
 --logversions <n>      : keep logs up to version <n>. Default is 5.
 --noremake             : runs the commands without trying to remake the safec
                          compiler before each test. Useful if you want to 
                          work on the compiler while the tests are running
 --safecdebug           : uses the DEBUG version of the safec compiler and
                          uses the C compiler in debug mode. By default it
                          used the RELEASE version and the optimizing compiler.



 --group <groupname>    : adds all the tests in the named group to the list of 
                          tests to be run or to participate in the analysis of
                          the log. (Right now we have groups: apache,
                          bad, cil, box, infer.) If no such option is
                          specified then all tests are selected. Multiple such
                          options can be given and are cumulative. 
 --nogroup <groupname>  : excludes the tests in the named group from running
                          or from the analysis. Multiple such options can be
                          given and are cumulative. These options are
                          processed after all --group options have 
                          been processed. 

 --listtests            : list the tests that are enabled along with their
                          group membership. This is useful to find out what
                          tests and groups exist.

 --stoponerror          : stop at the first error
 --showoutput           : show the output on the console. Normally output is
                          saved in a file.  

D.2  Analyzing the results

The basic command for analyzing log files is "testsafec". This will prompt the user to select one of the several log files that exist in the current directory and then (by default) it will print a list of the failed test cases, with a short (user provided) comment and the last error message detected in the output for that test case.

The following commands are useful during analysis:
 --log                  : select the log file (see above)
 --group, --nogroup     : select groups (see above)
 --listtests            : list tests and groups (see above)
 --param=<pnames>       : show a report about the successes, with the columns
                          being the named parameters (separated by ,). Run
                          "testsafec --help" to see what parameters are
                          available. Use --param=ALL to make a report with 
                          all available parameters.
 --sort=<pnames>        : sorts the report by the given parameters. 
Furthermore, the reports that are generated can be compared. To compare the results of two runs (whose logs are say “safec.log.1” and “safec.log.2”, run
 test/compare safec.log.2 safec.log.1 --group=slow
This will compare the results of safec.log.2 using as reference the results in log safec.log.1.

D.3  Configuring the regression

For this you have to edit testsafec.pl. You will see a large section in the middle of the file containing lines like:
\$TEST->add3Tests("test/array1");
(check out the definition of add3Tests at the bottom of the file). This adds three tests named "test/array1-cil", "test/array1-box" and test/array1-inferbox", each one containing one command that invokes "make test/array1 ...", where ... are appropriate parameters.

A second optional string parameter to add2Tests is something to be added to the command line.

A third optional array parameters is a list of patterns to be used in scanning the output of the test cases. This is an advanced feature and you are on your own.

To add just one test do (as in the body of add3Tests):
    $TEST->newTest(Name => "mytestname",
                   Dir => "..",
                   Cmd => "make something",
                   Enabled => 1,
                   Comm => "Print this along with the test name",
                   Group => ["cil", "othergroup"],
                   Patterns => \%mypatterns);
Sometimes you might want to add just a comment or to add one group to a certain test. Use the following simple functions:
 $TEST->addGroups("mytestname", "group1", "group2");
 $TEST->addComment("mytestname", "Another line of comment");
There are some wrappers defined at the end of testsafec.pl:
   $TEST->add3BadComment("test/scope3", "missing prototype");
(this one adds a comment and the group "bad" to all three test cases)
  $TEST->addBadComment("li-box", "bug in box.ml");
(the same but just for one test)
  $TEST->enable("li-box", 0);   (disable the li-box test case)
For more advanced customization, read the Perl code. It is fairly easy to understand, especially the testsafec.pl.

D.4  An alternative interface

A simpler (but somewhat less powerful) interface to add tests is provided by the smAddTest and smFailTest functions.

Suppose you want to add a test which runs “make sometarget someoptions”. Just say
  smAddTest("sometarget someoptions");
somewhere after smAddTest is defined.

If at some point this test stops working for somereason (just a human-readable one-line description of what's wrong), and you want the regression tester to expect failure, change it to read
  smFailTest("somereason", "sometarget someoptions");
There are also facilities for associating a diagnosis with a failure (for example, if failure of a particular test is often due to a known, fixable cause), and for running a shell script upon completion of a test. See the scott/tprintf and scott/ptrkinds tests for examples of each.

Finally, there is a script called regrtest in the base cil directory which runs testsafec.pl with a set of options which: To use this, just say “./regrtest” in the cil directory. It accepts the “-help” option.

D.5  The Automated Regression Tester

Every time you commit a change to the CIL or CCURED repositories you trigger two regression tests. The first is called the QUICK test and it takes about 5 minutes to complete and you should be receiving email with the results. If you don't then it must be that the automated regression tester is not running or it has encountered a problem that it cannot fix.

The automated regression tester is implemented as a script mk-reports.pl and lives in the home directory of user regtest on manju. To see whether the tester is running run ps -Af and look for a line perl mk-reports.pl -daemon. If you do not see any let me know.

Here are the operations that are performed by the tester:
  1. A new directory is created under /home/regtest/cil.nightly. The name of the directory encodes the time of the commit (e.g. 2001-12-07_17_50_-0800.dir is the directory for the commit at 17:50 on 07/12/2001 in the timezone that is 8 hours behind UTC).
  2. A complete copy of CIL and CCURED is checked out in that directory and make setup is run in there.
  3. Then in the test directory we run testsafec but only on the groups that are known to terminate quickly. This run takes about 3 minutes. A copy of the safec.log file generated is saved as /home/regtest/cil.nightly/2001-12-07_17_50_-0800.safec.quick.log.
  4. Then we run testsafec again to produce a report with all available parameters. This report is saved as /home/regtest/cil.nightly/2001-12-07_17_50_-0800.report.quick.txt. In the event that any of the previous steps fail this file will be created but with zero length.
  5. Then we compare this report both with the previous commit and with a reference report and a message is sent to the user who performed the commit. This report is also saved as /home/regtest/cil.nightly/2001-12-07_17_50_-0800.msg.quick.txt.
The automated regression tester should be running continuously and monitoring the commits every 60 seconds. However, between midnight and 6am it alternates between the QUICK run described above and a SLOW run that processes those tests that the QUICK run does not do. A SLOW run might take anywhere between 15 and 30 minutes. The SLOW report is generated using the same checked-out repository as the QUICK run. Results of the regression test, reports and messages for the SLOW run are saved in similar files as for the QUICK run except that the word “quick” is replaced by “slow” in the name of those files.

If the QUICK report is zero-length then the SLOW run is not performed. This is at the moment the only way to prevent the SLOW run from happening.

If both the QUICK and SLOW reports exist then the directory containing the distribution can be deleted.

The reference report is in /home/regtest/cil.nightly/reference.report.quick.txt. You can edit it manually to change the reference for the messages but please do so only to improve it.

You can start regtest manually (though this shouldn't be necessary): “sudo become-regtest” (only certain users can do this), then “perl mk-reports.pl”.

For each of your commit, a copy of the repository is checked out. This takes about 150Mb on manju and 30 minutes worth of regression testing. If you want to disable the SLOW run (because, for example, you made a silly mistake and are committing a fix right away) you have to manually delete the contents (not the file) of the corresponding report.quick.

Appendix E  Debugging support

Most of the time we debug our code using the Errormsg module along with the pretty printer. But if you want to use the Ocaml debugger here is an easy way to do it. Say that you want to debug the invocation of CCured that arises out of the following command:
ccured --separate -c hello.c 
You must follow the installation instructions to install the Elist support files for ocaml and to extend your .emacs appropriately. Then from within Emacs you do
ALT-X my-camldebug
This will ask you for the command to use for running the Ocaml debugger (initially the default will be “ocamldebug” or the last command you introduced). You use the following command:
ccured --ocamldebug -c hello.c 
This will run ccured as usual and invoke the Ocaml debugger when the cilly engine starts. The advantage of this way of invoking the debugger is that the directory search paths are set automatically and the right set or arguments is passed to the debugger.

For the make-based interface to our regression tests you can pass the argument OCAMLDEBUG=1 to make to achieve the same effect.

Appendix F  Experimental Features

To print the annotations that are needed for separate compilation, add –annout=foo.txt. Information about these annotations will be printed in that file.


This document was translated from LATEX by HEVEA.