Previous Up Next

Chapter 7  How to Eliminate WILD Pointers

As explained in the tutorial, you can use the WILD pointer types to do most of the things that you can do with pointers in C. And, in fact, CCured's inferencer will turn some of your pointers into WILD pointers if you use them in unusual ways.

WILD pointers are bad. Every time you access them you have to also access the tags. And what makes them really annoying is that they spread very quickly. Even a few bad casts in your program can lead to a contamination of 30% of the pointers with WILDness. And that means that you'll have to write lots of wrappers, and hard ones. (In fact, the support that we provide for writing wrappers does not work in all cases in the presence of WILD pointers.)

So, we recommend that you take a look at the warnings and messages that CCured gives and try to address the bad casts. In this chapter, we describe a few tricks that you can use to change the code, and a few features that CCured has to help you do that.

First, a few notes: When it notices bad casts, CCured will print something like this:
** 1: Bad cast at cdb_make.c:36 (char  *510 ->struct cdb_hplist  *1376)
** 2: Bad cast at pathexec_env.c:42 (char  *510 ->char */* __NODE(2537)  */ *2538)
** 3: Bad cast at pathexec_env.c:67 (char */* __NODE(2537)  */ *2538 ->char  *2553)
** 4: Bad cast at sig.c:12 (void (int  ) *2695 ->void () *2694)
** 5: Bad cast at sig_catch.c:9 (void () *673 ->void (int  ) *2711)
ptrkinds: Graph contains 4383 nodes
ptrkinds:   SAFE - 3142 ( 72%)
ptrkinds:   SEQ - 15 (  0%)
ptrkinds:   FSEQ - 127 (  3%)
ptrkinds:   WILD - 1099 ( 25%)
535 pointers are void*
5 bad casts of which 0 involved void* and 2 involved function pointers
1 (20%) of the bad casts are downcasts
0 incompatible equivalence classes
This means that there are 5 bad casts (which contaminate 25% of your pointers). There are no incompatible equivalence classes in this case.

You can either go directly at the line numbers in which the bad casts are reported, or you can use the browser (Section 5.1).

Bad cast number 4 and 5 in the example above are clear indications that there are some incomplete function types in your program. Go and add the argument types.

The other bad casts are due to an undeclared memory allocator. After we fix those we rerun and we get:
ptrkinds: Graph contains 4575 nodes
ptrkinds:   SAFE - 3324 ( 73%)
ptrkinds:   SEQ - 41 (  1%)
ptrkinds:   FSEQ - 150 (  3%)
ptrkinds:   WILD - 1060 ( 23%)
579 pointers are void*
0 bad casts of which 0 involved void* and 0 involved function pointers
No bad casts, so no downcasts
2 incompatible types flow into node void  *518
  Type char */* __NODE(2549)  */ *2550 at pathexec_env.c:67
  Type char  *102 at dns_transmit.c:63
2 incompatible equivalence classes
Notice that we have more pointers in the program. This is due to the allocator, which is now polymorphic and is duplicated several times. But we also have incompatible equivalence classes. This is because there is a void * pointer that is used with several incompatible types (in this case char * and char **). See Section 7.1 for more details on this.

7.1  Polymorphism

Polymorphism is the ability a program fragment to operate on data of different types. This is a useful thing to be able to do and since C does not have special support for it, each programmer implements polymorphism by extensive use of casting. But not all casts are equal. Consider for example a function that just returns its argument:

int identity_bad(int x) { return x; }
This function can be used with any type that fits in an integer, provided the appropriate casts from the type to int and back are inserted. But as we have already discussed in Section 9.4 this won't work in CCured because the pointers you get out are not usable.

A better way to do this is as follows:

void* identity(void* x) { return x; }
It is a common paradigm in C to use void* for a “pointer to I don't know what” type. CCured supports this view directly by considering each use of void * in the program as an occurrence of an unknown type. The CCured inferencer will try to find a replacement type that makes sense in that context. For example, in the following code fragment CCured will think of both occurrences of void * as actually being int * *.

void* identity(void* x) { return x; }

int main() {
    int * * p = 0;
    int * * res = identity(p);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

This model works for even very complicated code, such as the following fragment that defines a function apply which applies a function pointer to some arguments (see in the output that all pointers are inferred SAFE):

// Applies a function to an argument
void * apply(void* (*f)(void*), void *arg) {
   return f(arg);
}

// A simple dereference function
int * deref(int * * addr) {
    return *addr;
} 

int  main() {
     int * x = 0;
     int * res = apply(deref, & x);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In the above example there are four occurrences of void * in the definition of apply. Based on the actual usage of apply the first two are mapped to int * and the latter two are mapped to int * *.

This very flexible scheme breaks down when you have inconsistent usage of a void * type, such as in the following code:

void* identity(void* x) { return x; }

int main() {
    int * p = 0;
    int * * res_pp = identity(& p);
    int * res_p    = identity(p);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In the above code the identity function is used both with int * and int ** argument. Since CCured cannot find any single non-WILD type that is compatible with all contexts in which the void * is used, it is going to infer that the type of the void * argument is WILD. And since the argument is assigned to the result (implicitly due to the return statement) the result type is also WILD. (You can use the browser to see all the different incompatible types that “flow” into a void *). It seems that we need a way to tell CCured to treat the two invocations separately.

CCured has a crude but effective mechanism for doing just that. First, you have to tell CCured that a function is polymorphic:

#pragma ccuredpoly("identity")
(you can list multiple names in one ccuredpoly pragma. The pragma can appear anywhere in your program.).

If you tell CCured that a function is polymorphic it will take the following steps:
  1. For each call site of the function, CCured will create a copy of the function and it will assign it the name /*15*/identity, where the number 15 is a running counter to ensure that the names are different.
  2. Then it will perform the usual inference in which case each copy of the identity function is used only once.
  3. Finally, for each combination of pointer kinds in the various flavors of identity CCured will keep one copy and erase all the others.
Consider as an example the code from above, in which all pointers are now SAFE. The output contains calls to /*1*/identity and /*2*/identity but since they both have the same pointer kinds for the arguments and results, only the body of /*1*/identity is kept:

#pragma ccuredpoly("identity")
void* identity(void* x) { return x; }

int main() {
    int * p = 0;
    int * * res_pp = identity(& p);
    int * res_p    = identity(p);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If the copies of the polymorphic function do not all have the same pointer kind then multiple definitions are kept, as in the code below where we have both a SAFE and a WILD copy of the identity function:

#pragma ccuredpoly("identity")
void* identity(void* x) { return x; }

int main() {
    int * __WILD p = 0;
    int * * res_pp = identity(& p);
    int * res_p    = identity(p);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Polymorphic types
A similar mechanism is also available for types. You can add in the arguments of the ccuredpoly pragma strings like "struct list" to say that a copy of struct list must be created for each occurrence in the program. The inference will then find out which of the copies have to be compatible and at the very end will keep only one copy for each kind. Note however that this form of polymorphism does not have any run-time cost because only types are duplicated. It will however slow down the CCured type inference.

Note: If the polymorphism directives do not seem to take any effect, pass the -verbose to ccured to see how it parses them.

For example, here is how you would write polymorphic list length:

#pragma ccuredpoly("length", "struct list")
struct list {
   void *car;
   struct list *cdr;
};

int length(struct list *l) {
  for(int i = 0; l; i ++, l=l->cdr) ;
}

int main() {
    struct list list_of_int = { 5, 0 };
    struct list list_of_wild_ptr = { (int * __WILD)5, 0 };
    struct list wild_list = { 5 , (struct list * __WILD)0 };

    int l1 = length(& list_of_int);
    int l2 = length(& list_of_wild_ptr);
    int l3 = length(& wild_list);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

You can see in the browser information that the references to struct list have been replaced with separate names such as struct /*45*/list.

In the case of recursive structures (whose name is refered directly or indirectly in the name of the fields), the fields use the same version of the structure as the structure itself.

CCured has polymorphism for types and for functions because those are the entities that can be copied legally in C. There is no similar polymorphism for data variables, nor should there be..

If you have a type name for a polymorphic structure, then CCured will replace all occurrences of the type name with a reference to the structure itself, meaning that each use of the type name gets its own independent copy.

7.2  User-defined memory allocators

If your program has a user-defined memory allocator that is used to allocate data of different types then its return type will be WILD and so will be all of the pointers you store with the allocated area. Declaring such a function to be polymorphic will likely not help because the function is probably using a global data structure (the allocation buffer) that is shared by all polymorphic copies of the function.

CCured allows you to declare a function to be a user-defined memory allocator using one of the following pragmas:

#pragma ccuredalloc("myfunc", <zerospec>, <sizespec>)
<zerospec> ::= zero | nozero
<sizespec> ::= sizein(k) | sizemul(k1, k2)
The zero argument means that the allocator zeroes the allocated area. Otherwise CCured will zero it itself, if it contains pointers. The sizein(k) argument means that the allocator is being passed the size (in bytes) of the area to be allocated in argument number k (counting starts at 1). The sizemul(k1, k2) argument means that the allocator allocates a number of bytes equal to the product of the arguments number k1 and k2.

For example the following are the pragmas for the standard library allocators malloc and calloc:

void* malloc(unsigned int size);
#pragma ccuredalloc("malloc", nozero, sizein(1))
void* calloc(unsigned int nr_elems, unsigned int size);
#pragma ccuredalloc("calloc", zero, sizemul(1, 2))
A memory allocator should have return type void *. In the pre-ANSI C days allocators were written with the type char *. Once you declare a function to be allocator, its return type will be changed to unsigned long. At all call sites CCured will examine what kind of data is being allocated and will construct the metadata for it.

Note that declaring a function an allocator has the effect of also making it polymorphic. This means that CCured will create as many copies of your allocators as you have allocation sites. (After curing only copies with distinct calling convention will be kept, however.)

Note that when you declare a custom-memory allocator as such, CCured will trust that you are not going to re-use the memory area that you return. This means that you can use this feature to write unsafe programs in CCured. The following program will succeed in trying to dereference the address 5!

#pragma curealloc("myalloc", sizein(1), zero)
int data[8];
void* myalloc(int sz) {
  return data;
}
int main() {
 int ** p = (int **)myalloc(8);
 data[1] = 5; 
 return *p[1]; // Will dereference 5 !!!
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Most often the custom-memory allocators are just wrappers around the system malloc. In that case there is no danger of unsoundness.

Note also that CCured relies on the fact that the result of the custom-memory allocators is assigned to a variable of the right type. It is from the type of the destination of the allocator, or from the type cast with which the allocators is used, that CCured knows what kind of metadata to create.

7.3  Pointers with Run-Time Type Information

There are many C programs in which void * pointers are used non-parametrically. An example is a global variable (of type void *) that is used to store values of different types at different times. Consider for example the following code, where CCured is forced to infer that the g pointer has kind WILD because the struct foo and struct bar are incompatible:

struct foo { 
  int f1;
} gfoo;

struct bar {
  int * f1;
  int f2;
} gbar;

void * g;

int main() {
  int acc = 0;
  g = (void *)&gfoo; 
  acc += ((struct foo *)g)->f1;
  g = (void *)&gbar; 
  acc += ((struct bar *)g)->f2;
  return acc;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In this example g is used polymorphically but not in a way that could be handled through our support of polymorphism. (This form of polymorphism is called non-parametric polymorphism.) CCured will consider the casts on g as bad and will mark those pointers WILD.

CCured contains special support for handling such cases, by tagging the polymorphic values with information about their actual type. To enable this behavior you must use the RTTI pointer kind qualifier on the polymorphic pointer. Consider again the example from before but with a RTTI annotation:

struct foo { 
  int f1;
} gfoo;

struct bar {
  int * f1;
  int f2;
} gbar;

void * __RTTI g;

int main() {
  int acc = 0;
  g = (void *)&gfoo; 
  acc += ((struct foo *)g)->f1;
  g = (void *)&gbar; 
  acc += ((struct bar *)g)->f2;
  return acc;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If you use the browser, you will see that there are no more bad casts and no WILD pointers in this example. If you also look at the CCured output for the above example you will see that instead the g variable is now represented using two words, one to store its value and another to store the actual type of the pointer it contains. This type is created when g is assigned to and is checked when g is used.

CCured can work with run-time type information only for certain pointer types. We call such types as extensible and for each type we also construct a name. Specifically, the extensible types are: RTTI pointers can be created on by casting from a scalar or a SAFE pointer to an extensible type and can be cast only to scalars and a SAFE pointer to an extensible type. In the example above, struct boo and struct bar are extensible pointers and we can cast pointers to these structs to void * RTTI and back.

CCured also supports the RTTI pointer kind on pointers whose base type is different from void. Consider the following example:

struct foo {
   int *f1;
   int  f2;    
} gfoo;

struct bar {
   int *f3;
   int  f4;
   int  f5;
} gbar;

#pragma ccured_extends("Sbar", "Sfoo")

struct foo * __RTTI g;

int main() {
  int acc = 0;
  g = (struct foo *)&gfoo; 
  acc += g->f2;
  g = (struct foo *)&gbar; 
  acc += g->f2;
  acc += ((struct bar *)g)->f5;
  gfoo.f1 ++; // To make foo.f1 and bar.f3 both FSEQ pointers
  return acc;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Notice that the RTTI pointer kind is used with the base kind struct foo. An RTTI pointer is strictly more powerful than a SAFE pointer of the same base type. This means that g in the code above can be used to access the field f1 and f2 without any overhead. This is because CCured enforces the requirement that an RTTI pointer of base type T contains only pointer values whose base type extends T. The extension relationship is a subset of the physical subtyping relationship: we say that type T extends type Q if: The ccured_extends pragmas use extensible type names to declare a extension hierarchy (similar to a single-inheritance class hierarchy) in which void is the top. Note that only extensible types can appear in the hierarchy and an extensible type can appear at most once on the left-side of a ccured_extends pragma. An RTTI pointer can contain values that are pointers to some extensible base type that extends that of the RTTI pointer itself.

The RTTI pointer kind can be applied only to base types that are either void or non-leaf in the extension hierarchy.

For example, in the following code
struct foo { int x; }
struct bar { int y; int z; }
typedef int MY_INT __NOUNROLL;
#pragma ccured_extends("Sbar", "Sfoo")
#pragma ccured_extends("Sfoo", "TMY_INT")
we can use the RTTI pointer kind for struct foo * and MY_INT * but not for struct bar. Notice that in all declared extension relationships physical subtyping is respected.

The inferencer will spread the RTTI pointer kind backwards through assignments but only on pointers that can be RTTI. If you want to cut short the propagation of the RTTI pointer king you can use the SAFE pointer kind.

To summarize, RTTI pointers can be used with the following constraints: Interestingly enough the RTTI pointer kind can be used to implement in a type-safe way virtual method dispatch, as shown in the example below:

typedef struct parent {
  void * __RTTI * vtbl; // virtual table, with various types of functions
  int  *f1;             // some field
} Parent;

#pragma ccured_extends("Schild", "Sparent")

typedef struct child {
  void * __RTTI * vtbl;
  int  *f2;
  int   f3;
} Child;

// virtual method foo for class P
// notice that the self parameter is an RTTI. It must 
// be of base type void to ensure that foo_P and foo_C have the 
// same type
int* foo_P(void * __RTTI self_rtti, Parent *x) {
  Parent * self = (Parent *)self_rtti; // downcast
  return self->f1;
}

// virtual method bar for class P
int * bar_P(void * __RTTI self_rtti) {
  Parent * self = (Parent *)self_rtti;
  return self->f1;
}

int* foo_C(void * __RTTI self_rtti, Parent *x) {
  Child * self = (Child *)self_rtti;
  return self->f2 + self->f3;
}

// Name the types of the virtual methods, to make them extensible
typedef int * FOO_METHOD(void *, Parent *) __NOUNROLL;
typedef int * BAR_METHOD(void *) __NOUNROLL;

// Now the virtual tables
void * vtbl_P[] = { (void*) (FOO_METHOD *)foo_P,
                    (void*) (BAR_METHOD *)bar_P };


// child inherits bar_P
void * vtbl_C[] = { (void*) (FOO_METHOD *)foo_C,
                    (void*) (BAR_METHOD *)bar_P };


int array[8];

// Now the constructors
void ctor_P(Parent * p) {  p->vtbl = vtbl_P; p->f1 = array; }

void ctor_C(Child * c) {  c->vtbl = vtbl_C;  c->f2 = array;  c->f3 = 5; }

int main() {
  Parent p;
  Child c;
  Parent * pp = &p, * pc = &c;
  Child  * pc1;
      
  // Construct
  ctor_P(&p); ctor_C(&c);

  // Now try a downcast
  pc1 = (Child * __RTTI)pc;
  // Now invoke some virtual methods
  {
    FOO_METHOD *pfoo = (FOO_METHOD *) pp->vtbl[0];
    pfoo((void *)pp, pc);
    pfoo = (FOO_METHOD *) pc->vtbl[0];
    pfoo1((void *)pc, pp);  
   }
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Notice the use of the __NOUNROLL typedefs for the function types.

7.3.1  Implementation Details

CCured collects all extensible types in your program (either those declared using the ccured_extends pragma or those that are used in casts to and from RTTI pointers) and constructs the extension hierarchy. An encoding of this hierarchy is dumped in the resulting code in the array RTTI_ARRAY. Each entry in the array corresponds to an extensible type and it contains the difference between the entry corresponding to the parent of the extensible entry and the index of th current entry. The root of the extension hierarchy is always at index 0 and that entry contains 0. The function CHECK_RTTICAST is used to walk this encoding to verify a cast from a RTTI pointer into a SAFE pointer or another RTTI pointer.

7.4  Specifying Trusted Code

In this section we describe a few mechanisms that you can use to override CCured's reasoning. These are powerful mechanisms but you can use them to write unsafe code.

7.4.1  Trusted casts

Occasionally there are casts in your program that are judged as bad, yet you know that they are sound and it is too inconvenient to change the program to expose the soundness to CCured. In that case, you can use the __trusted_cast built-in function. In the following example we know that the boxedint type can encode an integer (if odd) or a pointer to a boxedint if even. We could use RTTI pointers to encode this safely in CCured. Or, we can use a trusted cast:

typedef int boxedint; // If even, then a pointer to a boxedint
int unroll(boxedint x) {
  if(x & 1) return x;
  return unroll(* (int*)__trusted_cast(x));
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

CCured will not complain if the argument and result type of __trusted_cast are incompatible. However, it will ensure the following: For example, in the following example, the variable q and the field f1 in struct foo are made FSEQ. The FSEQ constraint propagates back through __trusted_cast to p.

struct foo {
   int   * f1;
   int     f2;
};
struct bar {
   int   * f1; // This is FSEQ !
   int   * f2;
};
int main(struct bar * p) {
    struct foo * q = __trusted_cast(p);
    p->f1 ++;        // Make foo.f1 FSEQ
    return q[1].f2; // Make q FSEQ
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

If you look carefully at the above examples you will see one of the potential dangers of using __trusted_cast: you are on your own to ensure that the argument type and the result type match. In the above example, this is not true because the field f1 in struct bar is SAFE while the field f1 in struct foo is FSEQ!

If you want to prevent a pointer arithmetic operation from generating sequence pointers, you can use the __trusted_add function:

int foo(int *p) {
    int * q = __trusted_add(p, 4);
    return *q;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

You can use a __trusted_cast to cast an integer into a pointer. This works as expected if the type of the resulting pointer is SAFE (as in the example with boxedint earlier in this section). But if it is FSEQ or SEQ then you will get exactly the same effect as if the __trusted_cast was not there: you will obtain a pointer with null metadata and thus unusable for memory dereference.

A better way to cast an integer (or a SAFE pointer into a SEQ or FSEQ one) is to use the __mkptr built-in function. This function takes as a second argument some other pointer whose metadata is used in constructing the result:

int g[8];
int main() {
  int * __SAFE pg = & g[2];
  int * __SEQ sg = __mkptr(pg, g); // We know that the home area of pg and g
                                   // are the same
  int pg1 = (int) & g[3];
  int * __SEQ sg1 = __mkptr(pg1, g);
  return sg[1] + sg1[1];
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Another useful built-in function is __mkptr_size. It allows you to specify the size of the home area in which a pointer lives:

int g[8];
int main() {
  int * __SAFE pg = & g[2];
  // We know that there is are at least 2 more integers after pg
  int * __SEQ sg = __mkptr_size(pg, 2 * sizeof(int)); 
                                   // are the same
  return sg[1];
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

There are other built-in functions that you can use to achieve various things behind CCured's back. Those are mostly intended for use in wrappers for the library functions (which you have to trust anyway). These are described in Chapter 8 and declared in ccured.h.

7.4.2  Turning off curing

You can turn the curing off for a fragment of a source file, for a function, or for a block statement.

You can use the cure pragma to turn curing off for a fragment of a source file (in CCured pragmas can only appear at global scope and therefore you cannot use this mechanism to turn curing off for part of the definition of a global function):

int * g; // This is a pointer to several integers
         // but we do not want to make it SEQ
#pragma ccured(off)
int foo() {
   return g[2]; // CCured won't see this and will leave g SAFE
                // But also CCured won't check this code
}
#pragma ccured(on)
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Alternatively, you can add the nocure attribute to a function to tell CCured to not cure this function:

int * g; // This is a pointer to several integers
         // but we do not want to make it SEQ

// We must put the attribute in a prototype
int foo(void) __NOCURE;
int foo(void) {
   return g[2]; // CCured won't see this and will leave g SAFE
                // But also CCured won't check this code
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

At a finer-grained level, you can use the __NOCUREBLOCK attribute with a block statement:

int * g; // This is a pointer to several integers
         // but we do not want to make it SEQ

int foo(void) { 
   int res;
   { __NOCUREBLOCK
     res = g[2]; // CCured won't see this and will leave g SAFE
   }
   return res; 
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

In all of these cases, the CCured inferencer does not even look at the non-cured portions of the code. However, CCured will at least change the non-cured code to access the fat pointers properly. For example, in the following example the global g is a sequence pointer. While CCured will not complain about the unsafe cast to int **, it will make sure that at least the proper component of g is used:

int * g; // This will be FSEQ

int ** foo(void) { 
   int res = g[2]; // Make g FSEQ
   { __NOCUREBLOCK
     return (int **)g; // But not WILD
   }
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment

Finally, to avoid curing a whole source file (say trusted_foo.c), you can use the –leavealone=trusted argument to CCured. All source files whose names start with the given “leave alone” prefix, are not merged and are not scanned by CCured at all. Instead they are compiled with gcc and linked in the final executable.


Previous Up Next