Previous Up Next

Appendix A  Inference Results

The correspondence between the source code and the annotated graph can be found in the infer.c file. You may need to pass –emitinfer to ccured to see this file. For a base file named FOO.c, the graph will be shown in FOOinfer.c. This file represents the state of the tool after pointer kind inference but before run-time checks have been inserted and before names have been mangled. The infer.c file consists of two sections: the annotated source code and the inference graph.

A.0.1  Annotated Source

The annotated source code is just the original source code with the special syntax __NODE(n) associated with every pointer type in the program. There are two main places where the __NODE attribute can appear: immediately following a pointer-type constructor * or immediately following the name of a variable. In the former case the attribute specifies the node associated with the pointer-type while in the latter case the attribute specifies the node associated with the address-of the variable.

For example, the line:

  int * __NODE(3) ptr __NODE(4);
  int * __NODE(6) * __NODE(7) matrix __NODE(8);
indicates that node 3 in the graph is associated with the pointer variable ptr and node 4 is associated with the address of ptr. Node 7 is associated with the top-level pointer in the variable matrix, while node 6 is associated with the inner pointer (e.g., node 6 is associated with the type of the expression matrix[0]). __NODE(n) is a type qualifier (like const or restrict) that modifies the pointer type constructor just to its left.

Consider the following simple test case:

  int main() { 
    int * ptr;
    int base;
    ptr = &base;
    ptr ++; 
  } 
If we run it through CCured's inferencer we will see:

  int main() {
    int * __NODE(90) ptr __NODE(91);
    int base __NODE(92); 
    ptr = & base;
    ptr = ptr + 1;
  } 
Node 90 is associated with ptr. The additional nodes to the right are associated with the addresses of variables. Node 91 is associated with &ptr and node 92 is associated with &base. Since there is an assignment between ptr and &base, we expect to see an edge between nodes 90 and 92 in the graph. The graph itself can be found at the end of the C code near a line like:

  /* Now the solved graph (solver) */

A.0.2  Inference Graphs

In order to see the whole graph, you must pass the argument –emitGraphDetailLevel=3 to Ccured. The graph at the end of FILEinfer.c is represented as a printout of every node. Each node description has the following form:

  ID : Location (flags) (node this type points to)
   K=Kind/Reason  T=(base type of this node)
   S=(successor edges)
   P=(predecessor edges)
So the entry for node 90 (from the example in Section A.0.1) might look like:

  90 : Local(./simple.i.main.ptr).1 (stack,posarith,) ()
   K=FSEQ/from_flag T=int
   S=
   P=92:Cast
The node number is 90, the location field says that this node represents a local variable named ptr inside the main function of the file simple.i. The location of a node is a unique name for the pointer type. Locations use a hierarchical naming scheme is composed of a place followed by a period and an index within the place. The following places are used: Since each place can contain many occurrences of a pointer-type constructor (e.g. in a declaration int * * x) we use indices to differentiate between such occurrences. For a variable place (Glob, Static, Local), the index 0 is always associated with the node corresponding to the address of the variable. Then we traverse the type of the variable in depth-first order (for functions we start with the result and then continue with the arguments) and we assign indices starting at 1. For Field and Anon we do a similar traversal.

The flags field is a list of the following values:

Flag Meaning
stack this pointer may contain a stack address
escape this value may be assigned through a pointer and escape to the heap
upd a write may be performed through this pointer
posarith this pointer may be subject to positive pointer arithmetic
arith this pointer may be subject to arbitrary pointer arithmetic
null this pointer may become NULL
int an int may be cast into this pointer
noproto this pointer is associated with a function that is missing a prototype
interf this pointer is associated with a function that is part of an interface
sized this pointer may point into a sized array
reach_s this pointer may flow into a string
reach_q this pointer may flow into a SEQpointer
reach_i this pointer may flow into an INDEXpointer

In our example, node 90 has the posarith flag because it represents ptr and ptr++ appears in the code. Node 90 has the stack flag because &base is a stack address and the assignment ptr = &base occurs in the code.

Continuing our example, the text K=FSEQ/from_flag tells us that node the pointer type associated with 90 is FSEQbecause of one of the flags (in this case, the posarith flag). That is, the type inference has made node 90 a forward sequence pointer (one that carries its upper bound with it) because positive pointer arithmetic is performed on it (so we will need a run-time check to make sure that it stays in bounds).

The text T=int tells us the base type of node 90, that is, node 90 is associated with a pointer to an int.

The S= and P= lists show successors and predecessors of this node. In our example, the text P=92:Cast means that node 90 has a predecessor edge of type Cast from node 92 (the node for &base). That is, information from node 92 flows into node 90. This is the graph's way of representing the assignment statement ptr = &base. There are various sorts of edges in the graph:
Edge Name Meaning
Cast the value of pred may flow into succ
Compat pred and succ must have equal types
Safe if pred is WILD, succ must be WILDas well
  (usually links structures and their fields)
Null a NULL flows in the direction of the edge
Index if pred is INDEX, succ must be INDEXas well

Of these, Cast, Compat and Safe are the most common. Cast represents casts and assignments, Safe links up fields and structures so that they have consistent pointer kinds and Compat ensures equality between underlying pointers. As an example, in the code:

        int * __NODE(1) * __NODE(2) a;
        int * __NODE(3) * __NODE(4) b;
        a = b; 
We would see a Cast edge from 4 to 2 and a Compat edge between 1 and 3. The Compat edge ensures that nodes 1 and 3 will end up with the same pointer kind.

A.1  Type Names

CCured will also rename types that are expanded to contain extra information used by the run-time checks that ensure safety. For example, a SEQpointer needs to carry information about its bounds. This information is stored adjacent to the main pointer, but as a result a SEQpointer is larger than a normal C pointer. Thus a structure that contains a SEQpointer will be larger than an otherwise-identical structure that contains a normal C pointer (or a SAFEpointer). In addition, the offsets of any structure fields that come after the SEQpointer will be different. To mark these differences and indicate new types, CCured will change the names of types and structure. CCured does this by prepending strings to the structure or type name.
Pointer Kind Type Prefix
WILD wildp_
INDEX indexp_
SEQ seq_
SEQN seq_
FSEQ fseqp_
FSEQN fseqp_
SAFE No associated prefix
RWSTRING No associated prefix
ROSTRING No associated prefix

For example, given a program of the form:

  Foo * ptr;
  ptr++;
CCured will probably infer that ptr should be an FSEQpointer. The resulting cured code will have the following form:

  typedef struct {        /* type made by CCured */
    Foo * __FSEQ _p;      /* actual pointer */ 
    void * _e;            /* end of the allocated region: upper bound */
  } fseqp_Foo;            /* FSEQ pointer to an int */ 

  fseqp_Foo ptr;
  // code for "ptr++;"
If the original code were:

  Foo ** ptr;
  ptr++;                /* forces ptr to be FSEQ */
  ptr[0]--;             /* forces *ptr to be SEQ */ 
We would expect the final type of ptr in the cured code to be fseqp_seq_Foo, because ptr will have the type “(forward sequence pointer) to (sequence pointer) to Foo.” Note that fseqp_seq_Foo is not merely a name mangled form of Foo, since both the definition of Foo and the definition of fseqp_seq_Foo will exist in the final cured program.


Previous Up Next