The correspondence between the source code and the annotated graph can
be found in the infer.c file. You may need to pass –emitinfer to
ccured to see this file. For a base file named FOO.c, the graph
will be shown in FOOinfer.c. This file represents the state of the tool
after pointer kind inference but before run-time checks have been inserted
and before names have been mangled. The infer.c file consists of two
sections: the annotated source code and the inference graph.
A.0.1 Annotated Source
The annotated source code is just the original source code with the special
syntax __NODE(n) associated with every pointer type in the program.
There are two main places where the __NODE attribute can appear:
immediately following a pointer-type constructor * or immediately
following the name of a variable. In the former case the attribute specifies
the node associated with the pointer-type while in the latter
case the attribute specifies the node associated with the address-of the
variable.
For example, the line:
int * __NODE(3) ptr __NODE(4);
int * __NODE(6) * __NODE(7) matrix __NODE(8);
indicates that node 3 in the graph is associated with the pointer variable
ptr and node 4 is associated with the address of ptr. Node 7 is
associated with the top-level pointer in the variable matrix, while node 6
is associated with the inner pointer (e.g., node 6 is associated with the type
of the expression matrix[0]). __NODE(n) is a type qualifier (like
const or restrict) that modifies the pointer type constructor just to
its left.
Consider the following simple test case:
int main() {
int * ptr;
int base;
ptr = &base;
ptr ++;
}
If we run it through CCured's inferencer we will see:
int main() {
int * __NODE(90) ptr __NODE(91);
int base __NODE(92);
ptr = & base;
ptr = ptr + 1;
}
Node 90 is associated with ptr. The additional nodes to the right are
associated with the addresses of variables. Node 91 is associated with
&ptr and node 92 is associated with &base. Since there is an
assignment between ptr and &base, we expect to see an edge between
nodes 90 and 92 in the graph. The graph itself can be found at the end of
the C code near a line like:
/* Now the solved graph (solver) */
A.0.2 Inference Graphs
In order to see the whole graph, you must pass the argument
–emitGraphDetailLevel=3 to Ccured. The graph at the end of
FILEinfer.c is represented as a printout of every node. Each node
description has the following form:
ID : Location (flags) (node this type points to)
K=Kind/Reason T=(base type of this node)
S=(successor edges)
P=(predecessor edges)
So the entry for node 90 (from the example in Section A.0.1) might
look like:
90 : Local(./simple.i.main.ptr).1 (stack,posarith,) ()
K=FSEQ/from_flag T=int
S=
P=92:Cast
The node number is 90, the location field says that this node represents a
local variable named ptr inside the main function of the file
simple.i. The location of a node is a unique name for the pointer type.
Locations use a hierarchical naming scheme is composed of a place
followed by a period and an index within the place. The following places
are used:
-
Glob(var) - this is associated with the global variable var
- Static(file.var) - this is associated with the file-scope variable
var from file file.
- Local(file.func.var) - this is associated with the local variable
var occurring in function func from file file.
- Field(fieldname) - this is associated with a field in a structured
type.
- Anon(nr) - this is associated with an occurrence of a pointer-type
constructor in a cast or in the sizeof expression.
Since each place can contain many occurrences of a pointer-type constructor
(e.g. in a declaration int * * x) we use indices to differentiate between
such occurrences. For a variable place (Glob, Static, Local), the
index 0 is always associated with the node corresponding to the address of the
variable. Then we traverse the type of the variable in depth-first order (for
functions we start with the result and then continue with the arguments) and
we assign indices starting at 1. For Field and Anon we do a similar
traversal.
The flags field is a list of the following values:
Flag |
Meaning |
stack |
this pointer may contain a stack address |
escape |
this value may be assigned through a pointer
and escape to the heap |
upd |
a write may be performed through this pointer |
posarith |
this pointer may be subject to positive
pointer arithmetic |
arith |
this pointer may be subject to arbitrary
pointer arithmetic |
null |
this pointer may become NULL |
int |
an int may be cast into this pointer |
noproto |
this pointer is associated with a function
that is missing a prototype |
interf |
this pointer is associated with a function
that is part of an interface |
sized |
this pointer may point into a sized array |
reach_s |
this pointer may flow into a string |
reach_q |
this pointer may flow into a SEQpointer |
reach_i |
this pointer may flow into an INDEXpointer |
In our example, node 90 has the posarith flag because it represents
ptr and ptr++ appears in the code. Node 90 has the stack flag
because &base is a stack address and the assignment ptr = &base
occurs in the code.
Continuing our example, the text K=FSEQ/from_flag tells us that node
the pointer type associated with 90 is FSEQbecause of one of the flags
(in this case, the posarith flag). That is, the type inference has made
node 90 a forward sequence pointer (one that carries its upper bound with
it) because positive pointer arithmetic is performed on it (so we will need
a run-time check to make sure that it stays in bounds).
The text T=int tells us the base type of node 90, that is, node 90 is
associated with a pointer to an int.
The S= and P= lists show successors and predecessors of this node.
In our example, the text P=92:Cast means that node 90 has a predecessor
edge of type Cast from node 92 (the node for &base). That is,
information from node 92 flows into node 90. This is the graph's way of
representing the assignment statement ptr = &base. There are various
sorts of edges in the graph:
Edge Name |
Meaning |
Cast |
the value of pred may flow into succ |
Compat |
pred and succ must have equal types |
Safe |
if pred is WILD, succ must be WILDas well |
|
(usually links structures and their fields) |
Null |
a NULL flows in the direction of the edge |
Index |
if pred is INDEX, succ must be INDEXas well |
Of these, Cast, Compat and Safe are the most common. Cast
represents casts and assignments, Safe links up fields and structures
so that they have consistent pointer kinds and Compat ensures equality
between underlying pointers. As an example, in the code:
int * __NODE(1) * __NODE(2) a;
int * __NODE(3) * __NODE(4) b;
a = b;
We would see a Cast edge from 4 to 2 and a Compat edge between 1
and 3. The Compat edge ensures that nodes 1 and 3 will end up with the
same pointer kind.
A.1 Type Names
CCured will also rename types that are expanded to contain extra
information used by the run-time checks that ensure safety. For example, a
SEQpointer needs to carry information about its bounds. This information
is stored adjacent to the main pointer, but as a result a SEQpointer is
larger than a normal C pointer. Thus a structure that contains a SEQpointer will be larger than an otherwise-identical structure that contains
a normal C pointer (or a SAFEpointer). In addition, the offsets of any
structure fields that come after the SEQpointer will be different. To
mark these differences and indicate new types, CCured will change the names
of types and structure. CCured does this by prepending strings to the
structure or type name.
Pointer Kind |
Type Prefix |
WILD |
wildp_ |
INDEX |
indexp_ |
SEQ |
seq_ |
SEQN |
seq_ |
FSEQ |
fseqp_ |
FSEQN |
fseqp_ |
SAFE |
No associated prefix |
RWSTRING |
No associated prefix |
ROSTRING |
No associated prefix |
For example, given a program of the form:
Foo * ptr;
ptr++;
CCured will probably infer that ptr should be an FSEQpointer. The
resulting cured code will have the following form:
typedef struct { /* type made by CCured */
Foo * __FSEQ _p; /* actual pointer */
void * _e; /* end of the allocated region: upper bound */
} fseqp_Foo; /* FSEQ pointer to an int */
fseqp_Foo ptr;
// code for "ptr++;"
If the original code were:
Foo ** ptr;
ptr++; /* forces ptr to be FSEQ */
ptr[0]--; /* forces *ptr to be SEQ */
We would expect the final type of ptr in the cured code to be
fseqp_seq_Foo, because ptr will have the type “(forward sequence
pointer) to (sequence pointer) to Foo.” Note that fseqp_seq_Foo is
not merely a name mangled form of Foo, since both the definition of
Foo and the definition of fseqp_seq_Foo will exist in the final
cured program.