CCured is an extension of the C programming language that distinguishes among
various kinds of pointers depending on their usage. The purpose of this
distinction is to be able to prevent improper usage of pointers and thus to
guarantee that your programs do not access memory areas they shouldn't access.
You can continue to write C programs but CCured will change them slightly so
that they are type safe. In this chapter we explain in what situations will
your program be changed and in which way.
CCured leaves unchanged code that does not use pointers or arrays. Actually,
CCured is implemented on top of the C Intermediate Language (CIL)
infrastructure, which means that C programs are first translated into a subset
of the C language that has simple semantic rules. The following are some of
the transformations that are performed:
-
C expressions and statements are separated into expressions (no
side-effect and no control-flow), instructions (assignments and function
calls, with one side-effect and no control-flow) and statements (the
control-flow constructs). This means that CIL serializes the side-effects and
the control-flow.
- All type and structure declarations are moved to the beginning of the
file.
- The scope of variables is resolved and local variables are renamed
accordingly. All local variables are moved to function-scope and are declared
at the beginning of the function body. The initialization for such variables
is done using explicit assignment instructions.
- Anonymous structures and unions are given unique names.
- All implicit casts and conversion are expressed as explicit casts.
- All GNU CC extensions are compiled into regular C code.
For a complete description of the CIL infrastructure see
the CIL documentation.
3.1 CCured Attributes
The most significant difference between C and CCured is that CCured pays
close attention to how pointers are manipulated and it classifies pointers
into various kinds according to what you do with them. We'll discuss the
various kinds starting in the next section but before that we need to
introduce an important notation that you can use to communicate to CCured
which pointer kinds you want for your pointers. The same notation is then used
by CCured to explain in the transformed program what pointer kind if inferred
for each pointer.
CCured uses type attributes to express the kind of pointers. Type attributes
exist in a limited form in ANSI C (i.e. the volatile, const and
restrict type qualifiers) and in a richer form in the GCC dialect of C.
CCured, just like GCC, allows any attributes to be specified for types, names
of variables, functions or fields, and for structure or union declarations.
Unlike GCC, CCured has precise rules for how attributes are interpreted in a
declaration (instead GCC relies on knowing the semantics of the attribute in
order to associate it with the proper element of a declaration). The rule of
thumb is that the attribute of a pointer type is written immediately following the * pointer-type constructor and the attribute of a name is
written immediately before the semicolon or the = sign that terminates the
declaration of the name. CCured uses pointer-kinds such as SAFE,
SEQ and WILD and the corresponding attribute are formed by adding two
leading underscores. For example, in the following declaration:
int * __WILD * __SEQ x __SAFE;
the type of x is declared to be a SEQuence pointer to a WILD
pointer (just like pointer-types in C, attributes are read from
right-to-left). The __SAFE attribute in this case applies to the name
x, which in the context of CCured means that whenever we take the address
of the variable x we are going to obtain a SAFE pointer. The type of
such a pointer would be int * __WILD * __SEQ * __SAFE (read as a
SAFE pointer to a SEQ pointer to a WILD pointer to an integer.).
(The complete attribute-parsing rules for CIL are described in the
CIL manual.)
CCured is designed to work on regular C programs (i.e. without pointer-kind
attributes). One of the main features of CCured is that it will analyze your
pointer usage and will find for all pointers in your program what is the best
pointer-kind that can be ascribed to that pointer. However, you can also place
pointer-kind annotations and force CCured to use certain pointer kinds.
3.2 SAFE pointers
The main action in CCured concerns pointers and arrays. Pointers in C can be
assigned to l-values, dereferenced, subject to pointer arithmetic and cast to
other pointer or non-pointer types. In contrast, pointers in a typical
type-safe language (e.g. Java, Basic, ML) cannot be subject to arithmetic or
(arbitrary) casts. CCured allows all the pointer operations that C allows but
gives preferential treatment to pointers that are not subject to arithmetic or
to casts. CCured refers to such pointers as SAFE pointers.
Consider for example this small code-fragment that computes the length of a
linked list:
struct list {
void * car;
struct list * cdr;
};
int length(struct list * l) {
int i = 0;
while(l) {
l = l->cdr;
i ++;
}
return i;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
The only pointers used in this code fragment are pointers to list cells and
they are not subject to arithmetic or to casts. In fact, this code fragment
can be transcribed literally into Java or C#. You can see in the cured code
that CCured has inferred that these pointers are indeed SAFE.
Properties of SAFE pointers
The SAFE pointers are the best kind of pointers, meaning that they incur
the least amount of run-time cost. Here is a list of the properties of
SAFE pointers:
-
Cannot be subject to pointer arithmetic (adding or subtracting an
integer from it).
- Cannot be cast (except subject to stringent rules which we'll discuss
below). Note that assignment, passing actual arguments and returning are
implicit casts.
- Can be set to a compile-time constant equal to 0 but not to any other
integer expression.
- Can be cast to an integer and can be subtracted from another pointer.
This is useful for comparisons.
- SAFE pointers are represented using the standard C representation
on one word.
- Every time a SAFE pointer is dereferenced, a null check is inserted
before the dereference.
All of these restrictions are such that the following invariant holds for all
SAFE pointers:
A SAFE pointer to type T is either 0 or else it points to a valid area
of memory containing an object of type T. Furthermore, all other pointers
to the same area are also SAFE and agree on the type T of the stored
object.
3.2.1 Safe Casts
Casting a pointer to an integer is always allowed. CCured does actually allow
certain other casts on SAFE pointers. For example it is safe to cast a
pointer to a structure containing two integers into a pointer to integers. In
general it is safe to cast a pointer to a long structure into a pointer to a
short structure as long as the two structures agree on the types of the
elements in the overlapping portion. CCured is actually quite liberal about
these rules and will think of nested combinations of structures and arrays as
one big structure with non-structure and non-array fields. This feature is
called physical subtyping. For example, in the code shown below, all of
the four casts implicit in the assignments are safe and CCured will infer that
all pointers involved are SAFE.
struct large {
struct small {
int * f1;
int * f2;
} a;
int * f3;
} x;
struct small * s1 = & x;
int * * s2 = & x;
struct { int *a1, *a2, *a3; } * s3 = & x;
struct { int *a1, *a2[2]; } * s4 = & x;
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
Notice that all of the s1, s2, s3 and s4 are aliases for the
address of x but they agree on the type of the object pointed to.
Following are two examples of casts that are not allowed (for SAFE
pointers; you can see that CCured infers the WILD kind for the pointers
involved):
int y1;
int * * x1 = & y1; // Cast an int * to a int * *
int y2;
struct { int * a1, a2; } * x2 = &y2;
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
If the first cast were allowed then by writing to y1 an arbitrary
integer we would be invalidating the assumption that x1 points to a
pointer value. The second cast is similar.
3.2.2 Union types
Further complications arise in the case of union types. A pointer to union
type can be SAFE if it obeys all of the restrictions mentioned above and
also for all two fields of the union type, they agree on the types of the
elements in the overlap. For example, in the code below the type of x can
be a SAFE pointer.
union {
int *f1;
int *f2[2];
struct { int *a1, *a2, *a3; } f3;
} * x;
int* foo() { return x->f1; } //use x so it is analyzed.
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
But in the following code the type of x cannot possibly be SAFE
because the type of the field f3.a2 does not match the type of the
overlapping field
f2[1] and thus x->f3.a2 could be used to write an arbitrary integer
that can later be interpreted as a pointer using the expression x->f2[1].
union {
int *f1;
int *f2[2];
struct { int *a1, a2, *a3; } f3;
} * x;
int* foo() { return x->f1; } //use x so it is analyzed.
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
If your program uses a union with incompatible fields you can still obtain
SAFE pointers if you rewrite the union to be a struct. This will
waste some space but in some cases (in which the program requires the type to
have a given size) it might break the semantics of the program.
Or, you can use tagged unions (Section 9.7) in which CCured will
insert run-time checks to ensure that you are not trying to read a pointer for
a field of a union when you wrote a scalar (or an incompatible pointer) using
another field.
3.2.3 SAFE Function Pointers
There is nothing special about function pointers. They can be safe provided
that they are not cast to incompatible pointer types. A function pointer type
is compatible only with another function pointer type with the same number and
type of arguments and the same result type.
A common problem with function pointers (and functions) in CCured is if your
program uses external function without prototypes. This makes CCured think
that the function is taking no arguments and returning an integer and every
time you use it in a different way CCured behaves as if you are casting the
function pointer (denoted implicitly by the function's name) to the type
needed in the cast. CCured will print warnings about using functions without
prototype and we recommend that you fix those problems and try CCured again.
For a discussion of what happens when you do not use your function pointer in
a clean way you should read to the end of this tutorial chapter and then read
Section 9.1.
3.3 Checks for SAFE Pointers
As we mentioned above, every time a SAFE pointer is dereferenced it must be
checked whether it is null or not. We know from the invariant for SAFE
pointers that non-null pointers can be dereferenced and we can count that the
value read through them has the type given by the pointer type.
A null check appears in the output of CCured as a call to the function
CHECK_NULL. This and other run-time checking functions used by CCured
have a name that starts with the prefix CHECK_ and are declared in the
file ccuredcheck.h. You will see in that file that most of these
functions are declared inline.
Checking for null pointers is necessary not just when reading or writing
through them but also when they are used to compute the address of a subobject
of the object they point to. For example, in the following code CCured will
add a run-time check that s is not null before computing the value of
x. Then again there will be a check that x is not null before
dereferencing it.
struct str {
int a, b;
};
int getaddr(struct str * s) {
int * x = & (s->b);
return *x;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
The first check in the code above is necessary to enforce the invariant that
SAFE pointers are either 0 or else valid pointers. Without that check the
value of x would be 4 (on most machines) which would break the invariant
and would defeat the second null-check thus letting you dereference an invalid
pointer.
At this point you are starting to see some of the subtleties in the design of
CCured. To ensure that we got everything right we have formalized the type
system of CCured and we proved (for a subset of CCured) that the set of
run-time checks and invariants achieve memory safety. In fact, in the first
implementation of CCured we had forgotten about the first null check in the
above example and the need for it was revealed while trying to prove that
CCured is sound. To read our formalization and see the soundness proofs take a
look at our paper “CCured: Type-Safe Retrofitting of
Legacy Code”.
CCured includes a simple optimizer that tries to eliminate redundant
checks and checks that cannot possibly fail (such as checking that the address
of a global variable is non-null). Currently the optimizer is fairly naive.
For example, it does not know that since s is a non-null SAFE pointer
to struct str then &(s->b) is guaranteed to be non-null as well, thus
the second check is not really necessary.
Speaking of too many checks, some of the more experienced C programmers
will have noticed that our run-time checks prevent a common idiom for computing
the offset of fields in structures. The typical code for doing that is shown
below (as is defined as the macro offsetof in many C libraries):
struct str {
int a, b;
};
int get_offset_of_b() {
return (int) &(((struct str*)0)->b);
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
CCured recognizes in this specific case that you are casting the result of
the & operator to an integer, so it avoids the run-time check.
3.4 Checks for Returning Pointers
Note: Checks for returning stack pointers have been disabled
in the current version. This was done because recent versions of
gcc perform more aggressive inlining that results in false positives
for our return-pointer checks. For more information, contact the
CCured developers.
One of the unsafe features of C is that the address of a local variable can
be returned from a function and later be used in a context in which the
storage for the local variable has been reused. Many C compilers try to give
warnings when they notice this happening but it is way to easy to fool them.
For example, in the code example below the function bar does return the
address of its local variable and this is going to be missed by most
compilers.
int *foo(int *in) { // in is a stack address
*in = 5;
return in;
}
int* bar() {
int local = 0;
return foo(&local);
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
Each function that returns a pointer value will have one call to
CHECK_RETURNPTR that will verify that the pointer value is not in the
stack frame of the function that is returning. Note that the pointer value can
be 0, or can be a pointer to a heap area or to a cooler stack frame (a
caller stack frame).
There is one complication with the return checks. There is no portable way to
implement the check that a pointer is in the frame of the returning function.
Currently CCured checks that the pointer value is not in the 1Mbyte range
that starts at the current address of the frame pointer going towards lower
addresses. However, getting the address of the frame pointer is somewhat
unreliable in the context of heavy optimizations. The method that seems to
work best is to introduce a volatile local variable whose address is then the
address of the stack frame. Since the address of the variable is taken and the
variable appears first in the local declarations, it appears that both GCC and
MSVC will allocate such a variable at the highest address in the frame.
Note that we have observed the CHECK_RETURNPTR check to lead to spurious
failures in the case when the function returning a pointer is inlined into its
caller. For example, the code example from above the check for the return of
foo should succeed and the one for the return of bar should fail. If
foo is inlined into bar then foo's check will see that in is
in the current stack frame and will generate a run-time error. This is not an
ideal situation and we are looking for a better solution.
3.5 Checks for Writing Pointers
Another possible unsoundness with addresses of local variables is when the
address of a local variable is written to a global or to the heap. In that
case the pointer value might be used later at a time when the underlying
storage is being reused by another activation frame.
For example, in the following code fragment, both of the assignments are
checked using the CHECK_STOREPTR run-time function. The first one is
checked because we are obviously writing to a global variable. The second one
is checked because we are writing through a pointer and thus we cannot know
for sure whether we are writing to the heap or to the stack.
int *g;
void foo(int * *x) {
g = *x; // Check this
*x = g; // Check this
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
The CHECK_STOREPTR is passed both the address we are writing to and the
pointer that is being written. This function will in fact allow writing of
stack pointers into cooler stack locations (deeper into the stack). The
function will also allow the writing of null pointers anywhere. As a special
feature, CCured will allow the writing of stack pointers that are in the stack
frame of main or at higher addresses. This is useful because the
command-line arguments and the environment strings are allocated on the stack
of the program before main is called.
In rare occasions, we have encountered programs that do want to write the
address of locals variables into global variables. CCured provides an
easy-to-use mechanism for dealing with those situations. If you add the
attribute __HEAPIFY to the name of a local variable, the CCured will
move that variable to the heap using dynamic memory allocation. In fact, just
one allocation is made for all __HEAPIFY local variables in a stack
frame. Take a look at what happens in the following code fragment (do not be
fooled by the call to free; in CCured that is only a hint to the built-in
garbage collector):
int *g;
void foo() {
int local __HEAPIFY = 5;
g = &local;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
3.6 SEQuence Pointers
So far we have discussed pointers for which we disallow most casts and
pointer arithmetic. In this section we will discuss another family of CCured
pointers that can be used in pointer arithmetic operations. We call these
sequence pointers and they come in two flavors: those that can only be
advanced through pointer arithmetic (called forward-sequence pointers or
FSEQ) and the regular sequence pointer that can be moved both forward and
backward (we use the kind SEQfor these pointers). The cost that the
programmer pays for using these more capable pointers is that each dereference
will be accompanied by a bounds check.
Consider the following code fragment. The pointer x cannot be SAFE
because it is involved in pointer arithmetic. Since we are adding a
non-constant value CCured cannot be certain that the pointer is only advancing
so it will assign the more general SEQ kind to it.
int * arith(int * x, int delta) {
return x + delta;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
3.6.1 Representation
By looking at the CCured output for the above code you will notice several
changes in the code. First, the type of the x parameter and the name of
the function have changed as follows:
struct seq_int {
int * __SEQ _p ;
struct meta_seq_int {
void *_b ;
void *_e ;
} _ms ;
};
typedef struct seq_int seq_int;
int * arith_sq(seq_int p, int delta);
We see that CCured has created the type seq_int (sequence pointer to an
integer). This type has two components, the regular pointer value (field
_p) and a metadata component (the field _ms). Metadata is the
CCured terminology for additional information that CCured is carrying with the
pointers in order to be able to check their usage. All multi-word pointers in
CCured are represented as a structure with two elements: a field _p that
stores the value of the pointer and the field _ms that stores the
metadata for the pointer.
In the case of a SEQ pointer the metadata consists of two pointers, one
that stores the beginning of the memory area in which a pointer was created
(stored in field _b), and the end of that memory area (in field _e).
Such a memory area is also called a home area for a pointer. The
meta-data of the home area are generated by CCured for a pointer obtained by
allocation or by taking the address of a variable, and is passed along in an
assignment. Thus, a SEQ pointer carries with it the beginning and the end
of the home area from which it originates and these values will be used to
perform the necessary bounds checking.
The structures denoting fat pointers are named by adding a prefix
corresponding to the kind of fat pointer to a canonical name of the type. For
the general rules for naming types, see Section A.1.
Notice also that the name of the arith function has changed. CCured
mangles the names of globals whose type has changed. We do this to ensure that
you are not going to be linking your CCured code with a library that, for
example, calls the arith function with a regular pointer argument. The
mangling is always in the form of a suffix separated from the main name by an
underscore. The suffix is constructed as a sequence of letters, each one
signifying a certain kind of pointer (q stands for SEQ). The order of
the letters corresponds to the order in which the pointer type is encountered
in a depth-first in-order traversal of the structure of the global's type
(for functions we scan the result type first and then the arguments in order;
however, we do not scan structures and unions). For the general rules on
global name mangling, see Section 8.1.
Sequence pointers have an additional capability: they can be set to any
integer value, not just to 0 as in the case of SAFE pointers. We allow this
because the sequence pointers have the additional fields that can be encoded to
identify an integer disguised as a pointer. In particular both the _b and
the _e fields of a SEQ pointer are null in the case when the pointer
is actually an integer. The example below uses this capability of sequence
pointers. Notice that null pointers (those in which all three fields are 0)
are just a special case in which a SEQ pointer is actually an integer.
int * __SEQ getSeq() {
return 5;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
3.6.2 Invariants for SEQ pointers
-
Cannot be cast (except subject to stringent rules which we'll discuss
below). Note that assignment, passing actual arguments and returning are
implicit casts.
- Can be subject to pointer arithmetic (adding or subtracting an
integer from it).
- Can be set to any integer value.
- Can be cast to an integer and can be subtracted from another pointer.
This is useful for comparisons.
- SEQ pointers are represented using three words.
- Every time a SEQ pointer is dereferenced, both a check that the
pointer is not null or a disguised integer and a bounds check are inserted.
When sequence pointers are assigned, passed as arguments or as return values,
written or read from memory they carry their metadata unchanged. Same happens
when they are subject to casts or to pointer arithmetic. (Pointer arithmetic
affects only the _p component of the sequence pointer but not the home
area.) There are two operations in which SEQ pointers “are born”: by
using the name of a global or local array (possibly embedded inside other
structures or arrays) or by dynamic memory allocation. It is at that time that
the metadata for the SEQ pointers is computed and initialized, in the case
of an array based on the array length. and in the case of memory allocation
based on the allocated size. Take a look at the code CCured generates to
initialize the metadata for the r1 and r2 pointers in the code below,
both for the case of the memory allocation and for using the name of an array.
extern void* malloc(unsigned int);
int foo(int x) {
int *p, *r1, *r2;
int a[8];
r1 = (int*)malloc(16);
r2 = a;
p = r1;
p = r2;
return *(p + x); // Force p to be SEQ
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
Whenever a pointer is subject to pointer arithmetic CCured will force that
pointer to be SEQ. This means that the pointer should come accompanied by
appropriate metadata. Thus the CCured inferencer will propagate the request
for the metadata backwards through the data flow, across function calls
and returns, to all the places where the pointer might be produced. For
example in the previous example p is subject to pointer arithmetic so it
must be SEQ. But since p is assigned from r1 and r2, they too
must be SEQ. Finally the request reaches the name of the array a (in
which case the metadata for a SEQ pointer is computed based on the length of
a) and also the malloc (in which case the metadata is computed based
on the allocated length).
3.6.3 Run-time checks for SEQ pointers
One of the design decisions for SEQ pointers was whether to check that the
pointer remains within bounds after each arithmetic operation, or to allow
pointer to go temporarily out of bounds and do the check when you use the
pointer. We chose to check the dereferences because the C standard actually
allows pointers to point outside their home area.
Sometimes a pointer is subject to arithmetic and then assigned to a pointer
that is used only for reading and writing. The latter pointer will be inferred
to be SAFE and the SEQ pointer will be converted to a SAFE pointer.
int foo(int x) {
int *p, *safe;
p += x; // p is SEQ
safe = p; // safe is SAFE
return *safe;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
To convert a SEQ pointer to SAFE we must check that it is in
within the bounds of the home area or is null. Note that even though a SEQ
pointer might contain arbitrary integers, a SAFE pointer can only contain
the integer 0. In the above code you can observe a run-time call to
CHECK_xxx that performs the bounds checking. The same check is used when
reading or writing through a SEQ pointer, as in the example below.
int addAll(int * p, int len, int stride) {
int sum = 0, i;
for(; len >= 0; len -= stride, p += stride) {
sum += *p;
}
return sum;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
Just like for SAFE pointers we must check the pointer validity when taking
the address of a field of an object pointed to by a SEQ pointer:
struct elem {
int f1, f2;
int nested[8];
};
int foo(struct elem *array, int len) {
int * pnested, * pnestedseq;
array += len; // Make array a SEQ
pnested = & array->f2; // A bounds check here
pnestedseq = & array->nested[2]; // A bounds check here
pnestedseq += len; // pnestedseq is a SEQ
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
In the example above array is a SEQ pointer pointing to elements of
type struct elem. As long as we only do pointer arithmetic on array no
bounds checking is necessary. However, when we take the address of the field
f2 we will obtain a SAFE pointer and thus we must be sure that
arrayis within the bounds of its home area. Another interesting situation
occurs on the next line in the above example where we obtain a SEQ pointer
with a home included within the home of array. The home of pnestedseq
is the nested array within the struct elem element pointed to by
array. But again we must know that array is within bounds.
Just like for SAFE pointers we must check for stack addresses when a
SEQ pointer is written through a pointer or returned from a function. But
in this case the check is more subtle. To see why consider the following
program fragment:
// return a fat pointer to my own local, but using arithmetic
// to hide the fact that it's mine
int * sneaky()
{
int local[2];
int *x = local;
x += 200; // push x (apparently) above my frame
return x;
}
int main() {
int *plocal = sneaky();
return *(plocal - 200); // Back into its (vanished) home
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
If we were to check that the _p field is not a local stack address we
would fail to notice the unsoundness in the above example. For this reason the
stack checks are performed using the _b pointer. Another to do so is that
if the _b pointer is null then we are returning an integer disguised as a
pointer and we should not care whether it is equal to a stack address (since
such a pointer cannot ever be dereferenced).
The bounds checks in CCured are more involved that in languages like Java
where all of the elements have the same size. It turns out that we can write
faster checks if we maintain the invariant that each FSEQ or SEQ pointer
points to an area that contains a whole number of elements of the given type.
Consider the following code fragment:
char buffer[17];
int main() {
long * __FSEQ p = buffer; // This will fail
return p[2];
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
This will fail with an alignment check because there is no room for a whole
number of long elements in buffer.
Failure ALIGNSEQ at ./foo.c:3: main(): Creating an unaligned sequence
In general, this check can appear as part of a cast from a sequence pointer
to a sequence pointer of a wider base type. However, the above program is
perfectly fine. We can do several things:
3.6.4 Casts allowed for SEQ pointers
.
It is always possible to cast
a SEQ pointer to a SAFE pointer with the same underlying type. A
bounds-check is performed in that case. It is also possible to cast a SAFE
pointer to a SEQ pointer, in which case the home area of the new pointer is
the memory range occupied by one element of the SAFE pointer type.
Casting of SEQ pointers is only allowed when the underlying types are the
same or very closely related. We cannot freely allow the casting of SEQ
pointers using the physical subtyping rules that we used for SAFE pointers.
To see why consider the following program:
[--noSplitPointers]
struct wide {
int i;
int *p;
};
void foo(struct wide * __SEQ x) {
int * __SEQ pi = (int*)wide;
*(pi + 1) = 5;
}
Notice that the cast has the property that it casts a pointer to a large
structure to a pointer to smaller structure that is compatible with the large
one. If we were to allow the above program then we would be able to write an
arbitrary integer in a place where the pointer-type field p is stored. The
rule for SEQ pointers is that an infinite-tiling of the two types being
cast is compatible. This allows us to cast a pointer to an array into a
pointer to array elements (a very useful operation when working with
multi-dimensional arrays):
double a[8][8];
int zero() {
double * pa = a;
for(int i=0; i<sizeof(a)/sizeof(double); i++) {
* pa ++ = 0.0;
}
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
3.7 FSEQ Pointers
We observed in our experiments that most SEQ pointers only move forward.
Thus the lower-bound check is not needed and also the _b field of the
SEQ pointer is also not needed. To capture this common case CCured is
using the FSEQ pointer kind (forward sequence). The FSEQ pointer is
very similar to the SEQ pointer with a few exceptions.
Consider the following example:
int addAll(int * p, int len) {
int sum = 0, i;
for(; len >= 0; len --, p ++) {
sum += *p;
}
return sum;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
A FSEQ pointer is represented as a two-word structure with just the
_p (actual pointer value) and _ms._e (end of the home area) fields:
struct meta_fseqp_int {
void *_e ;
};
struct fseqp_int {
int * __FSEQ _p ;
struct meta_fseqp_int _ms ;
};
typedef struct fseqp_int fseqp_int;
int addAll_f(int * __FSEQ p, void *p_ms_e , int len);
A FSEQ pointer can also encode an integer, in which case the _e field
is null.
A FSEQ pointer with a non-null _e field points always to an address
that is above or equal to the beginning of the home area. However, it might be
beyond the end of the home area, and that is why a FSEQ pointer requires an
upper-bound check whenever it is used (see the CCured output for the above
example).
When doing stack checks for the FSEQ pointer we use the value of the
_e field.
However, we must check for each arithmetic operation on FSEQ pointers
whether it is advancing the pointer or not. This is done using the
CHECK_ADVANCE run-time function. Notice that just because we add 1 to a
pointer it does not mean that we are advancing it. We might be trying to
overflow the addition and to break the invariant.
What remains to be said about FSEQ pointer is how does CCured infer that a
pointer is FSEQ as opposed to SEQ. CCured looks at all arithmetic
operations and if we always adding a positive constant to a pointer then
CCured will infer that pointer to be FSEQ. Another useful heuristic that
CCured uses is that pointer arithmetic expressed using the array indexing
notation is taken as an indication that we are advancing the pointer:
int addAll(int * p, int len) {
int sum = 0, i;
for(i=0; i<len; i++) {
sum += p[i];
}
return sum;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
Finally, FSEQ pointers can be cast to and from SAFE pointers using the
same rules that we discussed for SEQ pointers. FSEQ pointers can also be
cast to and from SEQ pointers. When casting a SEQ pointer into a FSEQ
pointer we must perform a lower-bound check. When casting a FSEQ pointer
into a SEQ pointer we consider that the home area starts at the place where
the pointer is pointing (provided that the pointer is not encoding an integer
and is within bounds). Note however that such a cast will never occur in a
program without pointer-kind annotations. The CCured inferencer will instead
prefer to propagate the constraint that all pointers which are assigned to
SEQ pointer must themselves be SEQ pointers and thus have a valid
_b field.
3.7.1 Arrays of unspecified length
We saw that a SEQ pointer obtains its metadata from the length present in
the array type from which they originate. But occasionally it is useful to
have arrays with either unspecified length or with a zero-length. Consider the
following code, in which the struct open is open-ended, that is the number
of integer pointers contained in the rest array field is determined at the
allocation time (in this case 4):
struct open {
int count;
int * rest[0];
};
extern void* malloc(unsigned int);
int main() {
struct open *p = (struct open*)malloc(20);
return p->rest[5];
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
CCured supports this paradigm and computes the size of the rest field at
the time of the allocation. Then CCured will turn the array rest into a
sized array, which is essentially a structure with two fields: a field
_size which stores the size of the array and a second field _array
which contains the array itself:
struct _sized_a_char {
unsigned int _size ;
char ( __SIZED _array)[20] ;
};
Sized arrays are very similar to the Java arrays in that they store their
length in the first word of the data structure. When a SEQ pointer is
created from such an array the metadata is computed based on the stored size.
There is one more situation in which CCured will automatically infer that an
array must be sized. That is when the array is declared external and without a
length:
extern a[];
int main() {
return a[3];
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
Additionally, the programmer can request that an array be represented in sized
form by using the __SIZED attribute on the array name:
int *a [8] __SIZED;
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
Note that CCured does not support the allocation of sequences of arrays of
structures with open arrays. For example, the following code will produce a
warning and will allocate a sequence of struct open elements each with a
zero-length rest field!
struct open {
int count;
int * rest[0];
};
extern void* malloc(unsigned int);
int main() {
struct open *p = (struct open*)malloc(20);
p ++; // Make p FSEQ
return p->rest[5];
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
You might have noticed in the previous example that when reading from a
SEQ pointer, CCured will allow the reading of the byte immediately
following the array. This “feature” is part of our current mechanism for
handling null-terminated strings, in which case the terminal null character
can be read but not written, as discussed in the next example.
3.8 WILD Pointers
The pointer kinds we have seen so far can be dereferenced and can be subject
to pointer arithmetic but can only be cast in very restrictive ways. Therefore
we cannot really hope to be able to annotate all existing C programs with
kinds like SAFE, SEQ and FSEQ. We also need a pointer
kind that can also be cast to any other type. The WILD pointer kind plays
this role. Looking back at the kinds of pointers we have introduced so far we
observe that the most restrictive kind of pointer, the SAFE pointer, is
also the cheapest to use. It requires only one word for storage and only a
null check for dereference; just like Java references. Then as we add more
capabilities we also increase the cost of the pointer. The FSEQ pointers
have all the capabilities of SAFE pointers but can also move forward. The
additional cost is an extra word required for the storage of the end of the
valid range and an upped-bound check required before dereference. The SEQ
pointers have the additional capability of moving backwards and the additional
cost of one more storage word and a lower-bound check before dereference.
Keeping with this trend it is to be expected that WILD pointers are going
to be even more costly. As we shall see, the WILD pointers must be able not
only to find the bounds of the range in which they are supposed to navigate but
they must also know for each word in that range whether it is a pointer or a
non-pointer. The previously-introduced kinds of pointers did not need to
maintain that information at run-time because the lack of casts allowed the
compiler to keep track of such information statically.
One way to think of the CCured pointer-kind inferencer is as an analysis that
classifies your pointers into two big categories: those for which the static
type is an accurate description of the values pointed to; and those for which
it is not. We refer to the first category as the statically-typed
pointers and they consist of the pointers discussed so far: SAFE, SEQ,
and FSEQ. The second category consist of the dynamically-typed pointers and include the WILD pointers. Since the
compiler cannot verify statically the type of the values the WILD pointers
point to, then CCured inserts code to maintain at run-time information about
the contents of a memory range pointed to by WILDpointers.
Consider the following example, in which a (of type int
*) is cast to type int * *:
int foo(int * a) {
int * * g = (int * *)a; // Bad cast
return 0;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
There are several new things in the CCured version of the above
program fragment. We start with the type of a:
struct meta_wildp_int {
void *_b ;
} ;
struct wildp_int {
int * __WILD _p ;
struct meta_wildp_int _ms ;
} ;
typedef struct wildp_int wildp_int;
int foo_w(wildp_int __WILD a);
A WILD pointer is represented as a two-word structure. As usual the
_p field stores the actual pointer and just like for SEQ pointers the
_b field stores the beginning of the pointer's home area. The major
difference in the representation of WILD pointers is the layout of the home
area. Since we must keep track at run-time what is stored in each location in
a dynamically-typed area we will store a bitmap (one bit per word) at the end
of the home area. And just like for sized arrays the word immediately before
the home area stores the size in words of the home area. Such an example is
shown in the code below where the address of the local variable h is the
home for the pointer p.
void foo() {
int * h = 0;
int * p = (int *) &h;
return 0;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
The type of h is:
struct _tagged_wildp_int {
unsigned int _len ;
wildp_int _data __attribute__((__packed__)) ;
int _tags[(sizeof(wildp_int ) + 127U) >> 7] __attribute__((__packed__)) ;
};
typedef struct _tagged_wildp_int _tagged_wildp_int;
The _data field in a tagged area stores the actual data, which in this
case is a WILD pointer. The _tags field contains one word for every 32
words of data. Each bit in the tags is 1 if the corresponding word in the data
field stores the _b field of a WILD pointer, and 0 otherwise (if it
contains an integer or the _p field of a WILD pointer or equivalently
a value that is not CCured metadata). To maintain this invariant we
must update the tag bits every time we perform a memory write. When we write an
integer into a word we clear the bit corresponding to that word. When we write
a WILD pointer then we set the two bits corresponding to the two words to 0
and 1 respectively. When we read an integer we do not need to check tags. When
we read a WILD pointer then we check that the tag bit for the word from
which we'll read the _b component has a set tag bit. If this is not true
then we check whether the _b component that we'll read is 0. In this
latter case we would be reading a pointer that cannot be used for memory
dereference anyway. This latter situation occurs when we read a pointer from
an area that has been initialized with zeros.
Notice that this scheme ensures that we will never interpret a word in a
tagged area as a _b field except if it was last written with a contents
of a _b field. This however does not prevent code that overwrites the
_p fields from running (except that the resulting pointers might not be
later usable).
The following run-time support functions are used in conjunction with WILD
pointers:
/* Fetch the size (in words) of the tagged area pointed to by a WILD pointer.
This also checks that the pointer has a valid _b field */
unsigned int
CHECK_FETCHLENGTH(void *_p, /* The _p field of the pointer */
void *_b); /* The _b field */
/* Do bounds checking for WILD pointers */
CHECK_BOUNDS_LEN(void *_b,
unsigned int bwords,/* Result of FETCHLENGTH */
void *_p,
unsigned int plen); /* The size in bytes of the memory area
being accessed */
/* Clear the tags for a memory range. This is called before writing a scalar
or a structure containing at least one scalar into a tagged area. */
CHECK_ZEROTAGS(void *base, /* The base of the tagged area */
unsigned int nrwords, /* Number of data words in the area */
void *start, /* Start of the memory range for which
to clear the tags */
unsigned int size); /* Size in bytes of the memory range for
which to clear the tags */
/* Set the tags for writing a pointer. This also checks that we are not
writing a stack pointer. This is called for EACH pointer in a structure
that is being written. */
CHECK_WILDPOINTERWRITE(void *base, /* The base of the tagged area in which
we write */
unsigned int nrwords, /* Number of data words in the
area */
void **where, /* The address in the tagged area where
we are about to write */
void *_b, /* The _b field of the written pointer */
void *_p); /* The _p field of the written pointer */
/* Check that the pointer we are about to read has a _b field that has not
been tampered with */
CHECK_WILDPOINTERREAD(void *base, /* The base of the tagged area in which
we write */
unsigned int nrwords, /* Number of data words in the
area */
void **where, /* The address in the tagged area where
we are about to write */
void *_b, /* The _b field of the written pointer */
void *_p); /* The _p field of the written pointer */
The code fragment below uses these runtime functions:
struct s {
int i; // Some integer
int *q; // And some pointer
} g, * __WILD pg = &g;
int foo(struct s * __WILD x) {
// Read an integer from x.
// Must do bounds check
int read = x->i;
// Read a pointer from x
// Must do bounds check and check that the _b field is valid
int * ptr = x->q;
// Write an integer
x->i = 5;
// Write a pointer
x->q = (int*)6;
// Read and write the whole struct
g = *x;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
A WILD pointer can be used in a very flexible way but there are some
constraints. Like any of the other pointers that can be subject to pointer
arithmetic the WILD pointer carries with it the identity of the home area
and can be used to access only that area. But most importantly a WILD
pointer can only be case to and from another WILD pointer and can only
point to scalars or other WILD pointers. Essentially this means that the
dynamically-typed universe can touch the statically-typed universe in only a
single way: a WILD pointer can be stored in a statically-typed area.
The following pointer kind is not legal “int * SAFE * WILD”. To
see why consider the (ill-typed) code below:
int foo(int * __SAFE * __WILD x) {
int * __SAFE y;
*(int __WILD *)x = 5; // Ok since x can be cast to another __WILD pointer
y = * x; // Ok
return *y; // Ok since y is SAFE and non-null
}
Essentially we cannot count on the accuracy of the types pointed to by
WILD pointers. For this reason we can only allow WILD pointers to point
to scalars or other WILD pointers. So, in the above example CCured does not
recognize x as a valid type.
For the same reason we cannot cast between WILD pointers and non-WILD
pointers.
3.9 Split Metadata
In the previous examples pointers with metadata are represented as
structures. It turns out that gcc or the Microsoft Visual C compiler
are not very effective at optimizing code that uses many variables with
structured types. Thus, CCured has the ability to split such variables into
several single-word variables. Consider again one of the examples from before:
int * arith(int * x, int delta) {
return x + delta;
}
Browse the CCured inferred pointer kinds,
or see the CCured output
for this code fragment
Notice that x becomes split into three variables: x (for the _p
field, or the regular pointer value), and x_b and x_e for the
beginning and end metadata fields. Notice also that function parameters and
arguments (but not the results are split in the same way):
int * arith_sq(int * x, void * x_b, void * x_e, int delta);
By default, CCured splits the metadata. You must pass the argument
–noSplitPointers to prevent it.