[Next] [up] [Previous]
Next: 8. Sequential and Up: EVOLVING ALGEBRAS:AN ATTEMPT TO Previous: 6. Example 2:

7. Example 3: The strcpy function

Q: Will you show me an example where the algorithm is given by a program?

A: Yes. Consider the piece P of C code in Figure 1 which is taken from Kernighan and Ritchie's book ``The C Programming Language'' [KR, page 105].

void strcpy (char *s, char *t)
{
    while (*s++ = *t++)
	;
}

Q: I know too little about C. What does P do and what does this unpronounceable ``strcpy'' stand for? Is P well formed? The while loop does not seem to do anything.

A: P is well formed and defines an algorithm that copies a character string from one place in memory to another. I guess, ``strcpy'' abbreviates ``string-copying''. The pointer variables t and s point initially to the first memory locations in the original place and the new place respectively; *s is the content of the memory location s. The equality sign denotes assignment. The expression *s++ evaluates to the content of the location s and has a side effect of resetting s to the next location. Kernighan and Ritchie admit that the code is ``cryptic at first sight'' [KR, page 106].

I should make a remark about the notion of a string in C. Usually, people have in mind strings of printable characters, but this is not important for us here. The important restriction is that a string does not contain the so-called null character which is used to indicate the end of a string representation in memory. The memory can be viewed as a linearly ordered set of memory locations holding characters. Each character fits into one location. (There isn't much difference in C between characters, bytes and very small integers. In particular, the null character corresponds to the number zero.) Because of the necessary null character at the end, it takes (l+1) memory locations to store a string of l non-null characters.

Q: How can an assignment be the condition of a while loop?

A: An assignment statement returns the value assigned. Nonzero values represent truth and 0 represents falsity in C. The value 0 is returned when the null character is encountered.

We will construct an evolving algebra A modeling the strcpy algorithm. First, we will describe the universes of A and most of the basic functions. Then we will describe the transition rules and the remaining basic functions. Since P uses pointers to memory locations, A has universes Char, Loc and Addr. Elements of Char will be called characters; null is a distinguished character. The universe Loc comes equipped with a successor function. Elements of Loc will be called memory locations. ( C allows more arithmetic on memory locations, but only the successor function is used in P .) A dynamic function LocCont from Loc to Char assigns characters to memory locations. Elements of Addr will be called addresses. A dynamic function AddrCont assigns memory locations to addresses. (According to C, pointer addresses are composed of memory locations and AddrCont is defined by LocCont, but this information is not necessary for explaining P and will be ignored.) We assume that the two content functions are total (on their respective domains).

Q: An address probably identifies a little block of a memory where the address of a pointer can be stored. What is the difference between Addr and Loc?

A: You are right of course. The size of the block depends on the implementation. For our purposes, there is no reason to go into these details.

To reflect (a somewhat simplified version of) the parse tree for P corresponding to the official grammar of C and shown in Figure 2, A has a universe Parsetree that comprises nodes of the parse tree. A distinguished element Root and unary functions Child1, Child2, Child3, Parent have the obvious meaning. In addition, there is a universe of labels. A function Label assigns labels to nodes as shown in Figure 2. (The labels do not include expressions in parentheses which are added for greater clarity.) Each label is a distinguished element and therefore can be mentioned in rules.

[Figure 2]

In order to simulate P , we should know at each moment where we are in the program. To this end, a dynamic distinguished element C will be the currently active node of Parsetree. Initially, C is the root. It will be convenient to abbreviate Child1(C), Child2(C), Child3(C) as C1, C2, C3 respectively. Let me write a few transition rules governing the movement of C. I presume that C never stays two moments in a row at the same place; in other words, it will never have the same value at two consecutive states.

if C = Root then
   if Val(C1) = undef then C := C1
   if Val(C2) = undef then C := C2
   if Val(C3) = undef then C := C3
   endif
endif

Q: What is Val?

A: Val is a function on Parsetree. Think about each node n of the parse tree as the root of the corresponding subtree. When the code corresponding to the subtree has been executed, the resulting value is assigned to n. For example, the execution in question may be an expression evaluation; then the value of the expression is assigned to n. Initially, Val is nowhere defined.

Q: What if n corresponds to a statement that does not return any value?

A: When the statement has been executed, the value done will be assigned to n.

Q: What will be the value of a declaration node?

A: You'll see. Here are some additional rules governing the movement of C and using some obvious abbreviations.

if C is a leaf then C := Parent(C) endif

if C has exactly one child then
   if Val(C1) = undef then C := C1
   else
      C := Parent(C)
      Val(C1) := undef
   endif
endif

Q: I understand that the case of the while-statement node is an exception, but what about the case of the assignment node? Is it treated the same way as the case of the root node?

A: No. The problem is that the definition of C does not specify whether the left or the right expression is evaluated first. This was a deliberate decision [KR, pp. 53-54]; the order of evaluation is left to the implementer. To reflect the implementer's decision, we use an external function ChooseChild.

if Label(C) = assignment then
   if Val(C1) = undef and Val(C2) = undef then C := ChooseChild
   elseif Val(C1) = undef then C := C1
   elseif Val(C2) = undef then C := C2
   else
      C := Parent(C)
      Val(C1) := undef
      Val(C2) := undef
   endif
endif

Q: Why do you reset the values of C1, C2 to undef?

A: To have a proper environment when C comes down to the assignment node next time.

Q: I do not see why do you need ChooseChild. Since you have allowed nondeterminism to resolve contradicting rules, you may as well use it. To choose a child, just say that C gets C1 if Val(C1) is undefined and that C gets C2 if Val(C2) is undefined. If the values of both C1 and C2 are undefined then one of them is nondeterministically chosen.

A: You are right. As an implementor, I would prefer the ChooseChild version though. It is more explicit. Now, let me give you the remaining rules. They are grouped with respect to the location of C.

if Label(C) = declaration then
   Val(C) := Val(C1)
   AddrCont(Val(Decl(C1))) := Val(C1) + 1
endif

Here ChooseAddr is a zero-ary external basic function that returns an address.

Q: Why should it be external?

A: P doesn't tell us how to allocate addresses.

Q: You suppose, I guess, that both occurrences of ChooseAddr evaluate to the same value.

A: Yes, ChooseAddr has a definite value in each state (where it is evaluated).

Q: And what is the PredefinedLoc function? You did not mention it earlier.

A: A call to strcpy should provide two parameters, namely, the initial values of s and t. PredefinedLoc carries this information.

Q: Is it an internal or external function?

A: In this case, the distinction does not matter. Since the two parameters should be known when P starts, PredefinedLoc can as well be an internal function, a static internal function. Further,

if Label(C) = identifier then Val(C) := AddrCont(Val(Decl(C))) endif

Q: What is Decl?

A: Decl maps each identifier node to the declaration node where the identifier in question is declared. The left identifier node is mapped to the left declaration node and the right identifier node is mapped to the right declaration node. Here is the rule for post-increment nodes.

if Label(C) = post-increment and Val(C1)<> undef then
   Val(C) := Val(C1)
   AddrCont(Val(Decl(C1))) := Val(C1) + 1
endif

Q: Let me understand this complicated second update. It would be easier for me to speak about a specific identifier node, say, the left one, which is related to the pointer variable s. Then Decl(C1) is the left declaration node and Val(Decl(C1)) is the address of s. On the other hand, Val(C1) is the memory location pointed to by s. Thus, we are changing s to point to the next location (the successor of the old value of s).

A: That is right. Notice that Val(C) does not reflect the change. But the next time when our identifier node becomes active, it will have a new value.

The dereferencing operator (also known as the indirection operator) of C is formalized by our LocCont function. However, in the case of the left child of the assignment node, we are not really interested in dereferencing.

if Label(C) = dereference and Val(C1)<>undef then
   if C = Child1(Parent(C)) then Val(C) := Val(C1) endif
   if C = Child2(Parent(C)) then Val(C) := LocCont(Val(C1)) endif
endif

The assignment-node rule should be clear now.

if Label(C) = assignment and Val(C1)<>undef and Val(C2)<>undef then
   LocCont(Val(C1)) := Val(C2)
   Val(C) := Val(C2)
endif

Q: I do not understand why the result of the assignment statement is Val(C2) rather than simply done.

A: As I mentioned before, an assignment statement returns the assigned value. This feature allows C programmers to use commands like a = b = c = 0. . We take advantage of this feature to indicate when the string copy operation should halt. When null has been copied, it is passed up to the while-statement node halting execution of the while statement.

if Label(C) = while-statement then
   if Val(C1) = undef then C := C1
   elseif Val(C1)<>null then
      C := C2
      Val(C1) := undef
   else
      C := Parent(C)
      Val(C) := done
      Val(C1) := undef
   endif
endif

Q: Is all that a part of your EA description of C?

A: No, not really, but the description used to look like that.


[Next] [up] [Previous]
Next: 8. Sequential and Up: EVOLVING ALGEBRAS:AN ATTEMPT TO Previous: 6. Example 2:


huggins@acm.org
Fri Dec 9 14:18:02 EST 1994