Semantics

A friend of mine once said, "What do you mean you don't want to argue about semantics? What else is there to argue about?"

In the realm of programming languages, expressing the intended semantics of an interface is one of the most important aspects of the design. Naturally, I'll focus this discussion on the issues with expressing semantics in C++ programs.

A strictly typed language can be used to express many simple semantics, such as "the first argument of the function is a signed integer." However, the C++ language does not offer a way to express ideas such as "the range of legal values for the first argument is between -2 and +2 inclusive."

The purpose of this document is to 1) provide a place to collect ideas regarding the various interface semantics that are not expressed directly by C++, 2) analyze those semantics, 3) evolve a consistent set of terminology, and 4) develop a set of questions that a developer can ask himself with regard to an interface, that will help him to correctly document the semantics of each argument, the return type, and the behavior of the operation overall.

Ambiguous Semantic Analysis

Each subsection of this document will discuss a particular semantic issue.

Life Expectency For Objects Passed By Reference

Its common knowledge that passing arguments by reference rather than by value is more efficient once the size of the argument exceeds a certain threshold. The threshold varies, but is generally related to the register size of the target CPU.

A pass by value semantic is the default for integers, and involves a copy of the object being passed to the operation. If during its execution, the operation modifies the argument, it is modifying a copy of the original object rather than the object itself.

If an object is passed by reference, and the operation modifies the object, it is actually modifying the original object, rather than a copy. Therefore, although the act of passing by reference can be more efficient than passing by value, the semantics are different from the point-of-view of the implementation of the operation. In other words, either the operation must be careful not to modify the referenced argument, or the semantic must be made clear to the client.

Of course, using the "const" qualifier for the reference in the interface specification neatly "prevents" the implementation of the interface from modifying the argument, effectively making the argument read-only.

However, what is not covered by C++ is the time related aspects of the referenced object. When an object is passed by value, the operation implementation is free to keep a copy of the copy, for its own use, and the lifetime of the copy is under its control.

The possibility of an operation storing a reference beyond the execution of the operation itself, however, has other implications. Does the operation expect the referenced object to persist beyond the exectution of the operation? For example, what happens if the operation is part of an object, and a copy of the reference is stored within the state of the object, and subsequently another operation is invoked that uses the reference. The expectation of the interface, in this case, is that the referenced object remain valid for some period of time after the reference is obtained throught the original operation.

The question then becomes, "How long must the referenced object persist beyond the invocation of this operation?" Possible answers to this question include:



Possible Language Extensions

In my opinion, C++ is one of the most powerful instruments for developing system software. There is, however, room for improvment. The following sections are reserved for a discussion of these.

Alignment

The static type checking facilities of C++ make the architects task of ensuring interface compliance, and the programmer's job of identifying mistakes much easier. System (not processor) imposed alignment constraints represents one of the holes in the type system that would help to check the integrity of systems code at compile time.

One of the major problems encountered is when the required user alignment is greater than that guaranteed by the run-time environment. The runtime alignment is encountered in three places:

Static alignment is the responsibility of the compiler within a compilation unit, but the alignment of compilation units is the responsibility of the linker, and the loader. The compiler can take to approaches to this problem, 1) assume that externally controlled alignment requirements have been met by the user, 2) not implement the alignment primitives. Clearly, the first gives more power to the system designer, and the consequences are no worse than the current alignment situation.

There are two approches to stack alignment. First, the ABI can specifiy a minimum stack alignment and the compiler can then make the appropriate assumptions. The second method would be for an extension that would allow for run-time dynamic alignment of the stack frame.

Given the backward compatability issues involved with ABI changes, and the negative effect of a static alignment requirement on the stack size, the dyanmic run-time approach to aligning objects on the stack frame is likely the best alternative.

For similar reasons, a dynamic run-time approach to aligning heap objects is probably best. Indeed, this practice is common, although it is generally ad-hoc and not standardized. Since I'm not a fan of heap allocation via malloc and new, I'm willing to leave this issue to the user who could either provide their own malloc/new with stricter alignment, or dynamically adjust allocated memory using traditional ad-hoc methods. Either way, the compiler can assume the user knows what he is doing.

Object Alignment

Type Alignment

Dynamic Alignment

Stack

Heap