ISO: WG21/N0363 ANSI: 93-0156 Author: John Max Skaller Date: 11/9/93 Reply to: maxtal@suphys.physics.us.oz.au C++ Memory model: Discussion paper ---------------------------------- The following details describe a proposed memory model for C++. Except where noted, the model attempts to provide the underlying framework already specified by, or required by, the ARM and Working Paper. This is a discussion paper not a proposal, but the intention is to provide a normative memory model as part of the standard. This seems to be required for construction of a suitable object model. COMMENT ------- One of the major strengths of the C language is its ability to deal with raw storage at the machine address level. Without these facilities, operating system programming, for example, would not be possible. We attempt in the C++ Memory model to preserve and indeed strengthen these abilities by defining exactly when certain operations are allowed and when they are ill-formed. The model is characterised in such a way that a suitable compiler and run-time system, interpreter, or hardware system could actually test the execution of a C++ program with given inputs for conformance. The principle substantive change in this model is the complete separation of function and data addresses. This was previously the case for portable, strictly conforming programs anyhow. Implementors may provide a library function that maps data and function pointers to each other as appropriate. A second substantive rule (I'm not sure if its a change) is that addresses of freed memory cant be used, even for comparions, assignments or copying. The reason is that the usual operations for this on some architectures may cause a fault if the pointers are not currently valid. This can usually be got around, but ony at some cost in efficiency. SUMMARY ------- 1) There are two address spaces and two types of address: data and function. The types void* and function_t* are universal types to which any data or function address may be cast implicitly (resp.). Rationale. This is a distinct difference from the existing WP. Conversion of a function pointer to a void* is allowed in the WP provided the void* has sufficient bits to hold it. Similarly, a data pointer may be converted to a function pointer. (Explicit casts are required in both cases). Both these conversions are non-portable. Finally, function pointers can be cast to function pointers of different types. Thus a cast to some fixed function pointer type such as void (*)() may be used to hold arbitrary function pointers, but there is nothing preventing the user accidentally calling such a function, with undefined results. Providing a function_t* type allows portable and relatively safe manipulation of function pointers, similar to void*. Disallowing the non-portable conversions is not a major change, since they are required to be supported only if certain other implementation dependent criteria are satisfied: portable programs could not easily use this feature anyhow. 2) No address calculations can be performed on function addresses other than comparison for equality and inequality, and explicit casting to some function pointer type. Copying, assignment, and initialisation by 0, however, are supported. The type function_t* does not support calls. Rationale. Function addresses cannot sensibly be compared to see which is lower than the other on many architectures. 3) Address calculations are supported for data addresses if, and only if, the pointers are in the same address segment, and both addresses are valid, in which case <, <=, > and >= are well defined and yield a total order. Rationale. This was intended to be a clarification of the ARM. Comparing addresses on segmented architectures for less can be defined, however the ARM does not require this. Comparing void* which are machine addresses of data which must be in a linear address range is explicitly made valid here. It covers those cases where a library function returns a void* which can be cast to char* and address calculations performed. Since an object occupies a contiguous sequence of bytes, comparisons of addresses within an object are well defined and we chose to require that such comparisons are in fact defined in the language. The rule restricting comparisons to valid addresses is intended to work as follows: when an allocation is performed, the addresses of all the memory bytes so allocated are marked 'allocated' and 'valid' and the address one past the end of the segment is also marked 'valid'. The purpose of this is to support the C requirement that the address one past the end of an array may be used in a comparison. 4) A comparison for a,b of data type T is well defined and has the same value as the comparison for (void*)a, (void*)b if, and only if, that comparison is well defined. Rationale. This provides an important connection between ordinary pointer comparisons and machine addresses. 5) The library functions malloc, realloc, operator new and operator [] new return the address of the first byte of a contiguous sequence of data addresses in the same segment. Rationale. Specifying this is necessary to allow address calculations and object construction. 6) Addresses of bytes within the same complete object are in the same memory segment. (Object is defined elsewhere). Rationale. This is a requirement comes from the ARM definition of object. It is provided here for completeness. 7) The address past the end of an array is said to be 'just' in the same memory segment as the addresses of the array. Rationale. This feature is required for C compatibility. 8) No address comparisons, including equality and inequality, may be performed on addresses which are not currently valid. Rationale. This is additional to the existing rules. I included this restriction because it is necessary for segmented architectures with hardware protection and/or address translation, such as the 80386 family. 9) The unit of addressability is the byte, which is big enough to hold a char. Rationale. This is from the ARM/WP. The section on bit order is elided because it is circular and meaningless. 10) Every address may be considered to have the following flags associated with it: flag synonymn for !flag ----------------------------------- isValid isInvalid isAllocated isUnallocated isInitialised isUninitialised isReadWrite isReadOnly isReliable isUnreliable 11) There are four types of access to memory: 1) normal read 2) normal write 3) locked read 4) locked write 12) No accesses to unallocated memory are allowed. 13) Read access to uninitialised memory is not allowed. 14) Write access to read only memory is not allowed. 15) Normal access to unreliable memory is not allowed. 16) All other accesses are legal and yield well defined results. 17) No operations may be performed on pointers to invalid memory locations, including comparisons. 18) Allocated memory is always valid. In addition, the byte past the last allocated byte is always valid. 19) The flags of memory bytes are set by various operations such as allocation by malloc, execution of a constructor, etc. (The exact operations must be specified somewhere in the standard, possibly here) 20) Non-normative: The ReadWrite/ReadOnly attributes are associated with the const cv-qualifier. 21) Non-normative: The Reliable/Unreliable attribute and the locked memory access forms are associated with the volatile qualifier. 22) Allocated reliable memory has the property after which it is named: memory. This is expressed by saying that a) contiguous read accesses to the same location yeild the same value if there is no intervening write access b) the value store by a write access is the one read by a read access 23) These properties only hold for reliable memory. Accesses to unreliable memory yield implementation defined results. Rationale. This section explains precisely what constitues a violation of the abstract machine memory model. Thus, it provides a basis for one of the distinctions between safe and unsafe pointer casts, and explains why certain unsafe casts are permitted. For example, casting away const is permitted, since no violation exists until a write access is done to read-only memory. 24) Exact details of pointer and address calculations should be given, here or elsehwere.