ISO: WG21/N0363
                                  ANSI: 93-0156
                                  Author: John Max Skaller
                                  Date:  11/9/93
                                  Reply to: maxtal@suphys.physics.us.oz.au


     C++ Memory model: Discussion paper
     ----------------------------------

The following details describe a proposed memory model for C++.
Except where noted, the model attempts to provide the underlying
framework already specified by, or required by, the ARM and Working
Paper.

This is a discussion paper not a proposal, but the intention
is to provide a normative memory model as part of the standard.
This seems to be required for construction of a suitable object model.


     COMMENT
     -------

One of the major strengths of the C language is its ability to
deal with raw storage at the machine address level. Without
these facilities, operating system programming, for example,
would not be possible.

We attempt in the C++ Memory model to preserve and indeed
strengthen these abilities by defining exactly when
certain operations are allowed and when they are ill-formed.

The model is characterised in such a way that a suitable
compiler and run-time system, interpreter,
or hardware system could actually test the execution of
a C++ program with given inputs for conformance.

The principle substantive change in this model is the
complete separation of function and data addresses.
This was previously the case for portable, strictly
conforming programs anyhow.

Implementors may provide a library function that
maps data and function pointers to each other as
appropriate.

A second substantive rule (I'm not sure if its a change)
is that addresses of freed memory cant be used,
even for comparions, assignments or copying.

The reason is that the usual operations for this
on some architectures may cause a fault if
the pointers are not currently valid. This can
usually be got around, but ony at some cost in efficiency.


     SUMMARY
     -------

1) There are two address spaces and two types of address:
data and function. The types void* and function_t* are
universal types to which any data or function address
may be cast implicitly (resp.).


    Rationale. This is a distinct difference from the existing
    WP. Conversion of a function pointer to a void* is allowed
    in the WP provided the void* has sufficient bits to hold it.
    Similarly, a data pointer may be converted to a function
    pointer. (Explicit casts are required in both cases).

    Both these conversions are non-portable.

    Finally, function pointers can be cast to function pointers
    of different types. Thus a cast to some fixed function
    pointer type such as void (*)() may be used to hold arbitrary
    function pointers, but there is nothing preventing the
    user accidentally calling such a function, with undefined
    results.

    Providing a function_t* type allows portable and relatively
    safe manipulation of function pointers, similar to void*.

    Disallowing the non-portable conversions is not a major change,
    since they are required to be supported only if certain
    other implementation dependent criteria are satisfied:
    portable programs could not easily use this feature anyhow.

2) No address calculations can be performed on function addresses
other than comparison for equality and inequality, and explicit
casting to some function pointer type.
Copying, assignment, and initialisation by 0, however, are supported.
The type function_t* does not support calls.

    Rationale. Function addresses cannot sensibly be
    compared to see which is lower than the other on
    many architectures.

3) Address calculations are supported for data addresses if,
and only if, the pointers are in the same address segment,
and both addresses are valid, in which case <, <=, > and >=
are well defined and yield a total order.

    Rationale. This was intended to be a clarification of
    the ARM. Comparing addresses on segmented architectures
    for less can be defined, however the ARM does not require
    this.

    Comparing void* which are machine addresses of data
    which must be in a linear address range is explicitly
    made valid here. It covers those cases where a library
    function returns a void* which can be cast to char*
    and address calculations performed.

    Since an object occupies a contiguous sequence of bytes,
    comparisons of addresses within an object are well
    defined and we chose to require that such comparisons
    are in fact defined in the language.

    The rule restricting comparisons to valid addresses
    is intended to work as follows: when an allocation
    is performed, the addresses of all the memory bytes
    so allocated are marked 'allocated' and 'valid'
    and the address one past the end of the segment is
    also marked 'valid'.

    The purpose of this is to support the C requirement
    that the address one past the end of an array
    may be used in a comparison.

4) A comparison for a,b of data type T is well defined and has
the same value as the comparison for (void*)a, (void*)b if, and only if,
that comparison is well defined.

    Rationale. This provides an important connection between
    ordinary pointer comparisons and machine addresses.

5) The library functions malloc, realloc, operator new and
operator [] new return the address of the first byte
of a contiguous sequence of data addresses in the same
segment.

    Rationale. Specifying this is necessary to allow
    address calculations and object construction.

6) Addresses of bytes within the same complete object are in the same
memory segment. (Object is defined elsewhere).

    Rationale. This is a requirement comes from the ARM definition
    of object. It is provided here for completeness.

7) The address past the end of an array is said to be 'just' in
the same memory segment as the addresses of the array.

    Rationale. This feature is required for C compatibility.

8) No address comparisons, including equality and inequality,
may be performed on addresses which are not currently valid.

    Rationale. This is additional to the existing rules.
    I included this restriction because it is necessary for
    segmented architectures with hardware protection and/or
    address translation, such as the 80386 family.

9) The unit of addressability is the byte, which is big enough
to hold a char.

    Rationale. This is from the ARM/WP. The section on bit order
    is elided because it is circular and meaningless.

10) Every address may be considered to have the following flags
associated with it:

  flag                 synonymn for !flag
  -----------------------------------
  isValid              isInvalid
  isAllocated          isUnallocated
  isInitialised        isUninitialised
  isReadWrite          isReadOnly
  isReliable           isUnreliable

11) There are four types of access to memory:

  1) normal read
  2) normal write
  3) locked read
  4) locked write

12) No accesses to unallocated memory are allowed.
13) Read access to uninitialised memory is not allowed.
14) Write access to read only memory is not allowed.
15) Normal access to unreliable memory is not allowed.
16) All other accesses are legal and yield well defined results.

17) No operations may be performed on pointers to invalid
memory locations, including comparisons.

18) Allocated memory is always valid. In addition, the byte
past the last allocated byte is always valid.

19) The flags of memory bytes are set by various operations
such as allocation by malloc, execution of a constructor, etc.
(The exact operations must be specified somewhere in
the standard, possibly here)


20) Non-normative: The ReadWrite/ReadOnly attributes are associated with
the const cv-qualifier.

21) Non-normative: The Reliable/Unreliable attribute and the locked
memory access forms are associated with the volatile qualifier.

22) Allocated reliable memory has the property after which it is
named: memory. This is expressed by saying that

  a) contiguous read accesses to the same location yeild
     the same value if there is no intervening write access

  b) the value store by a write access is the one read by a read
     access

23) These properties only hold for reliable memory. Accesses to
unreliable memory yield implementation defined results.


    Rationale. This section explains precisely what constitues
    a violation of the abstract machine memory model.

    Thus, it provides a basis for one of the distinctions between safe
    and unsafe pointer casts, and explains why certain
    unsafe casts are permitted. For example, casting away const
    is permitted, since no violation exists until a write
    access is done to read-only memory.


24) Exact details of pointer and address calculations should be given,
    here or elsehwere.