JTC1/SC22/WG21 N0716

 
	The One Definition Rule
           Jerry Schwarz
	   X3J16/95-0116
	    WG21/N0716
 
------------   Introduction -------------------
 
 
This note concerns issues relating to section 3.2[basic.def.odr], "One
Definition Rule".  The main issue that needs to be resolved is what
constraints are placed on a program when a single entity (class,
function, ....) is defined in more than one translation unit.
 
In this note I will use "the ODR" as shorthand for issues relating to
such entities.  It might be better to call this the MDR (multiple
definition rule) but ODR seems to be established usage for these
issues.  I will refer to a requirement that there be exactly one
definition of something in a program as the UDR (the unique definition
rule).
 
---------------  The intent of the ODR  ---------
 
To me, understanding the ODR has always started from the idea that
ideally there would be only a single definition of entities that are
subject to it but since the WP relies heavily on "translation units"
to support separate compilation we cannot impose such a constraint.
So, the intent is that the multiple definitions are somehow derived
from a single source (include file, macro, program generator,
programmer's head ....).  The effect of the ODR is to make it possible
for the system to determine from any of the available definitions what
the "true" definition is and to behave as if the entity really is
defined in only one place.
 
Under this interpretation our task in refining the words in the in the
WP is to determine what constraints can reasonably be imposed that
allow an implementation to behave properly.
 
This approach side steps the question of linkage except for the
entities subject to the ODR.  For example in an enum declaration
 
	enum Color { RED, GREEN, BLUE } ;
 
If the linkage rules establish that Color is the same type
in two different translation units, then the ODR itself
establishes that RED, GREEN and BLUE are the same enumerators
in the two translation units.  Similarly given
 
	class X {
	    enum Color { RED, GREEN, BLUE } ;
	};
 
The ODR applies to X, so the system must behave as if there were
only one definition of X::Color, X::RED, etc.  Thus linkage questions
do not arise for these.
 
Also, consider
 
	class X {
	    int f() {
		static int local = 0 ;
		return ++local ;
	    }
	};
 
Assuming this definition satisfies the ODR, the system must behave as
if there is only a single definition and therefore as if there is only
a single local. The committee reached that conclusion recently.  Under
my approach it becomes a consequence of the ODR.
 
The working paper currently gives external linkage to some entities
that under this proposal would not need linkage and I propose to
give them no linkage.
 
Proposal 1:  Add to the working paper
 
	"The program behaves as if there were a single definition
         of entities subject to the ODR."
 
Proposal 1A: Make the following changes to existing words
 
    3.5, basic.link, paragraph 5:
   	In addition, a member of class scope has external linkage if the name
	of the class has external linkage.
 
	change "a member" to "a function member"	
	
    3.5, basic.link, paragraph 8:
	'Two names that are the same and that are declared in different
	scopes shall denote the same object, function, type, enumerator,
	or template if ...
 
	delete enumerator
 
    3.5, basic.link, paragraph 8:
        -- both names refer to member of the same namespace or to members
        of the same class;
 
	change "members" to "function members"
 
    9.4, class.mfct, paragraph 6
        A static local variable in a member function always refers to the
	same obeject, whether or not the member function is inline.
 
	This sentence is now redundant. I propose making it a footnote
	of the sentence in proposal 1.
 
 
	
-------------  What entities are subject to the ODR? -------------
 
The current WP lists class types and enum types.  This list is
obviously incomplete and some form of the ODR (and not the UDR) needs
to be applied to several other entities.
 
Templates are discussed in more detail below.
 
Proposal 2: Make the following subject to the ODR.
 
	class types
 
	enum types
 
	member functions (when declared as inline, but out of class)
 
	non-member inline functions explicitly declared extern.
 
	class templates
 
	function templates
 
	class template instances. A point of instantiation of a class
	template serves as a definition of the instance.
 
	instances of function templates. [ Inline template functions
        present another question: are we treating them as if they
	are automatically "extern"? ]
	
--------------- What constraints should be applied -----------------
 
There have been a variety of constraints suggested in the past.  These
vary along a dimension that I call strictness vs. leniency.  For example
consider:
 
	// file 1
	class X {   	
	   int a ;
	} ;
 
	// file 2
	class X {
	  private: int a ;
	} ;
 
 
A lenient rule would allow these (as being semantically identical),
while a strict rule would not.
 
I have always been an advocate of the strictest possible rule.
Partially this is for convenience -- strict rules seem easier to state
than lenient ones -- but also it seems to follow from the general
intent of the ODR as I laid it out above.  Looking at these
definitions of X, there is no way to tell if the "true definition"
contains the redundant "private:"
 
So as one component of the ODR I propose that we require the two
definitions to be the same sequence of tokens.  I will call this
condition "token identity".
 
But token identity is insufficient because two names which are the
same token might refer to different entities in the different
translation units, so I propose the further constraint that
corresponding tokens that refer to entities from outside the
definition refer to the same entities. I call this "name identity".
 
An entity is "from outside a definition" if it is not "inside the
definition".  "inside of E" is defined as follows
 
    (A) members of a class (or class template) are inside the class
    (B) enumerators are inside their type
    (C) local variables and types of a function (or function template)
	are inside that	function.
    (D) Transitive closure. That is, if A is inside B and B is inside C
        then A is inside C.
 
Here are some examples of the implications of name identity.
 
	extern int g ;
	static int s = .... ;
	extern inline void h() { .... }
	inline void hh() { .... } ;
	class X { ... } ;
	class Y {
	    int f1() { return g ; } // ok
	    int f2() { return s ; } // not ok
	    void f3() { h() ; } // ok
	    void f4() { hh() ; }  // not ok
	    X x ;  // ok
	} ;
 
We do not need to look into the bodies of the definitions of h or X to
determine that references to them are ok. They have global linkage and
are themselves subject to the ODR.
 
In the past there have been proposals that f4 should be allowed
providing the definitions of hh in the different translation units
were semantically equivalent.  But defining semantically equivalent
becomes a quagmire.  I would be reluctant to propose an ODR that made
it impossible to use inline functions in in-class definitions, but
luckily our previous acceptance of "extern inline" means we can avoid
that dilemma.  We can allow the use of explicitly extern inlines which
are themselves subject to the ODR. This proposal "breaks" lots of
existing code, but most of the code is easily repaired (by adding
"extern" to the inline declaration) and (assuming proposal 5 is
accepted) the breakage is such that a diagnostic is not required (and
would probably not be given by most systems.)
 
But "name identity" isn't quite enough.  Sometimes entities can be
referred to without being explicitly mentioned.  In particular we have
overloaded operators, operators new and delete, constructors, implicit
conversions operators, and default arguments. [Have I left anything
out?] So we need to impose "implicit operation identity as well".
Actually, I haven't been able to find any examples where constructors
or implicit conversion functions matter. That is, all the examples I
have found that would fail this constraint are already ill-formed on
some other grounds.  However, I see no harm in mentioning them in this
constraint and we would be covered if someone comes up with a clever
example later.
 
Default arguments are the most delicate because they need to be
subject to a full ODR themselves in order to get the right result.
Consider, for example
 
	static int n ;
	extern void f(int = n ) ;
	extern void g(int = f() ) ;
	class X {
	    void h() { g() ; }
	} ;
 
The definition of X ought to be considered a violation of the ODR.
I'm not sure how to say this without getting into the "semantic
equivalence" quamire that I have been trying to avoid.
 
In order to be effective, token identity must be applied after
pre-processing.  It might also be applied before pre-processing and
that would be an even stricter rule.  I do not propose this because
there seem to be some definitional issues (macros can expand to more
than one declaration or span the end of one declaration and the
beginning of another) and the value of the increased strictness seems
minor.
 
There is one example that the constraint of name identity breaks
which I think should be allowed
 
	// const int example
	const int size = 99 ;
	class X {
	    int a[size] ; // used in a constant expression
	    int f() { return size ; } // value used.
	} ;
 
This is common, and I think we have to allow it.  So I would make an
exception to name identity for static (i.e. internal linkage) const
integral or enumeration objects that are initialized with constant
expressions and whose value (and not address) is used.  In such cases
the two names must have the same type and value.
 
This exception might be widened to allow for more types, e.g.
 
	// const pointer to function example
	void somefunction() ;
	const void (*ptf)() = somefunction ;
	class X {
	    void f() { (*ptf)() ; }
	} ;
 
Or it might be tightened to allow the const's to appear only in
constant expressions (which would allow the declaration of a in the
example, but not the definition of f).
 
Some people (perhaps John Skaller) believe that this approach should
be broadened even further.  For example, allow static const variables
of any type (provided their initializer satisfy certain constraints).
On balance, I think that the gains from allowing such constructs are
marginal and not worth the extra complexity that would be required
in the ODR.
 
Proposal 3:
 
	If an entity subject to the ODR (those listed in proposal 2)
	is defined in more than one translation units then the two
	definitions shall satisfy the following constraints.
 
	(a) At phase 5 of the translation process the definitions
	    shall consist of the identical sequence of tokens.
 
	(b) Except as noted below, all names in these definitions that
	    are looked up (see 3.4) shall find declarations for the
	    same entity or for an entity defined within the
	    definition.
 
	    (b1) If a name refers to a const object with internal
	         or no linkage in both definitions, and that object
		 has the same integral or enumeration type in both
		 definitions, and the object is initialized with a
		 constant expression and the value (but not the
		 address) of the object is used, then the object need
		 not be the same object in both definitions.
 
	(c) All overloaded operators, implicitly applied conversion
	    operators or constructors, and operator new functions or
	    operator deletes called implicity, shall refer to
	    identical entities or to entities declared within the
	    definitions.
 
        (d) Any default arguments used by function calls (implicit or
	    explicit) in the definition are treated as if the
	    token sequence used in declaring them were present in the
	    the definition.  That is, they are subject to (a), (b),
	    and (c).  The lookup mentioned in (c) is that done when
	    the default argument occurs and not at the use.
 
	    This rule applies recursively to any function calls
	    in a default argument.
 
-----------------------  Templates --------------------------------
 
Template definitions themselves should be subject to the full set of
constraints as proposed above.  This is not sufficient because
name identity and operation identity would apply only to the names
that are actually resolved in the initial look up.
 
For templates there is another lookup at the point of instantiation
and we need to apply the identity constraints again.
 
Proposal 4:
 
	Template definitions shall satisfy 3a.
 
	Template definitions shall satisfy 3b, 3c, 3d,
	as it applies to names from the template's enclosing
	scope. (see 14.2.2)
 
	If identical instances of a template occur in multiple files
	then name and operation identity (2b,2c, 2d) shall apply to
	dependent names. (See 14.2.3).
 
-----------------------------  Diagnosable Errors ------------------
 
While it is possible to imagine a system that diagnosed failures of
the ODR, most current systems do not, and I don't think we want to
impose the implementation burden on vendors.
 
More delicate is whether we should make ill-formed some usages that
would cause violation of the ODR if the definition does in fact occur
in more than one file.  For example
 
	const int a = 99 ;
	struct X {
	    int* f() { return &a ; }
	} ;
 
If the definition of X appears in more than one translation unit then
it will violate the ODR.  Should we therefore make the definition
ill-formed?  I'm undecided.
 
Proposal 5:
 
	Violations of the ODR do not require diagnostics.
 
 
----------------------   What Uses Require Definitions ----------------
 
This is already discussed in section 3.2, so I am considering it in
this paper.
 
It is well understood that some declarations and expressions require a
definition and others do not.
 
	class X ; // not defined
	X* p ; // ok
	X x ; // not ok
 
It is important that this be clear in the WP.  This should also serve
to determine at what point template classes are instantiated.  That is
the point of instantiation of a template instance is (related to) the
first place in a translation unit where that instance would need to be
defined.
 
Although the words in the WP currently attempt to inidicate places
where a definition is not needed, I think it is easier to say where
definitions are needed.  I have been guided here by what I believe is
common usage and ease of implementation.  There aren't any "right or
wrong" for this.  In what follows X is the class type or a
cv-qualified variant of it.
 
A) Definition of an object of type X or array of X
 
	   X x ;
 
   The following do not require a definition of X
 
	   X& xr = .... ;  // a reference
	   extern X ex ;   // not a definition of ex
           extern X& ;     //
	   X fx() ;        // types may be embedded in function types
	   X* px ;         // pointers are definitely ok		
	
 
B) Declaration of a non-static member of a class
 
	    class Y {
		X x ;
	    }
	
    Static members are covered by (A).
 
    It might be implementable to require the definition of X only if
    the definition of Y is ever needed, but I think current practice
    is to require it at the point of definition.  And doing so avoids
    further questions about delayed definitions and circularities.
 
C) Use as a base class
 
	    class Y : public X { } ;
 
 
D) rvalues of type X or lvalues of type X that are converted to rvalues.
 
   This is already in the WP.  (3.9[basic.lval], 4.1[conv.lval]).  I'm
   glad we don't have to figure all this out. but it might require
   some tinkering anyway
 
   Firstly (D) doesn't deal with intermediates
 
	class X;
 	class Y;
	class Z;
	struct X {
	    X(Y) ;
	};
	struct Z {
	    operator Y() ;
	};
	Z fz(int) ;
	X x (fz(0)) ; // Y has to be defined although there is no
		      // expression of type Y.  There is only the
		      // intermediate value.
 
	Also it is unclear to me how it deal with
 
	    X& xref() ;
 	    X  xval() ;
	    xref() ; // presumably ok
	    xval() ; // cannot be ok (caller allocates space on some systems)
 
So proposal 6 must be regarded as tentative untill some more details
are determined.
 
Proposal 6:
 
	If X is a class the following require definition of X.
 
	a) Definition of an object with type (cv-qualified)  X.
	b) Declaration of a non-static member with type (cv-qualified) X.
	c) Use of X as a base class
	d) An rvalue of type (cv-qualified) X. [This probably requires
	   more work]
 
	A program is ill formed if any translation unit there is no
   	definition of X before one of the above.
 
	If X is a class template instance than any of the above use
	require that the template have been defined (and not just
	declared) at the point of the use.
 
---------------  extern "C" -------------------------------------
 
Somebody (Mike Anderson?) raised this on the reflector recently.  We
want to say there can be only one extern "C" in a program with a given
name.  Right now the only constraint  prohibits multiple such
functions in "an overload set".  This doesn't seem to cover either
different translation units or different namespaces.  So I propose to
add the restriction.
 
For systems that use name mangling the implication is that the
namespace is not used in the mangle of an extern "C" function.
That implementation would make this a diagnosable error ("at
linktime") the way that multiple definitions of a single function are.
 
What about declarations?  Should we allow
 
	extern "C" int f() ;
	extern "C" int f(int) ;
 
in separate compilation units?  It doesn't seem to matter too much
because if the definition is given externally (i.e. really in C code)
then we never have a complete C++ program to call ill-formed.
 
Proposal 7 :
 
	A program is ill-formed if it contains definitions for more
        than one function with language linkage "C" with the same name
        (even if these are distinct functions).