ISO/ IEC JTC1/SC22/WG14 N825

                    Document Number:  WG14 N825/X3J11 98-024


   WG14/N825            C9X Public Comment            WG14/N825
                        ==================


Sponsoring National Body: J11                 Date: 98/05/15
Author: Tom MacDonald
Author Affiliation: Silicon Graphics Inc.
Postal Address: 655F Lone Oak Drive, Eagan, MN 55409 USA
E-mail Address: tam@cray.com
Telephone Number: +1 612 6835818
Fax Number: +1 612 6835307
Number of individual comments: 2


        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        %%                                   %%
        %% Problems With Undefined Behavior  %%
        %%                                   %%
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


As I understand it, the intents of "undefined behavior" in the current
Draft are:

   - let a programmer know something is not portable
   - often an outright error
   - no diagnostic required
   - if implementation elects to issue a diagnostic, it has to be
     a warning and not a fatal error (i.e., program is translated
     into something)

Seems like there are some conflicting statements in the C9X Draft:

       3.18  Undefined behavior

       [#1] Behavior,  upon  use  of  a  nonportable  or  erroneous
       program  construct, of erroneous data, or of indeterminately
       valued  objects,  for  which  this  International   Standard
       imposes  no  requirements.   Permissible  undefined behavior
       ranges  from  ignoring   the   situation   completely   with
       unpredictable  results,  to  behaving  during translation or
       program execution in a documented manner  characteristic  of
       the   environment   (with  or  without  the  issuance  of  a
       diagnostic  message),  to  terminating  a   translation   or
       execution (with the issuance of a diagnostic message).

The paragraph above indicates the implementation can terminate the
translation process if undefined behavior is detected.

Paragraph 3 in 3.18 contains contradictory statements:

       [#3] The implementation must successfully translate a  given
       program  unless  a syntax error is detected, a constraint is
       violated, or it can determine that every possible  execution
       of that program would result in undefined behavior.

Another problem with paragraph 3 above is that there are 8 phases
of translation.  Translation Phase 7 says:

       ... The resulting tokens are syntactically and semantically
       analyzed and translated as a translation unit.

Paragraph 3 above indicates the implementation must successfully translate
the entire program.  Typically the translator only translates through
phase 7, and phase 8 creates the program image using the output of
the translator:

         8.  All  external  object  and  function  references   are
             resolved.   Library  components  are linked to satisfy
             external  references  to  functions  and  objects  not
             defined   in   the   current  translation.   All  such
             translator output is collected into  a  program  image
             which contains information needed for execution in its
             execution environment.

So, here's a scenario:

       The following include file cannot be found by the translator

            #include <x\y>

        and "6.1.7  Header names" says this is undefined behavior.
        At this point the implementation is allowed to behave in
        an unpredictable way producing unpredictable results.
        Seems like one of those unpredictable results is producing
        the following output:

            command not found

What does it mean to say "that every possible execution results in
undefined behavior" for such a case?  It's not obvious.


What should we do?  *Warning* radical suggestion ahead!!!

Let's delete paragraph 3 above.  I'm not sure it accomplishes whatever,
we as a committee wanted it to accomplish.  It also changes one of the
original motivations for undefined behavior.

Originally, one of the intents of undefined behavior was to allow an
implementation to extend C in a particular way, but not force other
vendors to extend in the same way.  We always said that, that vendor can
just reject that program if it's undefined behavior.  Now the vendor must
successfully translate the program (assuming we fix existing wording
problems).  The problem now is that a vendor cannot issue a fatal error at
translation time if undefined behavior is found.  Granted they can issue a
warning, but it's easy to miss a warning when a recompilation of a large
application occurs.

The current wording places a burden on the implementors.  When customer X
complains that Vendor A successfully compiled a program containing an
obvious error, the vendor is forced to explain this decision.  Customer
support costs are expensive and vendors try to minimize them.  Paragraph 3
appears, from the vendor point of view, to be an attempt to significantly
increase the customer support costs.

Remember, you cannot fail to translate just because the following occur:

          - An unmatched ' or " character is encountered  on  a  logical
            source line during tokenization (6.1).

          - A reserved keyword token is used in translation phase 7 or 8
            for some purpose other than as a keyword (6.1.1).

          - The reserved token  complex  or  imaginary  is  used  before
            <complex.h> is included (6.1.1).

          - The first character of an identifier is a digit (6.1.2).

          - The same identifier has both internal and  external  linkage
            in the same translation unit (6.1.2.2).

           - A  block  containing  a  variably  modified  object  having
            automatic storage duration is entered by a jump to a labeled
            statement (6.1.2.4).

          - The whole-number and fraction parts of a  floating  constant
            are both omitted (6.1.3.1).

          - For a  function  call  without  a  function  prototype,  the
            function  is  defined  without a function prototype, and the
            types of the arguments after promotion  are  not  compatible
            with those of the parameters after promotion (6.3.2.2).

          - A pointer is converted to other than an integer  or  pointer
            type (6.3.4).

          - An expression is shifted by  a  negative  number  or  by  an
            amount  greater  than  or equal to the width of the promoted
            expression (6.3.7).

          - An expression that is required to  be  an  integer  constant
            expression  does  not  have  an integer type, contains casts
            (outside  operands   to   sizeof   operators)   other   than
            conversions  of  arithmetic  types  to integer types, or has
            operands  that  are  not  integer   constants,   enumeration
            constants,    character   constants,   fixed-length   sizeof
            expressions, or immediately-cast floating constants (6.4).

          - A constant expression in an initializer does not evaluate to
            one  of the following:  an arithmetic constant expression, a
            null pointer constant, an address constant,  or  an  address
            constant  for  an  object  type  plus  or  minus  an integer
            constant expression (6.4).

          - An arithmetic constant expression does not  have  arithmetic
            type,  contains casts (outside operands to sizeof operators)
            other than conversions of  arithmetic  types  to  arithmetic
            types,  or  has  operands  that  are  not integer constants,
            floating   constants,   enumeration   constants,   character
            constants, or sizeof expressions (6.4).

          - An address constant is created neither explicitly using  the
            unary  &  operator  or  an  integer constant cast to pointer
            type, nor implicitly by the use of an expression of array or
            function type (6.4).

          - An identifier for an object is declared with no linkage  and
            the  type  of the object is incomplete after its declarator,
            or after its init-declarator if it has an initializer (6.5).

          - A function is declared  at  block  scope  with  an  explicit
            storage-class specifier other than extern (6.5.1).

          - A structure or union  is  defined  as  containing  no  named
            members (6.5.2.1).

          - A bit-field is declared with a type other than  a  qualified
            or  unqualified  version  of  signed  int  or  unsigned  int
            (6.5.2.1).

          - A tag is declared with the bracketed list twice  within  the
            same scope (6.5.2.3).

          - etcetera ...

Many customers are not going to understand why the vendors successfully
translated their application when some obvious error occurred.  Vendors
will be forced to provide non-standard ways of getting fatal errors for
obvious mistakes.

Paragraph 3 seems to be doing a disservice to both vendors and users.