WG15 Defect Report Ref: 9945-2-09
Topic: LC_CTYPE


This is an approved interpretation of 9945-2:1993.

.

Last update: 1997-05-20


								9945-2-9
	Class: No change



 _____________________________________________________________________________


	Topic:			LC_CTYPE
	Relevant Sections:	E.3.5.3


Defect Report:
-----------------------
 
          In Section 3.5.3 - Variables, the standard states that: 
 
               This  variable  [LC_CTYPE]  shall  determine   the 
               interpretation of sequences of bytes of text  data 
               as  characters  (e.g.  single-  versus   multibyte 
               characters),  which  characters  are  defined   as 
               letters  (character  class  alpha)  and   <blank>s 
               (character class  blank),  and  the  behaviour  of 
               character   classes   within   pattern   matching. 
               Changing the value of LC_CTYPE after the shell has 
               started shall not affect the lexical processing of 
               shell commands  in  the  current  shell  execution 
               environment or its subshells (see 3.12). 
 
          [Draft 12 of ISO/IEC 9945-2:1993 (July 1992), p. 128, lines 
          268-276] 
 
          The standard also states  that  the  LANG  variable  ``shall 
          provide a default value for the LC_* variables,as  described 
          in 2.6'' [Ibid., p. 128,  line  261]  and  that  the  LC_ALL 
          variable ``shall interact with the LANG and  LC_*  variables 
          as described in 2.6.''  [Ibid., p. 128, line 264] 
 
          In  Section  2.6  -  Environment  Variables,  the   standard 
          summarizes the meanings of these variables: 
 
                    LANG   This  variable  shall  determine   the 
                           locale category for any  category  not 
                           specifically selected via  a  variable 
                           starting with LC_.  LANG and  the  LC_ 
                           variables can be used by  applications 
                           to determine the language for messages 
                           and instructions, collating sequences, 
                           date   formats,    etc.     Additional 
                           semantics of this  variable,  if  any, 
                           are implementation defined. 
 
                    LC_ALL This variable shall override the value 
                           of the LANG variable and the value  of 
                           any of the  other  variables  starting 
                           with LC_. 
 
                    [...] 
 
                    LC_CTYPE This variable  shall  determine  the 
                           locale category for character handling 
                           functions.  This environment  variable 
                           shall determine the interpretation  of 
                           sequences of bytes  of  text  data  as 
                           characters   (e.g.   single-    versus 
                           multibyte       characters),       the 
                           classification  of  characters   (e.g. 
                           alpha,   digit,   graph),   and    the 
                           behaviour   of   character    classes. 
                           Additional semantics of this variable, 
                           if any, are implementation defined. 
 
          [Ibid., pp. 76-77, lines 2635-2658] 
 
          Does changing LC_ALL (or LANG if LC_CTYPE is not set) affect 
          the lexical processing of  shell  commands  in  the  current 
          shell execution environment?  Is the intent of the  standard 
          that any changes to environment variables that cause  a  new 
          LC_TYPE to be used shall be ignored by the shell once it has 
          started execution? 
 
          An implementation of sh must use  the  locale  specified  in 
          LC_CTYPE   when   reading   a    script.     For    example, 
          isalpha/isalnum is used to parse variable names. 
 
          Consider this simple command: 
 
               FO<O-umlaut>=BAR cmd 
 
          If isalnum('<O-umlaut>'), then this will parse as a variable 
          assignment, otherwise it is argument 0.  Similarly, cmd will 
          be subject to alias expansion in the former case.  There  is 
          no need to validate variable names at other times.  In  such 
          an implementation, changing LC_CTYPE causes no problems. 
 
          What are the problems with the following commands: 
 
          LANG=locale-with-O-umlaut 
          FO<O-umlaut>=BAR 
 
          Then consider this sequence of commands: 
 
               [ -n "$FO<O-umlaut>" ] && alias echo=: 
               echo foo 
 
          In both cases the parsing of the second line  is  determined 
          by  the  execution   of   the   first   line.    Traditional 
          implementations execute  the  first  line,  then  parse  and 
          execute the second line.  What would a compiler do? 
 
          On the other hand, if they where embedded in  {...}  or  any 
          other shell compound command,  they  would  both  be  parsed 
          before being executed.  So we have two cases where behaviour 
          is poorly defined or context dependent. 
 
          I suggest the behaviour of  setting  the  LC_CTYPE  be  made 
          undefined.  Changing LANG  in  an  interactive  shell  is  a 
          reasonable  thing  to  do,   and   an   implementation   may 
          immediately change all locales with no problems.  Having all 
          but one locale change, and just in the shell, is unintuitive 
          and not required. 
 

WG15 response for 9945-2:1993 
-----------------------------------
The standard clearly states that changes to LC_CTYPE shall
not take effect within the current shell execution
environment.  This is discussed in the rationale in Section
E.3.5.3.

Rationale for Interpretation:
-----------------------------
None.

 _____________________________________________________________________________