Doc No:      N1750=05-0010
Project:        Programming Language C++
Date:            2005-01-13
Author:         Beman Dawes
Email:            bdawes@acm.org

Critique of Code Conversion Proposal (N1683)

N1683=04-0123, Proposed Library Additions for Code Conversion, proposes sorely needed code conversion facilities for the standard library. (See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1683.html) Without these facilities programmers concerned with internationalization are forced to reinvent the wheel; Boost has run into that problem two or three times in existing libraries, and will be hitting the problem again in libraries currently in the Boost pipeline. The proposal should be accepted by the LWG as a useful and high priority need.

That being said, there are several concerns which indicate the proposal can be further refined and improved.

1. Hard-wired types in wstring_convert

The underlying wstring_convert design seems flexible enough to cope with conversion between any two character types which meet std::basic_string charT requirements. Conversion is actually performed by std::codecvt, which is already parameterized by both internalT and externalT charT types. It seems artificial to restrict byte_string to std::basic_string<char> and restrict wide_string to std::basic_strings which use the default traits and allocator. Other character types including the proposed char16_t and char32_t will need string conversions to and from other wide string types, yet with the current restrictions wstring_convert could not be used for that purpose.

Discussions on the Boost list focused on two possible generalizations:

  1. Specify the two string types as template parameters, so that any types may be used which meet the std::basic_string charT requirements.
  2. Specify the conversions as algorithms operating on iterators or iterator ranges, using non-member functions.

(2) is not proposed here because I believe it to be an over-generalization for functionality which has little use outside of strings.

Suggested change:

    template< class Codecvt, class Elem = wchar_t >

becomes::

    template< class Codecvt, class WideS = std::wstring, class NarrowS = std::string >

and change types within the class accordingly. See the modified synopsis below.

2. Need target-argument form for wstring_convert conversion functions

wstring_convert's conversion functions are in the form:

    byte_string to_bytes(const wide_string& wstr) const;

While this form is often useful and should be retained, it may imply an extra copy of the result if a compiler is not smart enough to optimize the copy away.

Suggested change is to add additional functions in the form:

    void to_bytes(const wide_string& wstr, byte_string & target) const;

3. Need way to access error strings

Need to add member functions to access the two error string prefixes. See the modified synopsis below.

4. More explicit name for wstring_convert

"wstring" might be misleading, depending on the actual types involved. "convert" is a verb, yet nouns make better class names.

Suggested change:

    wstring_convert

to:

    string_converter

5. Improved member names

The proposal uses the name "byte" to identify the narrow case in member names. That will be misleading if the actual type is something other than char. However, it isn't clear what a better set of member names would be. The modified synopsis below uses "wide" and "narrow" in names, even though they may also be misleading in the case where the sizes are the same. Perhaps a better set of names will surface as the proposal moves forward.

Modified synopsis

This modified synopsis applies all of the suggestions above to make their impact easier to visualize.

template<class Codecvt,
    class WideS = wstring,
    class NarrowS = string>
class string_converter
{
    typedef NarrowS narrow_string_type;
    typedef typename NarrowS::value_type narrow_char_type;
    typedef WideS wide_string_type;
    typedef typename WideS::value_type wide_char_type;
    typedef typename Codecvt::state_type state_type;
    typedef typename wide_string_type::traits_type::state_type int_type;

    string_converter();
    string_converter(const narrow_string_type& narrow_err);
    string_converter(const narrow_string_type& narrow_err,
        const wide_string_type& wide_err);

    wide_string_type from_narrow(narrow_char_type value) const;
    wide_string_type from_narrow(const narrow_char_type *ptr) const;
    wide_string_type from_narrow(const narrow_string_type& str) const;
    wide_string_type from_narrow(const narrow_char_type *first,
        const narrow_char_type *last) const;

    void from_narrow(narrow_char_type value, wide_string_type & target) const;
    void from_narrow(const narrow_char_type *ptr, wide_string_type & target) const;
    void from_narrow(const narrow_string_type& str, wide_string_type & target) const;
    void from_narrow(const narrow_char_type *first,
        const narrow_char_type *last, wide_string_type & target) const;

    narrow_string_type to_narrow(wide_char_type wchar) const;
    narrow_string_type to_narrow(const wide_char_type *wptr) const;
    narrow_string_type to_narrow(const wide_string_type& wstr) const;
    narrow_string_type to_narrow(const wide_char_type *first,
        const wide_char_type *last) const;

    void to_narrow(wide_char_type wchar, narrow_string_type & target) const;
    void to_narrow(const wide_char_type *wptr, narrow_string_type & target) const;
    void to_narrow(const wide_string_type& wstr, narrow_string_type & target) const;
    void to_narrow(const wide_char_type *first,
        const wide_char_type *last, narrow_string_type & target) const;

    const narrow_string_type & narrow_error() const;
    const wide_string_type & wide_error() const;

    // exposition only
private:
    narrow_string_type narrow_err_string;
    wide_string_type wide_err_string;
};

Acknowledgements

This critique is based on discussions with Thorsten Ottosen, Stefan Slapeta, Rob Stewart, and Jonathan Turkanis.


Revised: 13 January 2005