Doc No: N1750=05-0010
Project: Programming Language C++
Author: Beman Dawes
N1683=04-0123, Proposed Library Additions for Code Conversion, proposes sorely needed code conversion facilities for the standard library. (See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1683.html) Without these facilities programmers concerned with internationalization are forced to reinvent the wheel; Boost has run into that problem two or three times in existing libraries, and will be hitting the problem again in libraries currently in the Boost pipeline. The proposal should be accepted by the LWG as a useful and high priority need.
That being said, there are several concerns which indicate the proposal can be further refined and improved.
The underlying wstring_convert design seems flexible enough to cope with conversion between any two character types which meet std::basic_string charT requirements. Conversion is actually performed by std::codecvt, which is already parameterized by both internalT and externalT charT types. It seems artificial to restrict byte_string to std::basic_string<char> and restrict wide_string to std::basic_strings which use the default traits and allocator. Other character types including the proposed char16_t and char32_t will need string conversions to and from other wide string types, yet with the current restrictions wstring_convert could not be used for that purpose.
Discussions on the Boost list focused on two possible generalizations:
(2) is not proposed here because I believe it to be an over-generalization for functionality which has little use outside of strings.
template< class Codecvt, class Elem = wchar_t >
template< class Codecvt, class WideS =
std::wstring, class NarrowS = std::string >
and change types within the class accordingly. See the modified synopsis below.
wstring_convert's conversion functions are in the form:
byte_string to_bytes(const wide_string& wstr) const;
While this form is often useful and should be retained, it may imply an extra copy of the result if a compiler is not smart enough to optimize the copy away.
Suggested change is to add additional functions in the form:
void to_bytes(const wide_string& wstr, byte_string & target) const;
Need to add member functions to access the two error string prefixes. See the modified synopsis below.
"wstring" might be misleading, depending on the actual types involved. "convert" is a verb, yet nouns make better class names.
The proposal uses the name "byte" to identify the narrow case in member names. That will be misleading if the actual type is something other than char. However, it isn't clear what a better set of member names would be. The modified synopsis below uses "wide" and "narrow" in names, even though they may also be misleading in the case where the sizes are the same. Perhaps a better set of names will surface as the proposal moves forward.
This modified synopsis applies all of the suggestions above to make their impact easier to visualize.
class WideS = wstring,
class NarrowS = string>
typedef NarrowS narrow_string_type;
typedef typename NarrowS::value_type narrow_char_type;
typedef WideS wide_string_type;
typedef typename WideS::value_type wide_char_type;
typedef typename Codecvt::state_type state_type;
typedef typename wide_string_type::traits_type::state_type int_type;
string_converter(const narrow_string_type& narrow_err);
string_converter(const narrow_string_type& narrow_err,
const wide_string_type& wide_err);
wide_string_type from_narrow(narrow_char_type value) const;
wide_string_type from_narrow(const narrow_char_type *ptr) const;
wide_string_type from_narrow(const narrow_string_type& str) const;
wide_string_type from_narrow(const narrow_char_type *first,
const narrow_char_type *last) const;
void from_narrow(narrow_char_type value, wide_string_type & target) const;
void from_narrow(const narrow_char_type *ptr, wide_string_type & target) const;
void from_narrow(const narrow_string_type& str, wide_string_type & target) const;
void from_narrow(const narrow_char_type *first,
const narrow_char_type *last, wide_string_type & target) const;
narrow_string_type to_narrow(wide_char_type wchar) const;
narrow_string_type to_narrow(const wide_char_type *wptr) const;
narrow_string_type to_narrow(const wide_string_type& wstr) const;
narrow_string_type to_narrow(const wide_char_type *first,
const wide_char_type *last) const;
void to_narrow(wide_char_type wchar, narrow_string_type & target) const;
void to_narrow(const wide_char_type *wptr, narrow_string_type & target) const;
void to_narrow(const wide_string_type& wstr, narrow_string_type & target) const;
void to_narrow(const wide_char_type *first,
const wide_char_type *last, narrow_string_type & target) const;
const narrow_string_type & narrow_error() const;
const wide_string_type & wide_error() const;
// exposition only
This critique is based on discussions with Thorsten Ottosen, Stefan Slapeta, Rob Stewart, and Jonathan Turkanis.
Revised: 13 January 2005