N1957 (WG21) 06-0027 (J16)
PROPOSED LIBRARY ADDITIONS FOR CODE CONVERSION

P.J. Plauger
Dinkumware, Ltd.
pjp@dinkumware.com

2006-02-21

Dinkumware has been marketing several character code conversion aids for a number of years, as part of a package of supplemental features we call CoreX. They have now been merged into our latest comprehensive product, the Dinkum Compleat Library. Based on the success of that package, we now feel confident in proposing two template classes from it for inclusion in a future standard C++ library.

We submitted these template classes earlier, as N1683, and met with several criticisms. But after review, we decided to resubmit the proposal without change and better defend the design decisions in committee. The descriptions that follow are taken primarily from our documentation.


Template class wstring_convert performs conversions between a wide string and a byte string. It lets you specify a code conversion facet (like template class codecvt) to perform the conversions, without affecting any streams or locales. Say, for example, you have a code conversion facet called codecvt_utf8 that you want to use to output to cout a UTF-8 multibyte sequence corresponding to a wide string, but you don't want to alter the locale for cout. You can write something like:

    wstring_convert<codecvt_utf8<wchar_t>>
        myconv();
    std::string mbstring = myconv.to_bytes(L"Hello\n");
    cout << mbstring;

Note that the Standard C++ library currently uses code conversion facets only within template class basic_filebuf, for converting from multibyte sequences when reading from a file and for converting to multibyte sequences when writing to a file. Something like template class wstring_convert is needed to perform similar conversions between string objects, without involving file I/O.

wstring_convert

namespace Dinkum {
    namespace codecvt {
template<class Codecvt,
    class Elem = wchar_t>
    class wstring_convert
    {
    typedef std::basic_string<char> byte_string;
    typedef std::basic_string<Elem> wide_string;
    typedef typename Codecvt::state_type state_type;
    typedef typename wide_string::traits_type::int_type int_type;

    wstring_convert();
    wstring_convert(const byte_string& byte_err);
    wstring_convert(const byte_string& byte_err,
        const wide_string& wide_err);

    wide_string from_bytes(char byte) const;
    wide_string from_bytes(const char *ptr) const;
    wide_string from_bytes(const byte_string& str) const;
    wide_string from_bytes(const char *first, const char *last) const;

    byte_string to_bytes(Elem wchar) const;
    byte_string to_bytes(const _Elem *wptr) const;
    byte_string to_bytes(const wide_string& wstr) const;
    byte_string to_bytes(const Elem *first, const Elem *last) const;

    // exposition only
private:
    byte_string byte_err_string;
    wide_string wide_err_string;
    };
    }  // namespace codecvt
}  // namespace Dinkum

The template class describes an object that controls conversions between wide string objects of class std::basic_string<Elem> and byte string objects of class std::basic_string<char> (also known as std::string). The template class defines the types wide_string and byte_string as synonyms for these two types. Conversion between a sequence of Elem values (stored in a wide_string object) and multibyte sequences (stored in a byte_string object) is performed by an object of class Codecvt<Elem, char, std::mbstate_t>, which meets the requirements of the standard code-conversion facet std::codecvt<Elem, char, std::mbstate_t>.

An object of this template class stores a wide-error string, called wide_err_string here for the sake of exposition. It also stores a byte-error string, called byte_err_string here for the sake of exposition.

wstring_convert::byte_string

typedef std::basic_string<char> byte_string;

The type is a synonym for std::basic_string<char>.

wstring_convert::from_bytes

wide_string from_bytes(char byte) const;
wide_string from_bytes(const char *ptr) const;
wide_string from_bytes(const byte_string& str) const;
wide_string from_bytes(const char *first, const char *last) const;

The first member function converts the single-element sequence byte to a wide string. The second member function converts the nul-terminated sequence beginning at ptr to a wide string. The third member function converts the sequence stored in str to a wide string. The fourth member function converts the sequence defined by the range [first, last) to a wide string.

In all cases:

wstring_convert::int_type

typedef typename wide_string::traits_type::int_type int_type;

The type is a synonym for wide_string::traits_type::int_type.

wstring_convert::state_type

typedef typename Codecvt::state_type state_type;

The type is a synonym for Codecvt::state_type.

wstring_convert::to_bytes

byte_string to_bytes(Elem wchar) const;
byte_string to_bytes(const _Elem *wptr) const;
byte_string to_bytes(const wide_string& wstr) const;
byte_string to_bytes(const Elem *first, const Elem *last) const;

The first member function converts the single-element sequence wchar to a byte string. The second member function converts the nul-terminated sequence beginning at wptr to a byte string. The third member function converts the sequence stored in wstr to a byte string. The fourth member function converts the sequence defined by the range [first, last) to a byte string.

In all cases:

wstring_convert::wide_string

typedef std::basic_string<Elem> wide_string;

The type is a synonym for std::basic_string<Elem>.

wstring_convert::wstring_convert

wstring_convert();
wstring_convert(const byte_string& byte_err);
wstring_convert(const byte_string& byte_err,
    const wide_string& wide_err);

The first constructor constructs a conversion object with no stored wide-error string or byte-error string. The second constructor stores a copy of byte_err in the stored byte-error string. The second constructor stores a copy of byte_err in the stored byte-error string and stores a copy of wide_err in the stored wide-error string.


Template class wbuffer_convert looks like a wide stream buffer, but performs all its I/O through an underlying byte stream buffer that you specify when you construct it. Like template class wstring_convert, it lets you specify a code conversion facet to perform the conversions, without affecting any streams or locales. The previous example can also be written as:

    wbuffer_convert<codecvt_utf8<wchar_t> >
        mybuf(cout.rdbuf());  // construct wide stream buffer object
    std::wofstream mystr(mybuf); // construct wide ostream object
    cout << L"Hello";

Something like template class wstring_convert is needed to perform code conversions when writing to streams other than files.

wbuffer_convert

namespace Dinkum {
    namespace codecvt {
template<class Codecvt,
    class Elem = wchar_t,
    class Tr = std::char_traits<Elem> >
    class wbuffer_convert
        : public std::basic_streambuf<Elem, Tr>
    {
public:
    wbuffer_convert(std::streambuf *bytebuf = 0);
    std::streambuf *rdbuf();
    std::streambuf *rdbuf(std::streambuf *bytebuf);

    // exposition only
private:
    std::streambuf *bufptr;
    };
    }  // namespace codecvt
}  // namespace Dinkum

The template class describes a stream buffer that controls the transmission of elements of type Elem, whose character traits are described by the class Tr, to and from a byte stream buffer of type std::streambuf. Conversion between a sequence of Elem values and multibyte sequences is performed by an object of class Codecvt<Elem, char, std::mbstate_t>, which meets the requirements of the standard code-conversion facet std::codecvt<Elem, char, std::mbstate_t>.

An object of this template class stores a pointer to its underlying byte stream buffer, called bufptr here for the sake of exposition.

wbuffer_convert::wbuffer_convert

wbuffer_convert(std::streambuf *bytebuf = 0);

The constructor constructs a stream buffer object and initializes its stored byte stream buffer pointer to bytebuf.

wbuffer_convert::rdbuf

std::streambuf *rdbuf();
std::streambuf *rdbuf(std::streambuf *bytebuf);

The first member function returns the stored byte stream buffer pointer. The second member function stores bytebuf in the stored byte stream buffer pointer.


Copyright © 2002-2006 by Dinkumware, Ltd. All rights reserved.