Re: [boost] [string] Realistic API proposal

On 2011-01-28 20:12, Joe Mucchiello<jmucchiello@yahoo.com> wrote:
// conversion for Windows API std::vector<wchar_t> vec; vec.resize(count_codepoints<utf8>(mystring.begin(), mystring.end())); convert<utf8,utf16>(mystring.begin(), mystring.end(), vec.begin());
I spy with my little eye a potential crash waiting to happen. Code-points != Code-units. vec has room for N code-units, but 2*N code-units may be written to it if mystring contains non-BMP characters. "Corrected" code: std::vector<wchar_t> vec; vec.resize(count_codeunits<wchar_encoding>(mystring.begin(), mystring.end())); convert<wchar_encoding>(mystring.begin(), mystring.end(), vec.begin()); I think a lot of these potential crashes could be prevented if the iterator of the new string-type (chain,text,tier,yarn) would only expose (const) code-points. Actual code-units would be hidden, and only accessed using a facade/adapter view/iterator. auto u8v = make_view<utf8_encoding>(mystring); auto u16v = make_view<utf16_encoding>(mystring); for (auto codepoint : mystring) {...} for (auto u8codeunit : u8v) {...} for (auto u16codeunit : u16v) {...} I also think there isn't a reason that the new string-type *has* to be UTF-8 internally. It could be UTF-16, UTF-32, SCSU, or CESU-8 internally for that matter. Making a view from the internal encoding to an external encoding when both encodings are the same should be a no-op. Regards, Anders Dalvander -- WWFSMD?
participants (1)
-
Anders Dalvander