Re: [boost] [string] Realistic API proposal

// UTF validation
bool is_valid_utf() const;
See, that's what makes the whole thing pointless.
Actually not, consider:
socket.read(my_string); if(!my_string.is_valid_utf()) ....
This, and many of these functions, work much better as standalone functions: // with a string-aware function socket.read(my_string); if (!is_valid_utf8(my_string.begin(),my_string.end())) .... // with a C interface std::vector<char> vec; vec.resize(MY_BUFSIZE); int len = recv(socket, vec.data(), MY_BUFSIZE, 0); if (len >= 0) { vec.resize(len); if (!is_valid_utf8(vec.begin(), vec.end())) .... } // conversion for Windows API std::vector<wchar_t> vec; vec.resize(count_codepoints<utf8>(mystring.begin(), mystring.end())); convert<utf8,utf16>(mystring.begin(), mystring.end(), vec.begin()); HRESULT hr = WriteFile(handle, vec.data(), vec.size(), &dwBytesWritten, NULL); ... Joe

// UTF validation
bool is_valid_utf() const;
See, that's what makes the whole thing pointless.
Actually not, consider:
socket.read(my_string); if(!my_string.is_valid_utf()) ....
This, and many of these functions, work much better as standalone functions:
To be honest I agree, if we just can keep std::string. Adding these functions would allow giving more UTF orientation to string - the ultimate goal we want.
// with a string-aware function socket.read(my_string); if (!is_valid_utf8(my_string.begin(),my_string.end())) ....
// with a C interface std::vector<char> vec; vec.resize(MY_BUFSIZE); int len = recv(socket, vec.data(), MY_BUFSIZE, 0); if (len >= 0) { vec.resize(len); if (!is_valid_utf8(vec.begin(), vec.end())) .... }
// conversion for Windows API std::vector<wchar_t> vec; vec.resize(count_codepoints<utf8>(mystring.begin(), mystring.end())); convert<utf8,utf16>(mystring.begin(), mystring.end(), vec.begin()); HRESULT hr = WriteFile(handle, vec.data(), vec.size(), &dwBytesWritten, NULL); ...
Notice vector is not string and it requires additional copy Artyom

AMDG On 1/28/2011 11:04 PM, Artyom wrote:
// with a string-aware function socket.read(my_string); if (!is_valid_utf8(my_string.begin(),my_string.end())) ....
// with a C interface std::vector<char> vec; vec.resize(MY_BUFSIZE); int len = recv(socket, vec.data(), MY_BUFSIZE, 0); if (len>= 0) { vec.resize(len); if (!is_valid_utf8(vec.begin(), vec.end())) .... }
// conversion for Windows API std::vector<wchar_t> vec; vec.resize(count_codepoints<utf8>(mystring.begin(), mystring.end())); convert<utf8,utf16>(mystring.begin(), mystring.end(), vec.begin()); HRESULT hr = WriteFile(handle, vec.data(), vec.size(),&dwBytesWritten, NULL); ...
Notice vector is not string and it requires additional copy
An additional copy compared to what? I don't see anything particularly wrong with using vector here. In Christ, Steven Watanabe

On 29/01/2011 05:12, Joe Mucchiello wrote:
// UTF validation
bool is_valid_utf() const;
See, that's what makes the whole thing pointless.
Actually not, consider:
socket.read(my_string); if(!my_string.is_valid_utf()) ....
This, and many of these functions, work much better as standalone functions:
// with a string-aware function socket.read(my_string); if (!is_valid_utf8(my_string.begin(),my_string.end())) ....
or if(!is_valid_utf8(my_string))

On 01/29/2011 10:56 AM, Mathias Gaunard wrote:
On 29/01/2011 05:12, Joe Mucchiello wrote:
... elision by patrick ... This, and many of these functions, work much better as standalone functions:
// with a string-aware function socket.read(my_string); if (!is_valid_utf8(my_string.begin(),my_string.end())) ....
or if(!is_valid_utf8(my_string))
I still like the idea of a utf-8_string that enforces correct encoding, i.e. it won't let you make a change to the string that would make the above external function is_valid_utf8 return false. Then of course you wouldn't need that function. And please don't object by saying it would be better to have a different type of string without the internal checks. I get what you're saying. I want both at different times. But I do want this. Always knowing that the string represented valid utf-8 would remove the burden of checking that everywhere that wanted a valid utf-8 encoded string. Patrick

On Sat, 29 Jan 2011 23:20:07 -0800 Patrick Horgan <phorgan1@gmail.com> wrote:
[...]
I still like the idea of a utf-8_string that enforces correct encoding, i.e. it won't let you make a change to the string that would make the above external function is_valid_utf8 return false. [...]
Don't worry, I'm still working on that. :-) I've already modified the classes so that they're all guaranteed to preserve valid encoding at all times, and added as many std::string functions to each of them as is feasible (utf32_t handles all of them, the others only a subset). I've nearly finished integrating the policy-based stuff, so you can tell it what action to take on errors in the input data. The only major thing left is designing a way to properly convert data that *isn't* in UTF format, and from studying Artyom's Boost.Locale, I have a few ideas on how to do that. It should be ready for public critique by the end of next week, at the latest. It won't have a true character iterator at that point, but as I envision it, that can be added on separately. I'm not sure about the proxy class design that I've had to use for utf32_t's operator[] and at() functions, in order to check any modifications for validity as they're made, but I'm sure someone here will set me straight if it's not right. -- Chad Nelson Oak Circle Software, Inc. * * *
participants (6)
-
Artyom
-
Chad Nelson
-
Joe Mucchiello
-
Mathias Gaunard
-
Patrick Horgan
-
Steven Watanabe