
On Fri, Jan 21, 2011 at 1:07 PM, Dean Michael Berris <mikhailberis@gmail.com> wrote:
Mostly I'm interested in seeing a string class that is:
1. Immutable. No if's or but's about it. I don't want a string to be modifiable. Period. You can create it, and once it's created, that's it.
2. Has real value semantics. This means, once you've copied it, that's really copied. No funky copy-on-write reference-counting mumbo-jumbo.
I also prefer nothing too fancy. But most of these things are implementation details, let us get the interface right first and focus on the optimizations afterwards.
3. Has all the algorithms that apply to it defined externally.
[snip/]
Encoding is a matter of external interpretation and I think should not be part of a string's interface. You can have wrappers that interpret a string as a UTF-* string.
auto it = encoded<utf8_encoding>(original_string), end = encoded<utf8_encoding>(); is perfectly generic and well-designed for some use-cases the first reaction of
I am all for a generalized-*string* class in the pedantic interpretation of the word i.e. a sequence of chars, char16_ts, bytes, octets, words, dwords, etc. without any enforced encoding for use-cases that call for it, but again, the reason why I participate in this whole discussion is because I think that C++ deserves also a class focused on the "everyday", *nice* and *convenient* handling of text, without having to worry about how do I need to "view" that raw-chunk-of-binary-data in this call to an OS API function and how do I have to "view" it in that other library call, explicitly specifying to which encoding I want to convert it using *ugly* :-) tag types, etc. (as much as this is possible). Another important concern for me is portability. I'd like (being very self-centered :-P) for example the following: boost::string s = "Mat" + code_point(0x00FA/*u with acute*/) + code_point(0x0161/*s with caron*/); std::cout << s << std::endl; (everywhere where the terminal can handle it) to print: Matúš // hope your email client can handle that :) instead of: Mat$#@!% or completely upsetting the terminal. Also, while I see that for example this the-average-joe-programmer-inside-me's when seeing it was, *yuck*. Sorry :-) Sometimes it is more important for the code and people writing/maintaining it to be nice and easy to understand than to be really-really-generic and smart. That said, it *is* perfectly valid if someone uses the generic version above. Let's do both. The reason why I want to call it (std::)string is that many not-so-pedantic people would react to the question "What is your first thought when you hear 'string type'?" with "Some kind of type for handling text, eh?" and not with "Some kind of generalized sequence of elements without any intrinsic encoding having the following properties...". But if there is so much resistance to calling it that then I vote for (boost|std)::text (however this sounds a little awkward to me, I don't know why). Let us keep the basic_string<CharT> as that generalized string (I never suggested to dump it, just that std::string would be an another type and not defined as typedef std::basic_string<char>). Regarding #1 above and the following ...
x = "Hello,"; x = x ^ " World!";
... would you be against, if the interface in addition also included a few convenience/backward compatibility member functions like ... string& append(const string& s) { *this = *this ^ s; return *this; } string& prepend(const string& s) { *this = s ^ *this; return *this; } ... etc? For the same reasons as above: clarity, simplicity (it may not be obvious what a fancy operator expression does, it is more obvious when using names like append, prepend, ...) and people are used to that programming style. BR, Matus