
Péter Szilágyi wrote:
Actually, it is what *you* put into it. Compiler decides what the size
of wchar_t should be. As long as your code points fit into that size, you will be fine. For example you can store UTF-16 characters in 4-byte wchar_t.
Well that's true, but wouldn't that be a waste? The other problem is that in UTF-32 every single "character" is an actual separate entity. In UTF-16 and UTF-8 espcially, entities are made up of multiple "characters", so you would need to "decode" them to their 32bit representation in order to use them correctly. (Actually, doing it this way would lead to quite a flexible lib... only the reader and the writer must be aware of the conversions and internally a wstring will suffice...)
I think you are missing the point. It's not an argument for any particular encoding. Rather, the point is that there is no pre-defined mapping between Unicode (or other) encoding and any C++ character type.
(I'd dare to say that those who propose to re-implement everything inside boost
either suffer the NotInventedHere syndrome, don't have a good understanding of what XML is, or grossly underestimate the required work, not only to implement it, but also to make it reasonably efficient.) I'd second that. One middle-ground option would be to include a small XML parser
How much functionality do you mean by "small XML parser"?
That's a good question. Also, it would still be a parser only, as opposed to any in-memory representation (tree ?) with assorted APIs. Such a parser may be sufficient if all you have in mind is an XMLReader-like API, but it surely isn't if what you want is a DOM, with XPath-based lookup, incremental validation, etc., etc. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...