
IMO, Unicode support is way beyond string template parameter. Unicode means different character sets to support, different encoding format, different encoding schemes sets and different tradeoffs in optimization and all above.
, and &), a name start character, a name character or "other", the
Sort of. For XML processing, the primary feature of Unicode is the extended character set. For XML 1.0, once an XML processor has decided whether or not a given character is whitespace, one of the special characters (such as <, peculiarities of Unicode are mostly irrelevant. Obviously, there has to be code to handle the detection of the input encoding, and conversion to a stream of Unicode codepoints, in order to facilitate such classification. However, beyond that, the details don't matter.
I think it's more then just that. Scenario 1: I prefer parse documents that use only first plane, use UCS2 as encoding format and UTF8, UTF16 as Encoding scheme. IOW I will always use wchar_t and wstring. Scenario 2: I prefer parse documents that use only ASCII chars, use 8bit as encoding format and 7bit as encoding scheme. IOW prefer to use char as std::string and I do not want to know about any transcoding, wide chars e.t.c. Scenario 3: I prefer parse documents that use whole Unicode set, use UTF16 as encoding format and UTF8, UTF16 as Encoding scheme and I want parser to be lazy, IOW if it is big(huge) XML document that uses UTF8, I do not want parser to convert any CDATA immediately into native encoding form, until requested, but only do some local char by char conversion required for markup detection. (Essentially I want to limit memory usage and unnecessary work) Scenario 4: I prefer parse documents that use whole Unicode set, use UCS4 as encoding format and support a wide variety (10 or more) different encoding schemes. I do not care about performance and memory usage that much - but prefer single parser that does it all. I could list a lot of different usage schemes with different tradeoffs. Eventually it bound to affect XML parser interface in regards to Unicode support (instead of Unicode I would prefer to use term Charsets and Encoding scheme sets - Unicode is just one particular charset/encoding scheme sets combination) Gennadiy