Re: [Boost-users] regex with multi-byte characters

21 Jul 2005

      ...
What do you think? Could boost regex make usage of such traits_class or 
you would not like to
include it into the distribution?
I don't know, it depends what it does: how do you plan to handle character 
classification in a portable manner for unsigned short?
...
There are too many developers involved in the process, that we force all 
to recompile Xerces-C
with specific settings. I don't think this would be an option for us. In 
our case it can also lead
to unpredictable results, if one replaces xerces-c with freshly compiled 
xerces-c without icu
support. I am a little bit sceptical about this.
OK let me try one more time: if you compile regex *only* with ICU support, 
and use the iterator based u32regex_match/u32regex_search algorithms (or 
their equivalent regex iterators) then it doesn't matter what character type 
Xerces or anything else uses as long as:

It's an 8-bit type: then it'll be treated as an [unsigned] UTF-8 encoded 
string.
Or: It's a 16-bit type, then it'll be treated as an [unsigned] UTF-16 
encoded string.
Or: It's a 32-bit type, then it'll be treated as an [unsigned] UTF-32 
encoded string.

Is that generic enough for you? :-)

John.

Re: [Boost-users] regex with multi-byte characters

John Maddock