
On Thu, July 21, 2005 14:54, John Maddock said:
What do you think? Could boost regex make usage of such traits_class or you would not like to include it into the distribution?
I don't know, it depends what it does: how do you plan to handle character classification in a portable manner for unsigned short? I plan to do it the same way Xerces-C does it. As I understand it they put 2 byte code into the short and do various operations with it. I have to investigate how exactly it is done.
There are too many developers involved in the process, that we force all to recompile Xerces-C with specific settings. I don't think this would be an option for us. In our case it can also lead to unpredictable results, if one replaces xerces-c with freshly compiled xerces-c without icu support. I am a little bit sceptical about this.
OK let me try one more time: if you compile regex *only* with ICU support, and use the iterator based u32regex_match/u32regex_search algorithms (or their equivalent regex iterators) then it doesn't matter what character type Xerces or anything else uses as long as:
It's an 8-bit type: then it'll be treated as an [unsigned] UTF-8 encoded string. Or: It's a 16-bit type, then it'll be treated as an [unsigned] UTF-16 encoded string. Or: It's a 32-bit type, then it'll be treated as an [unsigned] UTF-32 encoded string.
Is that generic enough for you? :-)
Yes, I will do some tests. If they will be ok, I will compile regex with ICU support. Otherwise I will write my own traits class for unsigned short characters. Thanks a lot for your help.
John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
With Kind Regards, Ovanes