If you're prepared to depend upon ICU,
WHat's ICU == I see you?? :)
IBM's Unicode libraries: http://www-306.ibm.com/software/globalization/icu/index.jsp
then the current cvs has
(optional) support for 16 and 32-bit Unicode character types, the traits
it's like utf-16, but I replace all the chars above 0xFFFF with '?', so it's utf-16 that doesn't have 4-byte chars.
class design is also rather simplified and better documented, so that would be the best bet if you wanted to define your own minimalist traits
I don't really understand well what's character_traits etc (and how to create them myself), I only wanted that my regex16 would do the same job for chars 0-0x00FF as boost::regex does for 0-0xff, and the rest of the chars (>=0x0100) would be considered non-words (\W) and so that I could only use \xXXXX-\xXXXX notation for their ranges& patterns...
Unfortunately you still have to write yourself a traits class to do that, a simple wrapper that forwards calls onto c_regex_traits<char> where appropriate would do it. Unfortunately the traits class design is going to change in the next release, which is why I'm nudging you towards the current cvs state, rather than the last release. John.