I just like to know, if you can use a std::vectorboost::uint32_t as a source to match regular expressions against it.
Yes but... not right out of the box, you would need to provide a traits class so that regex_traits
knows how to interpret unint32_t's as characters. What precisely did you want to do?
Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a means to parse xml.
If you don't mind depending upon ICU then the regex ICU wrappers will do that for you, *and* let you operate directly on the UTF-8 byte stream as well: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/non.... However, ICU is a big library to depend upon :-( A more lightweight alternative if you don't need true Unicode character classification and case-conversion, would be to implement a lightweight traits class for basic_regex that either "does nothing" or forwards to the same methods in regex_traits<char> etc, see: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/con.... This is obviously more work, but reduces the code footprint, your call :-) HTH, John.