Re: [Boost-users] Boost Regex: Use boost::uint32_t as charactertype.

14 Jul 2009

      ...
...
...
I just like to know, if you can use a std::vector<boost::uint32_t> as a
source to match regular expressions against it.
Yes but... not right out of the box, you would need to provide a traits
class so that regex_traits<uint32_t> knows how to interpret unint32_t's
as characters.
What precisely did you want to do?
Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a
means to parse xml.
If you don't mind depending upon ICU then the regex ICU wrappers will do 
that for you, *and* let you operate directly on the UTF-8 byte stream as 
well: 
http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/non....

However, ICU is a big library to depend upon :-(

A more lightweight alternative if you don't need true Unicode character 
classification and case-conversion, would be to implement a lightweight 
traits class for basic_regex that either "does nothing" or forwards to the 
same methods in regex_traits<char> etc, see: 
http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/con.... 
This is obviously more work, but reduces the code footprint, your call :-)

HTH, John.

Re: [Boost-users] Boost Regex: Use boost::uint32_t as charactertype.

John Maddock