Boost Regex: Use boost::uint32_t as character type.
Hello List, I just like to know, if you can use a std::vectorboost::uint32_t as a source to match regular expressions against it. Etienne
I just like to know, if you can use a std::vectorboost::uint32_t as a source to match regular expressions against it.
Yes but... not right out of the box, you would need to provide a traits
class so that regex_traits
John Maddock wrote:
I just like to know, if you can use a std::vectorboost::uint32_t as a source to match regular expressions against it.
Yes but... not right out of the box, you would need to provide a traits class so that regex_traits
knows how to interpret unint32_t's as characters. What precisely did you want to do?
Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a means to parse xml. http://www.cs.sfu.ca/~cameron/REX.html (bit outdated)
HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
I just like to know, if you can use a std::vectorboost::uint32_t as a source to match regular expressions against it.
Yes but... not right out of the box, you would need to provide a traits class so that regex_traits
knows how to interpret unint32_t's as characters. What precisely did you want to do?
Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a means to parse xml.
If you don't mind depending upon ICU then the regex ICU wrappers will do that for you, *and* let you operate directly on the UTF-8 byte stream as well: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/non.... However, ICU is a big library to depend upon :-( A more lightweight alternative if you don't need true Unicode character classification and case-conversion, would be to implement a lightweight traits class for basic_regex that either "does nothing" or forwards to the same methods in regex_traits<char> etc, see: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/con.... This is obviously more work, but reduces the code footprint, your call :-) HTH, John.
John Maddock wrote:
I just like to know, if you can use a std::vectorboost::uint32_t as a source to match regular expressions against it.
Yes but... not right out of the box, you would need to provide a traits class so that regex_traits
knows how to interpret unint32_t's as characters. What precisely did you want to do?
Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a means to parse xml.
If you don't mind depending upon ICU then the regex ICU wrappers will do that for you, *and* let you operate directly on the UTF-8 byte stream as well: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/non....
However, ICU is a big library to depend upon :-(
Agreed. ICU is big.
A more lightweight alternative if you don't need true Unicode character classification and case-conversion, would be to implement a lightweight traits class for basic_regex that either "does nothing" or forwards to the same methods in regex_traits<char> etc, see: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/con.... This is obviously more work, but reduces the code footprint, your call :-)
Excellent. Just what I need. Thank you John.
HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (2)
-
Etienne Philip Pretorius
-
John Maddock