Boost Regex: Use boost::uint32_t as character type. - Boost-users - lists.preview.boost.org

newer
Build error

Boost Regex: Use boost::uint32_t as character type.

older
Re: [Boost-users] boost...

Etienne Philip Pretorius

13 Jul 2009 13 Jul '09

6:54 p.m.

Hello List, I just like to know, if you can use a std::vector<boost::uint32_t> as a source to match regular expressions against it. Etienne

Reply

Sign in to reply online Use email software

Show replies by date

John Maddock

14 Jul 14 Jul

9:09 a.m.

New subject: Boost Regex: Use boost::uint32_t as character type.

I just like to know, if you can use a std::vector<boost::uint32_t> as a source to match regular expressions against it.

Yes but... not right out of the box, you would need to provide a traits class so that regex_traits<uint32_t> knows how to interpret unint32_t's as characters. What precisely did you want to do? HTH, John.

Reply

Sign in to reply online Use email software

Etienne Philip Pretorius

9:35 a.m.

New subject: Boost Regex: Use boost::uint32_t as character type.

John Maddock wrote:

...
I just like to know, if you can use a std::vector<boost::uint32_t> as a source to match regular expressions against it.

Yes but... not right out of the box, you would need to provide a traits class so that regex_traits<uint32_t> knows how to interpret unint32_t's as characters.

What precisely did you want to do?

Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a means to parse xml. http://www.cs.sfu.ca/~cameron/REX.html (bit outdated)

HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Reply

Sign in to reply online Use email software

John Maddock

10:17 a.m.

New subject: Boost Regex: Use boost::uint32_t as charactertype.

...
...
I just like to know, if you can use a std::vector<boost::uint32_t> as a source to match regular expressions against it.

Yes but... not right out of the box, you would need to provide a traits class so that regex_traits<uint32_t> knows how to interpret unint32_t's as characters.

What precisely did you want to do?

Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a means to parse xml.

If you don't mind depending upon ICU then the regex ICU wrappers will do that for you, *and* let you operate directly on the UTF-8 byte stream as well: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/non.... However, ICU is a big library to depend upon :-( A more lightweight alternative if you don't need true Unicode character classification and case-conversion, would be to implement a lightweight traits class for basic_regex that either "does nothing" or forwards to the same methods in regex_traits<char> etc, see: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/con.... This is obviously more work, but reduces the code footprint, your call :-) HTH, John.

Reply

Sign in to reply online Use email software

Etienne Philip Pretorius

10:23 a.m.

New subject: Boost Regex: Use boost::uint32_t as charactertype.

John Maddock wrote:

...
...
...
I just like to know, if you can use a std::vector<boost::uint32_t> as a source to match regular expressions against it.

Yes but... not right out of the box, you would need to provide a traits class so that regex_traits<uint32_t> knows how to interpret unint32_t's as characters.

What precisely did you want to do?

Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a means to parse xml.

If you don't mind depending upon ICU then the regex ICU wrappers will do that for you, *and* let you operate directly on the UTF-8 byte stream as well: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/non....

However, ICU is a big library to depend upon :-(

Agreed. ICU is big.

A more lightweight alternative if you don't need true Unicode character classification and case-conversion, would be to implement a lightweight traits class for basic_regex that either "does nothing" or forwards to the same methods in regex_traits<char> etc, see: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/con.... This is obviously more work, but reduces the code footprint, your call :-)

Excellent. Just what I need. Thank you John.

HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Reply

Sign in to reply online Use email software

5868

Age (days ago)

5869

Last active (days ago)

Download

4 comments

2 participants

tags

participants (2)

Etienne Philip Pretorius
John Maddock