[boost.regex]Does boost:u32regex recognize the unicode named blocks like "\p{IsBasicLatin}"?
Hi,all Nowadays I am using boost:u32regex to do some regular expression processing.But it seems that "/p{IsBasicLatin}"is not a accessable expression by boost::make_u32regex(tmp).Does boost:regex not suppor the named unicode blocks or I have to pass some other flags to the library? Now I was using the defult flag wich indicate using perl syntactic. thanks®ards Juan
gj_uestc wrote:
Hi,all Nowadays I am using boost:u32regex to do some regular expression processing.But it seems that "/p{IsBasicLatin}"is not a accessable expression by boost::make_u32regex(tmp).Does boost:regex not suppor the named unicode blocks or I have to pass some other flags to the library? Now I was using the defult flag wich indicate using perl syntactic.
The named properties/character classes supported are here: http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/... As you can see I haven't added support for language-specific blocks yet :-( John.
John Maddock wrote:
gj_uestc wrote:
Hi,all Nowadays I am using boost:u32regex to do some regular expression processing.But it seems that "/p{IsBasicLatin}"is not a accessable expression by boost::make_u32regex(tmp).Does boost:regex not suppor the named unicode blocks or I have to pass some other flags to the library? Now I was using the defult flag wich indicate using perl syntactic.
The named properties/character classes supported are here: http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/...
As you can see I haven't added support for language-specific blocks yet :-(
I forgot to mention that \p{IsBasicLatin} is the same as: [\x0-\x7f], likewise the other continuous blocks can be expressed in the same way. HTH, John.
Thanks John,
I have used "[\x0-\x7f]" instead of "/p{IsBasicLatin}" to construct the regular expression (expression=boost::make_u32regex("[\\x0-\\x7f]" )). The regular expression has been constructed correctly but it cannot accecpt instance string either "a" or " " (boost:u32match("a",expression)==false).I am wondering whether it has something to do with unicode? I have tried expression=boost::regex("[\\x0-\\x7f]" )); then I can pass the string "a" but not string " "(boost:match("a",expression)==true), which I think is reaonable for boost:regex since it does not support the unicode. So my point is: why the boost:u32match doesn't work well?
Thanks®ards
Juan
在2008-08-08 16:48:38,"John Maddock"
John Maddock wrote:
gj_uestc wrote:
Hi,all Nowadays I am using boost:u32regex to do some regular expression processing.But it seems that "/p{IsBasicLatin}"is not a accessable expression by boost::make_u32regex(tmp).Does boost:regex not suppor the named unicode blocks or I have to pass some other flags to the library? Now I was using the defult flag wich indicate using perl syntactic.
The named properties/character classes supported are here: http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/...
As you can see I haven't added support for language-specific blocks yet :-(
I forgot to mention that \p{IsBasicLatin} is the same as: [\x0-\x7f], likewise the other continuous blocks can be expressed in the same way.
HTH, John.
Juan, I have no problems with this sample code
#include "stdafx.h"
void test(wstring str)
{
u32regex expression = make_u32regex("^([\\x0-\\x7f]+)$");
wsmatch what;
if(u32regex_match(str, what, expression))
{
// what[0] contains the whole string
// what[1] contains ascii text
wcout << what[1] << _T(" is ascii") << endl;
} else {
wcout << str << _T(" is not ascii") << endl;
}
}
int _tmain(int argc, _TCHAR* argv[])
{
test(_T("Hello World!"));
test(_T("ôöò"));
return 0;
}
Hello World! is ascii
¶÷‗ is not ascii
_____
Da: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] Per conto di gj_uestc
Inviato: lunedì 11 agosto 2008 5.35
A: boost-users@lists.boost.org; john@johnmaddock.co.uk
Oggetto: Re: [Boost-users] [boost.regex]Does boost:u32regexrecognize theunicode named blocks like "\p{IsBasicLatin}"?
Thanks John,
I have used "[\x0-\x7f]" instead of "/p{IsBasicLatin}" to construct the regular expression (expression=boost::make_u32regex("[\\x0-\\x7f]" )). The regular expression has been constructed correctly but it cannot accecpt instance string either "a" or " " (boost:u32match("a",expression)==false).I am wondering whether it has something to do with unicode? I have tried expression=boost::regex("[\\x0-\\x7f]" )); then I can pass the string "a" but not string " "(boost:match("a",expression)==true), which I think is reaonable for boost:regex since it does not support the unicode. So my point is: why the boost:u32match doesn't work well?
Thanks®ards
Juan
在2008-08-08 16:48:38,"John Maddock"
John Maddock wrote:
gj_uestc wrote:
Hi,all Nowadays I am using boost:u32regex to do some regular expression processing.But it seems that "/p{IsBasicLatin}"is not a accessable expression by boost::make_u32regex(tmp).Does boost:regex not suppor the named unicode blocks or I have to pass some other flags to the library? Now I was using the defult flag wich indicate using perl syntactic.
The named properties/character classes supported are here: http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/...
As you can see I haven't added support for language-specific blocks yet :-(
I forgot to mention that \p{IsBasicLatin} is the same as: [\x0-\x7f], likewise the other continuous blocks can be expressed in the same way.
HTH, John.
_____ e100办理业务,抽取心动大奖,惊喜连连, http://popme.163.com/link/004669_0806_172.html 赶快行动!
Ok, it works for me now. Thanks a lot!
在2008-08-11 11:53:40,"Andrea Denzler"
gj_uestc wrote:
Hi,all Nowadays I am using boost:u32regex to do some regular expression processing.But it seems that "/p{IsBasicLatin}"is not a accessable expression by boost::make_u32regex(tmp).Does boost:regex not suppor the named unicode blocks or I have to pass some other flags to the library? Now I was using the defult flag wich indicate using perl syntactic. The named properties/character classes supported are here: http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/boost_regex/syntax/... As you can see I haven't added support for language-specific blocks yet :-( I forgot to mention that \p{IsBasicLatin} is the same as: [\x0-\x7f],
John Maddock wrote: likewise the other continuous blocks can be expressed in the same way. HTH, John.
e100办理业务,抽取心动大奖,惊喜连连,赶快行动!
participants (3)
-
Andrea Denzler
-
gj_uestc
-
John Maddock