something about UTF8
data:image/s3,"s3://crabby-images/15c65/15c6562572be87648ff2aa51d01f963dff5d793c" alt=""
hi guys, I want to use boost::regex in Windows XP to match Japanese kanji. The encoding of kanji is UTF-8 I want to make sure after I use the funcation: MultibyteToWideChar to change the UTF-8 Kanji string->wstring, I can directly use boost::wregex(from wstring) to match Japanese? Appreciate any help. Worldwind
data:image/s3,"s3://crabby-images/f9ecd/f9ecdac30e0c31950c61129fa787ee2661a42e9e" alt=""
On Wed, Dec 17, 2008 at 1:54 AM, wind world
hi guys, I want to use boost::regex in Windows XP to match Japanese kanji. The encoding of kanji is UTF-8 I want to make sure after I use the funcation: MultibyteToWideChar to change the UTF-8 Kanji string->wstring, I can directly use boost::wregex(from wstring) to match Japanese?
Not an expert in this, but if you compiled Boost.Regex with ICU, it should have full support for such languages, whereas wide-chars may not. May need someone else to come around and confirm...
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
wind world wrote:
hi guys, I want to use boost::regex in Windows XP to match Japanese kanji. The encoding of kanji is UTF-8 I want to make sure after I use the funcation: MultibyteToWideChar to change the UTF-8 Kanji string->wstring, I can directly use boost::wregex(from wstring) to match Japanese?
You would need to check the Windows API docs to make sure you're using the API correctly (does it work with UTF-8 as source? No idea on that), but yes, once you have the text encoded as UTF-16 then wregex will behave as you expect. Otherwise you could build regex with ICU support and then match UTF-8 directly: the downside is that you then have a dependency to ICU which is not a small library. HTH, John.
data:image/s3,"s3://crabby-images/3cf19/3cf19e7ea13ddb8eaa665bfb1926d7db04837541" alt=""
Working with wstrings with the regex lib should work without problems,
except you cannot rely on unicode specific character classes. Just make sure
you convert correctly between UTF-8 and wide-char strings.
Rune
On Thu, Dec 18, 2008 at 10:39 AM, John Maddock
wind world wrote:
hi guys,
I want to use boost::regex in Windows XP to match Japanese kanji. The encoding of kanji is UTF-8 I want to make sure after I use the funcation: MultibyteToWideChar to change the UTF-8 Kanji string->wstring, I can directly use boost::wregex(from wstring) to match Japanese?
You would need to check the Windows API docs to make sure you're using the API correctly (does it work with UTF-8 as source? No idea on that), but yes, once you have the text encoded as UTF-16 then wregex will behave as you expect.
Otherwise you could build regex with ICU support and then match UTF-8 directly: the downside is that you then have a dependency to ICU which is not a small library.
HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (4)
-
John Maddock
-
OvermindDL1
-
Rune Lund Olesen
-
wind world