RE: [Boost-users] find japanese character with boost regex++
data:image/s3,"s3://crabby-images/32cd1/32cd19442ccf0cb8ec33f8d94474fd1611c8b1de" alt=""
John Maddock
It might be best to add a facility to add new character classes as a list of characters and ranges to include, something like:
register_character_class("myname", "d-f");
Then we add all the Unicode block ranges as standard for wide character regexes.
Aside from the unified Han characters (kanji/hanzi), characters of the same category generally aren't neatly grouped together in Unicode. The 128-character blocks tend to correspond to locales, communities of use or specific legacy encodings and not to categories. You need a look-up table (or more efficiently two levels of table) to check character categories.
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
Aside from the unified Han characters (kanji/hanzi), characters of the same category generally aren't neatly grouped together in Unicode. The 128-character blocks tend to correspond to locales, communities of use or specific legacy encodings and not to categories. You need a look-up table (or more efficiently two levels of table) to check character categories.
I was talking about the Unicode block ranges (defined in blocks.txt from the Unicode ftp site), and you are correct that a two stage table is required for general categories. John
participants (2)
-
Ben Hutchings
-
John Maddock