[boost.locale] Question about boundary rules

29 Mar 2012

      Have just been exploring boost.locale which I hadn't used before. However I'm
not quite understanding some of the behaviour for boundary rules when
segmenting text.

Fortunately I can see this behaviour happening in the example code so its
probably my misunderstanding and easily corrected.

If I compile
http://www.boost.org/doc/libs/1_49_0/libs/locale/doc/html/boundary_8cpp-exam...

and run it I get :-

[...skipped to avoid long quote ]
Part [Linux2.6] has number(s)
Part [ ] has no word characters
Part [and] has letter(s)
Part [ ] has no word characters
Part [Windows7] has number(s) letter(s)
Part [ ] has no word characters
[...]

However I don't understand why "Linux2.6" is detected as having number(s)
but no letters whilst "Windows7" is detected as having both. It doesn't
appear to be the decimal point "Linux26" has the same behaviour (whilst
"Linux2" is detected as having both).

I haven't debugged this just glanced at the code (which seems to be setting
these flags based on all the icu ruleBasedBreakIterator getRuleStatusVec()). 

Thought I would just ask whether I'm misunderstanding something fundemental
here before trying to understand what is going on here and where (if there
is one) the problem is

TIA

Alex Perry

ps Just in case this is a known platform / version issue I was running this
on :-
Windows7
MSVC 10
boost 1.49
icu 4.9.1

--
View this message in context: http://boost.2283326.n4.nabble.com/boost-locale-Question-about-boundary-rule...
Sent from the Boost - Users mailing list archive at Nabble.com.

alex_perry

tags

participants (1)