Re: [boost] GSoC Unicode library: second preview

22 Jun 2009

      Mathias Gaunard wrote:
...
On another note, while I do think IF_LIKELY for UTF-16 is a good idea, 
doesn't that heavily penalize certain scripts, such as asian ones, in 
the case of UTF-8?
Not really:

- In many cases, documents that use a exotic script actually contain 
large numbers of ASCII characters; consider an HTML page, for example, 
which will be full of HTML punctuation and tags.  (I believe that I 
became aware of this after reading something written by a Mozilla 
person who had been investigating Unicode issues.)

- The penalty of a wrong branch hint is not "heavy".  We probably have 
lots of places in our code where the compiler heuristic is wrong, but 
we don't notice until we study it very carefully (as I did with this 
UTF8 code).  This is why processors still need to implement dynamic 
branch prediction.

My normal policy for using compiler branch hints like IF_LIKELY is to 
compile once with profile-driven optimisation, and then to find the 
places where it made a significant difference and add branch hints.  I 
then get close to the profile-driven-optimised performance without 
needing to actually re-do the profiling.

Regards,  Phil.

Re: [boost] GSoC Unicode library: second preview

Phil Endecott