Re: [boost] RFC: interest in Unicode codecs?

18 Jul 2009

      ...
...
I finally found some time to do some optimizations of my own and have
had some good progress using a small lookup table, a switch, and
slightly deducing branches.  See line 318:
http://svn.int64.org/viewvc/int64/snips/unicode.hpp?view=markup
Despite these efforts, Windows 7 still decodes UTF-8 three times
faster (~750MiB/s vs ~240MiB/s on my Core 2.  I assume they are either
using some gigantic look up tables or SSE.
How much cost are you incurring in the tests for whether the traits 
indicate that
Cory Nelson wrote:
the error returns are valid?

I'm wondering if theer is a case for requiring that these be compile 
time constants
in the Traits class rather than flags in a Traits value.

And why is 'last' passed in to decode_unsafe?

Is there any indication that duff's device will prevent aggressive 
inlining? I'm
assuming you need this method to be fully inlined into the outer loop, and
maybe its not happening - ideally you;d want some loop unrolling too.

I suspect that as noted the lack of special case for largely 7-bit ascii 
input
will tend to make it slow on mosts Western texts, though speedups for the
multi-character case will need care on alignment-sensitive hardware: you'll
need to fix that in the outermost loop.

Re: [boost] RFC: interest in Unicode codecs?

James Mansion