
Cory Nelson wrote:
I finally found some time to do some optimizations of my own and have had some good progress using a small lookup table, a switch, and slightly deducing branches. See line 318:
http://svn.int64.org/viewvc/int64/snips/unicode.hpp?view=markup
Despite these efforts, Windows 7 still decodes UTF-8 three times faster (~750MiB/s vs ~240MiB/s on my Core 2. I assume they are either using some gigantic look up tables or SSE.
Hi Cory, What is your test input? When the input is largely ASCII, a worthwhile optimisation is to cast groups of 4 (or 8) characters to ints and & with 0x80808080; if the answer is zero, no further conversion is needed. In general I'm unsure of the performance issues of lookup tables compared to explicit bit-manipulation. Cache effects may be significant, and a benchmark will tend to warm up the cache better than a real application might. I can't see how SSE could be applied to this problem, but it's not something I know much about. I don't have much time to work on this right now, but if the algorithm plus test harness and test data were bundled up into something that I can just "make", I will try to compare it with my version. Regards, Phil.