On 07/10/17 19:57, Phil Endecott via Boost wrote:
Vinnie Falco wrote:
The utf-8 validation of text websocket message payloads is a critical bottleneck.
I thought I'd have a look at that. Your code has a fast-path that attempts to check 8 bytes at a time by masking with 0x8080808080808080. I agree that that is probably a useful optimisation which a compiler is unlikely to apply by itself.
I wouldn't be so quick to agree. Modern compilers tend to be able to vectorize simple loops quite efficiently.
Your implementation ( https://github.com/vinniefalco/Beast/blob/6f88f0182d461e9cabe55032f966c9ca4f... ) does this:
if((*reinterpret_cast<std::size_t const*>(in) & mask) != 0)
So you're reinterpret_casting a char* to a size_t* and then dereferencing it. Isn't that undefined behaviour?
C++ allows chars and unsigned chars (which std::uint8_t presumably is) to alias other types, so unless the compiler is able to prove that the pointers refer to an array of something else than std::size_t, it should not apply optimizations based on strict aliasing rules. The code above performs pointer alignment, which strictly speaking has unspecified behavior because it assumes pointer representation. It would be cleaner to use std::align for that. BTW, the fast loop ends on the first byte >=0x80 and then the validation continues in the slow loop until the end of input. It might be better to resume the fast loop if validation of the "vector" passes. Otherwise, for example, and early comment in the input could ruin the performance. (I don't know how probable such case would be.)