Vinnie Falco wrote:
The reinterpret_cast<> can be trivially changed to std::memcpy: ... Yes, I believe that's the right thing to do.
That hurts 32-bit ARM.
I think that's an issue with whatever compiler you're using, not the architecture; I've just done a quick test with arm-linux-gnueabihf-g++-6 6.3.0 and I get about a 5% speedup by using memcpy.
There's just an eensy teensy problem, the Beast validator is an "online" algorithm. It works with chunks of the entire input sequence at a time, sequentially, so there could be a code point that is split across the buffer boundary.
Yes, I did notice that but it wasn't clear that it was actually being used.
I admit that there is surprisingly large amount of code required just to handle this case.
The following code is totally untested.
template <typename ITER>
bool is_valid_utf8(ITER i, ITER end, uint8_t& pending)
{
// Check if range is valid and complete UTF-8.
// pending is used to carry state about an incomplete multi-byte character
// from one call to the next. It should be zero initially and is zero on return if
// the input is not mid-character. After submitting the last chunk the caller
// should check both the return value and pending==0.
// Skip bytes pending from last buffer.
// The number of 1s at the most significant end of the first byte of a multi-byte
// character indicates the total number of bytes in the character. pending is
// this byte, shifted to allow for the number of bytes already seen.
while (pending & 0x80) {
uint8_t b = *i++;
pending = pending<<1;
if ((b & 0xc0) != 0x80) return false; // Must be a 10xxxxxx continuation byte.
if (i == end) return true;
}
pending = 0;
while (i != end) {
// If i is suitably aligned, do a fast word-at-a-time check for ASCII characters.
// FIXME this only works if ITER is a contiguous iterator; it needs a "static if".
const char* p = &(*i);
const char* e = p + (end-i); // I don't think &(*end) is allowed because it appears to dereference end.
unsigned long int w; // Should be 32 bits on 32-bit processor and 64 bits on 64-bit processor.
if (reinterpret_cast