
Vinnie Falco wrote:
The reinterpret_cast<> can be trivially changed to std::memcpy: ... Yes, I believe that's the right thing to do.
That hurts 32-bit ARM.
I think that's an issue with whatever compiler you're using, not the architecture; I've just done a quick test with arm-linux-gnueabihf-g++-6 6.3.0 and I get about a 5% speedup by using memcpy.
There's just an eensy teensy problem, the Beast validator is an "online" algorithm. It works with chunks of the entire input sequence at a time, sequentially, so there could be a code point that is split across the buffer boundary.
Yes, I did notice that but it wasn't clear that it was actually being used.
I admit that there is surprisingly large amount of code required just to handle this case.
The following code is totally untested. template <typename ITER> bool is_valid_utf8(ITER i, ITER end, uint8_t& pending) { // Check if range is valid and complete UTF-8. // pending is used to carry state about an incomplete multi-byte character // from one call to the next. It should be zero initially and is zero on return if // the input is not mid-character. After submitting the last chunk the caller // should check both the return value and pending==0. // Skip bytes pending from last buffer. // The number of 1s at the most significant end of the first byte of a multi-byte // character indicates the total number of bytes in the character. pending is // this byte, shifted to allow for the number of bytes already seen. while (pending & 0x80) { uint8_t b = *i++; pending = pending<<1; if ((b & 0xc0) != 0x80) return false; // Must be a 10xxxxxx continuation byte. if (i == end) return true; } pending = 0; while (i != end) { // If i is suitably aligned, do a fast word-at-a-time check for ASCII characters. // FIXME this only works if ITER is a contiguous iterator; it needs a "static if". const char* p = &(*i); const char* e = p + (end-i); // I don't think &(*end) is allowed because it appears to dereference end. unsigned long int w; // Should be 32 bits on 32-bit processor and 64 bits on 64-bit processor. if (reinterpret_cast<uintptr_t>(p) % sizeof(w) == 0) { while (p+sizeof(w) <= e) { memcpy(&w,p,sizeof(w)); if (w & 0x8080808080808080) break; // If any of the top bits are set, fall back to the // byte-at-a-time code below. // (Is there a better way to write the mask value that would work // for e.g. 128-bit ints? Is that expression OK for 32-bit ints?) p += sizeof(w); i += sizeof(w); } if (p == e) break; } uint8_t b0 = *i++; if ((b0 & 0x80) == 0) continue; // Single byte chars are 0xxxxxxx if ((b0 & 0xc0) == 0x80) return false; // 10xxxxxx not allowed as first byte of character if ((b0 & 0xf8) == 0xf8) return false; // 11111xxx is not valid // At this point, we know b0 is a valid first-byte if (i == end) { // Incomplete input pending = b0 << 1; // 1 byte seen so far, rest are pending. return true; } uint8_t b1 = *i++; if ((b1 & 0xc0) != 0x80) return false; // Following bytes are all 10xxxxxx if ((b0 & 0xe0) == 0xc0) continue; // Two-byte chars start 110xxxxx if (i == end) { // Incomplete input pending = b0 << 2; // 2 bytes seen so far, rest are pending return true; } uint8_t b2 = *i++; if ((b2 & 0xc0) != 0x80) return false; // Following bytes are all 10xxxxxx if ((b0 & 0xf0) == 0xe0) continue; // Three-byte chars start 1110xxxx if (i == end) { // Incomplete input pending = b0 << 3; // 3 bytes seen so far, rest are pending return true; } uint8_t b3 = *i++; if ((b3 & 0xc0) != 0x80) return false; // Following bytes are all 10xxxxxx if ((b0 & 0xf8) == 0xf0) continue; // Four-byte chars start 11110xxx return false; // Not reached, I think. } return true; }