Re: [boost] [Beast] Questions Before Review

26 Jun 2017

      On 26/06/2017 16:46, Vinnie Falco via Boost wrote:
...
On Mon, Jun 26, 2017 at 7:47 AM, Niall Douglas via Boost
<boost@lists.boost.org> wrote:
...
If you have a severe algorithmic flaw in your implementation, reviewers
would be right to reject your library.
If you are stating that Beast has a "severe algorithmic flaw" then
please back up your claims with more than opinion.
I did not state that. I did indicate surprise at the choice to use
memcpy over other available techniques.
...
However, note the following:
* At the time Boost.Http was reviewed it used the NodeJS parser which
operates in chunks [1]. No "severe algorithmic flaw" came up then.
I think I was the only person to recommend acceptance in that review? I
remember people had problems with the API design and intent. So people
may not have bothered thinking about implementation too much if the API
design was a fail.
...
* PicoHTTPParser, which Beast's parser is based on, outperforms NodeJS
by over 600% [2]
If used as its author intended. I didn't get the impression the author
expected to you memcpy its input.
...
* For parsers operating on discontiguous buffers, structured elements
such as request-target, field names, and field values must be
flattened (linearized) to be presented to the next layer which means
temporary storage and buffer copying [3, 4].
Why?

LLVM doesn't do this. So why should Beast?
...
So buffer copies cannot
be avoided. Beast makes the decision to do one big buffer copy up
front instead of many small buffer copies as it goes. The evidence
shows this tradeoff is advantageous.
It may be relative to the other things you've tested against. But it may
not be correct.

For example, another zero copy DMA friendly trick is to do all socket
i/o via a file so the kernel can DMA directly to/from the kernel file
cache. From your perspective, you simply need to map a view onto that
kernel file cache into your process, and indeed you can get your
contiguous span of chars this way without using memcpy by doing a
rolling mmap() of the span you need to see.

However, there are costs to that approach. TLB shootdown can be a real
problem, though less than some often think. Your TCP window size needs
to be big and your NIC not a toy NIC for it to be worthwhile. Most
consumer PCs are fitted with toy NICs, so any benchmarking done on those
won't be realistic. You also need to bypass ASIO and go direct, because
ASIO's async scatter-gather buffer implementation is crap
(https://github.com/chriskohlhoff/asio/issues/203).

Now it may be that reviewers feel that the memory copy is not
significant relative to the rest of Beast. I may feel this way too after
I've studied the code, and for example I'd likely accept memcpy of the
HTTP request and response headers but not accept memcpy of the message
body. We'll see when the review begins.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/