On 26/06/2017 16:46, Vinnie Falco via Boost wrote:
On Mon, Jun 26, 2017 at 7:47 AM, Niall Douglas via Boost
wrote: If you have a severe algorithmic flaw in your implementation, reviewers would be right to reject your library.
If you are stating that Beast has a "severe algorithmic flaw" then please back up your claims with more than opinion.
I did not state that. I did indicate surprise at the choice to use memcpy over other available techniques.
However, note the following:
* At the time Boost.Http was reviewed it used the NodeJS parser which operates in chunks [1]. No "severe algorithmic flaw" came up then.
I think I was the only person to recommend acceptance in that review? I remember people had problems with the API design and intent. So people may not have bothered thinking about implementation too much if the API design was a fail.
* PicoHTTPParser, which Beast's parser is based on, outperforms NodeJS by over 600% [2]
If used as its author intended. I didn't get the impression the author expected to you memcpy its input.
* For parsers operating on discontiguous buffers, structured elements such as request-target, field names, and field values must be flattened (linearized) to be presented to the next layer which means temporary storage and buffer copying [3, 4].
Why? LLVM doesn't do this. So why should Beast?
So buffer copies cannot be avoided. Beast makes the decision to do one big buffer copy up front instead of many small buffer copies as it goes. The evidence shows this tradeoff is advantageous.
It may be relative to the other things you've tested against. But it may not be correct. For example, another zero copy DMA friendly trick is to do all socket i/o via a file so the kernel can DMA directly to/from the kernel file cache. From your perspective, you simply need to map a view onto that kernel file cache into your process, and indeed you can get your contiguous span of chars this way without using memcpy by doing a rolling mmap() of the span you need to see. However, there are costs to that approach. TLB shootdown can be a real problem, though less than some often think. Your TCP window size needs to be big and your NIC not a toy NIC for it to be worthwhile. Most consumer PCs are fitted with toy NICs, so any benchmarking done on those won't be realistic. You also need to bypass ASIO and go direct, because ASIO's async scatter-gather buffer implementation is crap (https://github.com/chriskohlhoff/asio/issues/203). Now it may be that reviewers feel that the memory copy is not significant relative to the rest of Beast. I may feel this way too after I've studied the code, and for example I'd likely accept memcpy of the HTTP request and response headers but not accept memcpy of the message body. We'll see when the review begins. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/