On Sat, Oct 14, 2017 at 12:03 PM, Phil Endecott via Boost <boost@lists.boost.org> wrote:
The issue of generator<T> providing only input iterators is the most significant issue I've spotted so far. This is in some way related to the whole ASIO "buffer sequence" thing; the code I posted before read into contiguous buffers, but that was lost before the downstream code saw it, so it couldn't hope to optimise with e.g. word-sized copies or compares.
Buffer sequences are not the problem, it is that parsed HTTP data types are heterogeneous. For example, the series of types generated when parsing a request looks like this: 1. std::pair<verb, string>: verb enum (if known) and method string 2. string: request-target string 3. integer (HTTP-version) 4. vector<tuple<field, string, string>>: field name enum (if known), name, value 5. vector<string>: body data OR 5. vector<string, string>: body data plus chunk-extension An interface which presents parsed data through a function return value (for example, an iterator's operator*) is only capable of yielding one type. The only way to use the same control flow and produce different types is to do two things: inform the caller of the type of the next incoming object, and then provide a set of functions from which the caller chooses the correct one with the proper matching return type for receiving the next value. You can see this in the Boost.Http parser calling code: do { request_reader.next(); switch (request_reader.code()) { case code::skip: // do nothing break; case code::method: method = request_reader.value<token::method>(); break; case code::request_target: request_target = request_reader.value<token::request_target>(); break; case code::version: version = request_reader.value<token::version>(); break; case code::field_name: last_header = request_reader.value<token::field_name>(); } } while(request_reader.code() != code::end_of_message); A viable alternative, which does not preserve the same structure of calling code, is to use a type of "visitor". The parser calls a user defined function specific to the next anticipated token, whose argument list has the correct types. This is the approach used in Beast. The parser calls a particular member function of the derived class depending on what structured element was parsed. The arguments to the member function have the correct high level types. For example, when Beast parses the request-line it invokes a member function with this signature in the derived class: /// Called after receiving the request-line (isRequest == true). void on_request_impl( verb method, // The method verb, verb::unknown if no match string_view method_str, // The method as a string string_view target, // The request-target int version, // The HTTP-version error_code& ec); // The error returned to the caller, if any Note the rich variety of types: `verb` is an enumeration of known HTTP methods: <http://www.boost.org/doc/libs/master/libs/beast/doc/html/beast/ref/boost__beast__http__verb.html> `method_str` is the exact method string extracted by the parser. This is needed when the method does not match one of the method strings known to the library, indicated by the enumeration value `verb::unknown`. `target` is a straightforward string, while `version` is conveyed as an integer. Since the parser owns the control flow at the time the member function is called, the `ec` output parameter allows the callee to indicate that it wishes to break out of the parser's loop and return control to the calling function. After the request-line comes zero or more calls to a member function with field name/value pairs. That member function signature looks like this: /// Called after receiving a header field. void on_field_impl( field f, // The known-field enumeration constant string_view name, // The field name string. string_view value, // The field value error_code& ec); // The error returned to the caller, if any Note how the collection of types presented for a header field is different from the request-line. Expressing this irregular stream of different types through an iterator interface is going to be very clumsy. Furthermore, there is metadata generated during the parse which is not easily reflected in an iterator interface. For example, after the HTTP headers have been parsed, Beast calculates the "keep-alive" semantic as well as the disposition of the Content-Length, which may be in three states: body-to-eof, chunked, or known. The keep-alive semantics are communicated to the caller of the parser through a member function `basic_parser::is_keep_alive`: <http://www.boost.org/doc/libs/master/libs/beast/doc/html/beast/ref/boost__beast__http__basic_parser/is_keep_alive.html> I described in a previous post how Beast's parser exposes two interfaces. The public interface is consumed by stream algorithms (e.g. read_some, async_read_some) while the derived class interface is used to store structure HTTP elements. The function `is_keep_alive` is exposed through the public interface of the parser because it is primarily of interest to the stream algorithm, since the stream algorithm concerns itself with the connection and whether or not it should be closed afterwards. Meanwhile, the Content-Length disposition is exposed to the derived class since it is a piece of metadata of interest to the algorithm which stores the body in the message container. It is communicated by the parser through a call to this derived class member: /// Called just before processing the body, if a body exists. void on_body_init_impl( boost::optional< std::uint64_t> const& content_length, // Content length if known, else `boost::none` error_code& ec); // The error returned to the caller, if any There is so much type irregularity in the information presented during the parse that I feel an iterator based approach would be, to use informal terms, "quite ugly." Thanks