
- Does this library bring real benefit to C++ developers for real world use-case? Yes. - Do you have an application for this library? No. - Does the API match with current best practices? See details below. - Is the documentation helpful and clear? In Overview: - s/will be easy to read/is easy to read (no need to use the future tense) - s/1.78/1.81 (can't use 1.78 if URL depends on TOT, which is stated elsewhere in the review announcement) - The bullet list is an awkward mix of verb- and noun-phrases (e.g. "Require only C++11" and "Fast compilation, few templates"). I suggest all noun-phrases, like "Requires only C++11"). - s/1.80/1.81 -- same reason as above. - I think that flow in tutorial-style documentation is important. In the Overview, you introduce what the library is at a high level, then jump into some build- and compiler-support weeds, only to then follow with the Quick Look section. IMO, you should move the bits after the bullet list to a section called Compilers and Build Considerations or similar, and place it much later in the docs. In Containers: The text at the top makes the relative/absolute distinction sound important, but the rest of the page does not define or show examples of the difference between the two. I suggest single quotes around "/" in "The reason is any decoded character / could make", since on first reading I thought it was part of the documentation text, not in code-font. Maybe do it like you do for the mention of double slashes a few lines later? When the ambiguity of the string "https://www.example.com/get-customer.php?name=John%26Doe" is mentioned, the docs say that if the query string represents parameters, the quoted "&" is not ambiguous. Then it shows an example of getting all the parameters from the string "https://www.example.com/get-customer.php?id=409&name=Joe&individual". This begs the question, what happens when there's a "%26" in the string, and you use .params()? Please show this as part of the example, or at least explain it. Examples: It would be nice if these were on separate pages. Even nicer would be some context, besides just a bunch of code. I had to Google "magnet link", and I still don't really know what it is. A brief explanation would be helpful. API Docs: The utl::params API docs refer to it as a random-access view, but it is neither random-access, nor a view. - Did you try to use it? What problems or surprises did you encounter? I did not. - What is your evaluation of the implementation? I did not look too much at the implementation. but what I did look at was somewhat hard to follow. I went looking for where a result<> is filled in during the parse, and it took me a solid 45 minutes to find it. Design: Overall, I quite like the design. Lazy, lazy, lazy. The fact that only views into the underlying sequence are returned from the parse is great, and the fact that there are both owning (url::url and fixed-capcity owning url::static_url) and view versions of the URL representations is exactly what I would need as a user. When I call parse_uri(), I get a result of type result<url_view>? Why isn't the function spelled parse_url(), or the return type spelled result<uri_view>? This seems needlessly inconsistent, given the assertion in the docs that URL is fine to use in place of URI. The assign_to() API seems awkward to me (as in std::string str; u.query().assign_to(str);). This would be better as a conversion function with a name, like string(), to_string(), or as_string(). Assignment is fundamental, and the compiler knows what to do with it (how to optimize it away, for one thing). Assignment through an out-param is awkward at best, and is actually more likely to lead to more pessimal code, since RVO cannot be applied. The append_to() API is really very nice. Most libraries don't support appending as a first-order operation, but it is essential for string perf in many use patterns. However, the containers are frankly a mess. What are they, exactly, at a high level of abstraction? For instance, is params a singly-linked list? It has a forward iterator, so it kinda looks like one. Is it an associative container? It has key/value pairs (where, sure the value might be empty, but it still exists at the level of the container), so it looks like one of those too. Is it array-like? The keys are not sorted, and the order corresponds to the order in a URL, so it sure looks array-like. This is important to users who want to use your containers in generic code, including the standard algorithms (which often have a pessimal implementation specific to forward_iteretors). Also, your containers have about 10% of the API of any of the kinds of containers listed above, meaning generic code will have to special-case use of your container if it's missing some of the SequenceContainer or AssociativeContainer API that the generic code relies on. I recommend that you implement params in terms of a std::vector<std::pair<string_type, string_type>> (using whatever string_type you're already using), and then add the API that is URL-specific on top of it, and have two members: std::vector<std::pair<string_type, string_type>> & storage(); std::vector<std::pair<string_type, string_type>> const & storage() const; Then make the iterators RA. Similar changes seem appropriate for the other containers as well. "Remove" is not really a term used in the STL. I think your "remove*()" container members should be "erase*()" instead. The replace() API is nice. I only wish it were range-capable (that is, I wish there were an overload taking two iterators to erase, and two iterators to insert). Why did you define your own find_if() and find_if_not(), instead of using the std ones? And why the hell does CharSet have these two as members? That's crazy! I find the creation of a mini-parsing library and the expectation that users will use this mini-library to write their own parsers to be somewhat problematic. No one is going to use your mini-library for serious parsing work (except perhaps for modifications to the way URL does its parsing). It does have proper error reporting, for one thing. There are other equally obvious limitations. If the parsing were an implementation detail, or if you were only providing the parsing lib for extensions to your existing parsing, what you've provided would be enough. However, you also claim that you can use URL as part of a larger parser, and I don't think that's realistic. I vote to ACCEPT. I think I've raised legitimate concerns about the design here, but none of them rises to the level of a blocker. Zach

On Sun, Aug 21, 2022 at 12:24 PM Zach Laine via Boost <boost@lists.boost.org> wrote:
Yes.
Thank you for taking the time to review the library!
- Is the documentation helpful and clear? ...
Oh I very much like feedback about documentation, especially when it is actionable (i.e. offers specific advice instead of just "should be better"). I agree with everything you wrote. I do struggle with documentation, it seems it is never as good as I want it to be no matter how much time I put into it. We will copy this into an issue and get to work on it!
IMO, you should move the bits after the bullet list to a section called Compilers and Build Considerations or similar, and place it much later in the docs.
Well, the Overview page is designed to tell prospective users of the library everything they need to know to make a determination on whether or not the library will suit their needs. And it also serves double-duty as a promotional page. We want the library to look as good as possible - showing that we have taken the time to support all those compilers and build configurations seems to me to be a signal of quality. We can probably shuffle some things around and refine the grammar though.
The text at the top makes the relative/absolute distinction sound important, but the rest of the page does not define or show examples of the difference between the two.
Yes... the docs need work.
... Please show this as part of the example, or at least explain it. ...
Agree with everything there.
It would be nice if these were on separate pages. Even nicer would be some context, besides just a bunch of code. I had to Google "magnet link", and I still don't really know what it is. A brief explanation would be helpful.
Yes, we agree with everything. And we need many more small self-contained examples of doing common things.
The utl::params API docs refer to it as a random-access view, but it is neither random-access, nor a view.
This is what we in the biz call a "bug" in the documentation :) It is a mutable BidirectionalRange which references the original url's character buffer.
When I call parse_uri(), I get a result of type result<url_view>? Why isn't the function spelled parse_url(), or the return type spelled result<uri_view>? This seems needlessly inconsistent, given the assertion in the docs that URL is fine to use in place of URI.
Yes good question, and I think this is explained in the docs. When we refer to the general family of strings we call it "URL." Containers are all capable of holding the generic syntax (i.e. ALL valid productions of the grammar). thus containers are named url, url_view, and static_url. However, when we refer to specific grammars from rfc3986 we always use the exact name of the grammar. Names of parsing functions follow a consistent pattern: "parse_" followed by the grammar which they implement: // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] parse_uri(); // absolute-URI = scheme ":" hier-part [ "?" query ] parse_absolute_uri(); // relative-ref = relative-part [ "?" query ] [ "#" fragment ] parse_relative_ref(); // URI-reference = URI / relative-ref parse_uri_reference(); // origin-form = absolute-path [ "?" query ] parse_origin_form(); // rfc7230 However, they all return the same container, which is url_view, because url_view is capable of holding all of these things. We briefly explored baking different grammars into the type system (e.g. absolute_uri_view) but it becomes totally unergonomic with no corresponding benefit. We do have a separate type, authority_view - because an authority by itself does not satisfy the generic syntax. And of course the function is suitably named: // authority = [ userinfo "@" ] host [ ":" port ] result< authority_view > rv = parse_authority( s );
The assign_to() API seems awkward to me (as in std::string str; u.query().assign_to(str);). This would be better as a conversion function with a name, like string(), to_string(), or as_string().
I've resisted an implicit conversion to std::string but of course Ramey Rule kicks in and steers you elsewhere. We are very likely going to add it, because the API impedance is too high otherwise. So you will just be able to assign it to your string. And we can trivially add member to_string() or whatever name is in fashion these days.
Assignment is fundamental, and the compiler knows what to do with it (how to optimize it away, for one thing). Assignment through an out-param is awkward at best, and is actually more likely to lead to more pessimal code, since RVO cannot be applied. The append_to() API is really very nice. Most libraries don't support appending as a first-order operation, but it is essential for string perf in many use patterns.
Hey thanks for noticing this! Yes it was entirely our intent to create an API that is suitable for working with allocators, without having to specify Allocators if you know what I mean. assign_to and append_to are for reusing the capacity in existing strings. We have assign_to because the MutableString concept doesn't specify a way of clearing the string, and we want to be able to express algorithms without a precondition "the mutable string must have a zero size." I think the MutableString concept needs some more work though, we are open to a design discussion on how to improve it. The idea is to support algorithms that require temporary buffers to operate.
However, the containers are frankly a mess. What are they, exactly, at a high level of abstraction? For instance, is params a singly-linked list? It has a forward iterator, so it kinda looks like one.
It is a lazy flat_map without random access but with location-based insertion and erase. Segments is a lazy list<string>.
This is important to users who want to use your containers in generic code, including the standard algorithms (which often have a pessimal implementation specific to forward_iteretors).
Most users should be able to use these lazy containers as-is. For example they might simply iterate over the params, ignore the ones that they don't care about, and take action on the ones they want.
Also, your containers have about 10% of the API of any of the kinds of containers listed above, meaning generic code will have to special-case use of your container if it's missing some of the SequenceContainer or AssociativeContainer API that the generic code relies on.
The containers are designed to model as closely as possible standard containers, where it makes sense to do so. Because they are lazy and not node based, some concessions have to be made of course. The priority in all cases is to be lazy (avoid allocation). The escape hatch is that users who need the exact APIs offered by a standard container, are free to copy the lazy range into their standard container and then do everything they need to do. And then they should be able to assign the result back, since these lazy ranges support assignment of the entire range.
I recommend that you implement params in terms of a std::vector<std::pair<string_type, string_type>> (using whatever string_type you're already using), and then add the API that is URL-specific on top of it, and have two members:
Uhh... well... that's not lazy :) and it allocates memory. If users want this, they have to opt-in to it. I would be reluctant to design anything that doesn't give the user control over the allocator. But if you think about it, these lazy ranges give you that. Users can use standard algorithms like copy with a back_insert_iterator into any container they want, regardless of allocator.
std::vector<std::pair<string_type, string_type>> & storage(); std::vector<std::pair<string_type, string_type>> const & storage() const;
"Remove" is not really a term used in the STL. I think your "remove*()" container members should be "erase*()" instead.
I'm not sure where you're seeing this? There are no "remove" functions in these containers.
The replace() API is nice. I only wish it were range-capable (that is, I wish there were an overload taking two iterators to erase, and two iterators to insert).
That is something we could do, but there was no obvious use-case for removing ranges of query params.
Why did you define your own find_if() and find_if_not(), instead of using the std ones? And why the hell does CharSet have these two as members? That's crazy!
They are optional members. A particular charset might know how to implement find_if much better than a generic algorithm. The free function grammar::find_if will call the member find_if on the CharSet if it exists, otherwise it uses a standard algorithm. For example grammar::lut_chars::find_if uses an SSE2 implementation when available. And it can be further optimized but we just haven't bothered to do it yet.
I find the creation of a mini-parsing library and the expectation that users will use this mini-library to write their own parsers to be somewhat problematic. No one is going to use your mini-library for serious parsing work (except perhaps for modifications to the way URL does its parsing).
I agree that this is not going to be for most users but if even 1 or 2 users find utility (for example to further validate custom scheme-specific grammars) then I would be happy with it. I am using this extensively in downstream libraries, for "serious parsing work" so I disagree that it is not serious. This is a mini-library for now so downstream libraries can use it ergonomically, but once it gets some more practical usage in more varied settings (and becomes stable API-wise) I would rather propose it as a new library.
It does have proper error reporting, for one thing.
Did you mean that it "doesn't" have proper error reporting? Where do you see this lack of reporting (if that's what you meant)?
I vote to ACCEPT. I think I've raised legitimate concerns about the design here, but none of them rises to the level of a blocker.
Thanks!! Regards
participants (2)
-
Vinnie Falco
-
Zach Laine