[boost] Boost.URL review

21 Aug 2022

      - Does this library bring real benefit to C++ developers for real
world use-case?

Yes.

- Do you have an application for this library?

No.

- Does the API match with current best practices?

See details below.

- Is the documentation helpful and clear?

In Overview:

- s/will be easy to read/is easy to read (no need to use the future tense)
- s/1.78/1.81 (can't use 1.78 if URL depends on TOT, which is stated
elsewhere in the review announcement)
- The bullet list is an awkward mix of verb- and noun-phrases (e.g.
"Require only C++11" and "Fast compilation, few templates").  I
suggest all noun-phrases, like "Requires only C++11").
- s/1.80/1.81 -- same reason as above.
- I think that flow in tutorial-style documentation is important.  In
the Overview, you introduce what the library is at a high level, then
jump into some build- and compiler-support weeds, only to then follow
with the Quick Look section.  IMO, you should move the bits after the
bullet list to a section called Compilers and Build Considerations or
similar, and place it much later in the docs.

In Containers:

The text at the top makes the relative/absolute distinction sound
important, but the rest of the page does not define or show examples
of the difference between the two.

I suggest single quotes around "/" in "The reason is any decoded
character / could make", since on first reading I thought it was part
of the documentation text, not in code-font.  Maybe do it like you do
for the mention of double slashes a few lines later?

When the ambiguity of the string
"https://www.example.com/get-customer.php?name=John%26Doe" is
mentioned, the docs say that if the query string represents
parameters, the quoted "&" is not ambiguous.  Then it shows an example
of getting all the parameters from the string
"https://www.example.com/get-customer.php?id=409&name=Joe&individual".
This begs the question, what happens when there's a "%26" in the
string, and you use .params()?  Please show this as part of the
example, or at least explain it.

Examples:

It would be nice if these were on separate pages.  Even nicer would be
some context, besides just a bunch of code.  I had to Google "magnet
link", and I still don't really know what it is.  A brief explanation
would be helpful.

API Docs:

The utl::params API docs refer to it as a random-access view, but it
is neither random-access, nor a view.

- Did you try to use it? What problems or surprises did you encounter?

I did not.

- What is your evaluation of the implementation?

I did not look too much at the implementation. but what I did look at
was somewhat hard to follow.  I went looking for where a result<> is
filled in during the parse, and it took me a solid 45 minutes to find
it.

Design:

Overall, I quite like the design.  Lazy, lazy, lazy.  The fact that
only views into the underlying sequence are returned from the parse is
great, and the fact that there are both owning (url::url and
fixed-capcity owning url::static_url) and view versions of the URL
representations is exactly what I would need as a user.

When I call parse_uri(), I get a result of type result<url_view>?  Why
isn't the function spelled parse_url(), or the return type spelled
result<uri_view>?  This seems needlessly inconsistent, given the
assertion in the docs that URL is fine to use in place of URI.

The assign_to() API seems awkward to me (as in std::string str;
u.query().assign_to(str);).  This would be better as a conversion
function with a name, like string(), to_string(), or as_string().
Assignment is fundamental, and the compiler knows what to do with it
(how to optimize it away, for one thing).  Assignment through an
out-param is awkward at best, and is actually more likely to lead to
more pessimal code, since RVO cannot be applied.

The append_to() API is really very nice.  Most libraries don't support
appending as a first-order operation, but it is essential for string
perf in many use patterns.

However, the containers are frankly a mess.  What are they, exactly,
at a high level of abstraction?  For instance, is params a
singly-linked list?  It has a forward iterator, so it kinda looks like
one.  Is it an associative container?  It has key/value pairs (where,
sure the value might be empty, but it still exists at the level of the
container), so it looks like one of those too.  Is it array-like?  The
keys are not sorted, and the order corresponds to the order in a URL,
so it sure looks array-like.  This is important to users who want to
use your containers in generic code, including the standard algorithms
(which often have a pessimal implementation specific to
forward_iteretors).  Also, your containers have about 10% of the API
of any of the kinds of containers listed above, meaning generic code
will have to special-case use of your container if it's missing some
of the SequenceContainer or AssociativeContainer API that the generic
code relies on.  I recommend that you implement params in terms of a
std::vector<std::pair<string_type, string_type>> (using whatever
string_type you're already using), and then add the API that is
URL-specific on top of it, and have two members:

std::vector<std::pair<string_type, string_type>> & storage();
std::vector<std::pair<string_type, string_type>> const & storage() const;

Then make the iterators RA.  Similar changes seem appropriate for the
other containers as well.

"Remove" is not really a term used in the STL.  I think your
"remove*()" container members should be "erase*()" instead.

The replace() API is nice.  I only wish it were range-capable (that
is, I wish there were an overload taking two iterators to erase, and
two iterators to insert).

Why did you define your own find_if() and find_if_not(), instead of
using the std ones?  And why the hell does CharSet have these two as
members?  That's crazy!

I find the creation of a mini-parsing library and the expectation that
users will use this mini-library to write their own parsers to be
somewhat problematic.  No one is going to use your mini-library for
serious parsing work (except perhaps for modifications to the way URL
does its parsing).  It does have proper error reporting, for one
thing.  There are other equally obvious limitations.  If the parsing
were an implementation detail, or if you were only providing the
parsing lib for extensions to your existing parsing, what you've
provided would be enough.  However, you also claim that you can use
URL as part of a larger parser, and I don't think that's realistic.

I vote to ACCEPT.  I think I've raised legitimate concerns about the
design here, but none of them rises to the level of a blocker.

Zach

[boost] Boost.URL review

Zach Laine