
My review of the proposed Boost.URL library follows. Unfortunately I have been away for most of the review period and I have not been able to spend as much time on this as I would have liked; in particular I have not tried to run any code. But I have looked at the source and documentation. In previous email discussions a few months ago I complained about the dangerous use of non-owning view types in the library interface, and I see that these issues are still present. Specifically, quoting the "Quick Look" docs: result<url_view> r = parse_uri( s ); url u = parse_uri( s ).value(); That's fine for C++03, but since C++11 we have auto, and this is where it becomes dangerous: auto r = parse_uri( s ); There's no indication here that r is a non-owning view and that the caller is responsible for keeping s alive for the lifetime of r. Such errors could be difficult to detect; the code may appear to work correctly much of the time if the storage for s is not re-used. auto u = parse_uri( s ).value(); That's even worse, because we're calling a method called value() but what it returns is not actually a value, it's still a view. A safer API design would, at a minimum, give the parsing function a name with "view" in it. The default, shorter-named function should return a value type. More fundamentally, I'm not convinced that this allocation-avoiding view-rich API style is best for a general purpose vocabulary type like URL. URL should be a "Keep It Simple" type, accessible to C++ beginners, with straightforward lifetime semantics. Compare with std::filesystem::path, which does not try to optimise by actually being a view. Because of these concerns about the basic API design, I believe that the library should be REJECTED. Following are some comments about other more minor issues that I have noticed: I see that there is a method called persist(): std::shared_ptr< url_view const > persist() const; This function returns a read-only copy of the URL, with shared lifetime. The returned value owns (persists) the underlying string. The algorithm used to create the value minimizes the number of individual memory allocations, making it more efficient than when using direct standard library functions. Hmm, so url_view doesn't own the underlying storage, except when it does as a result of this mysterious method. What's the rationale for this? (Have I missed some documentation?) What is sizeof(url) or sizeof(url_view)? I think it must be about 160 bytes, since you store offsets to each of the components. I'm not convinced that this is the best approach. I would be inclined to determine the offsets each time they are needed by the accessor methods. In the constructor, parse to validate the syntax but don't store any offsets. I believe that in the case of the path segments and the query parameters, you are not storing offsets (because the number is variable, and that would need allocations); why not adopt that approach for the other offsets? Making the objects smaller would probably have measurable performance benefits. In your non-view url object, you seem to allocate storage using new char[]. Why not use a std::string? One benefit of a std::string is SBO. Another is that you could provide a move constructor from a string. I find the names of the encoding/decoding views confusing. I would expect a "decoded_view" to refer to underlying encoded data and return equivalent decoded data when it is read from, but you seen to call that an "encoded_view". In your query parameter API, you expose something that looks like a standard associative container except that you call the key and value "key" and "value" respectively, rather than "first" and "second". While I think many of us would agree that those are better names, it does make it more difficult to use this container with generic code than it would be otherwise. I see that you also have a bool called "has_value" in the value_type. I think this is to handle this: ?a=1 // has_value == true && value == "1" ?a= // has_value == true && value == "" ?a // has_value == false Is that correct? Again having this third member in the container's value_type makes generic code more difficult. I would be inclined to use a std::optional for the value instead. In the past I have had situations where it was necessary to specify the case for hex letters in %-encoding; please consider adding this to pct_encode_opts. The Magnet Link example needs a paragraph at the start to say what a Magnet Link is. Regards, Phil.