On 19/10/2021 05:10, Vinnie Falco wrote:
- "/.//foo/bar" => { ".", "", "foo", "bar" }
The list looks fine except for the above, which I think has to be { "", "foo", "bar" } for the reason that assigning the path should give you back the same results when iterated:
url u = parse_uri( "http:" ).value();
u.segments() = { "", "foo", "bar" };
assert( u.encoded_url() == "http:/.//foo/bar" ); assert( u.segments() == { "", "foo", "bar" } ); // same list
Again, what about the case where the original input URL contained that leading dot? You can't argue "we must report it unchanged" when by definition there are conditions when you are changing it. The only mechanism that seems sane to me is that encoded_url() and friends are documented to normalise (or at least to partially normalise, limited to adding/removing the path prefix) the URL before returning a string, at which point segments() may change content. (But it's important that it doesn't break if you push_back each segment individually instead of assigning it all at once.)
If we then remove the scheme, I think the library needs to remove the prefix that it added. Not a full "normalization" (that's a separate member function that the user calls explicitly). The rationale is that if the library can add something extra to make things consistent, it should strive to remove that something extra when it can do so. Yes this means that if the user adds the prefix themselves and then performs a mutation, the library could end up removing the thing that the user added. I think that's ok though, because the (up to 2 character) path prefixes that the library can add are all semantic no-ops even in the filesystem cases.
Segment iteration is not going to be compatible. In addition to adding an initial "/" segment for absolute paths, Filesystem also collapses consecutive / separators. So iterating "/foo//bar//baz///" produces
"/" │ "foo" │ "bar" │ "baz" │ "" Fair point; I hadn't considered that one. That's unfortunate. I agree
I don't disagree with this, but I do disagree with the iteration methods trying to "hide" elements that are actually present in the URL. On 19/10/2021 08:37, Peter Dimov wrote: that URL cannot collapse adjacent separators.