On 18/10/2021 13:01, Vinnie Falco wrote:
assert( u.encoded_url() == "https:/.//index.htm" );
I assume this was intended to be "https://./index.htm"?
Nope, it was correct as I wrote it. You managed to produce an authority with a single dot :)
Yes, that was my intent, from your description of replacing the authority with a dot. Though I see the issue now. I was reading the input as: "https://example.com/index.htm" For which removing the authority should result in: "https:/index.htm" (Although this is unusual; typically relative URIs will omit the scheme as well.) For the input: "https://example.com//index.htm" Then it does make sense at a purely-URL-level to transform this to: "https:/.//index.htm" (Although most web servers would treat either as illegal, but you could envisage some not-HTTP protocol that requires such syntax.) Adding the authority back to this URL should result in "https://example.com/.//index.htm", however -- it should not be "ignoring" the prefix once it exists. At least not until the URL is normalised. (Unless you're documenting that URLs are always stored in normalised form, or that setters will automatically normalise.)
We treat a leading "/." as not appearing in the segments, to make the behavior of the library doing these syntactic adjustments transparent and satisfy the rule that assignments from segments produce the same result when iterated.
So what about the input "https://example.com/./index.htm"? Unless you're documenting automatic normalisation, this should still iterate the "." and "index.htm" path components separately.