Re: [boost] boost.url review

21 Aug 2022

      On 21.08.22 20:36, Peter Dimov via Boost wrote:
...
Rainer Deyke wrote:
...
(Arbitrary percent-encoded 8-bit values are legal in URLs, but not in IRIs.)
Aren't they? I didn't find anything prohibiting them in the RFC, although I
might well have missed it.
The sections that specify the recommended way to convert between URI
and IRI do say how these are handled - a percent-encoded sequence in a
URI that doesn't correspond to a valid UTF-8 encoded code point is left
alone in the IRI. (Valid UTF-8 percent encodings are percent-decoded.)
OK, it is possible to partially decode a URL containing a mix of utf-8 
and arbitrary 8-bit values to get something that looks like a URL, but 
with Unicode.  And this half-decoded URL is an IRI, so on a technical 
level you are correct.

On the other hand, if calling the 'path' member function of a URL object 
returns a fully percent-decoded path, then calling the utf-8 equivalent 
of that member function should return something that is both legal utf-8 
and fully percent-decoded.  Which is only possible if the path contains 
no percent-encoded values that are not utf-8.

-- 
Rainer Deyke (rainerd@eldwood.com)