On 21.08.22 20:36, Peter Dimov via Boost wrote:
Rainer Deyke wrote:
(Arbitrary percent-encoded 8-bit values are legal in URLs, but not in IRIs.)
Aren't they? I didn't find anything prohibiting them in the RFC, although I might well have missed it.
The sections that specify the recommended way to convert between URI and IRI do say how these are handled - a percent-encoded sequence in a URI that doesn't correspond to a valid UTF-8 encoded code point is left alone in the IRI. (Valid UTF-8 percent encodings are percent-decoded.)
OK, it is possible to partially decode a URL containing a mix of utf-8 and arbitrary 8-bit values to get something that looks like a URL, but with Unicode. And this half-decoded URL is an IRI, so on a technical level you are correct. On the other hand, if calling the 'path' member function of a URL object returns a fully percent-decoded path, then calling the utf-8 equivalent of that member function should return something that is both legal utf-8 and fully percent-decoded. Which is only possible if the path contains no percent-encoded values that are not utf-8. -- Rainer Deyke (rainerd@eldwood.com)