On 22/01/2020 07:39, Andrey Semashev wrote:
On 2020-01-21 18:51, Vinnie Falco wrote:
On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev wrote:
I'd be more interested in a more generic URI library. Along with a few associated algorithms, e.g. those described in: https://tools.ietf.org/html/rfc3986
Yes, this library does that. I do not use the term "URI" because it is confusing and pointless. They are all URLs now. My library follows the RFC, except that I have renamed the top level production rules to reflect this preference:
URL = scheme ":" hier-part [ "?" query ] [ "#" fragment ] URL-reference = URL / relative-ref absolute-URL = scheme ":" hier-part [ "?" query ]
I didn't invent this idea, deprecating the word "URI" and using "URL" consistently in its place is recommended by WhatWG.
There is a semantic difference between URI and URL - the former is an identifier and the latter is a locator (i.e. a path to a resource location). You can treat locator as an identifier but not the other way around. Using the term URL to refer to an URI is confusing.
Notably, all URLs are URIs, but not all URIs are URLs. Some are URNs, for example, which are structured a bit differently (eg. "urn:oasis:names:specification:docbook:dtd:xml:4.1.2"). A program only dealing with "locations to download from" generally only needs to worry about URLs, but there are other places where all URIs (including URNs) may be encountered (even by such a program) -- for example, as XML namespace identifiers. (Usually these can be treated as opaque, though.) Still, given that the same parsing rules can apply to both (URNs usually just have a long opaque path after the "urn" scheme), it doesn't seem unreasonable to call it an "URL library" anyway (despite the recommendation in RFC3986). Some people would be confused by calling them "URIs" and those who know better will know that as well. Having said that, the docs should call out RFC support and URI compatibility explicitly, so that people aren't left wondering.