On 22/07/2021 5:56 pm, Soronel Haetir wrote:
I would have thought that '-' would only get confused as a range specifier when it follows an opening atom. Here it follows a closing atom (the '9' in 0-9').
I did not think for example that "a-g-z" could possibly be equivalent to "a-z", that it should only be able to match a, b, c, d, e,f ,g '-' and 'z'.
That's not unreasonable, but it's not how the specification is worded. So you might find that it works on a particular implementation, but it's risky. The text of most regex specifications says that the only valid positions for a minus character that is intended to represent itself is either immediately following the [ or immediately preceding the ]. Of those, the former is a bit more traditional and hence safer. (Although if you want to include ] as well, then ] must be first and so - must be last.) But there's lots of implementation-defined holes in regexes, so YMMV. For example, some will accept it anywhere if you escape it with a backslash. Others don't support backslash escapes inside character sets at all. https://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html specifically calls out a construct such as "a-g-z" as undefined behaviour.