Hi Ben, Ben Hutchings wrote:
A full solution will need to be a little bit more complicated. There isn't a specification of how Windows command-lines should be generated or interpreted from an array of strings, but it seems to be sensible to interpret them in the same way as Microsoft's run-time library does it:
* Outside a quoted section, a double-quote begins a quoted section. * Outside a quoted section, a string of one or more spaces is a separator if there are non-space characters both before and after it; otherwise it's just padding. * Inside a quoted section, a double-quote preceded by an odd number of backslashes (call this n) represents (n-1)/2 backslashes followed by a double-quote. * Inside a quoted section, a double-quote preceded by an even number of backslashes (possibly zero; call this n) represents n/2 backslashes and ends the quoted section. * All other characters represent themselves.
Thanks for this explanation! Just tried with MinGw, and the argv array appears to follow the rules you describe. For borland, however, the fancy rules about backslahes do not apply. Backslash before double quote blocks its special meaning, and backslash anywhere else has no effect. (BTW, I really don't understand the point of collapsing "\\" only before quotes.)
Note that double-quotes are not separators and backslashes are not usually escape characters. Also note that a quoted section does not have to be terminated.
tokenizer is probably not up to this job. You could probably use regular expressions, but a custom state machine might be the best solution.
Yep, the existing tokenizer facilities won't work with this case. I'll see about implementing either new tokenizing function or state machine. Thanks, Volodya