Vladimir Prus wrote:
cl_corba_at wrote: <snip>
Frankly, I've missed the WinMain case, and not sure what to do. The problem is that single string might erase important information --- what happens if you have command line argument with embedded space.
Embedded spaces should be placed in quotation marks which should be removed by the tokenizer. Arguments could be seperated by one ore more space.
Do you mean that if I type
program "a b c" "C:/Program files"
then the program will receive this string, with quotes there? (The linux shell will strip quotes completely).
That is correct. Windows programs receive given a single string of arguments to parse whereas Unix programs receive a vector which can be passed to main unchanged by the startup code. <snip>
Is this solution is OK? (The only problem is that adjuacent spaces are mishandled, but that's fixable).
A full solution will need to be a little bit more complicated. There isn't a specification of how Windows command-lines should be generated or interpreted from an array of strings, but it seems to be sensible to interpret them in the same way as Microsoft's run-time library does it: * Outside a quoted section, a double-quote begins a quoted section. * Outside a quoted section, a string of one or more spaces is a separator if there are non-space characters both before and after it; otherwise it's just padding. * Inside a quoted section, a double-quote preceded by an odd number of backslashes (call this n) represents (n-1)/2 backslashes followed by a double-quote. * Inside a quoted section, a double-quote preceded by an even number of backslashes (possibly zero; call this n) represents n/2 backslashes and ends the quoted section. * All other characters represent themselves. Note that double-quotes are not separators and backslashes are not usually escape characters. Also note that a quoted section does not have to be terminated. tokenizer is probably not up to this job. You could probably use regular expressions, but a custom state machine might be the best solution.
Hi Ben, Ben Hutchings wrote:
A full solution will need to be a little bit more complicated. There isn't a specification of how Windows command-lines should be generated or interpreted from an array of strings, but it seems to be sensible to interpret them in the same way as Microsoft's run-time library does it:
* Outside a quoted section, a double-quote begins a quoted section. * Outside a quoted section, a string of one or more spaces is a separator if there are non-space characters both before and after it; otherwise it's just padding. * Inside a quoted section, a double-quote preceded by an odd number of backslashes (call this n) represents (n-1)/2 backslashes followed by a double-quote. * Inside a quoted section, a double-quote preceded by an even number of backslashes (possibly zero; call this n) represents n/2 backslashes and ends the quoted section. * All other characters represent themselves.
Thanks for this explanation! Just tried with MinGw, and the argv array appears to follow the rules you describe. For borland, however, the fancy rules about backslahes do not apply. Backslash before double quote blocks its special meaning, and backslash anywhere else has no effect. (BTW, I really don't understand the point of collapsing "\\" only before quotes.)
Note that double-quotes are not separators and backslashes are not usually escape characters. Also note that a quoted section does not have to be terminated.
tokenizer is probably not up to this job. You could probably use regular expressions, but a custom state machine might be the best solution.
Yep, the existing tokenizer facilities won't work with this case. I'll see about implementing either new tokenizing function or state machine. Thanks, Volodya
participants (2)
-
Ben Hutchings
-
Vladimir Prus