program_options and wstrings

I had some questions about using program_options with wstrings. I'm a bit confused after reading the documentation: http://www.boost.org/doc/libs/1_40_0/doc/html/program_options/howto.html#id1... http://www.boost.org/doc/libs/1_40_0/doc/html/program_options/design.html#pr... Excerpt: """ - Each parser should accept either char* or wchar_t*, correctly split the input into option names and option values and return the data. - For each option, it should be possible to specify whether the conversion from string to value uses ascii or Unicode. - The library guarantees that (1) ascii input is passed to an ascii value without change, (2) Unicode input is passed to a Unicode value without change, and (3) ascii input passed to a Unicode value, and Unicode input passed to an ascii value will be converted using a codecvt facet (which may be specified by the user). """ Bullet 1 tells me the system takes both char* and wchar_t*. Bullet 2 tells me that each option can request the string it's handed to be first coerced as either a char* or wchar_t*. This is the only char/wchar_t conversion that happens in the system (codecvt). Why, then, can't I say value<wstring> (besides wvalue<wstring>)? And why do I need to specify command_line_parser vs. wcommand_line_parser? The complexity stems from the fact that there seem to be three sets of variables in the code: - value vs. wvalue - [w]value<string> vs. [w]value<wstring> - command_line_parser vs. wcommand_line_parser I find it unclear what the consequences are of setting each of these one way vs. the other. Any explanations? Related: I ran into this bug: http://lists.boost.org/boost-users/2008/02/34016.php https://svn.boost.org/trac/boost/ticket/1645 Why does this happen? -- Yang Zhang http://www.mit.edu/~y_z/

Yang Zhang wrote:
I had some questions about using program_options with wstrings. I'm a bit confused after reading the documentation:
http://www.boost.org/doc/libs/1_40_0/doc/html/program_options/howto.html#id1...
http://www.boost.org/doc/libs/1_40_0/doc/html/program_options/design.html#pr...
Excerpt:
""" - Each parser should accept either char* or wchar_t*, correctly split the input into option names and option values and return the data.
- For each option, it should be possible to specify whether the conversion from string to value uses ascii or Unicode.
- The library guarantees that (1) ascii input is passed to an ascii value without change, (2) Unicode input is passed to a Unicode value without change, and (3) ascii input passed to a Unicode value, and Unicode input passed to an ascii value will be converted using a codecvt facet (which may be specified by the user). """
Bullet 1 tells me the system takes both char* and wchar_t*. Bullet 2 tells me that each option can request the string it's handed to be first coerced as either a char* or wchar_t*. This is the only char/wchar_t conversion that happens in the system (codecvt).
Why, then, can't I say value<wstring> (besides wvalue<wstring>)? And why do I need to specify command_line_parser vs. wcommand_line_parser?
The complexity stems from the fact that there seem to be three sets of variables in the code:
- command_line_parser vs. wcommand_line_parser
The first is used to parse char* command line, the second to parse wchar_t* command line. If you use the wrong one you program will not compile.
- [w]value<string> vs. [w]value<wstring>
Well, using 'string' or 'wstring' as option type is entirely up to you.
- value vs. wvalue
If your option type can be constructed from char* (either using custom validator, or operator>>), you can use value. If your option type can be constructed from wchar_t, you can use wvalue. If both, wvalue is a better since you won't loose data no matter what kind of parser is used. Given what wstring cannot be constructed from char*, you have to use wvalue for wstring. Does this clarify things? - Volodya

On Sun, Nov 8, 2009 at 11:52 PM, Vladimir Prus
Yang Zhang wrote:
- value vs. wvalue
If your option type can be constructed from char* (either using custom validator, or operator>>), you can use value. If your option type can be constructed from wchar_t, you can use wvalue. If both, wvalue is a better since you won't loose data no matter what kind of parser is used.
Why would you ever lose data? UTF-8 and UTF-16 are both encodings of the same set of characters. Isn't that what codecvt converts between?
Given what wstring cannot be constructed from char*, you have to use wvalue for wstring.
You can't construct a vector<...> from a char* either, yet that's legal. See my confusion? :) This is why it's unclear to me what significance value vs. wvalue have - esp. since codecvt is doing conversions anyway. -- Yang Zhang http://www.mit.edu/~y_z/

Yang Zhang wrote:
On Sun, Nov 8, 2009 at 11:52 PM, Vladimir Prus
wrote: Yang Zhang wrote:
- value vs. wvalue
If your option type can be constructed from char* (either using custom validator, or operator>>), you can use value. If your option type can be constructed from wchar_t, you can use wvalue. If both, wvalue is a better since you won't loose data no matter what kind of parser is used.
Why would you ever lose data? UTF-8 and UTF-16 are both encodings of the same set of characters. Isn't that what codecvt converts between?
But char* strings are not necessary UTF-8, they are in local 8-bit encoding that might well be KOI8-R, or whatever else. So, if you have wchar_t* argv, and your final target is 'string', you have two possible transformations: wchar_t* -> string wchar_t* -> char* -> string with 'wvalue', the first conversion will be attempted, and will fail. with 'value', the second conversion will be attempted, and might work, or might lose some data.
Given what wstring cannot be constructed from char*, you have to use wvalue for wstring.
You can't construct a vector<...> from a char* either, yet that's legal. See my confusion? :) This is why it's unclear to me what significance value vs. wvalue have - esp. since codecvt is doing conversions anyway.
Speaking of vector, you can create a custom validator to create vector from char*, or wchar_t*, or both -- and then you still have to use value/wvalue to specify which path data should travel. - Volodya
participants (2)
-
Vladimir Prus
-
Yang Zhang