[program_options] unicode_test regressions for CW8

Volodya, Did some more investigation and the basic reason for the failure is that the CW8 lexer doesn't do phase 1 translations of the universal character names. So for string like: L"--foo=\u044F", the "\u044F" is translated into 'u'+'0'+'4'+'4'+'F'. The workaround, which should apply to any compiler that doesn't do universal character names, is to use "\x044F" instead: =================================================================== RCS file: /cvsroot/boost/boost/libs/program_options/test/unicode_test.cpp,v retrieving revision 1.2 diff -u -r1.2 unicode_test.cpp --- unicode_test.cpp 26 Jun 2004 11:33:02 -0000 1.2 +++ unicode_test.cpp 24 Jul 2004 14:49:23 -0000 @@ -32,12 +32,12 @@ ; vector<wstring> args; - args.push_back(L"--foo=\u044F"); + args.push_back(L"--foo=\x044F"); variables_map vm; store(wcommand_line_parser(args).options(desc).run(), vm); - BOOST_CHECK(vm["foo"].as<wstring>() == L"\u044F"); + BOOST_CHECK(vm["foo"].as<wstring>() == L"\x044F"); } // Test that unicode input is property converted into @@ -56,7 +56,7 @@ ; vector<wstring> args; - args.push_back(L"--foo=\u044F"); + args.push_back(L"--foo=\x044F"); variables_map vm; store(wcommand_line_parser(args).options(desc).run(), vm); @@ -82,7 +82,7 @@ variables_map vm; store(command_line_parser(args).options(desc).run(), vm); - BOOST_TEST(vm["foo"].as<wstring>() == L"\u044F"); + BOOST_TEST(vm["foo"].as<wstring>() == L"\x044F"); } // Since we've already tested conversion between parser encoding and @@ -100,7 +100,7 @@ ("foo", po::value<string>(), "unicode option") ; - std::wstringstream stream(L"foo = \u044F"); + std::wstringstream stream(L"foo = \x044F"); variables_map vm; store(parse_config_file(stream, desc), vm); =================================================================== Those changes let the test pass on CW8, and they still work on VC7.1. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com - 102708583/icq

Hi Rene,
Did some more investigation and the basic reason for the failure is that the CW8 lexer doesn't do phase 1 translations of the universal character names. So for string like: L"--foo=\u044F", the "\u044F" is translated into 'u'+'0'+'4'+'4'+'F'. The workaround, which should apply to any compiler that doesn't do universal character names, is to use "\x044F" instead:
Thanks for investigating! I've committed your patch. Thanks, Volodya
participants (2)
-
Rene Rivera
-
Vladimir Prus