[review] Boost.Nowide is Accepted into Boost
Dear all, I would like to thank all the people who took part in the discussions during the review. All contributors admitted that Boost.Nowide addresses a real issue and solves it, at least partially. Moreover we received 8 official reviews with 6 “accept”, 1 “do not accept as is” and 1 “I am not against it”. So Boost.Nowide is accepted for inclusion in Boost. Congratulation and many thanks to Artyom Beilis for this contribution. However, 2 major issues arose in the discussions and they will have to be fixed before inclusion: 1. handling of ill-formed Unicode input should be improved/clarified 2. the documentation needs improvements In particular, this needs to be fixed: * Design: There was a lot of discussions on what to do with ill-formed UTF-16 strings on Windows, in particular coming from the file system while converting them to narrow UTF-8 strings. Basically, 3 options: 1. allow the roundtrip ill formed UTF-16 > UTF-8 > ill formed UTF-16, this is not possible with UTF-8 encoding, WTF-8 (not a standard) may be an option with some issues 2. issue an error (today’s nowide option, even if the author would like to move to 3.) 3. convert as much as possible to fully conformant UTF-8 with character replacement U+FFFD for invalid input. This is what the NT kernel does with functions like RtlUTF8ToUnicodeN. This would allow cout to continue to work after invalid output instead of failing. After reviewing what was chosen for std::filesystem in C++17, I think 3. is the right think to do: - std::filesystem::path::u8string does not convert strings on Posix. If a conversion is done (on Windows) the output is fully conformant UTF-8. - the roundtrip is not guaranteed in std::filesystem::path conversions (implementation defined) but the roundtrip of a converted paths is guaranteed to work which is the case with Boost.Nowide: once in fully conformant UTF-8, narrow > wide > narrow works fine. - the use of the replacement character U+FFFD is what is done by the NT kernel (question: do you have to reimplement RtlUTF8ToUnicodeN and RtlUnicodeToUTF8N, why not using them)? It should been made clear in the doc that this choice does not guarantees the roundtrip on Windows so that ill-formed filename may not be opened. * Documentation: - the main part of the doc (the non-reference part) should be more detailed - it should be clear to the reader that Windows and Posix platforms are not handled the same; on Windows, we get UTF-8 strings while on Posix platforms, the user should not expect UTF-8 encoding, just narrow strings, even if UTF-8 is today the most common encoding (addressed in Q/A but is probably too far away). It must be clear from the beginning that on Posix platforms, the library does nothing (including no checking of UTF-8 conformance), just forward to the standard library. - review the use of UTF-8 so that it do not let think that narrow strings are always UTF-8 as this is not necessary the case on Posix platform; maybe just use narrow string? - clarify what is converted to what and when: for each function, we need to know precisely what is the output for all possible input; in particular, better explain for each function what the library does with ill-formed input (either UTF-16 input or narrow input), in particular for args and getenv; say clearly what happens (failbit, error, exception, invalid character…) - clarify what is done for Windows and what is (not) done for Posix - explain better what is behind nowide::cout, cin, cerr, what is done? - clarify what does the filesystem integration on Windows vs Posix - check if basic_stackstring::swap needs to swap buffer_ when both strings are on the stack - The doc/ directory is missing a Jamfile, and there's also no meta/libraries.json. * Implementation: - The global BOOST_USE_WINDOWS_H macro should probably be respected. When windows.h is not used, the WinAPI prototypes are declared in the global namespace. This can cause issues when windows.h is also included before/after the Nowide headers, and the prototypes don't match exactly (e.g. short vs. wchar_t). Also it pollutes the global namespace. Nowide should do what other Boost libraries do, and declare the prototypes in a private detail namespace instead. Example in <boost/thread/win32/thread_primitives.hpp>. Need to add a namespace around your definitions. - MinGW gcc: fix warnings in c++03 mode and errors in c++11 and above - Cygwin gcc/clang (errors because of missing ::getenv and friends) - when this is fixed, an error remains: in file included from test\test_system.cpp:11:0: ..\../boost/nowide/cenv.hpp: In function 'int boost::nowide::putenv(char*)': ..\../boost/nowide/cenv.hpp:104:41: warning: comparison between pointer and zero character constant [-Wpointer-compare] while(*key_end!='=' && key_end!='\0') - Under MSYS, one test failed Fail: Error boost::nowide::cin.putback(c) in test_iostream.cpp:17 main - filesystem integration should return somehow the previous locale for potential undoing * Misc: - There are no .travis.yml and appveyor.yml files in the root. - a number of copy/paste references to Boost.System (test/Jamfile) and Boost.Locale (index.html), should be corrected This are just suggestions and are not mandatory for acceptance: - implement stat/opendir/readir - integration with boost::interprocess::file_lock - integration with boost::program_options (in particular for correct line breaks when showing options) Frédéric
On Fri, Jun 23, 2017 at 4:34 PM, Frédéric Bron <frederic.bron@m4x.org> wrote:
Dear all,
I would like to thank all the people who took part in the discussions during the review. All contributors admitted that Boost.Nowide addresses a real issue and solves it, at least partially. Moreover we received 8 official reviews with 6 “accept”, 1 “do not accept as is” and 1 “I am not against it”.
So Boost.Nowide is accepted for inclusion in Boost. Congratulation and many thanks to Artyom Beilis for this contribution.
I would like to thank all the reviewers who provided very valuable and important comments regarding the library and brought very important notes that would improve the Boost.Nowide library. I want also to express my deepest gratitude to Frédéric Bron who managed the review, collected all the data regarding the library, did a great code review on his own and finally created an excellent summary of all issues that were brought in this review. THANK YOU!
However, 2 major issues arose in the discussions and they will have to be fixed before inclusion:
1. handling of ill-formed Unicode input should be improved/clarified 2. the documentation needs improvements
In particular, this needs to be fixed: [snip]
Now its time to get my hands on the code :-) Once again thank you very much! Artyom Beilis
participants (3)
-
Artyom Beilis
-
Frédéric Bron
-
Peter Dimov