[C++0x] Emulate C++0x char16_t, char32_t, std::u16string, and std::u32string

I'm starting to fold in Boost.Filesystem support for the new C++0x character types. Support is emulated for compilers and standard libraries not currently supporting the 0x features. The emulation is working fine, with tests passing on Windows for GCC 4.5 and 4.6, and VC++ 8, 9, and 10. Haven't tested on non-Windows systems yet. This is the same emulation approach Microsoft ships in VC++ 10. It seems to me that all Boost libraries that want to emulate these 0X features should use the a unified approach. Otherwise we could get into a situation where libraries A and B worked fine in isolation, but had symbol or ODR clashes when used together. The header I'm using is attached. I propose to place this in <boost/string_0x.hpp> rather than, say, <boost/filesystem/detail/string_0x.hpp>, and providing a simple doc page. Does this make sense? --Beman

2011/7/21 Beman Dawes <bdawes@acm.org>:
This is the same emulation approach Microsoft ships in VC++ 10.
It is a bad approach, and will produce a lot of future workarounds. We can still remember wchar_t as a typedef for some compilers and a BOOST_NO_INTRINSIC_WCHAR_T macro for that defect. However some emulation for char16_t, char32_t types is required, but it must not be a typedef!
It seems to me that all Boost libraries that want to emulate these 0X features should use the a unified approach. Otherwise we could get into a situation where libraries A and B worked fine in isolation, but had symbol or ODR clashes when used together.
Totally agree.
The header I'm using is attached. I propose to place this in <boost/string_0x.hpp> rather than, say, <boost/filesystem/detail/string_0x.hpp>, and providing a simple doc page.
Agree, but please, don't implement char16_t, char32_t as a typedef! Best regards, Antony Polukhin

On Wed, Jul 20, 2011 at 4:45 PM, Antony Polukhin <antoshkka@gmail.com> wrote:
2011/7/21 Beman Dawes <bdawes@acm.org>:
This is the same emulation approach Microsoft ships in VC++ 10.
It is a bad approach, and will produce a lot of future workarounds. We can still remember wchar_t as a typedef for some compilers and a BOOST_NO_INTRINSIC_WCHAR_T macro for that defect.
However some emulation for char16_t, char32_t types is required, but it must not be a typedef!
It seems to me that all Boost libraries that want to emulate these 0X features should use the a unified approach. Otherwise we could get into a situation where libraries A and B worked fine in isolation, but had symbol or ODR clashes when used together.
Totally agree.
The header I'm using is attached. I propose to place this in <boost/string_0x.hpp> rather than, say, <boost/filesystem/detail/string_0x.hpp>, and providing a simple doc page.
Agree, but please, don't implement char16_t, char32_t as a typedef!
There is no choice, unless I'm missing something. Microsoft has already taken that approach to emulation, so that cat out of the bag. You can also define BOOST_NO_STRING_0X_EMULATION and turn it off, if you would rather not have emulation. --Beman

It is a bad approach, and will produce a lot of future workarounds. We can still remember wchar_t as a typedef for some compilers and a BOOST_NO_INTRINSIC_WCHAR_T macro for that defect.
However some emulation for char16_t, char32_t types is required, but it must not be a typedef!
Then what's the alternative? IMO there just isn't one, given that char16_t must be a literal type even on non-C++0x comilers. John.

----- Original Message ----
From: Beman Dawes <bdawes@acm.org> To: Boost Developers List <boost@lists.boost.org> Sent: Wed, July 20, 2011 11:27:38 PM Subject: [boost] [C++0x] Emulate C++0x char16_t, char32_t, std::u16string, and std::u32string
I'm starting to fold in Boost.Filesystem support for the new C++0x character types. Support is emulated for compilers and standard libraries not currently supporting the 0x features. The emulation is working fine, with tests passing on Windows for GCC 4.5 and 4.6, and VC++ 8, 9, and 10. Haven't tested on non-Windows systems yet.
This is the same emulation approach Microsoft ships in VC++ 10.
It seems to me that all Boost libraries that want to emulate these 0X features should use the a unified approach. Otherwise we could get into a situation where libraries A and B worked fine in isolation, but had symbol or ODR clashes when used together.
The header I'm using is attached. I propose to place this in <boost/string_0x.hpp> rather than, say, <boost/filesystem/detail/string_0x.hpp>, and providing a simple doc page.
Does this make sense?
--Beman
No, you can't emulate them. Emulation of char16_t/char32_t is useless for any real use. You can't create working std::basic_ostringstream<char16_t> stream; Because stream << 1245 would not work due to lack of std::locale facets. You can't create requires facets as for example they are specialized in many standard libraries. Even existing Microsoft's VC2010 does not work if you compile application with /MD or /MDd Note: char, wchar_t, char16_t and char32_t are much more then basic types that can be distinguished, they bring character information with them. If you want to represent a UTF-16 or UTF-32 code unit just use uint16_t or uint32_t, like for example ICU does for UChar and Qt does for QChar, but this isn't something that suppose to work with standard library in place where characters exists. Also for File system? Please, don't try make it more complicated then it is now. You want to make boost.filesystem better? Make it use UTF-8 on Windows by default and drop all "wide-crap" (sorry windows users). All operating systems around (with one exception) use char * API and one operating system uses utf-16/wchar_t API. So adding arbitrary character that no operating system uses seems to be waste of effort. I **personally** don't see any benefit in adding char16_t/char32_t emulation to the Boost and specialty to the Boost.Filesystem. Today Boost.Filesystem has enough problems besides char16_t/char32_t. Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/

On Thu, Jul 21, 2011 at 8:12 AM, Artyom Beilis <artyomtnk@yahoo.com> wrote:
You want to make boost.filesystem better? Make it use UTF-8 on Windows by default and drop all "wide-crap" (sorry windows users).
All operating systems around (with one exception) use char * API and one operating system uses utf-16/wchar_t API.
So adding arbitrary character that no operating system uses seems to be waste of effort.
I **personally** don't see any benefit in adding char16_t/char32_t emulation to the Boost and specialty to the Boost.Filesystem.
Today Boost.Filesystem has enough problems besides char16_t/char32_t.
Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/ _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I think it's a bit of a stretch to say that 'no operating system' uses wide characters/strings. Sure, it's only one OS, but it's not like it's some obscure platform that very few people use. Windows is a massive target, and it needs to be given first-rate support, regardless of whether it's annoying/wasteful in terms of implementation.

On Wed, Jul 20, 2011 at 6:12 PM, Artyom Beilis <artyomtnk@yahoo.com> wrote:
... Does this make sense?
--Beman
No, you can't emulate them.
Sorry, but people have been doing this for at least ten years, although the specific type names used vary.
Emulation of char16_t/char32_t is useless for any real use.
People have been using char16_t/char32_t or equivalent to handle UTF-16/UTF-32 for years. It does work, it is useful, it is used in production code, and Boost.Filesystem users are asking for it. So there are existence proofs that this sort of emulation does work for enough purposes to be quite useful.
You can't create working
std::basic_ostringstream<char16_t> stream;
Because stream << 1245 would not work due to lack of std::locale facets.
You can't create requires facets as for example they are specialized in many standard libraries.
Emulation via simple uint16_t and uint32_t typedefs doesn't work for all use cases. So only use it when it does work.
Even existing Microsoft's VC2010 does not work if you compile application with /MD or /MDd
I'll retest just to be sure, but I'm fairly sure that some of my tests have used those switches.
Note: char, wchar_t, char16_t and char32_t are much more then basic types that can be distinguished, they bring character information with them.
If you want to represent a UTF-16 or UTF-32 code unit just use uint16_t or uint32_t, like for example ICU does for UChar and Qt does for QChar, but this isn't something that suppose to work with standard library in place where characters exists.
Ummmm? Did you look at the attachment? That's what it does. Uses uint16_t and uint32_t if the compiler does not supply the new character types, otherwise just uses the supplied standard library unchanged.
Also for File system? Please, don't try make it more complicated then it is now.
There is no change to the filesystem interface; char16_t and char32_t were designed into V3 right from the beginning. It is just a case of adding char16_t/char32_t overloads to some implementation code. (That may not be entirely correct for POSIX systems when the native char encoding for filenames is not UTF-8. I'm just about to work that out.)
You want to make boost.filesystem better? Make it use UTF-8 on Windows by default and drop all "wide-crap" (sorry windows users).
Well, that certainly would be exciting:-) But more seriously, Boost.Filesystem and the C++ standard library are designed to work with native encodings as well as UTF-8, UTF-16, and UTF-32. Users will do what best serves their interests, and means a plethora of encodings for years and years to come. Get over it.
All operating systems around (with one exception) use char * API and one operating system uses utf-16/wchar_t API.
So adding arbitrary character that no operating system uses seems to be waste of effort.
The issue isn't what the operating system uses, it is what users want and the standard library demands. We are moving from a C++ world that only supports char and wchar_t to a C++ world that supports char, wchar_t, char16_t, and char32_t.
I **personally** don't see any benefit in adding char16_t/char32_t emulation to the Boost and specialty to the Boost.Filesystem.
Today Boost.Filesystem has enough problems besides char16_t/char32_t.
Lack of char16_t/char32_t support is seen as a problem by some users, plus I'm working on that portion of the code now in a effort to clear tickets released to locale, codecvt, and character encoding issues. --Beman

On 07/20/2011 10:27 PM, Beman Dawes wrote:
The header I'm using is attached. I propose to place this in <boost/string_0x.hpp> rather than, say, <boost/filesystem/detail/string_0x.hpp>, and providing a simple doc page.
This header adds stuff to namespace std. That's not allowed, and doesn't even work with some standard library implementations.

On Wed, Jul 20, 2011 at 8:37 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 07/20/2011 10:27 PM, Beman Dawes wrote:
The header I'm using is attached. I propose to place this in <boost/string_0x.hpp> rather than, say, <boost/filesystem/detail/string_0x.hpp>, and providing a simple doc page.
This header adds stuff to namespace std.
That's not allowed,
What's being added is stuff that is in fact specified in the standard, or as near to it as the compiler will permit.
and doesn't even work with some standard library implementations.
As those cases arise, the #if's will be expanded as needed. The idea is to do what is possible, allowing most non-conforming and legacy compilers and libraries to get more mileage out of Unicode encodings. It won't be perfect, and the typedef technique isn't pretty, but it does seem to solve a problem real users are asking be solved. --Beman

On 21/07/2011 04:07, Beman Dawes wrote:
The idea is to do what is possible, allowing most non-conforming and legacy compilers and libraries to get more mileage out of Unicode encodings. It won't be perfect, and the typedef technique isn't pretty, but it does seem to solve a problem real users are asking be solved.
What's wrong with defining boost::char32 as a typedef of char32_t if it is available and uint_least32_t otherwise?

On Thu, Jul 21, 2011 at 5:00 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 21/07/2011 04:07, Beman Dawes wrote:
The idea is to do what is possible, allowing most non-conforming and legacy compilers and libraries to get more mileage out of Unicode encodings. It won't be perfect, and the typedef technique isn't pretty, but it does seem to solve a problem real users are asking be solved.
What's wrong with defining boost::char32 as a typedef of char32_t if it is available and uint_least32_t otherwise?
Very little:-) It is a bit more conservative, more in line with how boost usually handles such things, and what we are likely to do in the end. But consider this... The two most widely used compilers, GCC and VC++, are already shipping with char16_t and char32_t in the global namespace. GCC via actual keywords, VC++ via typedefs. So other compilers are bound to soon follow. Thus it seems a bit of a detour for boost to introduce boost::char16_t/boost::char32_t. --Beman --Beman

On 07/21/2011 04:58 PM, Beman Dawes wrote:
But consider this... The two most widely used compilers, GCC and VC++, are already shipping with char16_t and char32_t in the global namespace. GCC via actual keywords, VC++ via typedefs. So other compilers are bound to soon follow. Thus it seems a bit of a detour for boost to introduce boost::char16_t/boost::char32_t.
Note that you can't have boost::char16_t, since char16_t is a keyword.

On Thu, Jul 21, 2011 at 5:35 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 07/21/2011 04:58 PM, Beman Dawes wrote:
But consider this... The two most widely used compilers, GCC and VC++, are already shipping with char16_t and char32_t in the global namespace. GCC via actual keywords, VC++ via typedefs. So other compilers are bound to soon follow. Thus it seems a bit of a detour for boost to introduce boost::char16_t/boost::char32_t.
Note that you can't have boost::char16_t, since char16_t is a keyword.
Right, instead you have to write BOOST_CHAR16_T, which is defined as boost::char16_t or char16_t depending on availability of C++0x or Microsoft char16_t. Ugly. Ugly. Ugly. The filesystem path POSIX implementation of the char16_t/char32_t support is underway, still using the header I posted. No problems so far. --Beman

On 07/22/2011 04:13 AM, Beman Dawes wrote:
Right, instead you have to write BOOST_CHAR16_T, which is defined as boost::char16_t or char16_t depending on availability of C++0x or Microsoft char16_t. Ugly. Ugly. Ugly.
Or, as I said, you use boost::char16, which is a typedef to char16_t if available. No macro, no ugliness, portable and safe.

On Fri, Jul 22, 2011 at 5:03 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 07/22/2011 04:13 AM, Beman Dawes wrote:
Right, instead you have to write BOOST_CHAR16_T, which is defined as boost::char16_t or char16_t depending on availability of C++0x or Microsoft char16_t. Ugly. Ugly. Ugly.
Or, as I said, you use boost::char16, which is a typedef to char16_t if available.
No macro, no ugliness, portable and safe.
Ah! Sorry, I missed the lack of _t. Yes, that would be a better approach. --Beman

I'm starting to fold in Boost.Filesystem support for the new C++0x character types. Support is emulated for compilers and standard libraries not currently supporting the 0x features. The emulation is working fine, with tests passing on Windows for GCC 4.5 and 4.6, and VC++ 8, 9, and 10. Haven't tested on non-Windows systems yet.
This is the same emulation approach Microsoft ships in VC++ 10.
It seems to me that all Boost libraries that want to emulate these 0X features should use the a unified approach. Otherwise we could get into a situation where libraries A and B worked fine in isolation, but had symbol or ODR clashes when used together.
The header I'm using is attached. I propose to place this in <boost/string_0x.hpp> rather than, say, <boost/filesystem/detail/string_0x.hpp>, and providing a simple doc page.
Does this make sense?
Yes but..... won't there be (possibly linker) errors from those std::string specializations? My gut feeling is that at the very least, some kind of basic_string concept check is required for those new string types. John.
participants (6)
-
Antony Polukhin
-
Artyom Beilis
-
Beman Dawes
-
John Maddock
-
Joshua Boyce
-
Mathias Gaunard