
Don G wrote:
Hi Erik,
I thought I would jump in with some small observations:
That's what I'm here for! :)
You do care about the representation when communicating with system API's or writing data to networks or files. For example, say, UTF-32 was the chosen representation, some programmers would be constantly converting to UTF-16 to call the system, and vise-versa if UTF-16 is chosen where the system wants something else.
Yes. This is correct, but conversion to/from the native string type (usually UTF-16) should be abstracted by the library. (Through some get_native_string() function in the string class.) The casual user should not need to do this him-/herself. That's how I feel anyway.
Here again, the performance measure could easily be dominated by conversions to the underlying system's encoding, depending on the application.
Quite the truth. We will have to look into how big of a problem this actually is.
Also, on some systems, particularly Mac, the system not only has an encoding preference, it doesn't particularly like "wchar_t *" either. On the Mac, most text is a CFString (a handle of sorts to the text). On Windows, you encounter BSTR's as well.
Yep. The idea is that this would all be wrapped in the get_native_string() function mentioned above. This will of course require some work to make an implementation of that function for every platform in use today, but I think it will be worth it.
In my not-so-nearly-thought-out work on this, I decided to have the default encoding be platform specific to eliminate the enormous number of conversions that might be otherwise needed. For example, on the Mac, I had an allocator-like strategy that allowed all unicode_strings to be backed by a CFString. There was a get_native() method that returned a platform-specific value (documented on a per platform basis) to allow platform-specific code to work more optimally.
Yep. Basically what the library does already. Except it doesn't use the native type behind the scenes.
Just some thoughts...
Well appreciated.
Best, Don
- Erik