
At 12:49 AM 2/16/2004, Walter Landry wrote:
Beman Dawes <bdawes@acm.org> wrote:
At 12:19 PM 2/15/2004, Paul Miller wrote:
Presumably Linux still works with multi-byte characters.
Is there progres toward a wchar_t-aware path?
Yes. I now have the outline of a design for the internationalization of
Boost.Filesystem paths.
Care to share? I'm curious how you handle some of the legacy Japanese encodings.
The framework looks something like this: There are internal representation types like char, wchar_t, or user-defined character types meeting std::string requirements. Those are handled by path, wpath, or a basic_path class template respectively. The encoding of char and wchar_t, of course, are defined by the compiler. The encodings of UDT's are defined by their implementations. There is one (usually, but with exceptions) external representation type. Each representation type may support multiple external path name encodings, including user defined encodings, subject to the operating system's encoding limitations. There will be a locale based (ie codecvt) mechanism for converting between the internal representation type and encoding, and the external representation type and encoding. The mechanisms for default and explicit locale operations will presumably be modeled on those of I/O streams. So handling the legacy Japanese encodings works like this: The programmer selects an internal type and encoding that can represent those external types and encodings. Perhaps wchar_t, but perhaps some UDT. The external type and encoding is presumably the operating system's default. The default locale mechanism will provide the codecvt facet to handle the conversions. So on a Japanese O/S, the external representation may be one of the legacy encodings, and if so the correct conversions will take place. Does that make sense? --Beman