
On Thu, Oct 27, 2011 at 3:31 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Wed, Oct 26, 2011 at 22:13, Beman Dawes <bdawes@acm.org> wrote:
On Wed, Oct 26, 2011 at 6:24 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
[...]
Even if you fix the Unicode problems,
What Unicode problems are you running into? Although there are some locale related tickets outstanding, I'm not aware of any Unicode issues.
1) The one that was brought up in the previous thread.
That resulted in a ticket being opened, to fix a problem specific to MinGW. It will get fixed as time permits.
2) The complexity of writing portable unicode-aware code: currently you're forcing me to a) use wstring on windows, or if I prefer to use my favorite portable UTF-8 encoded strings b) write all the boilerplate code that passes codecvt everywhere as a parameter (see below why ¬imbue()).
In both cases you're shifting the complexity to the higher-level code. It's not a kind thing for you as a low-level library developer to do,
I don't know of any other viable approaches. I'm sorry you find the boilerplate objectionable, but I'm not about to change to a default that would enforce UTF-8 or any other particular narrow string encoding. That's a much wider problem than Boost.Filesystem.
The library is expected to ℍ𝕚𝕕𝕖 the platform differences by providing a uniform interface.
Initially the plan was to provide both a uniform interface in terms of syntax and semantics. User reaction to uniform syntax was very positive, and I've tried to provide that to the maximum extent possible as far as the API goes. Uniform semantics turned out to be much more complex. Paths are one of the areas where acknowledging the difference between generic paths and native paths is something that users want and need.
⇒ Myth: Using the native encoding on each platform results in portable code.
Hum... I don't recall anyone every claiming that "native encoding on each platform results in portable code". It is way more complex than that.
⇒ Use boost⸬filesystem⸬imbue to convert b to c. ‽ Who is responsible for calling imbue()? I'm writing library code. I'm not allowed to change the global-state.
Right. Library code writers have to avoid changing global state if they want to keep users happy. Nothing unusual about Boost.Filesystem in that respect.
⇒ This code will break: int main(int argc, char* argv[]) { fs::ifstream fin(argv[1]); } ‽ It works fine for ASCII characters on all sane platform. For non-ASCII, I don't care. It's already not unicode-aware if the native encoding is not UTF-8 (which can't be so on Windows). If the writer of this code really cares about internationalization, she can use boost⸬program_options (assuming it's also changed to follow the UTF-8 convention). Otherwise she's a hypocrite.
I disagree with your assertion that the above code will break. It is in all essential aspects the same as: int main(int argc, char* argv[]) { std::ifstream fin(argv[1]); } While the results may not be what the coder expected, that's an issue well beyond the scope of the standard library or Boost.Filesystem.
⇒ UTF-8 is slow. ‽ Compared to what? You haven't measured this.
Actually, I have measured it many times, and never found UTF-8 to be a bottleneck on European, North American, or South American data sets. I haven't measured UTF-8 with Asian data sets; they tended to use other encodings.
Experience shows that the small overhead (if it's an overhead at all) is not the bottleneck. Many cross-platform libraries already switched to UTF-8 for narrow-chars (see one of the previous discussion for a list), and I don't see a reason why boost can't be the next.
Boost libraries could use UTF-8 for narrow characters as a default, but only where they aren't interfacing with existing code and/or operating systems. --Beman