
-----Original Message----- On Behalf Of Ferdinand Prantl Sent: Tuesday, August 24, 2004 4:30 AM To: boost@lists.boost.org Subject: Re: [boost] Re: Re: New design proposal for boost::filesystem
I have nothing against usage of UTF-8 if it suits the scenario well. I just say that it is not an encoding for all purposes. It is a multibyte one and so extremely inefficient for getting size, searching, etc.
Why to prescribe it for all boost::filesystem users and force them to
[Bennett, Patrick] I fail to see how this is the case. Right now, today, filesystem supports *only* ascii. If you continued to use ASCII, nothing would change. There is zero speed penalty for calculating the # of *bytes* in an utf-8 string. If you want to determine the # of characters, then there is, but then only if you're actually working on a Unicode string. There is no getting around this for an international application, no matter what encoding is used. put
recoding into their sources,
when it can be achived inside the boost::filesystem as it is done in std::streams? I would like to have the boost filesystem as flexible as possible. Someone can work with filenames in std::string in current locale, in UTF-8 or a different locale, someone can use std::wstring with UCS-2 or UTF-16, etc. The question is, if such a flexibility is not so rare,
[Bennett, Patrick] Absolutely no recoding would be necessary for current users of boost::filesystem. boost::filesystem has no support for unicode today, so why would they have to recode anything? that
it rather spoils the interface. I don't think so.
Win32 doesn't support UTF-8 filenames natively. That's why boost::filesystem would have to convert to/from UCS-2 along Win32 interface boundaries. If you're concerned about other platforms, you shouldn't be. boost::filesystem currently works only with latin encodings in ascii strings so no functionality would be taken away.
UTF-8 is not identical with the complete iso-8859-1 (latin1) codepage. Some code could be broken by accepting UTF-8 in the new version.
[Bennett, Patrick] Hmmm, good point, but... would it break for any of the characters that are valid characters for a path or filename on an 8859-1 system? No, not that I can think of.
Linux can be configured to support UTF-8 natively. However, it is not necessary and depends on your locale installation and configuration.
By imbuing I meant the conversion "application filenames encoding" -> "machine filenames encoding". Instead of putting a platform dependent code into conditions, which does the translation, one could simply say "I am running in UTF-8, boost::filesystem, please understand it and do the system translation for me". Exceptoins could sourt out incompatibilities.
boost::filesystem::imbue("UTF-8"); // more abstract than codecvt pseudocode :-)
In this example an internal conversion into UCS-2 would be done on Windows, on Linux it would depend on the configured locale and on the other systems, which could support ASCII only, it would convert into ASCII only. However, it does not constrain the application from running wholly in wchar_t (e.g. UCS-2) or char (UTF-8 or something else), or does not force the user to write extra code for character conversion if it is not necessary.
[Bennett, Patrick] If you can think of a good way of handling this that doesn't involve a mess of codepages, locales, and facets, then I'm all for it. Frankly I think C++'s 'built-in' internationalization support is a nightmare, but that's probably just me. My (intentional) limited exposure to them probably hasn't helped. It's hard to beat having a 'single' encoding like UTF-8 that can handle all defined characters. Unicode is definitely the way to go IMO. My real issue with boost::filesystem is that as currently defined, it's unusable in an application that will be used around the world. My initial response to this whole thread was just to point out to David that there *are* issues preventing people from using the library. He didn't think there were any, so I was compelled to point out what one of the issues was for me at least. At the company where I work we're currently just pursuing our own wrappers for what filesystem provides. I originally tried using filesystem, but once I saw that it's handling of internationalization was absent, I had no choice but to dump it. I certainly have an interest in it being improved, and I could see looking at it again, but someone will have to spearhead that initiative. Considering that this hasn't really been brought up before tells me that people either aren't using the library, or simply don't care about internationalization (probably the latter). I, unfortunately, don't have that luxury. Cheers... Patrick Bennett