
Hi Patrick,
Patrick Bennett wrote:
Ferdinand Prantl wrote:
What do you think about imbuing and codecvt-like approach in boost::filesystem for the names of the files?
I don't (personally) care for it. More and more libraries and standards are using UTF-8 (for good reason) these days. It's a nice, simple, and flexible encoding.
I have nothing against usage of UTF-8 if it suits the scenario well. I just say that it is not an encoding for all purposes. It is a multibyte one and so extremely inefficient for getting size, searching, etc. Why to prescribe it for all boost::filesystem users and force them to put recoding into their sources, when it can be achived inside the boost::filesystem as it is done in std::streams? I would like to have the boost filesystem as flexible as possible. Someone can work with filenames in std::string in current locale, in UTF-8 or a different locale, someone can use std::wstring with UCS-2 or UTF-16, etc. The question is, if such a flexibility is not so rare, that it rather spoils the interface. I don't think so.
Win32 doesn't support UTF-8 filenames natively. That's why boost::filesystem would have to convert to/from UCS-2 along Win32 interface boundaries. If you're concerned about other platforms, you shouldn't be. boost::filesystem currently works only with latin encodings in ascii strings so no functionality would be taken away.
UTF-8 is not identical with the complete iso-8859-1 (latin1) codepage. Some code could be broken by accepting UTF-8 in the new version.
The UTF-8 representation of ascii strings is identical, so if you already use ascii strings, nothing will change, and nothing will break. If you want your application to be runnable in multiple countries though, an operating system which boost::filesystem has translations defined for would be required. Linux is UTF-8 natively (assuming the right environment variable is set), so boost::filesystem would just pass everything through as-is. The Win32 poet would have to make some simple conversions (Windows even has built-in functions to perform this conversion) . Other platforms might have to have make other conversions to/from UTF-8, but assuming that platform supports Unicode at all, this is a no-brainer.
Linux can be configured to support UTF-8 natively. However, it is not necessary and depends on your locale installation and configuration. By imbuing I meant the conversion "application filenames encoding" -> "machine filenames encoding". Instead of putting a platform dependent code into conditions, which does the translation, one could simply say "I am running in UTF-8, boost::filesystem, please understand it and do the system translation for me". Exceptoins could sourt out incompatibilities. boost::filesystem::imbue("UTF-8"); // more abstract than codecvt pseudocode :-) In this example an internal conversion into UCS-2 would be done on Windows, on Linux it would depend on the configured locale and on the other systems, which could support ASCII only, it would convert into ASCII only. However, it does not constrain the application from running wholly in wchar_t (e.g. UCS-2) or char (UTF-8 or something else), or does not force the user to write extra code for character conversion if it is not necessary. Ferda
Patrick Bennett