RE: [boost] Re: Re: New design proposal for boost::filesystem

Hi, I would prefer to use the standard types like std::string and char * as well. However, I do not like much the idea about a fixed encoding, UTF-8 or something else. It is possible to use any encoding in std::string and even in std::wstring. What do you think about imbuing and codecvt-like approach in boost::filesystem for the names of the files? It has a standard interface for the content of streams, it could serve also the filenames. Not all platforms support UTF-8 filenames, sometinmes could it serve as name mangling support. Ferda
[mailto:boost-bounces@lists.boost.org] On Behalf Of Patrick Bennett
Correct (kind of), but I'd far prefer that std::string be used than for some completely new type to be defined. For users of boost::filesystem, I can't personally think of a time when a user would need to iterate the paths or files a character at a time. Because of UTF-8's nature, even if a user were to search for something like '/', it would still work for find's, [], etc. UTF-8 maps to std::string extremely well. I think there is also a fair amount of precendents already set for using UTF-8 internally using std::string as the storage mechanism. UTF-8 strings don't contain embedded nul's (std::string still works for that though), ASCII characters remains ASCII characters, and you can tell if you're in the middle of a multi-byte sequence.
Since we're talking about filesystem's inability to be used with internationalized applications, and you don't think UTF-8/std::string is the way to do it, what is your recommendation?
Cheers... Patrick Bennett
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ferdinand Prantl wrote:
Hi,
I would prefer to use the standard types like std::string and char * as well.
However, I do not like much the idea about a fixed encoding, UTF-8 or something else. It is possible to use any encoding in std::string and even in std::wstring.
What do you think about imbuing and codecvt-like approach in boost::filesystem for the names of the files?
I don't (personally) care for it. More and more libraries and standards are using UTF-8 (for good reason) these days. It's a nice, simple, and flexible encoding.
It has a standard interface for the content of streams, it could serve also the filenames. Not all platforms support UTF-8 filenames, sometinmes could it serve as name mangling support.
Win32 doesn't support UTF-8 filenames natively. That's why boost::filesystem would have to convert t o/from UCS-2 along Win32 interface boundaries. If you're concerned about other platforms, you shouldn't be. boost::filesystem currently works only with latin encodings in ascii strings so no functionality would be taken away. The UTF-8 representation of ascii strings is identical, so if you already use ascii strings, nothing will change, and nothing will break. If you want your application to be runnable in multiple countries though, an operating system which boost::filesystem has translations defined for would be required. Linux is UTF-8 natively (assuming the right environment variable is set), so boost::filesystem would just pass everything through as-is. The Win32 poet would have to make some simple conversions (Windows even has built-in functions to perform this conversion) . Other platforms might have to have make other conversions to/from UTF-8, but assuming that platform supports Unicode at all, this is a no-brainer. Patrick Bennett

Win32 doesn't support UTF-8 filenames natively. That's why boost::filesystem would have to convert t o/from UCS-2 along Win32 interface boundaries. If you're concerned about other platforms, you shouldn't be. boost::filesystem currently works only with latin encodings in ascii strings so no functionality would be taken away.
Not true: currently the narrow character strings passed to boost.filefsystem are assumed to be in that platforms native encoding - you can for example pass native Windows narrow character strings (not just ACSII ones) to the lib, and actually I don't see why you can't pass UTF-8 on Linux. What you can't do is use the same encoding on all platforms, because the underlying platform API's won't understand them. Heres my (non-portable) test code BTW: namespace fs = boost::filesystem; int _tmain(int argc, _TCHAR* argv[]) { const char* name = "étrange.txt"; fs::path p(name, fs::native); fs::ofstream os(p); os << "Ha! Ha! Ha!"; os.close(); assert(fs::exists(p)); assert(!fs::is_directory(p)); assert(!fs::is_empty(p)); assert(fs::file_size(p)); assert(fs::remove(p)); return 0; } John.
participants (3)
-
Ferdinand Prantl
-
John Maddock
-
Patrick Bennett