RE: [boost] Re: Re: New design proposal for boost::filesystem

24 Aug 2004

...
-----Original Message-----
On Behalf Of Ferdinand Prantl
Sent: Tuesday, August 24, 2004 4:30 AM
To: boost@lists.boost.org
Subject: Re: [boost] Re: Re: New design proposal for boost::filesystem
I have nothing against usage of UTF-8 if it suits the scenario well. I
just say that it is not an encoding for all purposes. It is a
multibyte 
one and so extremely inefficient for getting size, searching, etc.
...
Why to prescribe it for all boost::filesystem users and force them to
[Bennett, Patrick] I fail to see how this is the case.  Right now,
today, filesystem supports *only* ascii.  If you continued to use ASCII,
nothing would change.  There is zero speed penalty for calculating the #
of *bytes* in an utf-8 string.  If you want to determine the # of
characters, then there is, but then only if you're actually working on a
Unicode string.  There is no getting around this for an international
application, no matter what encoding is used.

put
...
recoding into their sources,
...
when it can be achived inside the boost::filesystem as it is done
in std::streams? 
I would like to have the boost filesystem as flexible as
possible. Someone can work with filenames in std::string in current
locale,
in UTF-8 or a different locale, someone can use std::wstring with
UCS-2 or
UTF-16, etc. The question is, if such a flexibility is not so rare,
[Bennett, Patrick] Absolutely no recoding would be necessary for current
users of boost::filesystem.  boost::filesystem has no support for
unicode today, so why would they have to recode anything? 

that
...
it rather spoils the interface. I don't think so.
...
Win32 doesn't support UTF-8 filenames natively. That's why
boost::filesystem
would have to convert to/from UCS-2 along Win32 interface
boundaries.
If you're concerned about other platforms, you shouldn't be.
boost::filesystem
currently works only with latin encodings in ascii strings so no
functionality
would be taken away.
UTF-8 is not identical with the complete iso-8859-1 (latin1) codepage.
Some code could be broken by accepting UTF-8 in the new version.
[Bennett, Patrick] Hmmm, good point, but... would it break for any of
the characters that are valid characters for a path or filename on an
8859-1 system?  No, not that I can think of.
...
Linux can be configured to support UTF-8 natively. However, it is not
necessary and depends on your locale installation and configuration.
By imbuing I meant the conversion "application filenames encoding" ->
"machine filenames encoding". Instead of putting a platform dependent
code
into conditions, which does the translation, one could simply say "I
am
running in UTF-8, boost::filesystem, please understand it and do the
system
translation for me". Exceptoins could sourt out incompatibilities.
boost::filesystem::imbue("UTF-8"); // more abstract than codecvt
pseudocode
:-)
In this example an internal conversion into UCS-2 would be done on
Windows,
on Linux it would depend on the configured locale and on the other
systems,
which could support ASCII only, it would convert into ASCII only.
However,
it does not constrain the application from running wholly in wchar_t
(e.g.
UCS-2) or char (UTF-8 or something else), or does not force the user
to
write extra code for character conversion if it is not necessary.
[Bennett, Patrick] If you can think of a good way of handling this that
doesn't involve a mess of codepages, locales, and facets, then I'm all
for it.  Frankly I think C++'s 'built-in' internationalization support
is a nightmare, but that's probably just me.  My (intentional) limited
exposure to them probably hasn't helped.  It's hard to beat having a
'single' encoding like UTF-8 that can handle all defined characters.
Unicode is definitely the way to go IMO.

My real issue with boost::filesystem is that as currently defined, it's
unusable in an application that will be used around the world.  My
initial response to this whole thread was just to point out to David
that there *are* issues preventing people from using the library.  He
didn't think there were any, so I was compelled to point out what one of
the issues was for me at least.

At the company where I work we're currently just pursuing our own
wrappers for what filesystem provides.  I originally tried using
filesystem, but once I saw that it's handling of internationalization
was absent, I had no choice but to dump it.  I certainly have an
interest in it being improved, and I could see looking at it again, but
someone will have to spearhead that initiative.  Considering that this
hasn't really been brought up before  tells me that people either aren't
using the library, or simply don't care about internationalization
(probably the latter).  I, unfortunately, don't have that luxury.

Cheers...
Patrick Bennett