[filesystem] windows/posix inconsistencies.

Hi Beman, I noticed the following in v3 whie hunting down a bug in my unit test: # ifdef BOOST_WINDOWS_API const std::string string() const { return string(codecvt()); } const std::string string(const codecvt_type& cvt) const; // string_type is std::wstring, so there is no conversion const std::wstring& wstring() const { return m_pathname; } const std::wstring& wstring(const codecvt_type&) const { return m_pathname; } # else // BOOST_POSIX_API // string_type is std::string, so there is no conversion const std::string& string() const { return m_pathname; } const std::string& string(const codecvt_type&) const; const std::wstring wstring() const { return wstring(codecvt()); } const std::wstring wstring(const codecvt_type& cvt) const # endif I can understand, that it is more efficient to return a reference when a referenec can be formed. However, consider const std::string& ext = iter->path().extension().string(); This is fine on windows, as a temporary is returned, and its lifetime is extended; on linux, extension() returns a temp path object, from where we get a reference. I'm not a 100% sure, but I think the C++ standard does not guarantee that this path object be kept alive. The net efffect is a crash on linux. I'm not sure its a great idea to have different reference-ness on the return types here. It could lead to other subtle differences. kind regards -Thorsten

[Thorsten Ottosen]
However, consider const std::string& ext = iter->path().extension().string(); This is fine on windows, as a temporary is returned, and its lifetime is extended; on linux, extension() returns a temp path object, from where we get a reference. I'm not a 100% sure, but I think the C++ standard does not guarantee that this path object be kept alive.
* The Standard requires the temporary path to be destroyed at the semicolon. * Replace string with wstring, and this becomes problematic on Windows. * Additionally, returning const values like const string/const wstring inhibits move semantics. STL

Den 31-01-2012 14:47, Stephan T. Lavavej skrev:
[Thorsten Ottosen]
However, consider
* Replace string with wstring, and this becomes problematic on Windows.
Yep.
* Additionally, returning const values like const string/const wstring inhibits move semantics.
True, but on the specific platform it would be faster to get a refence than copy+move. I guess with the new & (I don't know of any compilers that have implemented it yet) feature we could have these overloads in path: const string& string() const; string string() const &&; so when a temporary path object is returned from path::extension(), the second overload is selected. -Thorsten

On Tue, Jan 31, 2012 at 11:33 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
I guess with the new & (I don't know of any compilers that have implemented it yet) feature we could have these overloads in path:
const string& string() const; string string() const &&;
so when a temporary path object is returned from path::extension(), the second overload is selected.
Interesting! I'm clueless about that use of &&. Need to do some reading. Thanks, --Beman

Beman Dawes wrote:
On Tue, Jan 31, 2012 at 11:33 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
I guess with the new & (I don't know of any compilers that have implemented it yet) feature we could have these overloads in path:
const string& string() const; string string() const &&;
so when a temporary path object is returned from path::extension(), the second overload is selected.
Interesting! I'm clueless about that use of &&. Need to do some reading.
Clang 2.9 and 3.0 have this feature (rvalue references for *this). Regards, Michel

On Thu, Feb 2, 2012 at 9:30 AM, Michel Morin <mimomorin@gmail.com> wrote:
Beman Dawes wrote:
On Tue, Jan 31, 2012 at 11:33 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
I guess with the new & (I don't know of any compilers that have implemented it yet) feature we could have these overloads in path:
const string& string() const; string string() const &&;
so when a temporary path object is returned from path::extension(), the second overload is selected.
Interesting! I'm clueless about that use of &&. Need to do some reading.
Clang 2.9 and 3.0 have this feature (rvalue references for *this).
Thanks! That makes it practical to consider rvalue references for *this use in the interface. --Beman

On 02.02.2012, at 15:18, Beman Dawes wrote:
On Tue, Jan 31, 2012 at 11:33 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
I guess with the new & (I don't know of any compilers that have implemented it yet) feature we could have these overloads in path:
const string& string() const; string string() const &&;
so when a temporary path object is returned from path::extension(), the second overload is selected.
Interesting! I'm clueless about that use of &&. Need to do some reading.
I'm pretty sure that's ill-formed. The first overload must be const string& string() const &; because you cannot overload solely based on "one version has a ref-qualifier, the other doesn't". Sebastian

El 31/01/2012 14:47, Stephan T. Lavavej escribió:
[Thorsten Ottosen]
However, consider const std::string& ext = iter->path().extension().string(); This is fine on windows, as a temporary is returned, and its lifetime is extended; on linux, extension() returns a temp path object, from where we get a reference. I'm not a 100% sure, but I think the C++ standard does not guarantee that this path object be kept alive.
* The Standard requires the temporary path to be destroyed at the semicolon.
* Replace string with wstring, and this becomes problematic on Windows.
* Additionally, returning const values like const string/const wstring inhibits move semantics.
I know I repeat myself too many times, but I can't resist! One of the issues I find with move semantics is that sometimes we assume returning by value is free. Returning by value objects with dynamic allocation avoids reusing those dynamic resources (specially in loops). I think returning by value has a nice syntax and properties (you don't need a default-constructed object when you need to create a new object instead of assigning it), but IMHO is premature pessimization. Although returning vectors by value might be a good idea for factories, and such things, it's always more efficient to pass a reference to a vector, and clear() + fill or assign, or even better, reusing already constructed values (which can be vectors of vectors). Using filesystem::path in a loop might trigger a lot of allocations that could be avoided if the string could be passed as an argument, reusing already allocated memory for each path. In this question I totally agree with Alexandrescu (skip until 41:00 if downloaded, or 1:41:00 in the online video according to my web browser): http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2011-C11-Panel-Scott-... Apart from the arguments explained by Alexandrescu and Meyers, I'd say that you must also avoid returning by value objects without dynamic allocation but big sizeof(T) (say std::array<T, 1000>, instead of returning std::vector<T>(1000)), you are going to waste a lot of stack space with that extra temporary. Ion

Den 31-01-2012 21:37, Ion Gaztañaga skrev:
El 31/01/2012 14:47, Stephan T. Lavavej escribió:
[Thorsten Ottosen]
Although returning vectors by value might be a good idea for factories, and such things, it's always more efficient to pass a reference to a vector, and clear() + fill or assign, or even better, reusing already constructed values (which can be vectors of vectors). Using filesystem::path in a loop might trigger a lot of allocations that could be avoided if the string could be passed as an argument, reusing already allocated memory for each path.
In my case "glob" performance is not important; STLSofts version is probabily much faster. Remark 1: with the &/&& syntax of C++11, at least the temporary strings are more efficient, as we can move the value. Remark 2: The fact that functions like extension() returns a temporary path object seems overkill (I know its only a cheap string copy, but still). I fail to see how ".foo" can be a path. I feel that the problem is that there is no uniform handling of strings in the interface/internally. If the path objects stored/acccepted utf8 strings, we would have a very nice, clean interface. Taking extension() as an example, we have the following options: A. return a temporary path B. return a temporary string C. add a function, path::has_extention( const std::string& ); D. add a free function, has_extension( const path& p, const string& ); I think the latter are to be preferred, and I don't think there is any overhead at all with them. -Thorsten

on Tue Jan 31 2012, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:
El 31/01/2012 14:47, Stephan T. Lavavej escribió:
[Thorsten Ottosen]
However, consider const std::string& ext = iter->path().extension().string(); This is fine on windows, as a temporary is returned, and its lifetime is extended; on linux, extension() returns a temp path object, from where we get a reference. I'm not a 100% sure, but I think the C++ standard does not guarantee that this path object be kept alive.
* The Standard requires the temporary path to be destroyed at the semicolon.
* Replace string with wstring, and this becomes problematic on Windows.
* Additionally, returning const values like const string/const wstring inhibits move semantics.
I know I repeat myself too many times, but I can't resist! One of the issues I find with move semantics is that sometimes we assume returning by value is free. Returning by value objects with dynamic allocation avoids reusing those dynamic resources (specially in loops). I think returning by value has a nice syntax and properties (you don't need a default-constructed object when you need to create a new object instead of assigning it), but IMHO is premature pessimization.
Although returning vectors by value might be a good idea for factories, and such things, it's always more efficient to pass a reference to a vector, and clear() + fill or assign, or even better, reusing already constructed values (which can be vectors of vectors).
Are there any numbers to back that claim up? It seems obvious at first glance, of course, but sometimes reality surprises us. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Wed, Feb 1, 2012 at 10:21 AM, Dave Abrahams <dave@boostpro.com> wrote:
on Tue Jan 31 2012, Ion Gaztañaga <igaztanaga-AT-gmail.com> wrote:
... Although returning vectors by value might be a good idea for factories, and such things, it's always more efficient to pass a reference to a vector, and clear() + fill or assign, or even better, reusing already constructed values (which can be vectors of vectors).
Are there any numbers to back that claim up? It seems obvious at first glance, of course, but sometimes reality surprises us.
I ran such a test a couple of years ago. While I've forgotten the details, it had to do with how to pass back the result of string encoding conversions for shortish strings. Return by value did surprisingly well in the particular application. Between optimizing compilers and aggressive CPU performance optimizations, measurement is the only real way to know which is more efficient. --Beman

On Tue, Jan 31, 2012 at 7:59 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:
Hi Beman,
I noticed the following in v3 whie hunting down a bug in my unit test:
# ifdef BOOST_WINDOWS_API const std::string string() const { return string(codecvt()); } const std::string string(const codecvt_type& cvt) const;
// string_type is std::wstring, so there is no conversion const std::wstring& wstring() const { return m_pathname; } const std::wstring& wstring(const codecvt_type&) const { return m_pathname; }
# else // BOOST_POSIX_API // string_type is std::string, so there is no conversion const std::string& string() const { return m_pathname; } const std::string& string(const codecvt_type&) const;
const std::wstring wstring() const { return wstring(codecvt()); } const std::wstring wstring(const codecvt_type& cvt) const
# endif
I can understand, that it is more efficient to return a reference when a referenec can be formed. However, consider
const std::string& ext = iter->path().extension().string();
This is fine on windows, as a temporary is returned, and its lifetime is extended; on linux, extension() returns a temp path object, from where we get a reference. I'm not a 100% sure, but I think the C++ standard does not guarantee that this path object be kept alive.
The net efffect is a crash on linux.
I'm not sure its a great idea to have different reference-ness on the return types here. It could lead to other subtle differences.
This thread directly or indirectly raises a number of issues: * Is a return type that differs between platforms too error prone? It's a tradeoff I agonized over during the design of V3. Returning by value regardless of platform is a perceived performance penalty and may be a real performance penalty for some path intensive applications. Using a non-const reference argument results in verbose user code and has the same performance issues as return by value. * Class path has several functions that logically should return basic_strings but return paths because paths support multiple string types. Is there a better alternative? This one deserves some investigation. Returning a path is a cop-out. Could the functions be templatized on the string return type desired? * Const return types inhibit move semantics. That's a known issue, raised by Howard Hinnant and other LWG members. The plan is to address it as part of applying C++11 to the Filesystem library. * "string string() const &&;" overload? Will investigate. * Request for a has_extension() function. Ugh. Given that class path already has a has_extension() member function, is something is wrong with the rest of the path interface is this one is really needed? Thanks for bringing these to the surface. There won't be any action on them until after the C++ committee meeting next week. --Beman

On Thu, Feb 2, 2012 at 3:14 PM, Beman Dawes <bdawes@acm.org> wrote:
* Is a return type that differs between platforms too error prone?
It's a tradeoff I agonized over during the design of V3. Returning by value regardless of platform is a perceived performance penalty and may be a real performance penalty for some path intensive applications. Using a non-const reference argument results in verbose user code and has the same performance issues as return by value.
Does it provide a performance advantage for cross-platform code or does it require platform-specific code to take advantage of it?
* Class path has several functions that logically should return basic_strings but return paths because paths support multiple string types. Is there a better alternative?
This one deserves some investigation. Returning a path is a cop-out. Could the functions be templatized on the string return type desired?
Disadvantage is that template type has to be specified explicitly then.
* Const return types inhibit move semantics.
Why's that?
That's a known issue, raised by Howard Hinnant and other LWG members. The plan is to address it as part of applying C++11 to the Filesystem library.
* "string string() const &&;" overload?
Will investigate.
* Request for a has_extension() function.
Ugh. Given that class path already has a has_extension() member function, is something is wrong with the rest of the path interface is this one is really needed?
Isn't the request for has given extension? So has_extension(const T&)? Advantage is probably better performance and simpler code. -- Olaf

Olaf van der Spek <ml <at> vdspek.org> writes:
* Const return types inhibit move semantics.
Why's that?
Const-ness inhibits modifying an object, rvalue or not, and modifying the moved-from object is integral to move semantics. I.e., you can't move from a `T const&&` any more than you can move from a `T const&`. Regards

Hi Beman, I have been thinking about the interface exposed by Boost.Filesystem a little. As I said earlier, I base my experience on trying to write portable code that runs on both windows and **nux, at a minimum. I sympathetic to the view that the most efficient underlying representation must be used, but I'm not convinced we can have that 100% while making the interface portable and easy to use. Therefore I think the main interface should focus on being portable, and if someone want to optimize stuff, they can dig into the native parts of the API. Recoomandation 1: Make two typedefs: string_type and native_string_type where the first is portable and the second is not. Recommendation 2: the default sting constructor should take an std::string and assume this contains utf8 unless the system does not support it; then it simple assume the native code-page. This implies that windows code neeed to convert to utf16 in this constructor. (The same can be said about other constructors/append; the presence of these constructors makes it harder to write portable code). Recommendation 3: if users wants to go native, which means non-portable by default, it might be more explicit to add path::fromNativeString(...) to create a path this way. Remark 1: the interface for string is quite hard to grasp, and I fail to see why it's needed. There are 11 functions! Recommendation 4: remove the c_str() function, it can easily be called from a string return. Recommedation 5: the interface only needs 2-4 string functions: string_type string() const; // perhaps with different OS return type const native_string_type& native_string() const; The first will return in utf8, thus requiring a conversion on e.g. windows; the second is not-portable, but guaranteed to require no conversion, The first converts \ to /, the second does not. Recommendation 6: for efficiency, comparison operators can be overloaded for one argument of the type string_type and native_string_type. just my 2 cents kind regards Thorsten
participants (9)
-
Adam Merz
-
Beman Dawes
-
Dave Abrahams
-
Ion Gaztañaga
-
Michel Morin
-
Olaf van der Spek
-
Sebastian Redl
-
Stephan T. Lavavej
-
Thorsten Ottosen