boost::filesystem::path operator/= is broken for char[1] in 1.48
Hi!
Please consider the code below:
=== BUG ===
boost::filesystem::path some_path("first");
char path_component[1]; /* = "second" */
some_path /= path_component;
Some_path value is expected to be "first/second" but it is "first".
I.e operator /= had no effect.
=== END BUG ===
Concerning a longer string in a char[1] buffer.
Actually path_component is the *last* field in a heap-allocated structure.
The structure is over-allocated to enable storing a longer string in a char[1] buffer.
This technique is often used in C code.
Here is operator /= callstack.
#0 boost::filesystem3::path_traits::empty
Hello Nick,
please see my answer below.
On Fri, Apr 20, 2012 at 6:08 PM, Nick Zavaritsky
Hi!
Please consider the code below:
=== BUG ===
boost::filesystem::path some_path("first"); char path_component[1]; /* = "second" */ some_path /= path_component;
Some_path value is expected to be "first/second" but it is "first". I.e operator /= had no effect.
I think the idea behind this is that any char* related string must be null terminated. So char[1] has only place for '\0'...
=== END BUG ===
Concerning a longer string in a char[1] buffer.
Actually path_component is the *last* field in a heap-allocated structure.
The structure is over-allocated to enable storing a longer string in a char[1] buffer. This technique is often used in C code.
This over-allocation is pretty much the same as reserving some space at the end of the data structure to place data together. Am I right? Why isn't it possible to just write char[128] or char[16] instead of char[1]? And if you know that this buffer represents some special case, why don't you pass it as such to filesystem? &char[0], than boost::filesystem will not assume it got char[1] and will be looking to '\0' marker...
Here is operator /= callstack.
#0 boost::filesystem3::path_traits::empty
() at path_traits.hpp:89 #1 0x0000000100001e7e in boost::filesystem3::path::append (this=0x7fff5fbff910, source=@0x100100270, cvt=@0x100100280) at path.hpp:655 #2 0x0000000100001f77 in boost::filesystem3::path::operator/= (this=0x7fff5fbff910, source=@0x100100270) at path.hpp:235 The relevant excerpt from boost::filesystem3::path_traits::empty follows:
namespace path_traits { ... template
inline bool empty(T (&)[N]) { return N <= 1; } ... } It appears that boost::filesystem::path treats char[1] as an empty path regardless of the actual content. This is definitely wrong due to so many C libraries storing long strings in char[1] buffers. In particular on Mac OS X fts(3) FTSENT::fts_name is char[1].
I believe the current operator =/ behavior is outright dangerous and should be fixed. By the way the only code currently benifiting from the boost::filesystem::path being overly smart is the code appending "" (empty string literal) to pathes.
Regards, Ovanes
On Fri, Apr 20, 2012 at 07:15:53PM +0200, Ovanes Markarian wrote:
Actually path_component is the *last* field in a heap-allocated structure.
The structure is over-allocated to enable storing a longer string in a char[1] buffer. This technique is often used in C code.
This over-allocation is pretty much the same as reserving some space at the end of the data structure to place data together. Am I right? Why isn't it possible to just write char[128] or char[16] instead of char[1]? And if you know that this buffer represents some special case, why don't you pass it as such to filesystem? &char[0], than boost::filesystem will not assume it got char[1] and will be looking to '\0' marker...
The trick in C is to malloc the struct with a larger size. A canonical example is: typedef struct S { size_t bytes; char flex[1]; } S; S* s = malloc(sizeof(S) + (amount-1)); s->bytes = amount; That is, you have bonus bytes after the struct which you can access through the last struct member. Variants on this is to use a zero-length-array in the language dialects that support it. It's a very common way of allocating a variable-size array with bundled size in a single chunk, which makes cleanup a single free() and gives you some coherence between the size and data. In that world, it's perfectly sane and rational to read past the end of an array, and some compilers have explicit exceptions to reading past the last element of an end-of-struct 1- or 0-sized array, just because of this. -- Lars Viklund | zao@acc.umu.se
On Fri, Apr 20, 2012 at 1:15 PM, Ovanes Markarian
On Fri, Apr 20, 2012 at 6:08 PM, Nick Zavaritsky
wrote:
boost::filesystem::path some_path("first"); char path_component[1]; /* = "second" */ some_path /= path_component;
Some_path value is expected to be "first/second" but it is "first". I.e operator /= had no effect.
I think the idea behind this is that any char* related string must be null terminated. So char[1] has only place for '\0'...
This technique is often used in C code.
if you know that this buffer represents some special case, why don't you pass it as such to filesystem? &char[0], than boost::filesystem will not assume it got char[1] and will be looking to '\0' marker...
I strongly agree with Ovanes.
I believe the current operator =/ behavior is outright dangerous and should be fixed.
Nope, sorry. If you pass a fixed-length array, a proper C++ library *should* be sensitive to its max. Anything else is unsafe and should be discouraged. In the early days of C, people played all sorts of tricky games because (a) the language didn't support any better alternatives and (b) there was this cowboy mentality of "structured assembler." The fact that we're still dealing with legacy C-style APIs is not a good reason for a modern C++ library to blithely run past the declared end of a char array. C++ tries very hard to bring type-safety to the party. It is far more important to try to keep new code from crashing than it is to accommodate such misleading APIs. When you know you're dealing with code that willfully lies to the compiler -- even if it's been frozen into an OS API -- use Ovanes's workaround to pass char* rather than char[size]. To protect your fellow coders from the crufty details, wrap it in a function layer that obeys language rules.
participants (4)
-
Lars Viklund
-
Nat Linden
-
Nick Zavaritsky
-
Ovanes Markarian