Re: [boost] [filesystem] Some questions about string use

6 Mar 2006

      "Ion Gaztañaga" <igaztanaga@gmail.com> wrote in message 
news:440ABAE3.8080007@gmail.com...
...
Hi Beman,
...
As Caleb points out, it is premature optimizaton to talk about "hurting
performance" in the absence of timings in realistic use scenarios.
That said, if you can come up with a realistic use case that really does
show significant slow-down compared to some alternate interface, it would 
be
worth talking about.
Ok. I see it like "premature pessimization", but you are right about a
realistic use case. Scanning recursively a directory looking for files
that have an extension (say for example, looking for mp3 files) is in my
opinion a realistic use case. Obviously, looking for files will be
slower than returning "path.leaf()" (although maybe the OS catches
directory entries in memory) but apart from the speed, I think that the
important point is the memory stress you force creating a temporary
every time you want to obtain the name of the file. The filesystem
operations are maybe slower but surely the OS is carefully avoiding heap
fragmentation using internal pools, while the user is creating a lot of
temporaries. I will try to implement this use case if you agree.
That's a use case that I would be interested in.

But also remember that objectives of the library including ease-of-use for 
script-like programs, and in general valuing clean design over the last iota 
of efficiency. I'd also like it to feel familiar to standard library users.

I have a long personal history of adding kinky little bits and pieces to 
designs (mostly for efficiency) and then regretting it later.
...
The "path" class is also a class not related with disk operations (by
the way, we can mount a filesystem in memory so operations can be fast,
or it can represent a shared memory, following "Everything is a file"
UNIX philosophy). Is it realistic to store a lot of path objects in a
containers and request operations like leaf(), root(), etc...? I don't
know. But I see the path class as a pseudo-container of strings
representing a hierarchy. Path could represent a file or any other
hierarchy in the operating system, because is quite generic.
...
...
Apart from this I see that path::iterator has a string member.
dereference will return a reference to that member but an iterator is
supposed to be a "lightweight" pointer-like abstraction, which is
value-copied between functions. A string member, in my opinion, converts
an iterator in a heavy class (that depends on the string length, but an
small string optimization of 16 bytes is not going to help much).
That's an implementation detail. It isn't required by the spec, although
that may be the most obvious way to implement the spec. An alternate
implementation would be to keep a pool of directory entry objects and
recycle them if performance was a concern. It would be great if Boost had 
a
cache library to make such a strategy trivial to implement.
You are right. I need to concentrate on the interface. As a comment
looking the code, since iterator returns a const string reference, you
could also add a vector of strings to the path class, so that the
iterator could be the const_iterator of the vector. You could avoid the
string member and have trivial increment/copy operations. You are
requesting more memory to the path class, though.
Yes, if you are willing to expend more memory in the path class itself, you 
can gain a lot of theoretical speed, particularly on non-POSIX 
implementations where the portable syntax and the native syntax differ.

--Beman

Re: [boost] [filesystem] Some questions about string use

Beman Dawes