
"Ion GaztaƱaga" <igaztanaga@gmail.com> wrote in message news:440ABAE3.8080007@gmail.com...
Hi Beman,
As Caleb points out, it is premature optimizaton to talk about "hurting performance" in the absence of timings in realistic use scenarios.
That said, if you can come up with a realistic use case that really does show significant slow-down compared to some alternate interface, it would be worth talking about.
Ok. I see it like "premature pessimization", but you are right about a realistic use case. Scanning recursively a directory looking for files that have an extension (say for example, looking for mp3 files) is in my opinion a realistic use case. Obviously, looking for files will be slower than returning "path.leaf()" (although maybe the OS catches directory entries in memory) but apart from the speed, I think that the important point is the memory stress you force creating a temporary every time you want to obtain the name of the file. The filesystem operations are maybe slower but surely the OS is carefully avoiding heap fragmentation using internal pools, while the user is creating a lot of temporaries. I will try to implement this use case if you agree.
That's a use case that I would be interested in. But also remember that objectives of the library including ease-of-use for script-like programs, and in general valuing clean design over the last iota of efficiency. I'd also like it to feel familiar to standard library users. I have a long personal history of adding kinky little bits and pieces to designs (mostly for efficiency) and then regretting it later.
The "path" class is also a class not related with disk operations (by the way, we can mount a filesystem in memory so operations can be fast, or it can represent a shared memory, following "Everything is a file" UNIX philosophy). Is it realistic to store a lot of path objects in a containers and request operations like leaf(), root(), etc...? I don't know. But I see the path class as a pseudo-container of strings representing a hierarchy. Path could represent a file or any other hierarchy in the operating system, because is quite generic.
Apart from this I see that path::iterator has a string member. dereference will return a reference to that member but an iterator is supposed to be a "lightweight" pointer-like abstraction, which is value-copied between functions. A string member, in my opinion, converts an iterator in a heavy class (that depends on the string length, but an small string optimization of 16 bytes is not going to help much).
That's an implementation detail. It isn't required by the spec, although that may be the most obvious way to implement the spec. An alternate implementation would be to keep a pool of directory entry objects and recycle them if performance was a concern. It would be great if Boost had a cache library to make such a strategy trivial to implement.
You are right. I need to concentrate on the interface. As a comment looking the code, since iterator returns a const string reference, you could also add a vector of strings to the path class, so that the iterator could be the const_iterator of the vector. You could avoid the string member and have trivial increment/copy operations. You are requesting more memory to the path class, though.
Yes, if you are willing to expend more memory in the path class itself, you can gain a lot of theoretical speed, particularly on non-POSIX implementations where the portable syntax and the native syntax differ. --Beman