
On 3/4/06, Caleb Epstein <caleb.epstein@gmail.com> wrote:
On 3/4/06, Ion Gaztañaga <igaztanaga@gmail.com> wrote:
Now that filesystem is proposed for the standard I would like to ask boosters (and Beman, of course) if they find these performance concerns serious enough.
Perhaps if they were accompanied by some comparative performance benchmarks or profile analysis?
In the interests of science, I wrote a small "file finder" using Boost.Filesystem and a comparable version using POSIX functions (e.g. stat, readdir, etc). The POSIX version runs MANY times faster than the Boost.Filesystem version (code attached). Note that the Boost version makes use of the "status" member of the directory iterator which is in CVS and is aimed at reducing the number of operating system calls that the library needs to make. The test programs take an optional -R (recursive) flag and a list of directories and filename extensions on the command line. They walks each of the directories (recursively) searching for files with matching extensions and tally their number and size. At the end, they generate a summary report by extension. Here is some sample output after priming the buffer cache by running each of the programs several times (yes, I have a lot of music): [9:48] cae @ tela 740% time ~/finder-fs -R /raid/shn .mp3 .flac .shn ~/src/c++ .flac: 5550 files, 243.918 GiB .mp3: 30364 files, 232.695 GiB .shn: 152 files, 6.90744 GiB ~/finder-fs -R /raid/shn .mp3 .flac .shn 0.31s user 3.97s system 97% cpu 4.369 total Once the buffer cache has been primed, the results do not vary much. All runs are on the order of 4.3 seconds. This is using an optimized version of the filesystem library and my code compiled with -g -O2 -pg. Removing the profiling options reduces the runtime to approximately 4.1 seconds so the profiling overhead is relatively small. Here's the output from the POSIX version: [9:49] cae @ tela 741% time ~/finder-posix -R /raid/shn .mp3 .flac .shn .flac: 5550 files, 243.918 GiB .mp3: 30364 files, 232.695 GiB .shn: 152 files, 6.90744 GiB ~/finder-posix -R /raid/shn .mp3 .flac .shn 0.18s user 0.64s system 99% cpu 0.832 total Looking at the profiling output from the "finder-fs" program, it appears a bulk of the time is spent in fs::basic_path::operator/=() which may bear out Ion's fears. The profiling output for "finder-posix" shows that the bulk of the time ( 66.6%) is in std::map::find and the finder function. In the Boost.Filesystem version, this amounts to only 12.5% of the runtime. -- Caleb Epstein caleb dot epstein at gmail dot com