
Hi, A German journal recently published a small programming contest with a very simple text processing problem. The program needs to renumber and sort footnotes in a long text. I wrote a solution in C++ using a number of Boost libraries, among them Xpressive, but also Interprocess for memory-mapping the input file. Then I looked at the submissions page and found one C++ solution already there, an admirably short (but rather inflexible) program using libpcre. http://www.linux-magazin.de/static/listings/magazin/2008/10/leser/cpp/footno... Comparing the performance, my program was sightly but consistently slower than this small program, even when I felt that my usage of memory-mapping and then scanning the whole file instead of going line by line ought to give me a speed advantage. So to test the performance I took the existing submission and replaced libpcre with Xpressive (see attached file). I believe the solutions to be functionally equivalent. However, the original takes 6 seconds to process a 55MB file, whereas my variation takes ~15 seconds. That's on the second run of each program, meaning that the entire file is in the OS cache. This seems awfully slow. Has anyone done a proper performance comparison between Xpressive and libpcre? Boost.Regex's performance pages lists PCRE, but in version 4.1, where 7.8 is the most recent release. My system: Athlon 64 2000MHz (64-bit mode) 1GB RAM Linux 2.6.23, GCC 4.1.2 Boost trunk as of 2008-09-11 libpcre 7.7 Sebastian Redl

Sebastian Redl wrote: <snip>
http://www.linux-magazin.de/static/listings/magazin/2008/10/leser/cpp/footno... <snip>
So to test the performance I took the existing submission and replaced libpcre with Xpressive (see attached file). I believe the solutions to be functionally equivalent. However, the original takes 6 seconds to process a 55MB file, whereas my variation takes ~15 seconds. That's on the second run of each program, meaning that the entire file is in the OS cache. This seems awfully slow.
It does seem slow, especially considering all the IO, lexical casting and memory management it's doing, besides the regexing. (I notice that the pcre version isn't doing any lexical casting, but I can't tell if it's doing something equivalent.) Are you sure you compiled with full optimizations turned on, and NDEBUG defined?
Has anyone done a proper performance comparison between Xpressive and libpcre?
Not really, no. I may look into this, if it doesn't turn out to be a simple matter of compiler optimization. Where can I find the input file you're testing with? Thanks for the heads up. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Eric Niebler wrote:
Sebastian Redl wrote: <snip>
http://www.linux-magazin.de/static/listings/magazin/2008/10/leser/cpp/footno...
<snip>
So to test the performance I took the existing submission and replaced libpcre with Xpressive (see attached file). I believe the solutions to be functionally equivalent. However, the original takes 6 seconds to process a 55MB file, whereas my variation takes ~15 seconds. That's on the second run of each program, meaning that the entire file is in the OS cache. This seems awfully slow.
It does seem slow, especially considering all the IO, lexical casting and memory management it's doing, besides the regexing. (I notice that the pcre version isn't doing any lexical casting, but I can't tell if it's doing something equivalent.) It is, by passing a pointer to an int to the matcher functions. libpcre converts internally.
Are you sure you compiled with full optimizations turned on, and NDEBUG defined? I use bjam's release mode. "g++" -ftemplate-depth-128 -O3 -finline-functions -Wno-inline -Wall -fPIC -march=athlon64 -DNDEBUG The other program is compiled with g++ -O2
Has anyone done a proper performance comparison between Xpressive and libpcre?
Not really, no. I may look into this, if it doesn't turn out to be a simple matter of compiler optimization. Where can I find the input file you're testing with? http://www.linux-magazin.de/static/listings/magazin/2008/10/sprachen/ It's the sample4.txt.bz2. (8MB, expands to 55MB)
Thanks for looking into this. Sebastian

on Sun Sep 14 2008, Sebastian Redl <sebastian.redl-AT-getdesigned.at> wrote:
Eric Niebler wrote:
Sebastian Redl wrote: <snip>
http://www.linux-magazin.de/static/listings/magazin/2008/10/leser/cpp/footnotes.cpp> <snip>
So to test the performance I took the existing submission and replaced libpcre with Xpressive (see attached file). I believe the solutions to be functionally equivalent. However, the original takes 6 seconds to process a 55MB file, whereas my variation takes ~15 seconds. That's on the second run of each program, meaning that the entire file is in the OS cache. This seems awfully slow.
It does seem slow, especially considering all the IO, lexical casting and memory management it's doing, besides the regexing. (I notice that the pcre version isn't doing any lexical casting, but I can't tell if it's doing something equivalent.) It is, by passing a pointer to an int to the matcher functions. libpcre converts internally.
Are you sure you compiled with full optimizations turned on, and NDEBUG defined? I use bjam's release mode. "g++" -ftemplate-depth-128 -O3 -finline-functions -Wno-inline -Wall -fPIC -march=athlon64 -DNDEBUG The other program is compiled with g++ -O2
You should really try them with identical options, as -O3 is known to be worse than -O2 in some cases. -- Dave Abrahams BoostPro Computing http://www.boostpro.com
participants (3)
-
David Abrahams
-
Eric Niebler
-
Sebastian Redl