Performance and linking (boost_regex)

Hi,
i've encountered a significant performance difference with boost_regex
between linking first to it and linking last to it. I guess it is not a
issue specific to boost_regex, but maybe someone can clarify that.
I have created the following test files and tested with gcc-4.1 and
gcc-4.3 on 64-bit linux and gcc-4.1.2 on 32-bit linux. The results for
64bit are further down, but the results for 32-bit are very similar.
****
# set your boost_extension (i.e. -mt)
# export BOOST_EXT=-mt
cat > test.hpp <

SD wrote:
Hi,
i've encountered a significant performance difference with boost_regex between linking first to it and linking last to it. I guess it is not a issue specific to boost_regex, but maybe someone can clarify that.
I have created the following test files and tested with gcc-4.1 and gcc-4.3 on 64-bit linux and gcc-4.1.2 on 32-bit linux. The results for 64bit are further down, but the results for 32-bit are very similar.
****
# set your boost_extension (i.e. -mt) # export BOOST_EXT=-mt
cat > test.hpp <
#include <string>
int check(const std::string & test);
#endif // TEST EOF
cat > test.cpp <
#include
int check(const std::string & test) { int result = 0; static boost::regex match("a*b*c*d*e*f*g*h*i*j*k*a*b*c*d*e*f*g*h*i*a*b*c*d*e*f*g*h*i*j*"); for (int i = 0; i < 1000; ++i) { result += boost::regex_match(test, match); } return result; } EOF
cat > main.cpp <
#include <iostream>
int main() { std::cout << check("abcdaaaaaaaaaabcdefgaaaaaaaaaaabbbbbbbbbddeef") << std::endl; } EOF
# Create libtest.so g++ -shared -fPIC test.cpp -o libtest.so # link boost_regex before libtest.so g++ main.cpp -lboost_regex${BOOST_EXT} -ltest -L${PWD} -o main_fast # link boost_regex after libtest.so g++ main.cpp -ltest -lboost_regex${BOOST_EXT} -L${PWD} -o main_slow
time env LD_LIBRARY_PATH=${PWD} ./main_slow # real 0m0.753s, user 0m0.746s, sys 0m0.004s time env LD_LIBRARY_PATH=${PWD} ./main_fast # real 0m0.104s, user 0m0.103s, sys 0m0.002s
As the results show the order which libs are being linked to the program has significant performance impacts on execution time: main_fast is about 7 times faster than main_slow. .... I would appreciate if someone could explain these surprising results.
Before we go on guessing what could cause this -- do you get the same timings on successive runs of any given executable? - Volodya

On Mi, 2008-07-09 at 21:28 +0400, Vladimir Prus wrote:
SD wrote:
Hi,
i've encountered a significant performance difference with boost_regex between linking first to it and linking last to it. I guess it is not a issue specific to boost_regex, but maybe someone can clarify that.
I have created the following test files and tested with gcc-4.1 and gcc-4.3 on 64-bit linux and gcc-4.1.2 on 32-bit linux. The results for 64bit are further down, but the results for 32-bit are very similar.
****
# set your boost_extension (i.e. -mt) # export BOOST_EXT=-mt
cat > test.hpp <
#include <string>
int check(const std::string & test);
#endif // TEST EOF
cat > test.cpp <
#include
int check(const std::string & test) { int result = 0; static boost::regex match("a*b*c*d*e*f*g*h*i*j*k*a*b*c*d*e*f*g*h*i*a*b*c*d*e*f*g*h*i*j*"); for (int i = 0; i < 1000; ++i) { result += boost::regex_match(test, match); } return result; } EOF
cat > main.cpp <
#include <iostream>
int main() { std::cout << check("abcdaaaaaaaaaabcdefgaaaaaaaaaaabbbbbbbbbddeef") << std::endl; } EOF
# Create libtest.so g++ -shared -fPIC test.cpp -o libtest.so # link boost_regex before libtest.so g++ main.cpp -lboost_regex${BOOST_EXT} -ltest -L${PWD} -o main_fast # link boost_regex after libtest.so g++ main.cpp -ltest -lboost_regex${BOOST_EXT} -L${PWD} -o main_slow
time env LD_LIBRARY_PATH=${PWD} ./main_slow # real 0m0.753s, user 0m0.746s, sys 0m0.004s time env LD_LIBRARY_PATH=${PWD} ./main_fast # real 0m0.104s, user 0m0.103s, sys 0m0.002s
As the results show the order which libs are being linked to the program has significant performance impacts on execution time: main_fast is about 7 times faster than main_slow. .... I would appreciate if someone could explain these surprising results.
Before we go on guessing what could cause this -- do you get the same timings on successive runs of any given executable?
Yes, the runtimes vary only minimally. Regards, Stefan

SD wrote:
# Create libtest.so g++ -shared -fPIC test.cpp -o libtest.so # link boost_regex before libtest.so g++ main.cpp -lboost_regex${BOOST_EXT} -ltest -L${PWD} -o main_fast # link boost_regex after libtest.so g++ main.cpp -ltest -lboost_regex${BOOST_EXT} -L${PWD} -o main_slow
time env LD_LIBRARY_PATH=${PWD} ./main_slow # real 0m0.753s, user 0m0.746s, sys 0m0.004s time env LD_LIBRARY_PATH=${PWD} ./main_fast # real 0m0.104s, user 0m0.103s, sys 0m0.002s
As the results show the order which libs are being linked to the program has significant performance impacts on execution time: main_fast is about 7 times faster than main_slow. .... I would appreciate if someone could explain these surprising results.
Before we go on guessing what could cause this -- do you get the same timings on successive runs of any given executable?
Yes, the runtimes vary only minimally.
Does running ldd on both binaries report that the same boost_regexp binary is being linked to? I don't see why different ones would be linked to, but still. If this theory is wrong, can you create a self-contained archive with all the necessary files, and send it, so that I (and others) can try to reproduce. This sounds truly bizarre. - Volodya

On Do, 2008-07-10 at 19:08 +0400, Vladimir Prus wrote:
SD wrote:
# Create libtest.so g++ -shared -fPIC test.cpp -o libtest.so # link boost_regex before libtest.so g++ main.cpp -lboost_regex${BOOST_EXT} -ltest -L${PWD} -o main_fast # link boost_regex after libtest.so g++ main.cpp -ltest -lboost_regex${BOOST_EXT} -L${PWD} -o main_slow
time env LD_LIBRARY_PATH=${PWD} ./main_slow # real 0m0.753s, user 0m0.746s, sys 0m0.004s time env LD_LIBRARY_PATH=${PWD} ./main_fast # real 0m0.104s, user 0m0.103s, sys 0m0.002s
As the results show the order which libs are being linked to the program has significant performance impacts on execution time: main_fast is about 7 times faster than main_slow. .... I would appreciate if someone could explain these surprising results.
Before we go on guessing what could cause this -- do you get the same timings on successive runs of any given executable?
Yes, the runtimes vary only minimally.
Does running ldd on both binaries report that the same boost_regexp binary is being linked to? I don't see why different ones would be linked to, but still.
If this theory is wrong, can you create a self-contained archive with all the necessary files, and send it, so that I (and others) can try to reproduce. This sounds truly bizarre.
- Volodya
$ LD_LIBRARY_PATH=./ ldd ./main_slow linux-vdso.so.1 => (0x00007fff1bbff000) libtest.so => ./libtest.so (0x00007ffc136fd000) libboost_regex.so => /usr/lib/libboost_regex.so (0x00007ffc1346b000) libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/libstdc++.so.6 (0x00007ffc13168000) libm.so.6 => /lib/libm.so.6 (0x00007ffc12ee8000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007ffc12cd2000) libc.so.6 => /lib/libc.so.6 (0x00007ffc12992000) /lib64/ld-linux-x86-64.so.2 (0x00007ffc13914000) $ LD_LIBRARY_PATH=./ ldd ./main_fast linux-vdso.so.1 => (0x00007ffffe9fe000) libboost_regex.so => /usr/lib/libboost_regex.so (0x00007f88f636f000) libtest.so => ./libtest.so (0x00007f88f6158000) libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/libstdc++.so.6 (0x00007f88f5e55000) libm.so.6 => /lib/libm.so.6 (0x00007f88f5bd5000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f88f59bf000) libc.so.6 => /lib/libc.so.6 (0x00007f88f567f000) /lib64/ld-linux-x86-64.so.2 (0x00007f88f6601000) As you can see, the only difference seems to be the position of libtest.so and libboost_regex.so, whereas in the fast version libboost_regex is linked in first. I've attached 4 files. You can run ./runtest.sh in the directory to run the tests. Regards Stefan

stefan.demharter@gmx.net wrote:
I've attached 4 files. You can run ./runtest.sh in the directory to run the tests.
Thanks. I've reproduced this performance difference, and with the source files at hand it's easy to see why it's there. Try running 'nm' on your libtest.so. You'll notice that it includes a bunch of boost::regex functions. I suspect those symbols are defined inline, but then the compiler decides those functions are too big, and generates out-of-line copies. Then, you're test.cpp makes a call to boost::regex function, which in turn calls other boost::regex function. When the first call to each function is made, dynamic linker figures out which library contains this function. Then system boost_regex is first on the linker line, all functions comes from this (optimized) library. When your test.so is first, some of boost::regex functions comes from your test.so, which is not built with optimization. Of course, if you build your library with optimization, the performance difference disppears. I think this fact that mere use of a single boost::regex function cause a bunch of implementation functions to be added to libtest.so to be unfortunate -- besides possible performance issues you've found, it also code bloat, and may be dangerous if different versions of internal functions are included in different libraries. Without further investigation, I don't know how easy it is to fix. - Volodya

Vladimir Prus wrote:
I think this fact that mere use of a single boost::regex function cause a bunch of implementation functions to be added to libtest.so to be unfortunate -- besides possible performance issues you've found, it also code bloat, and may be dangerous if different versions of internal functions are included in different libraries. Without further investigation, I don't know how easy it is to fix.
Surely this is true of *any* template though - if it's used by two different shared libraries then both instantiate and get a complete copy of the template. If the two libraries are built with different build options, then you get ODR violations with all that means. Unless I'm missing something I don't see anything to be fixed here, that's just how templates work? John.

John Maddock wrote:
Vladimir Prus wrote:
I think this fact that mere use of a single boost::regex function cause a bunch of implementation functions to be added to libtest.so to be unfortunate -- besides possible performance issues you've found, it also code bloat, and may be dangerous if different versions of internal functions are included in different libraries. Without further investigation, I don't know how easy it is to fix.
Surely this is true of *any* template though - if it's used by two different shared libraries then both instantiate and get a complete copy of the template. If the two libraries are built with different build options, then you get ODR violations with all that means. Unless I'm missing something I don't see anything to be fixed here, that's just how templates work?
Of course, a template function that is used gets instantiated and added to shared library. The question, however, is whether it's OK for 173 symbols to be instantiated, bringing the test.so size to 175309 bytes? It's 30% of the size of boost_regex.so itself. There are various approaches to reduce the percentage of templated code; which of them will work here can be only decided by looking at the symbols that are now included in all client .so libraries. - Volodya
participants (4)
-
John Maddock
-
SD
-
stefan.demharter@gmx.net
-
Vladimir Prus