Excessive dynamic allocation when using boost::split(..., boost::is_any_of(", "))?
Hi This following program shows many allocations coming from boost::split(..., boost::is_any_of(",")): ``` $ cat split.cpp #include <string> #include <cstdio> #include <boost/algorithm/string.hpp> #include <boost/container/small_vector.hpp> int main() { std::string str = "foo,bar,foobar,quux"; std::size_t s = 0; for (int i = 0; i < 100; ++i) { boost::container::small_vector<std::string, 4> v; #ifdef USE_IS_ANY_OF boost::split(v, str, boost::is_any_of(",")); #else boost::split(v, str, [](const char c) { return c == ',';}); #endif s += v.size(); } fprintf(stderr, "%zu\n", s); } ``` When using boost::is_any_of(","), there are 7 dynamic allocations per loop iteration whereas when using the lambda predicate, there are 0 dynamic allocation per loop as shown below: ``` $ clang++ -std=c++14 -DUSE_IS_ANY_OF split.cpp $ heaptrack ./a.out ... allocations: 702 $ clang++ -std=c++14 split.cpp $ heaptrack ./a.out ... allocations: 2 ``` I suspect that there are potential optimizations in Boost here to avoid so many allocations when boost::is_any_of(",") is used. Regards Dominique
On Jul 5, 2022, at 1:29 AM, Dominique Pellé via Boost-users <boost-users@lists.boost.org> wrote:
Hi
This following program shows many allocations coming from boost::split(..., boost::is_any_of(",")): ``` $ cat split.cpp
#include <string> #include <cstdio> #include <boost/algorithm/string.hpp> #include <boost/container/small_vector.hpp>
int main() { std::string str = "foo,bar,foobar,quux"; std::size_t s = 0; for (int i = 0; i < 100; ++i) { boost::container::small_vector<std::string, 4> v; #ifdef USE_IS_ANY_OF boost::split(v, str, boost::is_any_of(",")); #else boost::split(v, str, [](const char c) { return c == ',';}); #endif s += v.size(); } fprintf(stderr, "%zu\n", s); } ``` When using boost::is_any_of(","), there are 7 dynamic allocations per loop iteration whereas when using the lambda predicate, there are 0 dynamic allocation per loop as shown below: ``` $ clang++ -std=c++14 -DUSE_IS_ANY_OF split.cpp $ heaptrack ./a.out ... allocations: 702
$ clang++ -std=c++14 split.cpp $ heaptrack ./a.out ... allocations: 2 ```
What happens if you change your code to create the `boost::is_any_of` object outside the loop? — Marshall
Hi Dominique, Dominique Pelle wrote:
This following program shows many allocations coming from boost::split(..., boost::is_any_of(",")): When using boost::is_any_of(","), there are 7 dynamic allocations per loop iteration whereas when using the lambda predicate, there are 0 dynamic allocation per loop as shown below:
See this thread from 2008: https://lists.boost.org/Archives/boost//2008/02/133396.php There are some benchmarks in my reply here: https://lists.boost.org/Archives/boost//2008/02/133415.php I observed that C's strcspn was one of the better choices, but it could be beaten for search sets known at compile time with this: template <char c0> bool is_any_of(char c) { return (c==c0); } template <char c0, char c1> bool is_any_of(char c) { return (c==c0) || is_any_of<c1>(c); } template <char c0, char c1, char c2> bool is_any_of(char c) { return (c==c0) || is_any_of<c1,c2>(c); } etc. (Yes, I was writing that before variadic templates!) I don't know if boost::is_any_of has changed in the 14 years since then, but your observation suggests not. Regards, Phil.
participants (3)
-
Dominique Pellé
-
Marshall Clow
-
Phil Endecott