Hi!
Le 12 oct. 2014 à 20:46, Joaquin M Lopez Munoz
I think you can make the test even more realistic: as it stands now, the level of value redundancy is very low --for instance, each num value is repeated only four times--, which is not the ideal setting for Boost.Flyweight. If you change the creation of e to something like
Exp e = bin('*', bin('+', num(1+(j%m)), num(1+(j%m))), bin('+', num(2+(j%m)), num(2+(j%m))));
for some constant m (in the range of 1000, for instance) then you can modulate the level of redundancy (the lower m, the more duplicate values): tuning m can tell you how Boost.Flyweight improves wrt the shared_ptr case (which should not be affected by m at all). I bet by having m sufficient low and n sufficiently high #7 can get to beat #1.
I couldn't find such n and m. Actually, too large a n gives not so nice errors: $ time ./7 Assertion failed: (pthread_mutex_lock(&m_)==0), function scoped_lock, file /opt/local/include/boost/flyweight/detail/recursive_lw_mutex.hpp, line 72. but I'm not surprised: the tree must be too deep, and I'm not even sure my stack is in good shape. I have not tried to disable locking, as I'd be happy to go multithread in the future, but currently it wouldn't be a problem. For n = 10.000 and m = 100, I get (1 is pure shared_ptr, and 7 is Flyweight). $ time ./1 ./1 0,02s user 0,01s system 93% cpu 0,030 total $ time ./7 ./7 0,05s user 0,00s system 96% cpu 0,051 total Note that my test case does some sharing even with shared_ptr: Bin f = bin('/', e, e); it's not a tree. But that's coherent with my code base, where I already have _some_ sharing, but not maximal. With no sharing at all, I get: $ time ./1 ./1 0,03s user 0,01s system 96% cpu 0,042 total $ time ./7 ./7 0,07s user 0,00s system 97% cpu 0,070 total
./1 0,02s user 0,00s system 93% cpu 0,031 total ./7 0,06s user 0,00s system 97% cpu 0,070 total ./9 46,19s user 0,09s system 99% cpu 46,424 total ./10 171,29s user 9,71s system 99% cpu 3:01,09 total ./11 17,25s user 0,04s system 99% cpu 17,296 total
Well, seems like there's a clear winner here (ruling out #1, which I understand from what you say it's not an acceptable alternative, presumably because of memory limitations).
No, my hope is that I can get faster hashing and operator== on my expressions in all the algorithms that are on top of these expressions. I am not yet worried about memory consumption. But maybe I'm hoping for something I won't get...