
Warming this up for tr2: http://home.roadrunner.com/~hinnant/combinations.html Comments welcome. Real world use welcome. -Howard

On 1/3/2011 7:40 PM, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
-Howard
I need to read this through thoroughly (it looks interesting), but along the same vein, an alternative abstraction is to create a range adaptor that allows iteration through permutations and combinations of a given range. This would be more flexible than a for_each algorithm style of iteration, but possibly less efficient. Have you also considered multipermutations and multicombinations (where repetitions are allowed)? - Jeff

On Jan 4, 2011, at 3:21 AM, Jeffrey Lee Hellrung, Jr. wrote:
On 1/3/2011 7:40 PM, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
-Howard
I need to read this through thoroughly (it looks interesting), but along the same vein, an alternative abstraction is to create a range adaptor that allows iteration through permutations and combinations of a given range. This would be more flexible than a for_each algorithm style of iteration, but possibly less efficient.
I'm not positive but I suspect that a range adaptor approach would suffer precisely the same inefficiencies I've demonstrated in the next_permutation/combination algorithms: The increment operator begins to dominate as shown in the two figures. That being said, the two approaches do not have to compete. They can coexist.
Have you also considered multipermutations and multicombinations (where repetitions are allowed)?
Yes and no. Yes: one would have to store the repetition count for each element somewhere: perhaps a range of pairs, or perhaps a pair of ranges. Then one would need new algorithms, with new interfaces, to deal with this new data structure. I don't mind admitting that these were some of the hardest algorithms I've ever accomplished as far as getting them both correct and fast (I completely gave up on reversible_permutation several times). I don't relish the idea of complicating them with repetition counts. So no. :-) However you can get the same effect by inputting a range with your repetitions represented by multiple values for each value you want to repeat. And those repetitions do not have to be in any order: aabbccdd is just as good as: abcdabcd -Howard

On 04/01/2011 16:03, Howard Hinnant wrote:
I need to read this through thoroughly (it looks interesting), but along the same vein, an alternative abstraction is to create a range adaptor that allows iteration through permutations and combinations of a given range. This would be more flexible than a for_each algorithm style of iteration, but possibly less efficient.
I'm not positive but I suspect that a range adaptor approach would suffer precisely the same inefficiencies I've demonstrated in the next_permutation/combination algorithms: The increment operator begins to dominate as shown in the two figures. That being said, the two approaches do not have to compete. They can coexist.
You could generate the range adaptor from the for_each-style algorithm using a coroutine ; this should have minimal overhead.

On Jan 5, 2011, at 4:41 AM, Mathias Gaunard wrote:
On 04/01/2011 16:03, Howard Hinnant wrote:
I need to read this through thoroughly (it looks interesting), but along the same vein, an alternative abstraction is to create a range adaptor that allows iteration through permutations and combinations of a given range. This would be more flexible than a for_each algorithm style of iteration, but possibly less efficient.
I'm not positive but I suspect that a range adaptor approach would suffer precisely the same inefficiencies I've demonstrated in the next_permutation/combination algorithms: The increment operator begins to dominate as shown in the two figures. That being said, the two approaches do not have to compete. They can coexist.
You could generate the range adaptor from the for_each-style algorithm using a coroutine ; this should have minimal overhead.
I looked at: http://en.wikipedia.org/wiki/Coroutine and was unsuccessful in translating that idea to this application. Perhaps you could elaborate? -Howard

On 05/01/2011 15:38, Howard Hinnant wrote:
I looked at:
http://en.wikipedia.org/wiki/Coroutine
and was unsuccessful in translating that idea to this application. Perhaps you could elaborate?
Look at the first example on that page, produce/consume: your for_each-style algorithm is similar to the 'produce' coroutine, and iteration of ranges is similar to the 'consume' coroutine. Coroutines are typically implemented in C++ by storing the current context somewhere (registers, stack, PC), and switching to another context. This can be done in assembler or using OS-specific routines. It's similar to threads, except there is no scheduler, typically no kernel, and no concurrency involved. There is a Boost.Coroutine library somewhere, if you want to try it. I wrote some code with it that converts a for_each-style mechanism into a classic pair of iterators, if you're interested.

On Jan 6, 2011, at 7:40 AM, Mathias Gaunard wrote:
On 05/01/2011 15:38, Howard Hinnant wrote:
I looked at:
http://en.wikipedia.org/wiki/Coroutine
and was unsuccessful in translating that idea to this application. Perhaps you could elaborate?
Look at the first example on that page, produce/consume: your for_each-style algorithm is similar to the 'produce' coroutine, and iteration of ranges is similar to the 'consume' coroutine.
Coroutines are typically implemented in C++ by storing the current context somewhere (registers, stack, PC), and switching to another context. This can be done in assembler or using OS-specific routines.
It's similar to threads, except there is no scheduler, typically no kernel, and no concurrency involved.
There is a Boost.Coroutine library somewhere, if you want to try it. I wrote some code with it that converts a for_each-style mechanism into a classic pair of iterators, if you're interested.
Thanks for the elaboration. I'm currently lacking the time and motivation to pursue this transformation. However I would eagerly review the results if someone were to take up that mantle and run with it. -Howard

On 4 Jan 2011, at 03:40, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
For TR2, it would be nice to produce a range object, so we could write: for(perm: reversible_permutation(first, mid, last) ) f(perm); Of course, before doing this, it would be nice to propose boost::range, or something similar to it, for TR2. Is anyone working on anything like that? Chris

On Jan 4, 2011, at 6:22 AM, Christopher Jefferson wrote:
On 4 Jan 2011, at 03:40, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
For TR2, it would be nice to produce a range object, so we could write:
for(perm: reversible_permutation(first, mid, last) ) f(perm);
Not positive, but I believe (if one could figure out how to code it) would suffer the exponential cost of incrementation demonstrated for next_partial_permutation and next_combination in "Performance Considerations" shown in: http://home.roadrunner.com/~hinnant/combinations.html -Howard
Of course, before doing this, it would be nice to propose boost::range, or something similar to it, for TR2. Is anyone working on anything like that?
Chris _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Howard Hinnant <howard.hinnant@gmail.com> writes:
On Jan 4, 2011, at 6:22 AM, Christopher Jefferson wrote:
On 4 Jan 2011, at 03:40, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
For TR2, it would be nice to produce a range object, so we could write:
for(perm: reversible_permutation(first, mid, last) ) f(perm);
Not positive, but I believe (if one could figure out how to code it) would suffer the exponential cost of incrementation demonstrated for next_partial_permutation and next_combination in "Performance Considerations" shown in:
Couldn't the range adaptor object --- or the iterators generated from it --- store the intermediate information about where in the series of swaps it was? Sort of a continuation in your for_each_permutation() functions. Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++0x thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

On Jan 4, 2011, at 11:05 AM, Anthony Williams wrote:
Howard Hinnant <howard.hinnant@gmail.com> writes:
On Jan 4, 2011, at 6:22 AM, Christopher Jefferson wrote:
On 4 Jan 2011, at 03:40, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
For TR2, it would be nice to produce a range object, so we could write:
for(perm: reversible_permutation(first, mid, last) ) f(perm);
Not positive, but I believe (if one could figure out how to code it) would suffer the exponential cost of incrementation demonstrated for next_partial_permutation and next_combination in "Performance Considerations" shown in:
Couldn't the range adaptor object --- or the iterators generated from it --- store the intermediate information about where in the series of swaps it was? Sort of a continuation in your for_each_permutation() functions.
I don't know. Show me the code. :-) Hervé Brönnimann wrote a high quality implementation of next_combination which stored this information in the sorted-but-permuted sequence (just like std::next_permutation): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2639.pdf And such an algorithm could be the operator++() for a range adaptor. But my tests are showing this approach to be very expensive for some cases (see the plots in my paper). -Howard

On 4 Jan 2011, at 16:39, Howard Hinnant wrote:
On Jan 4, 2011, at 11:05 AM, Anthony Williams wrote:
Howard Hinnant <howard.hinnant@gmail.com> writes:
On Jan 4, 2011, at 6:22 AM, Christopher Jefferson wrote:
On 4 Jan 2011, at 03:40, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
For TR2, it would be nice to produce a range object, so we could write:
for(perm: reversible_permutation(first, mid, last) ) f(perm);
Not positive, but I believe (if one could figure out how to code it) would suffer the exponential cost of incrementation demonstrated for next_partial_permutation and next_combination in "Performance Considerations" shown in:
Couldn't the range adaptor object --- or the iterators generated from it --- store the intermediate information about where in the series of swaps it was? Sort of a continuation in your for_each_permutation() functions.
I don't know. Show me the code. :-)
I am currently working on the code. I believe I am getting similar performance. However, I currently have a very heavy-duty iterator (it contains a heap-allocated array, effectively storing the stack of an algorithm like yours). Very important that it doesn't get post-incremented! Chris

On Jan 4, 2011, at 5:27 PM, Christopher Jefferson wrote:
On 4 Jan 2011, at 16:39, Howard Hinnant wrote:
On Jan 4, 2011, at 11:05 AM, Anthony Williams wrote:
Howard Hinnant <howard.hinnant@gmail.com> writes:
On Jan 4, 2011, at 6:22 AM, Christopher Jefferson wrote:
On 4 Jan 2011, at 03:40, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
For TR2, it would be nice to produce a range object, so we could write:
for(perm: reversible_permutation(first, mid, last) ) f(perm);
Not positive, but I believe (if one could figure out how to code it) would suffer the exponential cost of incrementation demonstrated for next_partial_permutation and next_combination in "Performance Considerations" shown in:
Couldn't the range adaptor object --- or the iterators generated from it --- store the intermediate information about where in the series of swaps it was? Sort of a continuation in your for_each_permutation() functions.
I don't know. Show me the code. :-)
I am currently working on the code. I believe I am getting similar performance. However, I currently have a very heavy-duty iterator (it contains a heap-allocated array, effectively storing the stack of an algorithm like yours). Very important that it doesn't get post-incremented!
Cool, and kudos! :-) I suspect that the ultimate solution to this problem will be writing combine_discontinuous and permute in a non-recursive manner, and then saving state as you are doing. If they are non-recursive that will eliminate the need for storing the stack of states and thus the heap allocation. I tried but failed after running out of time on this endeavor. Warning: it's addictive. ;-) And to get reversible permutations going you need combine_discontinuous3 (or its equivalent) in a non-recursive manner. Ultimately the logic needed here is the ability to call the functor for all permutations of 3 distinct ranges, treated as if they were one continuous range. Then you could skip the combination step altogether. Again, I ran out of time to work this direction. Anyway, I'm glad you're working on this. for_each_* can easily be built out of next_* if there is no performance penalty for doing so. But the reverse is not true, at least as far as I've discovered. And if I had my druthers, I'd like to be able to choose either the next_* or for_each_* API without performance concerns, depending only upon my application (whether or not I needed to break out of the loop early). -Howard

At Mon, 3 Jan 2011 22:40:25 -0500, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
Hi Howard, Neat stuff! Was there a real-world use case driving this work, or are you just going all Knuth on us in your copious spare time? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Jan 4, 2011, at 11:01 AM, Dave Abrahams wrote:
At Mon, 3 Jan 2011 22:40:25 -0500, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
Hi Howard,
Neat stuff! Was there a real-world use case driving this work, or are you just going all Knuth on us in your copious spare time?
One of my first applications of for_each_combination was for writing unit tests for rvalue reference overloading. This was done in 2004 or 2005, and so those tests are now obsolete due to the language changes since then. But in a nutshell I needed to write tests that covered every combination of passing an lvalue or rvalue, not-cv-qualified, const, volatile and const volatile qualified, to a set of overloaded set of functions each taking one parameter. That one parameter was a cv-qaulified lvalue or rvalue reference (each combination). And I needed to consider all combinations of that set of 8 overloaded functions 1 at a time, 2 at time, etc. up to 8 at a time. Here is one file containing just 8 tests that considered all 8 overloads at once: // I, Howard Hinnant, hereby place this code in the public domain. // Test overload resolution among referece types // { dg-do compile } // { dg-options "-std=c++0x" } template <bool> struct sa; template <> struct sa<true> {}; struct one {char x[1];}; struct two {char x[2];}; struct three {char x[3];}; struct four {char x[4];}; struct five {char x[5];}; struct six {char x[6];}; struct seven {char x[7];}; struct eight {char x[8];}; struct A { A(); A(const volatile A&&); }; A source(); const A c_source(); volatile A v_source(); const volatile A cv_source(); // 8 at a time one sink_8_12345678( A&); two sink_8_12345678(const A&); three sink_8_12345678(volatile A&); four sink_8_12345678(const volatile A&); five sink_8_12345678( A&&); six sink_8_12345678(const A&&); seven sink_8_12345678(volatile A&&); eight sink_8_12345678(const volatile A&&); int test8_12345678() { A a; const A ca = a; volatile A va; const volatile A cva = a; sa<sizeof(sink_8_12345678(a)) == 1> t1; sa<sizeof(sink_8_12345678(ca)) == 2> t2; sa<sizeof(sink_8_12345678(va)) == 3> t3; sa<sizeof(sink_8_12345678(cva)) == 4> t4; sa<sizeof(sink_8_12345678(source())) == 5> t5; sa<sizeof(sink_8_12345678(c_source())) == 6> t6; sa<sizeof(sink_8_12345678(v_source())) == 7> t7; sa<sizeof(sink_8_12345678(cv_source())) == 8> t8; return 0; } int main() { return test8_12345678(); } If I'm not mistaken, the total number of tests that needed to be written was a little over 2000. I used for_each_combination to semi-automate the writing of these tests. This both saved manual effort, and ensured that the test suite covered every combination. Since I wrote those rvalue-ref tests I lacked the time to refine and complete this set of algorithms. I took it on as my pet project over the holidays. -Howard

At Tue, 4 Jan 2011 11:30:02 -0500, Howard Hinnant wrote:
Hi Howard,
Neat stuff! Was there a real-world use case driving this work, or are you just going all Knuth on us in your copious spare time?
One of my first applications of for_each_combination was for writing unit tests for rvalue reference overloading. This was done in 2004 or 2005, and so those tests are now obsolete due to the language changes since then. But in a nutshell I needed to write tests that covered every combination of passing an lvalue or rvalue, not-cv-qualified, const, volatile and const volatile qualified, to a set of overloaded set of functions each taking one parameter. That one parameter was a cv-qaulified lvalue or rvalue reference (each combination). And I needed to consider all combinations of that set of 8 overloaded functions 1 at a time, 2 at time, etc. up to 8 at a time.
Here is one file containing just 8 tests that considered all 8 overloads at once:
Hah, I wouldn't have guessed you did this to solve a metaprogramming problem! -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Den 04-01-2011 04:40, Howard Hinnant skrev:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
Very useful. I have been using Hervé Brönnimann's extensions for years in my own work (intended for TR2 IIRC). I attach his work. regards -Thorsten

On Jan 4, 2011, at 12:00 PM, Thorsten Ottosen wrote:
Den 04-01-2011 04:40, Howard Hinnant skrev:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
Very useful. I have been using Hervé Brönnimann's extensions for years in my own work (intended for TR2 IIRC). I attach his work.
I agree that Hervé did an excellent job with his algorithms. I'm curious: Which algorithms of his do you use? And for what ballpark ranges of r (distance[first, middle)) and N (distance(first, last))? -Howard

Den 04-01-2011 19:56, Howard Hinnant skrev:
On Jan 4, 2011, at 12:00 PM, Thorsten Ottosen wrote:
Den 04-01-2011 04:40, Howard Hinnant skrev:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
I'm curious: Which algorithms of his do you use? And for what ballpark ranges of r (distance[first, middle)) and N (distance(first, last))?
I can see that I have only used next_combination(), and in my case I needed to call it for all possible sizes of combinations, so I call next_combination() inside a for loop. I have no predetermined bound on N, it depends on my models, but can range from 10 to 30 to even more, although such N usually become intractable. So I haven' used all of Herve's interface, although I think they could be useful at some point. The thing is, once you need one of these functions, then it's soo nice to find one implemented and tested. In general, I believe there are much more efficient ways to visit all combinations (*) than using the swap-based algorithms provided by Herve, but they give me a fast solution I can use to unit-test other more complicated approached (at least for small N). -Thorsten (*) I have a paper lying around somewhere .... don't have time to find it ... something along "subset enumeration trees" or something

(*) I have a paper lying around somewhere .... don't have time to find it ... something along "subset enumeration trees" or something
Here it was: http://repository.upenn.edu/cgi/viewcontent.cgi?article=1312&context=cis_reports -Thorsten

On Jan 4, 2011, at 4:37 PM, Thorsten Ottosen wrote:
Den 04-01-2011 19:56, Howard Hinnant skrev:
On Jan 4, 2011, at 12:00 PM, Thorsten Ottosen wrote:
Den 04-01-2011 04:40, Howard Hinnant skrev:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
I'm curious: Which algorithms of his do you use? And for what ballpark ranges of r (distance[first, middle)) and N (distance(first, last))?
I can see that I have only used next_combination(), and in my case I needed to call it for all possible sizes of combinations, so I call next_combination() inside a for loop.
I have no predetermined bound on N, it depends on my models, but can range from 10 to 30 to even more, although such N usually become intractable.
I ran some performance tests for this scenario. This tests compares the next_combination algorithm with the for_each_combination, with the visit for each combination doing nothing but a long long increment. The test prints out the N, the total time for each algorithm. The time/visit for each algorithm, and the time for next_combination / the time for for_each_combination. Here is the code: #include <iostream> #include <vector> #include <numeric> #include <chrono> struct F { unsigned long long count_; F() : count_(0) {} void operator()(std::vector<int>::iterator, std::vector<int>::iterator) {++count_;} }; int main() { typedef std::chrono::high_resolution_clock Clock; typedef std::chrono::duration<double> sec; typedef std::chrono::duration<double, std::nano> ns; int n = 33; std::vector<int> v(n); std::iota(v.begin(), v.end(), 0); F f; Clock::time_point t0 = Clock::now(); for (std::vector<int>::iterator r = std::next(v.begin()); r != v.end(); ++r) { do { f(v.begin(), r); } while (next_combination(v.begin(), r, v.end())); } Clock::time_point t1 = Clock::now(); sec s0 = t1 - t0; ns pvt0 = s0 / f.count_; f = F(); t0 = Clock::now(); for (std::vector<int>::iterator r = std::next(v.begin()); r != v.end(); ++r) f = for_each_combination(v.begin(), r, v.end(), f); t1 = Clock::now(); sec s1 = t1 - t0; ns pvt1 = s1 / f.count_; std::cout << "N = " << v.size() << ", visits = " << f.count_ << "\n\tnext_combination total = " << s0.count() << " seconds" << ", next_combination per visit = " << pvt0.count() << " nanoseconds" << "\n\tfor_each_combination = " << s1.count() << " seconds" << ", for_each_combination per visit = " << pvt1.count() << " nanoseconds" << "\n\tnext_combination/for_each_combination = " << s0/s1 << '\n'; } And here are the results: N = 10, visits = 1022 next_combination total = 2.292e-05 seconds, next_combination per visit = 22.4266 nanoseconds for_each_combination = 1.7091e-05 seconds, for_each_combination per visit = 16.7231 nanoseconds next_combination/for_each_combination = 1.34106 N = 20, visits = 1048574 next_combination total = 0.0268492 seconds, next_combination per visit = 25.6054 nanoseconds for_each_combination = 0.0144714 seconds, for_each_combination per visit = 13.8011 nanoseconds next_combination/for_each_combination = 1.85532 N = 30, visits = 1073741822 next_combination total = 32.2251 seconds, next_combination per visit = 30.012 nanoseconds for_each_combination = 14.4642 seconds, for_each_combination per visit = 13.4709 nanoseconds next_combination/for_each_combination = 2.22792 N = 31, visits = 2147483646 next_combination total = 65.1725 seconds, next_combination per visit = 30.3483 nanoseconds for_each_combination = 28.9122 seconds, for_each_combination per visit = 13.4633 nanoseconds next_combination/for_each_combination = 2.25415 N = 32, visits = 4294967294 next_combination total = 131.869 seconds, next_combination per visit = 30.703 nanoseconds for_each_combination = 57.8251 seconds, for_each_combination per visit = 13.4635 nanoseconds next_combination/for_each_combination = 2.28047 N = 33, visits = 8589934590 next_combination total = 266.712 seconds, next_combination per visit = 31.0493 nanoseconds for_each_combination = 115.835 seconds, for_each_combination per visit = 13.4849 nanoseconds next_combination/for_each_combination = 2.30252 In this scenario the number of visits grows rapidly with N: visits = 2^N - 2. And in this scenario for_each_combination is only 2x faster than next_combination which may not make any difference if your visit time is much greater than a few tens of nanoseconds. If your visit time is greater than that, then that will be the driving factor for performance, and not the combination algorithm.
So I haven' used all of Herve's interface, although I think they could be useful at some point. The thing is, once you need one of these functions, then it's soo nice to find one implemented and tested.
Much agreed. Though I must admit that I haven't invested enough time to really understand the use case for next_mapping. I have found other requests for the functionality offered by for_each_reversible_permutation. The suggested solution to that query was not the algorithm I've implemented, and would be horribly inefficient (though correct) if implemented.
In general, I believe there are much more efficient ways to visit all combinations (*) than using the swap-based algorithms provided by Herve, but they give me a fast solution I can use to unit-test other more complicated approached (at least for small N).
-Thorsten
(*) I have a paper lying around somewhere .... don't have time to find it ... something along "subset enumeration trees" or something
Thanks for the link! (in your next post). I've skimmed it. -Howard

Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
Hi Howard, The only time I've needed to use anything like this was for some sort of search, with termination as soon as a solution is found. With these algorithms it looks like you'd need to throw an exception to do this, which is rather frowned-upon, no? I wonder if the functor could return a bool, or something. Re the improved performance compared to next_permutation, I always think of co-routines when I see things like this. A co-routine could be used to wrap your faster algorithm in the interface of next_permutation. Yet co-routines are rarely used. I wonder why? Regards, Phil.

On Jan 4, 2011, at 3:56 PM, Phil Endecott wrote:
Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
Hi Howard,
The only time I've needed to use anything like this was for some sort of search, with termination as soon as a solution is found. With these algorithms it looks like you'd need to throw an exception to do this, which is rather frowned-upon, no? I wonder if the functor could return a bool, or something.
I actually did think of this. And I believe it would not be that difficult. I'm not sure if it is desirable though. It is an awful lot of if-checking added to the algorithm. All that checking would tend to penalize those clients not wanting to break out early. In a perfect world, if it didn't cost any more to use the next_* interface than the for_each_* interface, I would definitely use the next_* interface along with a break statement, in this situation. That is clean and easy to understand. Though if I didn't need to break out early I believe I would choose for_each_*, and for the same reason. However if I had to choose between clean and 10x (really even 2x) faster, I lean towards speed. As always, it is an engineering judgement where one draws a line like that. And I'm not here to tell anyone where they should draw it. Perhaps Chris' work (a few posts in the future from yours) will give us the ability to choose between next_* and for_each_* guilt-free. :-) -Howard

Hi, Howard Hinnant <howard.hinnant <at> gmail.com> writes:
I'm not sure if it is desirable though. It is an awful lot of if-checking added to the algorithm. All that checking would tend to penalize those clients not wanting to break out early.
Couldn't the checks for early breaks be removed at compile-time by using callable returning a convertible to bool instead of a bool ? At least it appears to do so when returning a boost::mpl::bool_<false> with g++ at default optimization level.[*] Best Regards, Bernard [*] http://paste.debian.net/103792/

At Wed, 5 Jan 2011 12:20:16 +0000 (UTC), bernardH wrote:
Hi,
Howard Hinnant <howard.hinnant <at> gmail.com> writes:
I'm not sure if it is desirable though. It is an awful lot of if-checking added to the algorithm. All that checking would tend to penalize those clients not wanting to break out early.
Couldn't the checks for early breaks be removed at compile-time by using callable returning a convertible to bool instead of a bool ?
At least it appears to do so when returning a boost::mpl::bool_<false> with g++ at default optimization level.[*]
yeah, for some years I've been re-thinking the "early break from BGL algorithms" problem along these lines. I think it's an approach worth pursuing. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Jan 5, 2011, at 7:20 AM, bernardH wrote:
Hi,
Howard Hinnant <howard.hinnant <at> gmail.com> writes:
I'm not sure if it is desirable though. It is an awful lot of if-checking added to the algorithm. All that checking would tend to penalize those clients not wanting to break out early.
Couldn't the checks for early breaks be removed at compile-time by using callable returning a convertible to bool instead of a bool ?
At least it appears to do so when returning a boost::mpl::bool_<false> with g++ at default optimization level.[*]
Best Regards, Bernard
Thanks for the code sample. That *always* helps. In my current code my algorithm is recursive: template <class BidirectionalIterator, class Function> void combine_discontinuous(BidirectionalIterator first1, BidirectionalIterator last1, typename std::iterator_traits<BidirectionalIterator>::difference_type d1, BidirectionalIterator first2, BidirectionalIterator last2, typename std::iterator_traits<BidirectionalIterator>::difference_type d2, Function& f, typename std::iterator_traits<BidirectionalIterator>::difference_type d = 0) { typedef typename std::iterator_traits<BidirectionalIterator>::difference_type D; using std::swap; if (d2 == 0) f(); else { switch (d1) { case 0: return; case 1: for (BidirectionalIterator i2 = first2; i2 != last2; ++i2) { f(); swap(*first1, *i2); } break; default: BidirectionalIterator f1p = std::next(first1); BidirectionalIterator i2 = first2; for (D d22 = d2; i2 != last2; ++i2, --d22) { combine_discontinuous(f1p, last1, d1-1, i2, last2, d22, f, d+1); swap(*first1, *i2); } break; } f(); if (d != 0) rotate_discontinuous(first1, last1, d1, std::next(first2), last2, d2-1); else rotate_discontinuous(first1, last1, d1, first2, last2, d2); } } Transforming this to break out early: template <class BidirectionalIterator, class Function> bool combine_discontinuous(BidirectionalIterator first1, BidirectionalIterator last1, typename std::iterator_traits<BidirectionalIterator>::difference_type d1, BidirectionalIterator first2, BidirectionalIterator last2, typename std::iterator_traits<BidirectionalIterator>::difference_type d2, Function& f, typename std::iterator_traits<BidirectionalIterator>::difference_type d = 0) { typedef typename std::iterator_traits<BidirectionalIterator>::difference_type D; using std::swap; if (d2 == 0) { if (f()) return true; } else { switch (d1) { case 0: return false; case 1: for (BidirectionalIterator i2 = first2; i2 != last2; ++i2) { if (f()) return true; swap(*first1, *i2); } break; default: BidirectionalIterator f1p = std::next(first1); BidirectionalIterator i2 = first2; for (D d22 = d2; i2 != last2; ++i2, --d22) { if (combine_discontinuous(f1p, last1, d1-1, i2, last2, d22, f, d+1)) return true; swap(*first1, *i2); } break; } if (f()) return true; if (d != 0) rotate_discontinuous(first1, last1, d1, std::next(first2), last2, d2-1); else rotate_discontinuous(first1, last1, d1, first2, last2, d2); } return false; } The problem is I can't return a compile-time constant because of the recursion. To complicate things, in some of the algorithms I wrap the Function with an adaptor on the way down in order to fancier things like reversible_permutation. That's the bad news. Now since I went to the trouble to code this up to show you my problem, I figured I might as well test it. ... It doesn't seem to be any slower! :-) Wouldn't be the first time the compiler is smarter than I give it credit for. ;-) Now all I have to decide is if I like this break-out feature enough to require all user-supplied functors to follow the convention: return false if you don't want to break out of the loop, else return true when you want to break. The answer might depend upon if we get an equally performant next_* going. If we do, then I don't think the functor should return a bool to break: instead the client should use next_*. If we don't, then putting in the break feature means, at least to me, that there is no reason to use the next_* variant since it is always slower (even if only by a little). Thanks for furthering these algorithms. -Howard

At Wed, 5 Jan 2011 15:04:04 -0500, Howard Hinnant wrote:
On Jan 5, 2011, at 2:45 PM, Howard Hinnant wrote:
It doesn't seem to be any slower! :-)
Correction: If the compiler can see that the functor always returns false it isn't any slower. If it can't (separate compilation unit, or whatever), there's about a 12% speed hit.
That is consistent with my understanding of modern optimization capabilities. I wonder if link-time optimization can quash that 12% as well? Anyway, for algorithms like I think the separate-compilation case doesn't matter much. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Howard Hinnant <howard.hinnant <at> gmail.com> writes:
On Jan 5, 2011, at 7:20 AM, bernardH wrote:
In my current code my algorithm is recursive: ...
The problem is I can't return a compile-time constant because of the recursion.
I must be missing something, but I fail to see the problem : couldn't you just use an #define EARLY_EXIT(condition) \ do{ \ result_type const result(condition); \ if(result) \ return result; \ }while(false) // to allow ; with typedef typename result_of< Function(typename std::iterator_traits<BidirectionalIterator>::value_type)
::type result_type; ? Untested code (*blush*) @http://paste.debian.net/103845/
And if this doesn't work, you could always have two separate implementations one for Callable returning void and one for Callable returning bool and use enable_if on the result_type to select the proper one, no ?
To complicate things, in some of the algorithms I wrap the Function with an adaptor on the way down in order to fancier things like reversible_permutation.
If my solution is correct, I do (want to :) )believe that it could be adapted.
The answer might depend upon if we get an equally performant next_* going. If we do, then I don't think the functor should return a bool to break: instead the client should use next_*. If we don't, then putting in the break feature means, at least to me, that there is no reason to use the next_* variant since it is always slower (even if only by a little).
Agreed.
Thanks for furthering these algorithms.
Thank *you* for your most excellent code. Best Regards, Bernard

I've updated the for_each algorithms to break out early if the functor returns true: http://home.roadrunner.com/~hinnant/combinations.html I've used a simple bool return rather than any of the fancier tricks. The speed hit didn't seem that severe and is actually zero if the compiler can see that the functor always returns false. I can easily revert this to the void-return form if that becomes necessary. -Howard

Howard Hinnant <howard.hinnant <at> gmail.com> writes:
I've used a simple bool return rather than any of the fancier tricks. The speed hit didn't seem that severe and is actually zero if the compiler can see that the functor always returns false.
Fwiw, despite my previous messages, I think using plain bool is the right thing to do. For callables that would not need the test, the call will either be : 1°) out of line : the function call will dwarf the useless test 2°) inlined : the compiler will remove the useless test. (Not even speaking of the actual processing performed by the callable). I got carried away by my fondness for boost::mpl :). Best Regards, Bernard

On 4/01/2011 2:40 PM, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
Have you considered this as the implementation for the combinations routine: template <typename Iterator> inline bool next_combination(const Iterator first, Iterator k, const Iterator last) { /* Credits: Thomas Draper */ if ((first == last) || (first == k) || (last == k)) return false; Iterator itr1 = first; Iterator itr2 = last; ++itr1; if (last == itr1) return false; itr1 = last; --itr1; itr1 = k; --itr2; while (first != itr1) { if (*--itr1 < *itr2) { Iterator j = k; while (!(*itr1 < *j)) ++j; std::iter_swap(itr1,j); ++itr1; ++j; itr2 = k; std::rotate(itr1,j,last); while (last != j) { ++j; ++itr2; } std::rotate(k,itr2,last); return true; } } std::rotate(first,k,last); return false; } Sourced from: http://marknelson.us/2002/03/01/next-permutation/

On Jan 5, 2011, at 4:40 AM, Arash Partow wrote:
On 4/01/2011 2:40 PM, Howard Hinnant wrote:
Warming this up for tr2:
http://home.roadrunner.com/~hinnant/combinations.html
Comments welcome. Real world use welcome.
Have you considered this as the implementation for the combinations routine:
This looks very similar to the next_combination I'm comparing to in the paper. -Howard
template <typename Iterator> inline bool next_combination(const Iterator first, Iterator k, const Iterator last) { /* Credits: Thomas Draper */ if ((first == last) || (first == k) || (last == k)) return false; Iterator itr1 = first; Iterator itr2 = last; ++itr1; if (last == itr1) return false; itr1 = last; --itr1; itr1 = k; --itr2; while (first != itr1) { if (*--itr1 < *itr2) { Iterator j = k; while (!(*itr1 < *j)) ++j; std::iter_swap(itr1,j); ++itr1; ++j; itr2 = k; std::rotate(itr1,j,last); while (last != j) { ++j; ++itr2; } std::rotate(k,itr2,last); return true; } } std::rotate(first,k,last); return false; }
Sourced from: http://marknelson.us/2002/03/01/next-permutation/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (11)
-
Anthony Williams
-
Arash Partow
-
bernardH
-
BernardH
-
Christopher Jefferson
-
Dave Abrahams
-
Howard Hinnant
-
Jeffrey Lee Hellrung, Jr.
-
Mathias Gaunard
-
Phil Endecott
-
Thorsten Ottosen