[iterator] std::sort on zip_iterator (repost)

newer
Bug Sprint Report for Thursday,...

Jeremiah Willcock

8 May 2009 8 May '09

6:03 p.m.

(Resubmitting to Boost list because I did not receive any answers on boost-users, nor any helpful answers on the Boost list.) I would like to sort two separate vectors of numbers, treating corresponding elements of the two vectors as tuples and sorting the tuples. In particular, I am only comparing the first elements of the tuples; that may or may not be relevant to my problem. An example program is: #include <vector> #include <algorithm> #include <iostream> #include <boost/tuple/tuple.hpp> #include <boost/iterator/zip_iterator.hpp> struct compare_first_elements_in_tuples { template <typename Tuple> bool operator()(const Tuple& a, const Tuple& b) const { return (a.template get<0>()) < (b.template get<0>()); } }; int main(int, char**) { size_t a_data[5] = {0, 2, 1, 2, 4}; size_t b_data[5] = {1, 1, 5, 3, 7}; std::vector<size_t> a(a_data, a_data + 5); std::vector<size_t> b(b_data, b_data + 5); std::cout << "Before:"; for (size_t i = 0; i < a.size(); ++i) { std::cout << " (" << a[i] << ", " << b[i] << ")"; } std::cout << std::endl; typedef std::vector<size_t>::iterator vec_iter; typedef boost::zip_iterator<boost::tuple<vec_iter, vec_iter> > zip_iter; std::sort(zip_iter(boost::make_tuple(a.begin(), b.begin())), zip_iter(boost::make_tuple(a.end(), b.end())), compare_first_elements_in_tuples()); std::cout << "After:"; for (size_t i = 0; i < a.size(); ++i) { std::cout << " (" << a[i] << ", " << b[i] << ")"; } std::cout << std::endl; return 0; } On gcc 4.0.1, the sort operation results in the element (2, 1) (a pair of corresponding elements in a and b) being duplicated, while the element (1, 5) does not appear in the result at all: Before: (0, 1) (2, 1) (1, 5) (2, 3) (4, 7) After: (0, 1) (2, 1) (2, 1) (2, 3) (4, 7) I suspect the problem is related to the fact that zip_iterator does not return a "real" reference when dereferenced. Am I mistaken about this? Is the code above supposed to work? If not, is there an alternative way to sort one sequence while swapping elements in another in a corresponding way? I am not able to join the two input sequences into a single sequence of pairs because of other factors in the use case for this code. Thank you for your help. -- Jeremiah Willcock

Show replies by date

Steven Watanabe

8 May 8 May

7:25 p.m.

AMDG Jeremiah Willcock wrote:

...

(Resubmitting to Boost list because I did not receive any answers on boost-users, nor any helpful answers on the Boost list.)

I would like to sort two separate vectors of numbers, treating corresponding elements of the two vectors as tuples and sorting the tuples. In particular, I am only comparing the first elements of the tuples; that may or may not be relevant to my problem. An example program is:

<snip>

On gcc 4.0.1, the sort operation results in the element (2, 1) (a pair of corresponding elements in a and b) being duplicated, while the element (1, 5) does not appear in the result at all:

Before: (0, 1) (2, 1) (1, 5) (2, 3) (4, 7) After: (0, 1) (2, 1) (2, 1) (2, 3) (4, 7)

I suspect the problem is related to the fact that zip_iterator does not return a "real" reference when dereferenced. Am I mistaken about this?

You are correct.

...

Is the code above supposed to work?

No.

...

If not, is there an alternative way to sort one sequence while swapping elements in another in a corresponding way? I am not able to join the two input sequences into a single sequence of pairs because of other factors in the use case for this code. Thank you for your help.

std::sort requires a Random Access Iterator. zip_iterator is not a Random Access Iterator. In Christ, Steven Watanabe

Steven Ross

8:25 p.m.

...

If not, is there an alternative way to sort one sequence while swapping

...
elements in another in a corresponding way? I am not able to join the two input sequences into a single sequence of pairs because of other factors in the use case for this code. Thank you for your help.

std::sort requires a Random Access Iterator. zip_iterator is not a Random Access Iterator.

I've proposed a sorting library here; it only works for Random Access Iterators, but there were some requests in the past for a more general sort implementation. Theoretically, anything that can be iterated and swapped can be sorted. Do you have a genuine usage case where a sort would be useful where an item can be iterated and swapped, but does not have a Random Access Iterator? If so, I could reconsider the request and see if I can find (or write) an effective algorithm and template it.

Jeremiah Willcock

8:38 p.m.

On Fri, 8 May 2009, Steven Ross wrote:

...

...
If not, is there an alternative way to sort one sequence while swapping

...
elements in another in a corresponding way? I am not able to join the two input sequences into a single sequence of pairs because of other factors in the use case for this code. Thank you for your help.

std::sort requires a Random Access Iterator. zip_iterator is not a Random Access Iterator.

I've proposed a sorting library here; it only works for Random Access Iterators, but there were some requests in the past for a more general sort implementation. Theoretically, anything that can be iterated and swapped can be sorted. Do you have a genuine usage case where a sort would be useful where an item can be iterated and swapped, but does not have a Random Access Iterator? If so, I could reconsider the request and see if I can find (or write) an effective algorithm and template it.

Yes, I do -- the case of sorting two (or more) corresponding sequences using a zip_iterator. This capability is important to save memory when explicitly zipping the sequences into a new sequence is not possible. Note that zip_iterator still has random access in constant time, but just does not return an actual reference but a tuple of references. -- Jeremiah Willcock

Steven Ross

9:37 p.m.

On Fri, May 8, 2009 at 1:38 PM, Jeremiah Willcock <jewillco@osl.iu.edu>wrote:

...

Theoretically, anything that can be iterated and swapped can be sorted. Do

...
you have a genuine usage case where a sort would be useful where an item can be iterated and swapped, but does not have a Random Access Iterator? If so, I could reconsider the request and see if I can find (or write) an effective algorithm and template it.

Yes, I do -- the case of sorting two (or more) corresponding sequences using a zip_iterator. This capability is important to save memory when explicitly zipping the sequences into a new sequence is not possible. Note that zip_iterator still has random access in constant time, but just does not return an actual reference but a tuple of references.

so if I (theoretically) were to use ++ to iterate over the array, calling std::swap(element_position, correct_position), then the zip_iterators would be sorted correctly? Can I use -- (Is it bidirectional)? That would improve runtime. I would expect begin() and end() to be passed in as the arguments to the sort method. If so, please send me a testcase you expect to work, and I can try to find an appropriate algorithm, and template it (Quicksort would work, but it's O(n*n)).

Jeremiah Willcock

11:30 p.m.

On Fri, 8 May 2009, Steven Ross wrote:

...

...
Yes, I do -- the case of sorting two (or more) corresponding sequences using a zip_iterator. This capability is important to save memory when explicitly zipping the sequences into a new sequence is not possible. Note that zip_iterator still has random access in constant time, but just does not return an actual reference but a tuple of references.

so if I (theoretically) were to use ++ to iterate over the array, calling std::swap(element_position, correct_position), then the zip_iterators would be sorted correctly? Can I use -- (Is it bidirectional)? That would improve runtime. I would expect begin() and end() to be passed in as the arguments to the sort method. If so, please send me a testcase you expect to work, and I can try to find an appropriate algorithm, and template it (Quicksort would work, but it's O(n*n)).

I think you would need an implementation of swap_tuples like in my later email (containing "[tuple] swap" in the subject). You can use ++ on the array, but there is also constant-time --, +=, -=, [], etc. The zip_iterators have full random access capability; they just do not model the old Random Access Iterator concept because operator* does not return a reference (it returns a tuple of references instead). The test code (using std::sort) is in the email that started this thread. -- Jeremiah Willcock

Steven Ross

9 May 9 May

2:51 a.m.

On Fri, May 8, 2009 at 4:30 PM, Jeremiah Willcock <jewillco@osl.iu.edu>wrote:

...

On Fri, 8 May 2009, Steven Ross wrote:

Yes, I do -- the case of sorting two (or more) corresponding sequences

...
...
using a zip_iterator. This capability is important to save memory when explicitly zipping the sequences into a new sequence is not possible. Note that zip_iterator still has random access in constant time, but just does not return an actual reference but a tuple of references.

so if I (theoretically) were to use ++ to iterate over the array, calling std::swap(element_position, correct_position), then the zip_iterators would be sorted correctly? Can I use -- (Is it bidirectional)? That would improve runtime. I would expect begin() and end() to be passed in as the arguments to the sort method. If so, please send me a testcase you expect to work, and I can try to find an appropriate algorithm, and template it (Quicksort would work, but it's O(n*n)).

I think you would need an implementation of swap_tuples like in my later email (containing "[tuple] swap" in the subject). You can use ++ on the array, but there is also constant-time --, +=, -=, [], etc. The zip_iterators have full random access capability; they just do not model the old Random Access Iterator concept because operator* does not return a reference (it returns a tuple of references instead). The test code (using std::sort) is in the email that started this thread.

You'd have to use a specialized sorting algorithm that does not dereference any elements and does not use std::swap. To make it as general as possible, I think it would be best to define a simple_sort that works on bidirectional iterators and lets the user override the swap method to allow tuple swapping, for example. The idea is a fallback sort that can sort almost any sortable data structure, though not necessarily at optimal speed. You're welcome to write this yourself, possibly using the quicksort part of the STL introsort definition, or you could wait for me to write it, which I should eventually get to.

Edouard A.

6:20 p.m.

...

You'd have to use a specialized sorting algorithm that does not dereference any elements and does not use std::swap. To make it as general as possible, I think it would be best to define a simple_sort that works on bidirectional iterators and lets the user override the swap method to allow tuple swapping, for example. The idea is a fallback sort that can sort almost any sortable data structure, though not necessarily at optimal speed. You're welcome to write this yourself, possibly using the quicksort part of the STL introsort definition, or you could wait for me to write it, which I should eventually get to.

Quick/introsort requires a random access iterator, if you only have bidirectional iterators, merge sort is more appropriate. It's very easy to implement using std::merge (or std::inplace_merge for a speed/memory tradeoff) : - If the list is small (below 30 ?) - use insert_sort - else - Part the list in two at the middle - Call merge sort on each part (recursion here) - std::merge (or std::inplace_merge) the two resulting lists -- EA

Jeremiah Willcock

7:04 p.m.

...

Quick/introsort requires a random access iterator, if you only have bidirectional iterators, merge sort is more appropriate.

But I do have a random access iterator. The zip_iterator of multiple random access iterators is random access; it just fails to work with std::sort because its operator* does not return a "real" reference. Therefore, quicksort or introsort should work just fine with O(N lg N) complexity, but the Standard Library implementations probably won't because they make too many assumptions about references. -- Jeremiah Willcock

Steven Watanabe

7:26 p.m.

AMDG Jeremiah Willcock wrote:

...

...
Quick/introsort requires a random access iterator, if you only have bidirectional iterators, merge sort is more appropriate.

But I do have a random access iterator. The zip_iterator of multiple random access iterators is random access; it just fails to work with std::sort because its operator* does not return a "real" reference. Therefore, quicksort or introsort should work just fine with O(N lg N) complexity, but the Standard Library implementations probably won't because they make too many assumptions about references.

I'm afraid that these assumptions are written into the current standard. According to 24.1.3 you do not even have a forward iterator. In Christ, Steven Watanabe

Jeremiah Willcock

7:29 p.m.

On Sat, 9 May 2009, Steven Watanabe wrote:

...

AMDG

Jeremiah Willcock wrote:

...
...
Quick/introsort requires a random access iterator, if you only have bidirectional iterators, merge sort is more appropriate.

But I do have a random access iterator. The zip_iterator of multiple random access iterators is random access; it just fails to work with std::sort because its operator* does not return a "real" reference. Therefore, quicksort or introsort should work just fine with O(N lg N) complexity, but the Standard Library implementations probably won't because they make too many assumptions about references.

I'm afraid that these assumptions are written into the current standard. According to 24.1.3 you do not even have a forward iterator.

I understand that it is not random access according to the standard. What I was saying is that algorithm-wise (in terms of writing them from scratch), the iterator can be treated as having actual random access (with the new iterator hierarchy), even though it does not satisfy the requirements that are part of the specification for std::sort and such. That is, it satisfies some requirements (constant-time +=, etc), and the deficiencies can be worked around in a newly-written sorting algorithm. The discussion was about whether to use quick sort or merge sort, and either of these would be efficient on zip_iterators if they were rewritten to handle the swap problem. -- Jeremiah Willcock

Edouard A.

10 May 10 May

4:56 a.m.

...

I understand that it is not random access according to the standard. What I was saying is that algorithm-wise (in terms of writing them from scratch), the iterator can be treated as having actual random access (with the new iterator hierarchy), even though it does not satisfy the requirements that are part of the specification for std::sort and such. That is, it satisfies some requirements (constant-time +=, etc), and the deficiencies can be worked around in a newly-written sorting algorithm. The discussion was about whether to use quick sort or merge sort, and either of these would be efficient on zip_iterators if they were rewritten to handle the swap problem.

Ok now I clearly see the problem. You need a sort algorithm that takes the dereference to the element and the swap as template policy parameters. One could write a specialized sort, but I think it would be even nicer to have a customizable introsort (ie you could customize the pivot selection, thresholds, fallback algorithms, comparison, dereference and maybe other things). Performances would probably be a tiny bit below std::sort (because inlining doesn't always work as expected) but you would use it when std::sort doesn't work, or when you are in a case where the default pivot (generally median of 3 or median of 5) is suboptimal for you. As for quicksort vs mergesort (I would include heapsort in the comparison as well) I would bet on quicksort, but benchmarking is the only way to know. -- EA

Steven Ross

9 May 9 May

9 p.m.

On Sat, May 9, 2009 at 11:20 AM, Edouard A. <edouard@fausse.info> wrote:

...

Quick/introsort requires a random access iterator, if you only have bidirectional iterators, merge sort is more appropriate.

Introsort requires a random access iterator for the partial_sort recursive call, agreed. Strictly speaking, if you're willing to iterate to the pivot point, the Quicksort algorithm only requires a bidirectional iterator and a swap method. Iterating to the pivot point doesn't change the computational complexity, nor is likely to greatly slow down the algorithm. Thanks for the merge_sort suggestion, Edouard. How about this: flex_sort (if that name isn't taken) starts with quicksort for average-case speed after X iterations (just like introsort) switches to mergesort (as opposed to partial_sort) for O(nlog(n)) performance written to be as general as possible, with these requirements: ++ and -- operations that iterate maybe support for advance if it's just as general as ++ and -- a swap method * operator that returns either a reference or a const reference (so comparisons can be made) I can't think of a reason for needing to sort something that can only be iterated forward, but if someone can give me a good argument for it, I can eliminate the -- requirement, though it will hurt performance. If I can't use the * operator, then I'd have to have an iter_compare method that takes two iterators and compares them. The extra indirection will be slower. In the zip_iterator example, does the * operator return a const reference that can be used for comparison? The calls and overloads for my suggestion would be: flex_sort(first, last) flex_sort(first, last, compare) flex_sort(first, last, compare, swap) I would expect this algorithm to be slightly slower than introsort on average, and much slower in conditions where it switches to mergesort, but provide substantially more flexibility with regards to what can be sorted, and with the same order of computational complexity. Does that sound like a useful and appropriate general solution?

Edouard A.

10 May 10 May

5:39 a.m.

...

How about this: [...] The calls and overloads for my suggestion would be: flex_sort(first, last) flex_sort(first, last, compare) flex_sort(first, last, compare, swap)

Exactly what I had in mind, except I was more thinking about something like template < typename Iterator, typename Compare, int SmallThreshold, int DepthFallback, typename MainSort, typename FallbackSort, typename Swapper

...

void flex_sort(Iterator first, Iterator last, Compare compare) Mainsort would be something like Template <typename Iterator, typename Compare, typename Swapper> struct { size_t distance(Iterator first, Iterator last); // finds pivot & partition Iterator split(Iterator first, Iterator last); }; And all MainSort and FallbackSort would share some traits and tell what kind of iterator they need (to have static assertions and stuff, at compile time you would know that you're trying to use a sort algorithm that's inefficient for the iterators you have). Swapper would be struct { Element get_elem(Iterator i); void operator()(Iterator left, Iterator right); }; So basically flex_sort would be something like (typing as I think, not valid C++): void flex_sort(Iterator first, Iterator last, Compare compare, int depth = 0) { if (depth >= DepthFallBack) { FallbackSort<...>(compare, swapper)(first, last); } else { Iterator pivot = MainSort<...>(compare, swapper).split(first, last); // pivot => last we didn't split but sort the stuff (means the data was too small to be split) if (pivot != last) { ++depth; flex_sort(first, pivot, compare, depth); advance(pivot, 1); flex_sort(pivot, last, compare, depth); } } } For quicksort, split would be: void MainSort::split(first, last) { Iterator pivot = last; if (distance(first, last) > threshold) { pivot = find_pivot(first, last); Iterator start = first; advance(start, 1); pivot = partition (start, last, std::bind2nd(compare, swapper.get_elem(pivot))); swapper(pivot, first); } else { insert_sort(first, last... ?); } return pivot; } What do you think? -- EA

Steven Ross

1:54 p.m.

That looks good to me. Passing all the functors will hurt performance, but probably not horribly so if you make sure to keep the insertion sort threshold; you might even increase the insertion sort threshold a bit. After it's written if it makes a big difference we could add overloads with less functors. The point of this isn't to make a specialized fast sort; it's to make a sort that is as general as possible and is not too much slower than introsort. Do you want to write it, Edouard? On Sat, May 9, 2009 at 10:39 PM, Edouard A. <edouard@fausse.info> wrote:

...

...
How about this: [...] The calls and overloads for my suggestion would be: flex_sort(first, last) flex_sort(first, last, compare) flex_sort(first, last, compare, swap)

Exactly what I had in mind, except I was more thinking about something like

template < typename Iterator, typename Compare, int SmallThreshold, int DepthFallback, typename MainSort, typename FallbackSort, typename Swapper

...
void flex_sort(Iterator first, Iterator last, Compare compare)

Mainsort would be something like

Template <typename Iterator, typename Compare, typename Swapper> struct { size_t distance(Iterator first, Iterator last); // finds pivot & partition Iterator split(Iterator first, Iterator last); };

And all MainSort and FallbackSort would share some traits and tell what kind of iterator they need (to have static assertions and stuff, at compile time you would know that you're trying to use a sort algorithm that's inefficient for the iterators you have).

Swapper would be

struct {

Element get_elem(Iterator i); void operator()(Iterator left, Iterator right); };

So basically flex_sort would be something like (typing as I think, not valid C++):

void flex_sort(Iterator first, Iterator last, Compare compare, int depth = 0) { if (depth >= DepthFallBack) { FallbackSort<...>(compare, swapper)(first, last); } else { Iterator pivot = MainSort<...>(compare, swapper).split(first, last);

// pivot => last we didn't split but sort the stuff (means the data was too small to be split) if (pivot != last) { ++depth;

flex_sort(first, pivot, compare, depth); advance(pivot, 1); flex_sort(pivot, last, compare, depth); } } }

For quicksort, split would be:

void MainSort::split(first, last) { Iterator pivot = last;

if (distance(first, last) > threshold) { pivot = find_pivot(first, last);

Iterator start = first; advance(start, 1); pivot = partition (start, last, std::bind2nd(compare, swapper.get_elem(pivot)));

swapper(pivot, first); } else { insert_sort(first, last... ?); }

return pivot; }

What do you think?

--

EA

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Steven Ross

2:09 p.m.

Scratch that; as template parameters they shouldn't hurt performance at all. That's a good idea; the user needs to pass lots of template parameters, but it's easy to create overloads that pass in defaults. Theoretically you should be able to have a version that is introsort, with identical performance. If someone else wants to write it, I'd be willing to maintain it in the Boost.Algorithm.Sorting library (assuming it is accepted); not that I'd try to stop the author from maintaining it if they wanted to. On Sun, May 10, 2009 at 6:54 AM, Steven Ross <spreadsort@gmail.com> wrote:

...

That looks good to me. Passing all the functors will hurt performance, but probably not horribly so if you make sure to keep the insertion sort threshold; you might even increase the insertion sort threshold a bit. After it's written if it makes a big difference we could add overloads with less functors. The point of this isn't to make a specialized fast sort; it's to make a sort that is as general as possible and is not too much slower than introsort.

Do you want to write it, Edouard?

On Sat, May 9, 2009 at 10:39 PM, Edouard A. <edouard@fausse.info> wrote:

...
...
How about this: [...] The calls and overloads for my suggestion would be: flex_sort(first, last) flex_sort(first, last, compare) flex_sort(first, last, compare, swap)

Exactly what I had in mind, except I was more thinking about something like

template < typename Iterator, typename Compare, int SmallThreshold, int DepthFallback, typename MainSort, typename FallbackSort, typename Swapper

...
void flex_sort(Iterator first, Iterator last, Compare compare)

Mainsort would be something like

Template <typename Iterator, typename Compare, typename Swapper> struct { size_t distance(Iterator first, Iterator last); // finds pivot & partition Iterator split(Iterator first, Iterator last); };

And all MainSort and FallbackSort would share some traits and tell what kind of iterator they need (to have static assertions and stuff, at compile time you would know that you're trying to use a sort algorithm that's inefficient for the iterators you have).

Swapper would be

struct {

Element get_elem(Iterator i); void operator()(Iterator left, Iterator right); };

So basically flex_sort would be something like (typing as I think, not valid C++):

void flex_sort(Iterator first, Iterator last, Compare compare, int depth = 0) { if (depth >= DepthFallBack) { FallbackSort<...>(compare, swapper)(first, last); } else { Iterator pivot = MainSort<...>(compare, swapper).split(first, last);

// pivot => last we didn't split but sort the stuff (means the data was too small to be split) if (pivot != last) { ++depth;

flex_sort(first, pivot, compare, depth); advance(pivot, 1); flex_sort(pivot, last, compare, depth); } } }

For quicksort, split would be:

void MainSort::split(first, last) { Iterator pivot = last;

if (distance(first, last) > threshold) { pivot = find_pivot(first, last);

Iterator start = first; advance(start, 1); pivot = partition (start, last, std::bind2nd(compare, swapper.get_elem(pivot)));

swapper(pivot, first); } else { insert_sort(first, last... ?); }

return pivot; }

What do you think?

--

EA

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Edouard A.

3:37 p.m.

...

Scratch that; as template parameters they shouldn't hurt performance at all. That's a good idea; the user needs to pass lots of template parameters, but it's easy to create overloads that pass in defaults. Theoretically you should be able to have a version that is introsort, with identical performance.

Yes, inlining + copy elision should make the performance impact < epsilon. Last profiling I made on sorting, I don't recall spending too much on function calls.

...

If someone else wants to write it, I'd be willing to maintain it in the Boost.Algorithm.Sorting library (assuming it is accepted); not that I'd try to stop the author from maintaining it if they wanted to.

I can do it. I can add a wrapper to spread sort as well. The idea is to give as many bricks as possible with some default constructions for the user who isn't interested in dwelling to much in the details. That's an approach I like. Simple and satisfactory default behavior with the possibility for heavy customization. I'll have some time next week and guess what, I have a new rig to benchmark! When I returned from Aspen I found on the threshold of my door an abandoned Core i7 and 12 GB of RAM with a letter saying "take good care of them". Oh yeah. >) -- EA

Steven Ross

4:34 p.m.

On Sun, May 10, 2009 at 8:37 AM, Edouard A. <edouard@fausse.info> wrote:

...

I can do it. I can add a wrapper to spread sort as well. The idea is to give as many bricks as possible with some default constructions for the user who isn't interested in dwelling to much in the details. That's an approach I like. Simple and satisfactory default behavior with the possibility for heavy customization.

I'll have some time next week and guess what, I have a new rig to benchmark!

When I returned from Aspen I found on the threshold of my door an abandoned Core i7 and 12 GB of RAM with a letter saying "take good care of them".

Oh yeah. >)

Sounds good to me. I might have to modify spreadsort a bit to make it work reasonably efficiently with bidirectional iterators, but it's doable. You may want to consider keeping track of counts instead of calculating distance.

Steven Ross

28 May 28 May

2:26 p.m.

Is there a boost::iter_swap? It would be useful to enable use of a more specialized iter_swap for sorting. This would apply both to my hybrid algorithms and spreadsort.

Steven Ross

2:26 p.m.

and flex_sort On Thu, May 28, 2009 at 7:26 AM, Steven Ross <spreadsort@gmail.com> wrote:

...

Is there a boost::iter_swap? It would be useful to enable use of a more specialized iter_swap for sorting. This would apply both to my hybrid algorithms and spreadsort.

Christopher Jefferson

3:15 p.m.

On 28 May 2009, at 15:26, Steven Ross wrote:

...

Is there a boost::iter_swap? It would be useful to enable use of a more specialized iter_swap for sorting. This would apply both to my hybrid algorithms and spreadsort.

One thing to watch out for (you might know this) is that standard libraries are not required to use iter_swap for iterators, and in general don't (they often use swap(*a,*b)). Chris

Steven Ross

5:09 p.m.

I believe the zip_iterator problem is that *a returns a const. swap(*a, *b) won't work for that situation; it needs a specialized iter_swap. From my understanding of Boost::swap, using an equivalent Boost::iter_swap would resolve the problem. Additionally, iter_swap can be be faster, and it's what SGI's std::sort uses. On Thu, May 28, 2009 at 8:15 AM, Christopher Jefferson < chris@bubblescope.net> wrote:

...

On 28 May 2009, at 15:26, Steven Ross wrote:

Is there a boost::iter_swap? It would be useful to enable use of a more

...
specialized iter_swap for sorting. This would apply both to my hybrid algorithms and spreadsort.

One thing to watch out for (you might know this) is that standard libraries are not required to use iter_swap for iterators, and in general don't (they often use swap(*a,*b)).

Chris

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

David Abrahams

4 Jun 4 Jun

6:20 p.m.

on Sat May 09 2009, "Edouard A." <edouard-AT-fausse.info> wrote:

...

...
You'd have to use a specialized sorting algorithm that does not dereference any elements and does not use std::swap. To make it as general as possible, I think it would be best to define a simple_sort that works on bidirectional iterators and lets the user override the swap method to allow tuple swapping, for example. The idea is a fallback sort that can sort almost any sortable data structure, though not necessarily at optimal speed. You're welcome to write this yourself, possibly using the quicksort part of the STL introsort definition, or you could wait for me to write it, which I should eventually get to.

Quick/introsort requires a random access iterator, if you only have bidirectional iterators, merge sort is more appropriate.

Quicksort really doesn't take advantage of random access at all, other than possibly for choosing better pivots. The basic structure is a recursive "partition, then sort-both-halves," which you can do nicely with bidirectional iterators. With a little extra storage I think you can even do it with forward iterators. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Steven Ross

5 Jun 5 Jun

2:05 p.m.

On Thu, Jun 4, 2009 at 11:20 AM, David Abrahams <dave@boostpro.com> wrote:

...

on Sat May 09 2009, "Edouard A." <edouard-AT-fausse.info> wrote:

...
...
You'd have to use a specialized sorting algorithm that does not dereference any elements and does not use std::swap. To make it as general as possible, I think it would be best to define a simple_sort that works on bidirectional iterators and lets the user override the swap method to allow tuple swapping, for example. The idea is a fallback sort that can sort almost any sortable data structure, though not necessarily at optimal speed. You're welcome to write this yourself, possibly using the quicksort part of the STL introsort definition, or you could wait for me to write it, which I should eventually get to.

Quick/introsort requires a random access iterator, if you only have bidirectional iterators, merge sort is more appropriate.

Quicksort really doesn't take advantage of random access at all, other than possibly for choosing better pivots. The basic structure is a recursive "partition, then sort-both-halves," which you can do nicely with bidirectional iterators. With a little extra storage I think you can even do it with forward iterators.

The partition step is more efficient with random access (unless you partition on the first or last element, which is bad for sorted data), but agreed, Quicksort is good with bidirectional iterators, once you partition. You can trivially convert a list of forward iterators into reverse iterators by holding a vector of them, using O(n) memory, but then you might as well use LSB radix sort and sort the memory externally if you're willing to sacrifice the memory. I was thinking that for general sorting, a more appropriate solution would be to create O(square_root(n)) index locations in the reverse iteration array, along with storing the local square_root(n) locations, which get refreshed (in O(square_root(n)) time) every time an iterator backs over a point in the index. This approach could be generalized in a class, but each "reverse iterator" would take O(square_root(n)) memory, though reference counting could be used to avoid duplication (to a maximum of n + square_root(n) additional locations, if there were square_root(n) reverse iterators, at least 1 for each location in the index). Would that be worth putting into boost? Is reverse iterating over a forward iterator a common problem that occurs outside sorting that would be worth creating a class for?

David Abrahams

4 Jun 4 Jun

6:18 p.m.

on Fri May 08 2009, Steven Ross <spreadsort-AT-gmail.com> wrote:

...

On Fri, May 8, 2009 at 4:30 PM, Jeremiah Willcock <jewillco@osl.iu.edu>wrote:

...
On Fri, 8 May 2009, Steven Ross wrote:

Yes, I do -- the case of sorting two (or more) corresponding sequences

...
...
using a zip_iterator. This capability is important to save memory when explicitly zipping the sequences into a new sequence is not possible. Note that zip_iterator still has random access in constant time, but just does not return an actual reference but a tuple of references.

so if I (theoretically) were to use ++ to iterate over the array, calling std::swap(element_position, correct_position), then the zip_iterators would be sorted correctly? Can I use -- (Is it bidirectional)? That would improve runtime. I would expect begin() and end() to be passed in as the arguments to the sort method. If so, please send me a testcase you expect to work, and I can try to find an appropriate algorithm, and template it (Quicksort would work, but it's O(n*n)).

I think you would need an implementation of swap_tuples like in my later email (containing "[tuple] swap" in the subject). You can use ++ on the array, but there is also constant-time --, +=, -=, [], etc. The zip_iterators have full random access capability; they just do not model the old Random Access Iterator concept because operator* does not return a reference (it returns a tuple of references instead). The test code (using std::sort) is in the email that started this thread.

You'd have to use a specialized sorting algorithm that does not dereference any elements and does not use std::swap.

The usual trick to handle proxy refs is to do everything in terms of iter_swap. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

5874

Age (days ago)

5902

Last active (days ago)

List overview

Download

24 comments

6 participants

participants (6)

Christopher Jefferson
David Abrahams
Edouard A.
Jeremiah Willcock
Steven Ross
Steven Watanabe