New algorithm in Boost.Algorithm: "gather" -- looking for comments

I have committed a new algorithm named "gather" into Boost.Algorithm. Actually, it's one I wrote for Sean Parent as part of ASL about five years ago, and it's the first (of hopefully many) of the bits oF ASL to be moved into boost. Here's a summary: gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate. template <typename ForwardIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, ForwardIterator pivot, Pred pred ); Given an sequence containing: int [] arr = { 0 1 2 3 4 5 6 7 8 9 }; A call to gather ( arr, arr + 10, arr + 4, IsEven) will result in: 1 3 0 2 4 6 8 5 7 9 |---|-----| first | second pivot where first and second are the fields of the pair returned by the call. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

AMDG On 01/21/2013 04:49 PM, Marshall Clow wrote:
I have committed a new algorithm named "gather" into Boost.Algorithm.
Actually, it's one I wrote for Sean Parent as part of ASL about five years ago, and it's the first (of hopefully many) of the bits oF ASL to be moved into boost.
Here's a summary:
gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate.
template <typename ForwardIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, ForwardIterator pivot, Pred pred );
Given an sequence containing: int [] arr = { 0 1 2 3 4 5 6 7 8 9 };
A call to gather ( arr, arr + 10, arr + 4, IsEven) will result in:
1 3 0 2 4 6 8 5 7 9 |---|-----| first | second pivot
where first and second are the fields of the pair returned by the call.
I don't understand why the matches are inserted between 3 and 5, rather than (say) 1 and 3. You need to specify exactly what the algorithm does (and does not) guarantee. In Christ, Steven Watanabe

gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate.
template <typename ForwardIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, ForwardIterator pivot, Pred pred );
Given an sequence containing: int [] arr = { 0 1 2 3 4 5 6 7 8 9 };
A call to gather ( arr, arr + 10, arr + 4, IsEven) will result in:
1 3 0 2 4 6 8 5 7 9 |---|-----| first | second pivot
where first and second are the fields of the pair returned by the call.
I don't understand why the matches are inserted between 3 and 5, rather than (say) 1 and 3. You need to specify exactly what the algorithm does (and does not) guarantee.
Notice the pivot is "arr + 4", which holds the same value (4) before and after the call. It seems to me that the algorithm operates independently on the sub-ranges [first, pivot) and [pivot + 1, last), for the first one moving elements satisfying the predicate to just before the pivot (while maintaining the relative order), and for the second one to just after the pivot. Am I understanding that correctly? Regards, Nate

On Jan 21, 2013, at 8:29 PM, Nathan Ridge <zeratul976@hotmail.com> wrote:
gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate.
template <typename ForwardIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, ForwardIterator pivot, Pred pred );
Given an sequence containing: int [] arr = { 0 1 2 3 4 5 6 7 8 9 };
A call to gather ( arr, arr + 10, arr + 4, IsEven) will result in:
1 3 0 2 4 6 8 5 7 9 |---|-----| first | second pivot
where first and second are the fields of the pair returned by the call.
I don't understand why the matches are inserted between 3 and 5, rather than (say) 1 and 3. You need to specify exactly what the algorithm does (and does not) guarantee.
Notice the pivot is "arr + 4", which holds the same value (4) before and after the call.
It seems to me that the algorithm operates independently on the sub-ranges [first, pivot) and [pivot + 1, last), for the first one moving elements satisfying the predicate to just before the pivot (while maintaining the relative order), and for the second one to just after the pivot.
Am I understanding that correctly?
Yes, you are. it "gathers" the elements satisfying the predicate together at the pivot point, preserving their (relative) order. If pred(*pivot) is true, then the range returned will contain pivot. If pred(*pivot) is false, then the range returned may or may not contain pivot. (if there are no elements in [pivot, end) that satisfy the predicate, then the range returned will have pivot as it's end) I used this in an email client I worked on several years ago. It had a great feature for "gathering" related messages together. If you option-clicked on a particular message's subject, it would collect all the messages with the same subject (and "Re:"), and bring them together where you clicked. Ditto if you option-clicked on a sender - all the messages from that person would be gathered together (and selected). Really handy feature. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On Mon, Jan 21, 2013 at 9:36 PM, Marshall Clow <mclow.lists@gmail.com>wrote:
gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the
template <typename ForwardIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, ForwardIterator
On Jan 21, 2013, at 8:29 PM, Nathan Ridge <zeratul976@hotmail.com> wrote: pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate. pivot, Pred pred );
Given an sequence containing: int [] arr = { 0 1 2 3 4 5 6 7 8 9 };
A call to gather ( arr, arr + 10, arr + 4, IsEven) will result in:
1 3 0 2 4 6 8 5 7 9 |---|-----| first | second pivot
where first and second are the fields of the pair returned by the call.
I don't understand why the matches are inserted between 3 and 5, rather than (say) 1 and 3. You need to specify exactly what the algorithm does (and does not) guarantee.
Notice the pivot is "arr + 4", which holds the same value (4) before and after the call.
It seems to me that the algorithm operates independently on the sub-ranges [first, pivot) and [pivot + 1, last), for the first one moving elements satisfying the predicate to just before the pivot (while maintaining the relative order), and for the second one to just after the pivot.
Am I understanding that correctly?
Yes, you are. it "gathers" the elements satisfying the predicate together at the pivot point, preserving their (relative) order.
So...isn't this basically a (stable_)partition call on each subrange? If pred(*pivot) is true, then the range returned will contain pivot.
If pred(*pivot) is false, then the range returned may or may not contain pivot. (if there are no elements in [pivot, end) that satisfy the predicate, then the range returned will have pivot as it's end)
Sounds like the returned range will never contain the pivot; in the parenthesized case, the returned range still doesn't contain the pivot, right? I used this in an email client I worked on several years ago. It had a
great feature for "gathering" related messages together. If you option-clicked on a particular message's subject, it would collect all the messages with the same subject (and "Re:"), and bring them together where you clicked. Ditto if you option-clicked on a sender - all the messages from that person would be gathered together (and selected).
Really handy feature.
Well, I guess up to you whether it necessitates a separate function, but I would think a couple (stable_)partition calls would be sufficient (I could be wrong about the gather == 2 x (stable_)partition equivalence, though). Or is there some optimization that gather provides over (stable_)partition? - Jeff

On Jan 21, 2013, at 11:56 PM, "Jeffrey Lee Hellrung, Jr." <jeffrey.hellrung@gmail.com> wrote:
On Mon, Jan 21, 2013 at 9:36 PM, Marshall Clow <mclow.lists@gmail.com>wrote:
gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the
template <typename ForwardIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, ForwardIterator
On Jan 21, 2013, at 8:29 PM, Nathan Ridge <zeratul976@hotmail.com> wrote: pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate. pivot, Pred pred );
Given an sequence containing: int [] arr = { 0 1 2 3 4 5 6 7 8 9 };
A call to gather ( arr, arr + 10, arr + 4, IsEven) will result in:
1 3 0 2 4 6 8 5 7 9 |---|-----| first | second pivot
where first and second are the fields of the pair returned by the call.
I don't understand why the matches are inserted between 3 and 5, rather than (say) 1 and 3. You need to specify exactly what the algorithm does (and does not) guarantee.
Notice the pivot is "arr + 4", which holds the same value (4) before and after the call.
It seems to me that the algorithm operates independently on the sub-ranges [first, pivot) and [pivot + 1, last), for the first one moving elements satisfying the predicate to just before the pivot (while maintaining the relative order), and for the second one to just after the pivot.
Am I understanding that correctly?
Yes, you are. it "gathers" the elements satisfying the predicate together at the pivot point, preserving their (relative) order.
So...isn't this basically a (stable_)partition call on each subrange?
It is - in fact, that's how it's implemented.
If pred(*pivot) is true, then the range returned will contain pivot.
If pred(*pivot) is false, then the range returned may or may not contain pivot. (if there are no elements in [pivot, end) that satisfy the predicate, then the range returned will have pivot as it's end)
Sounds like the returned range will never contain the pivot; in the parenthesized case, the returned range still doesn't contain the pivot, right?
No - the returned range will always contain the pivot (position), _except_ in the case where none of the elements in [pivot, end) satisfy the predicate.
I used this in an email client I worked on several years ago. It had a
great feature for "gathering" related messages together. If you option-clicked on a particular message's subject, it would collect all the messages with the same subject (and "Re:"), and bring them together where you clicked. Ditto if you option-clicked on a sender - all the messages from that person would be gathered together (and selected).
Really handy feature.
Well, I guess up to you whether it necessitates a separate function, but I would think a couple (stable_)partition calls would be sufficient (I could be wrong about the gather == 2 x (stable_)partition equivalence, though). Or is there some optimization that gather provides over (stable_)partition?
Nope. But I think it's sufficiently useful (and possibly non-obvious) to be given its' own name. [ If you watch Bjarne's keynote from GoingNative 2012, he relates an anecdote about how a whole bunch of code, involving loops and multiple calls to rotate, was replaced by a single call to gather (starting at about 52:50) ] -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

Hi Marshall, On Jan 21, 2013, at 5:49 PM, Marshall Clow wrote:
I have committed a new algorithm named "gather" into Boost.Algorithm.
Are scatter and assemble algorithms also forthcoming?
gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate.
This is a good start, but I'd like to see gather generalized. One change is to lift the requirement for the pivot being internal to the element sequence as in many computational engineering applications we don't want to destroy the source elements when we perform the gather, we want to gather (some parts of) the elements into another buffer. In this mode, pivot becomes less apt as you're no longer pivoting in the source sequence. We'd also want to allow a different iterator type for the output buffer so perhaps something like:
template <typename ForwardIterator, typename OutputIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, OutputIterator result, Pred pred );
Also, could we just return the range end iterator instead of the pair since the user already told you where the range started? Something like:
template <typename ForwardIterator, typename OutputIterator, typename Pred> OutputIterator gather(ForwardIterator first, ForwardIterator last, OutputIterator result, Pred pred);
I think this would still support your use case and might be a base we could build some engineering tools on, what do you think? -- Noel

On Wed, Jan 23, 2013 at 9:12 AM, Belcourt, Kenneth <kbelco@sandia.gov> wrote:
Hi Marshall,
On Jan 21, 2013, at 5:49 PM, Marshall Clow wrote:
I have committed a new algorithm named "gather" into Boost.Algorithm.
Are scatter and assemble algorithms also forthcoming?
gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate.
This is a good start, but I'd like to see gather generalized. One change is to lift the requirement for the pivot being internal to the element sequence as in many computational engineering applications we don't want to destroy the source elements when we perform the gather, we want to gather (some parts of) the elements into another buffer.
Isn't it the same as copy_if? Also, std::remove looks similar to gather when the gathered elements are put to the end of the sequence.

On Jan 22, 2013, at 10:27 PM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
Also, std::remove looks similar to gather when the gathered elements are put to the end of the sequence.
it does look similar but.. std::remove does not have the requirement to preserve all the entries in the sequence. std::remove can leave the "removed" entries in an unknown state (overwritten? moved from? copied?) Otherwise you could just write std::remove as std::stable_partition ;-) -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On Jan 22, 2013, at 9:12 PM, "Belcourt, Kenneth" <kbelco@sandia.gov> wrote:
Hi Marshall,
On Jan 21, 2013, at 5:49 PM, Marshall Clow wrote:
I have committed a new algorithm named "gather" into Boost.Algorithm.
Are scatter and assemble algorithms also forthcoming?
I'm always open to suggestions.
gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate.
This is a good start, but I'd like to see gather generalized. One change is to lift the requirement for the pivot being internal to the element sequence as in many computational engineering applications we don't want to destroy the source elements when we perform the gather, we want to gather (some parts of) the elements into another buffer. In this mode, pivot becomes less apt as you're no longer pivoting in the source sequence. We'd also want to allow a different iterator type for the output buffer so perhaps something like:
template <typename ForwardIterator, typename OutputIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, OutputIterator result, Pred pred );
As Andrey said, I think you've just described copy_if. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On Jan 21, 2013, at 4:49 PM, Marshall Clow <mclow.lists@gmail.com<mailto:mclow.lists@gmail.com>> wrote: I have committed a new algorithm named "gather" into Boost.Algorithm. Actually, it's one I wrote for Sean Parent as part of ASL about five years ago, and it's the first (of hopefully many) of the bits oF ASL to be moved into boost. Here's a summary: My edits: gather() takes a sequence of objects and moves the objects that satisfy a predicate to a position within the sequence. The algorithm is stable, meaning that the relative position of items satisfying the predicate, and those not satisfying the predicate are not changed. gather() returns the range of objects satisfying the predicate. Post-condition: The position is within the returned range. Sean ... [Rename pivot to position in diagram] gather() takes a collection of elements defined by a pair of iterators and moves the ones satisfying a predicate to them to a position (called the pivot) within the sequence. The algorithm is stable. The result is a pair of iterators that contains the items that satisfy the predicate. template <typename ForwardIterator, typename Pred> std::pair<ForwardIterator,ForwardIterator> gather ( ForwardIterator first, ForwardIterator last, ForwardIterator pivot, Pred pred ); Given an sequence containing: int [] arr = { 0 1 2 3 4 5 6 7 8 9 }; A call to gather ( arr, arr + 10, arr + 4, IsEven) will result in: 1 3 0 2 4 6 8 5 7 9 |---|-----| first | second pivot where first and second are the fields of the pair returned by the call. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki Sean Parent sparent@adobe.com<mailto:sparent@adobe.com> “Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.” - H. L. Mencken

On Jan 26, 2013, at 4:18 PM, Sean Parent <sparent@adobe.com> wrote:
gather() takes a sequence of objects and moves the objects that satisfy a predicate to a position within the sequence.
gather() takes a sequence of objects and moves the objects in that sequence satisfying a predicate to a specified position within that sequence.
The algorithm is stable, meaning that the relative position of items satisfying the predicate, and
"as well as" rather than "and" seems less likely to suggest the two sets are interleaved in any way.
those not satisfying the predicate are not changed. gather() returns the range of objects satisfying the predicate.
Post-condition: The position
The specified position
is within the returned range.
___ Rob

On Jan 26, 2013, at 2:53 PM, Rob Stewart <robertstewart@comcast.net> wrote:
Post-condition: The position
The specified position
is within the returned range.
Almost: If the input is int arr [] = { 1, 2, 3, 4, 5, 7, 9, 11, 13, 15 } and you call gather ( arr, arr + 10, arr + 4, IsEven ) then the output will be: { 1, 3, 2, 4, 5, 7, 9, 11, 13, 15 } and the returned pair will be { arr + 2, arr + 4 } If you add the precondition of pred(*pivot), then, yes, pivot will always be in the returned range. If you don't want to require pred(*pivot), then it might be that the returned range will have pivot as it's "end". (pair.second == pivot) in the case where none(pivot, last, pred) is true. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki
participants (8)
-
Andrey Semashev
-
Belcourt, Kenneth
-
Jeffrey Lee Hellrung, Jr.
-
Marshall Clow
-
Nathan Ridge
-
Rob Stewart
-
Sean Parent
-
Steven Watanabe