Proposed Library Boost.Algorithm

(Finally) I have uploaded to the vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=Boost.Algorithm-008.zip&directory=Algorithms&> a proposal for the Boost.Algorithm library. This has been a matter of interest and discussion for a long, long time. Some of this code was written during the "Library in a week" session at BoostCon 2008. More was written last summer after this year's BoostCon. There is no way that I can write and submit all the algorithms that people want/need/are interested in, so I am proposing to set up a system by which people can submit algorithms to be included in the library, and have them reviewed, much like how new libraries are reviewed for acceptance into boost. The difference will be that a new algorithm is generally a much smaller piece of code than a new library, so hopefully we can lower the barrier to getting small pieces of code into Boost. So, what's in this proposal? 1) Three search algorithms: Boyer-Moore, Boyer-Moore-Horspool and Knuth-Morris-Pratt. (search.hpp) 2) A set of predicates for determining if a sequence is ordered (by some criteria) (ordered.hpp) Originally written by Grant Erickson during the "Library in a week" session at Boostcon 2008, and reworked somewhat by myself. 3) A set of predicates for determining other properties of a sequence, based on some suggestions for C++0x. (all.hpp) 4) An algorithm for restricting a value between two endpoints (clamp.hpp) And, of course, test cases and documentation. -- Marshall

AMDG On 03/30/2011 03:03 PM, Marshall Clow wrote:
(Finally) I have uploaded to the vault<http://www.boostpro.com/vault/index.php?action=downloadfile&filename=Boost.Algorithm-008.zip&directory=Algorithms&> a proposal for the Boost.Algorithm library.
This has been a matter of interest and discussion for a long, long time. Some of this code was written during the "Library in a week" session at BoostCon 2008. More was written last summer after this year's BoostCon.
There is no way that I can write and submit all the algorithms that people want/need/are interested in, so I am proposing to set up a system by which people can submit algorithms to be included in the library, and have them reviewed, much like how new libraries are reviewed for acceptance into boost. The difference will be that a new algorithm is generally a much smaller piece of code than a new library, so hopefully we can lower the barrier to getting small pieces of code into Boost.
So, what's in this proposal?
1) Three search algorithms: Boyer-Moore, Boyer-Moore-Horspool and Knuth-Morris-Pratt. (search.hpp)
2) A set of predicates for determining if a sequence is ordered (by some criteria) (ordered.hpp) Originally written by Grant Erickson during the "Library in a week" session at Boostcon 2008, and reworked somewhat by myself.
The range library has some extra magic to control the return type of find etc. Maybe it should be used here too?
3) A set of predicates for determining other properties of a sequence, based on some suggestions for C++0x. (all.hpp)
The range overload of one_of_if should take a const R& argument. In Christ, Steven Watanabe

On Mar 30, 2011, at 3:55 PM, Steven Watanabe wrote:
2) A set of predicates for determining if a sequence is ordered (by some criteria) (ordered.hpp) Originally written by Grant Erickson during the "Library in a week" session at Boostcon 2008, and reworked somewhat by myself.
The range library has some extra magic to control the return type of find etc. Maybe it should be used here too?
I'll take a look. I went back and forth on these routines - should they return an iterator, or just a boolean? A boolean would tell you if the whole sequence is sorted, an iterator tells you where the first out of order item is; == end -> no items are out of order. I compromised - I wrote the basic routine to return an iterator, and the "wrapper" routines return bool.
3) A set of predicates for determining other properties of a sequence, based on some suggestions for C++0x. (all.hpp)
The range overload of one_of_if should take a const R& argument.
D'oh! I wrote the Doxygen comment correctly, but not the code! Thanks! -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

AMDG On 03/30/2011 03:03 PM, Marshall Clow wrote:
And, of course, test cases and documentation.
A few comments on the tests: - We usually use run instead of unit-test, since it interacts better with the regression tests. - all_test is missing tests of the _if variants. - in clamp_test, the tests with a predicate would pass even if the predicate were ignored. - copy_test doesn't need to be there. In Christ, Steven Watanabe

On Thu, Mar 31, 2011 at 6:03 AM, Marshall Clow <mclow.lists@gmail.com> wrote:
(Finally) I have uploaded to the vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=Boost.Algorithm-008.zip&directory=Algorithms&> a proposal for the Boost.Algorithm library.
Coolness! Thanks Marshall! Is there some way you can get this into either a git repo or into the sandbox? Looking forward to learning more about and using this algorithms library! :D -- Dean Michael Berris http://about.me/deanberris

On Mar 30, 2011, at 7:00 PM, Dean Michael Berris wrote:
On Thu, Mar 31, 2011 at 6:03 AM, Marshall Clow <mclow.lists@gmail.com> wrote:
(Finally) I have uploaded to the vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=Boost.Algorithm-008.zip&directory=Algorithms&> a proposal for the Boost.Algorithm library.
Coolness!
Thanks Marshall! Is there some way you can get this into either a git repo or into the sandbox?
It's already in a git repo - on my machine ;-) Seriously, though - what is the advantage (to you, to me) in having development versions in a public repo? I can understand it if there are multiple contributors- but there are not (at this point). -- Marshall

(Finally) I have uploaded to the vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=Boost.Algorithm-008.zip&directory=Algorithms&> a proposal for the Boost.Algorithm library.
Coolness!
Thanks Marshall! Is there some way you can get this into either a git repo or into the sandbox?
It's already in a git repo - on my machine ;-)
Seriously, though - what is the advantage (to you, to me) in having development versions in a public repo? I can understand it if there are multiple contributors- but there are not (at this point).
-- Marshall
I too am interested in the git repo. There is a utility function that I am considering contributing, and I think that it would be easier for me to use GitHub's pull request system to send in a patch, or at least to use git's format-patch feature. Daniel Trebbien

(re-sending to the list) On Apr 9, 2011, at 10:04 AM, Daniel Trebbien wrote:
(Finally) I have uploaded to the vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=Boost.Algorithm-008.zip&directory=Algorithms&> a proposal for the Boost.Algorithm library.
Coolness!
Thanks Marshall! Is there some way you can get this into either a git repo or into the sandbox?
It's already in a git repo - on my machine ;-)
Seriously, though - what is the advantage (to you, to me) in having development versions in a public repo? I can understand it if there are multiple contributors- but there are not (at this point).
-- Marshall
I too am interested in the git repo. There is a utility function that I am considering contributing, and I think that it would be easier for me to use GitHub's pull request system to send in a patch, or at least to use git's format-patch feature.
Daniel -- That is exactly the direction that I want to take Boost.Algorithm - once it is accepted. Let people propose/contribute new algorithms, have them reviewed (hopefully a much easier task than a full-blown library review), and then add them to the library. However, first the library needs to be accepted into boost - and that won't ever happen as long as it is being added to. So, yes - I welcome your contribution; just not yet. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

Hi Marshall, 2011/3/31 Marshall Clow <mclow.lists@gmail.com>
(Finally) I have uploaded to the vault < http://www.boostpro.com/vault/index.php?action=downloadfile&filename=Boost.Algorithm-008.zip&directory=Algorithms&> a proposal for the Boost.Algorithm library.
Thanks for working on this! I've taken a glimpse at your code, and I'd like to suggest to specify the requirement of the iterator explicitly, for example, RandomAccessIterator rather than 'patIter', and be sure to document these requirements clearly (I only saw them in comments). [...] 1) Three search algorithms: Boyer-Moore, Boyer-Moore-Horspool and
Knuth-Morris-Pratt. (search.hpp)
Recently I just implemented the KMP algorithm, and in my implementation, it requires the corpus iterator to be InputIterator for search to return a bool, and ForwardIterator to return the iterator, maybe you also want to lower down the requirement for RandomAccessIterator here? To be more flexible in searching, it could also give a way to search all or some of the results up to a threshold, not just the first one. The interface may like what Regex provides (I'm not certainly sure), or it should let users provide a visitor for the resultant corpus iterators. What do you think?

On Mar 31, 2011, at 10:08 AM, TONGARI wrote:
2011/3/31 Marshall Clow <mclow.lists@gmail.com>
(Finally) I have uploaded to the vault < http://www.boostpro.com/vault/index.php?action=downloadfile&filename=Boost.Algorithm-008.zip&directory=Algorithms&> a proposal for the Boost.Algorithm library.
Thanks for working on this!
I've taken a glimpse at your code, and I'd like to suggest to specify the requirement of the iterator explicitly, for example, RandomAccessIterator rather than 'patIter', and be sure to document these requirements clearly (I only saw them in comments).
I'll look into it. Part of the problem here is that I don't want to lose the feature that the "pattern iterator" and the "corpus iterator" can be different types (as long as they point to the same underlying data type)
1) Three search algorithms: Boyer-Moore, Boyer-Moore-Horspool and
Knuth-Morris-Pratt. (search.hpp)
Recently I just implemented the KMP algorithm, and in my implementation, it requires the corpus iterator to be InputIterator for search to return a bool, and ForwardIterator to return the iterator, maybe you also want to lower down the requirement for RandomAccessIterator here?
Since I want to return the match, I think that InputIterators are out, but I'll look at ForwardIterator.
To be more flexible in searching, it could also give a way to search all or some of the results up to a threshold, not just the first one. The interface may like what Regex provides (I'm not certainly sure), or it should let users provide a visitor for the resultant corpus iterators.
Since the search algorithms return an iterator to the start of the "found pattern", you can find all the matches in a loop: // uncompiled code warning! xxx search_obj ( pat_start, pat_end ); // choose your search object corpus_iterator it = corpus_start; do { if (( it = search_obj ( it, corpus_end )) == corpus_end ) break; // do something with the match it++; } while (true); -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

2011/4/1 Marshall Clow <mclow.lists@gmail.com>
On Mar 31, 2011, at 10:08 AM, TONGARI wrote:
Recently I just implemented the KMP algorithm, and in my implementation, it requires the corpus iterator to be InputIterator for search to return a bool, and ForwardIterator to return the iterator, maybe you also want to lower down the requirement for RandomAccessIterator here?
Since I want to return the match, I think that InputIterators are out, but I'll look at ForwardIterator.
Alas! I must took it wrong with the ForwardIterator one, since I only implemented the InputIterator one returning a bool. Sorry for that and it was not an April fool joke :/

On 31/03/2011 00:03, Marshall Clow wrote:
So, what's in this proposal?
1) Three search algorithms: Boyer-Moore, Boyer-Moore-Horspool and Knuth-Morris-Pratt. (search.hpp)
Clearly the most interesting bit of the library IMO. However, it seems it only allows searching for the substring from the beginning to the end, a shame ; it should allow the other way around too. Bad dealing of reverse string searching in Boost.StringAlgo is the reason why I personally had to rewrite big chunks of that library for my uses. The "bummer" bits are weird. Partial specialization of function templates is not possible, but SFINAE comes pretty close to being equivalent. But looking at the code, I don't think how any would help.
3) A set of predicates for determining other properties of a sequence, based on some suggestions for C++0x. (all.hpp)
It could be argued the _if variants are useless in presence of the filter iterator/range adaptors.

On Apr 9, 2011, at 1:40 PM, Mathias Gaunard wrote:
On 31/03/2011 00:03, Marshall Clow wrote:
So, what's in this proposal?
1) Three search algorithms: Boyer-Moore, Boyer-Moore-Horspool and Knuth-Morris-Pratt. (search.hpp)
Clearly the most interesting bit of the library IMO.
Thanks; I think so, too.
However, it seems it only allows searching for the substring from the beginning to the end, a shame ; it should allow the other way around too. Bad dealing of reverse string searching in Boost.StringAlgo is the reason why I personally had to rewrite big chunks of that library for my uses.
I'm not sure what you mean here. I'm assuming that a reverse_iterator adaptor doesn't do what you want.
3) A set of predicates for determining other properties of a sequence, based on some suggestions for C++0x. (all.hpp)
It could be argued the _if variants are useless in presence of the filter iterator/range adaptors.
I think that "useless" is too strong a term. You might be able to convince me that they are "redundant", though. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On 09/04/2011 23:08, Marshall Clow wrote:
However, it seems it only allows searching for the substring from the beginning to the end, a shame ; it should allow the other way around too. Bad dealing of reverse string searching in Boost.StringAlgo is the reason why I personally had to rewrite big chunks of that library for my uses.
I'm not sure what you mean here. I'm assuming that a reverse_iterator adaptor doesn't do what you want.
I want to search for "foo" in "foo bar foo bar foo bar foo bar" starting from the end, my first match would be ^ and not ^ But I guess all the smart algorithms would need to redo preprocessing anyway, so reversing both strings seems like the way to go, even if that might have performance implications.
participants (6)
-
Daniel Trebbien
-
Dean Michael Berris
-
Marshall Clow
-
Mathias Gaunard
-
Steven Watanabe
-
TONGARI