[Review:Algorithms] Boost.Algorithms Review

Boost.Algorithms should be accepted into Boost. Many issues have been raised during the review and during the pre-review using Code Collaborator. I have confidence that Marshall will carefully weigh all of the ideas and concerns that have been raised and apply those that are sensible and appropriate. __________ Design I have a problem with the physical design. all.hpp is not the file I'd expect to find any_of or none_of. I'd prefer to see them moved to any.hpp and none.hpp. I realize that none_of must test all elements in a range in the case when no element matches the predicate, but that doesn't make me think of all.hpp as the place to find it. I think "is_ordered_until" is a better name than "is_ordered" for the reasons previously elucidated. As to your dislike of "is_ordered_until", just read it as "the range is ordered until the element referenced by the return value". As there may well be other search algorithms than those you've provided thus far, I think each should be in a separate header and documented separately. Otherwise, search.hpp may grow to be ridiculously large. __________ Implementation I find the naming of your formal parameters to be inconsistent: first, last, p, lo, hi, val, and range. A more consistent set of names would be first, last, predicate, low, high, value, and range. I'd even accept "pred" in that list. Likewise, the template parameter names are inconsistent: InputIterator, Predicate, R, V, Pred, FI, patIter, and corpusIter. A more consistent set of names would be InputIterator, Predicate, Range, Value, ForwardIterator, PatternIterator, CorpusIterator. If you'd prefer to abbreviate "Iterator" with "It" or "Iter", then I'd abbreviate the categories, too: InIt, FwdIt, RandomIt, etc. You presume that all C++ implementations with a specific value of __cplusplus provide std::all_of, std::any_of, and std::none_of. Is that portable? Indentation is inconsistent. The use of mpl::identity for clamp's arguments should be documented for future maintainers. is_ordered() uses a one-line if/return and a two-line for no apparent reason. That inconsistency increases the chances of a maintainer failing to notice one or the other. search.hpp uses BOOST_ALGO_DEBUG. "ALGO" should be spelled out. Many of the functions controlled by that manifest constant use camelcase and should be all lower case. I was expecting more algorithm documentation in search.hpp. The code is not overly complex, but there should at least be pointers to literature on the implementation of each algorithm. However, as such literature may not always be accessible, copying pertinent information to the header would be helpful to future maintainers. __________ Documentation Specific notes and corrections are listed at the end of this message. I'd like to see the complexity, behavioral, and iterator information be documented for each algorithm separately rather than for a group, such as you've done in paragraphs 3 and 4 of all.html. Note that the complexity is described below the clamp algorithms and above the all_of, any_of, and none_of algorithms. This should be consistent. The algorithms are all function templates, so "the function <algorithm-name>" is incorrect. Thus, instead of "the function all_of", write "all_of". Look for other uses of "function" and "functions" for similar terminology issues. While is_increasing, is_decreasing, is_strictly_increasing, and is_strictly_decreasing are implemented using is_ordered, that is by no means a requirement. Consequently, I'd prefer that each was documented separately. Complexities written as should be written in all upper case: O(N), O(M x N), etc. The treatment of the two internal tables used by the search algorithms is inconsistent ("first", "second", "skip", etc.). I'm not even sure they should be mentioned as you do. I realize that you want to document memory usage and customization options, but perhaps the better approach is to clearly document the traits class and its usage as well as the memory usage in terms of the pattern and corpus character sets. For example, K-M-P allocates N bytes per element in the pattern. The Reference section for each algorithm should include "#include <boost/algorithm/foo.hpp>", to permit copying and pasting, plus complexity and template parameter requirements. Template parameters should be documented just as function parameters. Thus, for example, in the reference page for knuth_morris_pratt_search(), there should be a section that describes the two iterator types including their iterator category as well as their relationship (which elsewhere you mention requiring that their value_types be identical). The boyer_moore_search() and boyer_moore_horspool_search() algorithms do not have reference pages. There are no examples in the text. There should be an example for each algorithm. The search algorithms' traits classes are not documented. __________ Usefulness This library, in its current state has certain utility. As a place to incorporate future algorithms that don't deserve a library of their own, it has enormous utility. __________ Usage I did not try to use the library due to time constraints. __________ Effort I spent many hours studying and commenting on the code during the pre-review and now, reading the documentation, etc. __________ Domain Knowledge I have no experience designing string search algorithms. I have used many algorithms. I have designed a good deal of generic code. Thus, I have broad domain knowledge, but nothing extraordinary. __________ Documentation Issues _____ Description and Rationale 2nd para, 3rd sent: missing "and" from "tested, reviewed, documented" _____ Future plans Written with this review in mind, this section is fine, but for an accepted library, it isn't right. I suggest something like this: "This library will be extended by algorithms submitted or suggested by Boost developers and users. The Adobe Source Library (http://stlab.adobe.com/), for example, contains many useful algorithms with documentation and test cases that might be culled for inclusion in this library. Knuth's _The Art of Computer Programming_ is full of algorithm descriptions, too. Various algorithms have been created and documented in magazines, online forums, etc., or even just for personal use. All of these sources, and more, can provide fodder for algorithms to add to this library. "Adding new algorithms will require some form of review, but will not be subject to the usual Boost review process. The Boost.Algorithm maintainer would have the ability to add algorithms without review, but this will not be the case for non-trivial algorithms. Non-trivial algorithms, and possibly even trivial algorithms, will be subjected to a mini-review. This will be a call for Boost developers to study the proposed algorithms with the goal to get more minds to consider their design and utility to ensure their broad utility." _____ Header: 'all.hpp' The heading can just be "all.hpp". 1st para, 1st sent: Strike "The header file". 1st para, 2nd sent: any_of does not verify that every element has a particular property. 2nd para: All three of these algorithms is in C++11. Try this wording: "The iterator-based versions are part of the standard library in C++11 (§25.2 Non-modifying sequence operations). Boost.Algorithms will use the standard library versions of these algorithms when available." 4th para: The algorithms will not work on output iterators. Therefore, you should indicate that they require input iterators. _____ Header: 'clamp.hpp' The heading can just be "clamp.hpp". 1st para, 1st sent: Strike "The header file". The description is insufficient, which you seem to understand since you quoted "clamp". I suggest this instead: "clamp.hpp contains function templates to constrain a value to be within the inclusive range formed from supplied boundary values." 2nd para, bullets: s/iff/if. "iff" is overly proscriptive on the implementation. That is, lo could be returned if v == lo without affecting the result. 3rd para: "also" this sentence refers the previous paragraph, which uses < in the bullets, but this is a bit awkard. It would be better to describe each version of clamp separately. The second would have the same bulleted list, but would use p(v, lo) and p(hi, v). _____ Header: 'ordered.hpp' The heading can just be "ordered.hpp". 1st para, 1st sent: Strike "The header file". Strike the parentheses. 2nd para, 4th sent: Surely is_ordered returns the end iterator for sequences of less than two elements. 3rd para: s/are will not/will not/ 5th para, 1st sent: I'd prefer, "There are also a number of function templates that use is_ordered to determine whether an entire sequence satisfies a particular ordering." _____ Header: 'search.hpp' The heading can just be "search.hpp". ___ Overview 1st para, 1st sent: Strike "The header file". 3rd para: merge with the 2nd para 4th para: Change from first person and quoting to "For each of these search algorithms, the sequence being searched is referred to as the <i>pattern,</i> and the sequence being searched as the <i>corpus.</i>" 5th para, 1st sent: Change to "These search algorithms generate tables to make searches faster." 5th para, 3rd sent: s/To help with this/To provide control over the overhead/ 5th para, 3rd sent: s/;/:/ 5th para, 4th sent: s/the search/the search, thereby using the same generated table(s) for multiple searches./ 7th para: s/and here are/and here is/ 8th para, last sent: Must I1::value_type be the same as I2::value_type or must they merely be comparable? ___ Boyer-Moore 1st para: The wording at the end is confusing. I think you mean, "benchmark used in the literature on practical string searching." 2nd para: merge with the 1st para s/target string (pattern) that is being search for/pattern/ s/algorithm, while still...can have/algorithm has O(N) complexity, where N is the corpus length, but can have/ s/algorithms; it doesn't/algorithms because it doesn't/ s/gets faster as the key being search for becomes longer/is faster for longer patterns/ Note: s/skip table (the second one) as stored as an/skip table (the second one); it is stored as a/ s/are (say) UTF-16 text; in that case, the/are UTF-16 text, for example, since the/ s/The B-M and B-M-H objects take a 'traits' template parameter which lets/The traits template parameter of the B-M and B-M-H class templates allow/ s/paramater/parameter/ ___ Boyer-Moore-Horspool 1st para: s/Boyer-Moore, and has/Boyer-Moore, but has/ 2nd para: s/algorithm an internal/algorithm uses an internal/ You should mention that the traits template parameter allows using a different container to store that data so the memory usage can change. Unless this reference is to the "first" table. ___ Knuth-Morris-Pratt 2nd para: s/looking for the match - instead of just at the next entry/looking next for a match rather than just advancing to the next element/ _____ Reference I don't think the all.hpp content should be in reference.html. It should be found by a link like the other three headers. The declarations for all.hpp are compressed too much. Some vertical whitespace would be helpful. Why is "Marshall Clow" in the "Header <boost/algorithm/all.hpp>" section? It also appears in the clamp.hpp reference page. _____ Function template clamp (both) Returns: Could be improved: "lo if p(val, lo) returns true, hi if p(hi, val) returns true, and val otherwise." _____ Header <boost/algorithm/ordered.hpp> The declarations are compressed too much. Some vertical whitespace would be helpful. _____ Function template is_ordered (iterators) Returns: Could be improved: "the iterator it, in the range [first, last), for which p(*it) returns false, or last if there is no such iterator." _____ Function template is_ordered (range) Returns: Could be improved: "the iterator it, in range, for which p(*it) returns false, or boost::end(range) if there is no such iterator." _____ Function template is_increasing (iterators) Returns: "true if each element in [first, last) is greater than or equal to the previous element or [first, last) has fewer than two elements." _____ Function template is_increasing (range) Returns: "true if each element in range is greater than or equal to the previous element or range has fewer than two elements." _____ Function template is_decreasing (iterators) Returns: "true if each element in [first, last) is less than or equal to the previous element or [first, last) has fewer than two elements." _____ Function template is_decreasing (range) Returns: "true if each element in range is less than or equal to the previous element or range has fewer than two elements." _____ Function template is_strictly_increasing (iterators) Returns: "true if each element in [first, last) is greater than the previous element or [first, last) has fewer than two elements." _____ Function template is_strictly_increasing (range) Returns: "true if each element in range is greater than the previous element or range has fewer than two elements." _____ Function template is_strictly_decreasing (iterators) Returns: "true if each element in [first, last) is less than the previous element or [first, last) has fewer than two elements." _____ Function template is_strictly_decreasing (range) Returns: "true if each element in range is less than the previous element or range has fewer than two elements." _____ Header <boost/algorithm/search.hpp> The declarations are compressed too much, though the horizontal whitespace does help in the latter half. Some vertical whitespace would be helpful. _____ Class template boyer_moore The constructor deserves information on memory allocated, what exceptions might be emitted, etc. The function call operator should include information on its behavior, complexity, etc. The private member functions should not be listed. _____ Class template boyer_moore_horspool Ditto _____ Class template knuth_morris_pratt Ditto _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com ________________________________ IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On Oct 1, 2011, at 4:58 PM, Stewart, Robert wrote:
Boost.Algorithms should be accepted into Boost.
Thanks!
Many issues have been raised during the review and during the pre-review using Code Collaborator. I have confidence that Marshall will carefully weigh all of the ideas and concerns that have been raised and apply those that are sensible and appropriate.
__________ Design
I have a problem with the physical design. all.hpp is not the file I'd expect to find any_of or none_of. I'd prefer to see them moved to any.hpp and none.hpp. I realize that none_of must test all elements in a range in the case when no element matches the predicate, but that doesn't make me think of all.hpp as the place to find it.
I think "is_ordered_until" is a better name than "is_ordered" for the reasons previously elucidated. As to your dislike of "is_ordered_until", just read it as "the range is ordered until the element referenced by the return value".
As there may well be other search algorithms than those you've provided thus far, I think each should be in a separate header and documented separately. Otherwise, search.hpp may grow to be ridiculously large.
I've gone back and forth with this idea. I think that's a good idea for the searching routines, but not for any/all/none - but it's a matter of taste; because the any/of/all routines are so small. I'd like to avoid the general case where each algorithm has it's own header file, because that adds to programmer annoyance (and compile time). That being said, I don't think that all the algorithms in the library should go into the same header file, either.
Implementation
I find the naming of your formal parameters to be inconsistent: first, last, p, lo, hi, val, and range. A more consistent set of names would be first, last, predicate, low, high, value, and range. I'd even accept "pred" in that list.
Likewise, the template parameter names are inconsistent: InputIterator, Predicate, R, V, Pred, FI, patIter, and corpusIter. A more consistent set of names would be InputIterator, Predicate, Range, Value, ForwardIterator, PatternIterator, CorpusIterator. If you'd prefer to abbreviate "Iterator" with "It" or "Iter", then I'd abbreviate the categories, too: InIt, FwdIt, RandomIt, etc.
You presume that all C++ implementations with a specific value of __cplusplus provide std::all_of, std::any_of, and std::none_of. Is that portable?
I got that from n3290, which says in section 16.8 (predefined macro names): _ _ cplusplus The name _ _ cplusplus is defined to the value 201103L when compiling a C++ translation unit. [157] [157] It is intended that future versions of this standard will replace the value of this macro with a greater value. Non-conforming compilers should use a value with at most five decimal digits. On the other hand, if Boost.Config has a specific "compiling with C++11" flag, using that would be better. I couldn't find one, though. The flag "BOOST_PP_VARIADICS" triggers on that, and also BOOST_WAVE_SUPPORT_CPP0X depends on it.
Indentation is inconsistent.
OK.
The use of mpl::identity for clamp's arguments should be documented for future maintainers.
OK.
is_ordered() uses a one-line if/return and a two-line for no apparent reason. That inconsistency increases the chances of a maintainer failing to notice one or the other.
Ok.
search.hpp uses BOOST_ALGO_DEBUG. "ALGO" should be spelled out. Many of the functions controlled by that manifest constant use camelcase and should be all lower case.
Ok.
I was expecting more algorithm documentation in search.hpp. The code is not overly complex, but there should at least be pointers to literature on the implementation of each algorithm. However, as such literature may not always be accessible, copying pertinent information to the header would be helpful to future maintainers.
There are some links there. I have no problems with adding more - until it starts obscuring the code ;-) For example in search.hpp, line 147: /* A templated version of the boyer-moore searching algorithm. References: http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/ http://www.cs.utexas.edu/~moore/publications/fstrpos.pdf Explanations: boostinspect:noascii (test tool complains) http://en.wikipedia.org/wiki/Boyer–Moore_string_search_algorithm http://www.movsd.com/bm.htm http://www.cs.ucdavis.edu/~gusfield/cs224f09/bnotes.pdf and so on
Documentation
Specific notes and corrections are listed at the end of this message.
I'd like to see the complexity, behavioral, and iterator information be documented for each algorithm separately rather than for a group, such as you've done in paragraphs 3 and 4 of all.html. Note that the complexity is described below the clamp algorithms and above the all_of, any_of, and none_of algorithms. This should be consistent.
The algorithms are all function templates, so "the function <algorithm-name>" is incorrect. Thus, instead of "the function all_of", write "all_of". Look for other uses of "function" and "functions" for similar terminology issues.
While is_increasing, is_decreasing, is_strictly_increasing, and is_strictly_decreasing are implemented using is_ordered, that is by no means a requirement. Consequently, I'd prefer that each was documented separately.
Complexities written as should be written in all upper case: O(N), O(M x N), etc.
The treatment of the two internal tables used by the search algorithms is inconsistent ("first", "second", "skip", etc.). I'm not even sure they should be mentioned as you do. I realize that you want to document memory usage and customization options, but perhaps the better approach is to clearly document the traits class and its usage as well as the memory usage in terms of the pattern and corpus character sets. For example, K-M-P allocates N bytes per element in the pattern.
The Reference section for each algorithm should include "#include <boost/algorithm/foo.hpp>", to permit copying and pasting, plus complexity and template parameter requirements.
Template parameters should be documented just as function parameters. Thus, for example, in the reference page for knuth_morris_pratt_search(), there should be a section that describes the two iterator types including their iterator category as well as their relationship (which elsewhere you mention requiring that their value_types be identical).
The boyer_moore_search() and boyer_moore_horspool_search() algorithms do not have reference pages.
There are no examples in the text. There should be an example for each algorithm.
The search algorithms' traits classes are not documented.
__________ Usefulness
This library, in its current state has certain utility. As a place to incorporate future algorithms that don't deserve a library of their own, it has enormous utility.
__________ Usage
I did not try to use the library due to time constraints.
__________ Effort
I spent many hours studying and commenting on the code during the pre-review and now, reading the documentation, etc.
Yes, and I appreciate it.
Documentation Issues
[ A long list snipped ] Thank you for your attention to detail, Robert! -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

on Sat Oct 01 2011, Marshall Clow <mclow.lists-AT-gmail.com> wrote:
I'd like to avoid the general case where each algorithm has it's own header file, because that adds to programmer annoyance (and compile time). That being said, I don't think that all the algorithms in the library should go into the same header file, either.
Once you start getting into a fair number of components, it's a lot less annoying for me to have one component per file because then I never have to wonder about what needs to be included. FWIWly y'rs, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Marshall Clow wrote:
On Oct 1, 2011, at 4:58 PM, Stewart, Robert wrote:
You presume that all C++ implementations with a specific value of __cplusplus provide std::all_of, std::any_of, and std::none_of. Is that portable?
I got that from n3290, which says in section 16.8 (predefined macro names):
__cplusplus The name __cplusplus is defined to the value 201103L when compiling a C++ translation unit. [157] [snip] On the other hand, if Boost.Config has a specific "compiling with C++11" flag, using that would be better. I couldn't find one, though.
A Boost.Config macro that indicates when those algorithms are supplied would be ideal. I wonder, for example, whether compilers might define __cplusplus to that value without complete language support.
I was expecting more algorithm documentation in search.hpp. [snip] There are some links there. I have no problems with adding more - until it starts obscuring the code ;-)
For example in search.hpp, line 147: /* A templated version of the boyer-moore searching algorithm.
References: http://www.cs.utexas.edu/users/moore/best-ideas/string- searching/ http://www.cs.utexas.edu/~moore/publications/fstrpos.pdf
Explanations: boostinspect:noascii (test tool complains) http://en.wikipedia.org/wiki/Boyer- Moore_string_search_algorithm http://www.movsd.com/bm.htm http://www.cs.ucdavis.edu/~gusfield/cs224f09/bnotes.pdf
and so on
I missed that. I was further into the file before I noticed the apparent lack. Things do decay on the Internet, however, so it would be useful to capture titles for each of the URLs to make finding alternative locations easier, in the future, should that become necessary. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com ________________________________ IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Marshall Clow wrote:
On Oct 1, 2011, at 4:58 PM, Stewart, Robert wrote:
I have a problem with the physical design. all.hpp is not the file I'd expect to find any_of or none_of. I'd prefer to see them moved to any.hpp and none.hpp. I realize that none_of must test all elements in a range in the case when no element matches the predicate, but that doesn't make me think of all.hpp as the place to find it.
I've gone back and forth with this idea. I think that's a good idea for the searching routines, but not for any/all/none - but it's a matter of taste; because the any/of/all routines are so small.
I'd like to avoid the general case where each algorithm has it's own header file, because that adds to programmer annoyance (and compile time). That being said, I don't think that all the algorithms in the library should go into the same header file, either.
A generalized name for the group would work. Choosing one of the three, however, isn't helpful. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com ________________________________ IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

on Sat Oct 01 2011, "Stewart, Robert" <Robert.Stewart-AT-sig.com> wrote:
all.hpp is not the file I'd expect to find any_of or none_of. I'd prefer to see them moved to any.hpp and none.hpp.
And I'd strongly prefer any_of.hpp and none_of.hpp :-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Dave Abrahams wrote:
on Sat Oct 01 2011, "Stewart, Robert" <Robert.Stewart-AT-sig.com> wrote:
all.hpp is not the file I'd expect to find any_of or none_of. I'd prefer to see them moved to any.hpp and none.hpp.
And I'd strongly prefer any_of.hpp and none_of.hpp :-)
+1. At first I was thinking that all.hpp included all the library headers at once... --Lorenzo -- View this message in context: http://boost.2283326.n4.nabble.com/Review-Algorithms-Boost-Algorithms-Review... Sent from the Boost - Dev mailing list archive at Nabble.com.

Dave Abrahams wrote:
on Sat Oct 01 2011, "Stewart, Robert" <Robert.Stewart-AT- sig.com> wrote:
all.hpp is not the file I'd expect to find any_of or none_of. I'd prefer to see them moved to any.hpp and none.hpp.
And I'd strongly prefer any_of.hpp and none_of.hpp :-)
If all the algorithms that would be in one header are named with the "any_of" prefix, I agree that it should be any_of.hpp. I wasn't certain that would be the case, so I was being more general in my suggestion. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com ________________________________ IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
participants (4)
-
Dave Abrahams
-
lcaminiti
-
Marshall Clow
-
Stewart, Robert