Re: [Boost-users] Interest in a Unicode library for Boost?

31 Oct 2019

      On 30/10/2019 18:03, Zach Laine wrote:
...
Returning end of range on failure is incredibly inconvenient (for the
    consumer; granted it's usually more convenient for the algorithm
    implementer), and I'd be happier if STL algorithms didn't do that
    either.
I see that as an unfortunate consequence of using generic iterators as
    input parameters and return types, and not an otherwise desirable
    design choice.
(ie. the STL algorithms do it because they couldn't do anything better.
    string doesn't do it because it can do something better [since it knows
    the iterator type and class, and can consequently choose to return
    something other than an iterator].)
I heartily disagree, but I'm also very curious about this.  As an 
example, could you take one of the simple std algorithms (std::find 
would be a very simple candidate), and show its definition in the style 
you have in mind?
Ok: wall of text incoming.

There are many options (especially when you have Outcome and/or ranges), 
but a very straightforward update using only existing STL concepts would 
be to take this:

     template<typename InputIt, typename T>
     InputIt find(InputIt first, InputIt last, T const& value);

And change its return type:

     template<typename InputIt, typename T>
     optional<InputIt> my_find(InputIt first, InputIt last, T const& value);

It now either returns an iterator in the range [first, last) or it 
returns nullopt.  It will never return last.

This simple change means that code following this can easily detect 
search failure without having to refer back to the actual input 
parameters (ie. having to know the value of last).

This can be useful when searching a subset of the container (you don't 
need to save a separate copy of "list.begin() + 5" just so that you can 
use it as both the last parameter and for failure checks -- or worse, 
write it twice and risk bugs if someone edits one but not the other).

It also can be useful when the container as a whole is a temporary 
rvalue -- which is *rarely* useful for std::find (since after all a 
successful iterator return is almost useless if the container itself is 
about to be destroyed), but not completely so.  Sometimes you only want 
to detect existence without needing to actually consume the iterator. 
Sometimes you can use the successful return iterator within the same 
whole-expression, so the container's lifetime hasn't ended yet.

As for "last" itself, while having to pass both begin and end iterators 
*usually* precludes using rvalue containers, that's also not always 
true.  Some containers/iterators will accept a default-constructed 
iterator to mean "the end of the sequence" (especially when underlying 
storage is a linked-list, but other containers/iterators use this model 
too).  This means that "first" can be passed "method().begin()" or some 
other rvalue and "last" can be passed "iterator()".  And in range-based 
algorithms, there's only a single parameter to worry about anyway, so 
it's even more likely.

As a demonstration, let's imagine a simple span-based contains check (or 
replace span with your better range type of choice) -- it doesn't care 
about the iterator other than to assure that there was a successful return:

     template<typename T>
     bool contains(std::span<T> range, T const& value)
     {
         return my_find(range.begin(), range.end(), value).has_value();
         // or you can just cast to bool if you prefer
         // alternatively, if my_find itself took a range:
         return my_find(range, value).has_value();
     }

This seems more readable (and less error prone) than an explicit "== 
range.end()" check.  And, given the range-based version of my_find, you 
could even have code that does this:

     if (my_find(method(), 42)) { /* method included 42 */ }

Or this:

    return resolve(my_find(method(), 42), 0);

Where resolve(x, y) returns "x ? **x : y" -- somewhat like value_or, but 
including the iterator dereference.

(That makes more sense with a map find, or predicate find, or where the 
element type is a class that only checks key equality (so the above 
would return a fully populated object if it exists or a default object 
if not).  Obviously it's a bit silly with plain ints, but you get the idea.)

Another extension might be to define my_find_value, which returns 
optional<T> rather than optional<InputIt>.  That doesn't allow mutating 
the found value in the collection (so it can't replace my_find) but it's 
still useful in read-only scenarios (and rvalue containers will almost 
always be read-only scenarios), so that the above no longer needs 
resolve and would become:

     return my_find_value(method(), 42).value_or(0);

Of course, there are performance consequences of using rvalue 
containers, since you're copying and throwing away data.  But sometimes 
it makes sense, for known-cheap containers or where the method wants to 
do a lock-and-return-copy or dequeue or similar, but you're only 
interested in part of the information.

Re: [Boost-users] Interest in a Unicode library for Boost?

Gavin Lambert