
On Wed, Aug 25, 2010 at 5:38 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 24/08/2010 17:11, Dean Michael Berris wrote:
1. Efficiently allocate space to contain a string being built from literals and variable length strings. 2. Be able to build/traverse the string lazily (i.e., it doesn't matter that the string is contiguous in memory as in the case of C-strings, or whether they are built/backed by a stream as in Haskell ByteString).
It seems to be what you're looking for is a range (or a slight refinement). I.e. an entity that can be iterated.
Almost... That's just one part of it.
Arrays, tuples, strings, vectors, lists, a pair of istream_iterator, a pair of pointers, a concatenation of ranges, a transformed range, etc. are all ranges.
Indeed. However, I am looking for a means of representing a string -- a collection of characters on which you can implement algorithms on. They might as well be ranges, but things like pattern matching and templates (as in string templates, where you have placeholders and other things) can be applied to or used to generate them.
3. As much as possible be "automagically" network-safe (i.e. can be dealt with by Boost.Asio without having to do much acrobatics with it).
I suppose you'd have to linearize it. Sending it in multiple chunks would have different behaviour on different types of sockets.
Yes, but something that is inherently supported by the type. Why strings are important has a lot to do with being able to perform domain-specific optimization on string algorithms. Things like capitalization, whitespace removal, encoding/decoding, transformations like breaking up strings according to some pattern (tokenization, parsing, etc.). Because you can specialize the memory-management of strings (as opposed to just ranges of char's) the "win" in treating a string as a separate type are practical rather than conceptual.
What I wanted to be able to do (and am reproducing at the moment) is a means of doing the following:
string_handle f = /* some means of building a string */; string_handle s = str("Literal:") ^ f ^ str("\r\n\r\n");
std::string some_string = string_handle; // convert to string and build lazily
How about:
boost::lazy_range<char> f = /* some means of building a string */
boost::lazy_range<char> s = boost::adaptors::join( boost::as_literal("Literal"), f, boost::as_literal("\r\n\r\n") );
std::string some_string; boost::copy(s, std::back_inserter(some_string));
That's fine if I can control the memory allocation of the lazy range. As it is, a lazy range just represents a collection of iterator pairs -- the data has to live somewhere still. What I'm looking for is a combined ownership+iteration mechanism. Right now the problem of allocating a chunk of memory every time a you concatenate two strings is the problem I'm trying to solve with metaprogramming and knowing at compile time how much memory I'm going to need to allocate. Of course if you're dealing with strings that have an unknown length (as in my example, we really can't tell the length of 'f' at the point 's' is defined) at least getting to know the parts that have a known length at compile time (the literals) allows me to allocate enough space ahead of time with the compiler's help. Maybe instead of having multiple concatenations, what happens is I allocate a chunk "just large enough" to hold the multiple concatenated strings, and just traverse the lazy string as in your example. The copy happens at runtime, the allocation of a large enough buffer (maybe a boost::array) happens at compile-time (or at least the determination of the size).
boost::lazy_range is not actually in boost, but that would be a type erased range, and I've got an implementation somewhere.
Maybe a lazy_range would be nice to have in Boost. Or even just a join iterator.
Of course, not using type erasure at all (i.e. replacing lazy_range by auto or the actual type of the expression) would allow it to be quite faster.
Definitely. The hope was to be able to determine at least the length of the whole string from the concatenation of multiple strings, so that effective memory allocation can be done at the time it's needed, which is usually at the time a copy of the whole string is required. -- Dean Michael Berris deanberris.com