
On 26/08/10 08:09, Dean Michael Berris wrote:
Indeed.
However, I am looking for a means of representing a string -- a collection of characters on which you can implement algorithms on. They might as well be ranges, but things like pattern matching and templates (as in string templates, where you have placeholders and other things) can be applied to or used to generate them.
A string is just a range of characters; nothing more.
I suppose you'd have to linearize it. Sending it in multiple chunks would have different behaviour on different types of sockets.
Yes, but something that is inherently supported by the type.
There is no "inherently supported" or "not inherently supported". Linearizing the memory means you have to copy it in a contiguous buffer.
Why strings are important has a lot to do with being able to perform domain-specific optimization on string algorithms. Things like capitalization, whitespace removal, encoding/decoding, transformations like breaking up strings according to some pattern (tokenization, parsing, etc.). Because you can specialize the memory-management of strings (as opposed to just ranges of char's) the "win" in treating a string as a separate type are practical rather than conceptual.
I've got a system that does arbitrary conversion and segmentation of ranges in a lazy way without ever allocating memory. I can use it for things like on-the-fly character encoding conversion, Unicode normalization, or base64 encoding, for a few examples. It never requires to differentiate strings from other types of ranges, not to use special containers or anything.
What I wanted to be able to do (and am reproducing at the moment) is a means of doing the following:
string_handle f = /* some means of building a string */; string_handle s = str("Literal:") ^ f ^ str("\r\n\r\n");
std::string some_string = string_handle; // convert to string and build lazily
How about:
boost::lazy_range<char> f = /* some means of building a string */
boost::lazy_range<char> s = boost::adaptors::join( boost::as_literal("Literal"), f, boost::as_literal("\r\n\r\n") );
std::string some_string; boost::copy(s, std::back_inserter(some_string));
Just in case you haven't noticed, that's exactly the same code as you wrote, except it's real code that works and it uses different names.
That's fine if I can control the memory allocation of the lazy range.
Using type erasure means you have to use dynamic memory allocation eventually, unless you limit it to use stack storage and refuse types that are too large. (which in practice should be able to work fine)
As it is, a lazy range just represents a collection of iterator pairs
Just the one pair, actually.
-- the data has to live somewhere still.
Having it live in your new "string type" is a bad idea, as this just makes things not inter-operable. Simply allow people to use whatever data structure they see fit for their storage needs, and just deal with ranges when you want to manipulate strings.
What I'm looking for is a combined ownership+iteration mechanism. Right now the problem of allocating a chunk of memory every time a you concatenate two strings is the problem I'm trying to solve with metaprogramming and knowing at compile time how much memory I'm going to need to allocate.
What meta-programming will give you is a type that represents an expression that you can then evaluate. It doesn't own the values in any way. Range adaptors are exactly the same thing, they're a lightweight DSEL.
Of course if you're dealing with strings that have an unknown length (as in my example, we really can't tell the length of 'f' at the point 's' is defined) at least getting to know the parts that have a known length at compile time (the literals) allows me to allocate enough space ahead of time with the compiler's help.
With ranges, the way it works is depending on whether the range you want to get the size of has random access or not. If it is, getting the size is O(1). If it isn't, it's O(n). If it's a single-pass range, you can't even get the size. a joined_range has the lowest category of the ranges it's joining, so if one of the ranges isn't random access, then the concatenated range isn't.
Maybe instead of having multiple concatenations, what happens is I allocate a chunk "just large enough" to hold the multiple concatenated strings, and just traverse the lazy string as in your example. The copy happens at runtime, the allocation of a large enough buffer (maybe a boost::array) happens at compile-time (or at least the determination of the size).
You can't get the size of a range at compile-time. If you're only dealing with statically-sized stuff, I suggest you use fusion sequences, which is basically the same thing as ranges except they're statically-sized and can have heterogeneous elements. (and can make your compile-time explode) There is a boost::fusion::joint_view that behaves the same as boost::join. You can't really mix both runtime and compile-time sized things together. It's either one or the other, since compile-time code and runtime code at different beasts. (of course, you can consider a fusion sequence as a range, but then you lose the information it was statically sized when manipulating it as a range)
boost::lazy_range is not actually in boost, but that would be a type erased range, and I've got an implementation somewhere.
Maybe a lazy_range would be nice to have in Boost. Or even just a join iterator.
The join iterator is already there (since 1.43 I think? maybe before that), I wouldn't have put it in my code without saying it's not in Boost otherwise. Type erasure for ranges and iterators isn't in Boost, I suppose because there is little interest given using the real type is much more efficient, it would need allocations policies, and there is still hope that someday we'll have a generic type erasure facility.
Of course, not using type erasure at all (i.e. replacing lazy_range by auto or the actual type of the expression) would allow it to be quite faster.
Definitely.
Then why the string_handle thing? You want a type that represents the expression.
The hope was to be able to determine at least the length of the whole string from the concatenation of multiple strings
Nothing particularly hard to do there.
so that effective memory allocation can be done at the time it's needed, which is usually at the time a copy of the whole string is required.
The problem is that you seem to want to apply stateful operations on your string object and delay them until you copy it, and have it automagically deal with lifetime and ownership issues. Maybe the right solution for you is to use a rope data structure, which is a container and doesn't actually concatenate things, it just puts the fragments in a self-balancing tree. Never really understood the point of that data structure myself.