
On 27/08/2010 03:50, Dean Michael Berris wrote:
Actually, not *just* a range of characters.
If your range dealt with ownership of data and abstracted the means by which the data is actually manipulated (even through iterators) then maybe a string is just a range of characters. If your range made sure that the data is moved instead of copied in certain situations, or whether there is an optimization that can be made on certain operations (like concatenation) then yeah I'll agree that a string is just a range of characters.
If you think about a string as a special beast that does a lot of things:
1. Allocates memory which holds the characters 2. Offers an abstraction unique to strings (token_iterator, line_iterator, char_iterator, wchar_iterator) 3. Has value semantics similar to built-in types (copyable, movable, "regular")
Then it doesn't look like just a range.
What I call a string is just the data, not a container of said data. I don't see what 2 is doing in there. Surely those mechanisms are independent of the container.
Sure, but if the string type already does that for you at the time you need it, then it's inherently supported by the type, no?
If you can iterate through a sequence of copyable elements, then you can copy those elements into a new sequence of the structure of your choosing. This doesn't require any particular support from the first sequence type.
That's cool, but in cases where you deal with potentially huge strings (i.e. more than a memory page's worth of data) you start looking for ways of moving some of the work out of runtime and into compile time
This doesn't make sense. If the data is too big to fit into memory, then it's going to be completely out of the range (no pun intended) of what the compile-time world can deal with.
And especially in cases where you need to abstract the string from being something that is exclusively in memory to something that refers to data that is not in memory (retrieved from a socket, from a file, from user input) you run into issues like buffer management, demand-driven/lazy-loading data, etc. that a range is not the end-all and/or best solution for.
For instance, think about a forward iterator (or a single-pass iterator?) that the string can expose to allow for one-time traversal of the data it holds or it refers to. The string doesn't have to have the data all in memory if it doesn't yet, and it can then lazily load the data and expose it through that forward iterator.
I don't get your argument. Basically you're saying "wouldn't it be cool if you could not actually store your data in memory, but generate it as you're traversing or get it from I/O on demand?". That's what ranges are. Ranges *are* iterators.
If the string handle knew that some part of the string was a conglomeration of statically-sized literals then it can hold that data in a boost::array of the correct size
No it cannot. string_handle is a single type defined in advance, and it must have a finite size. It cannot guess how many conglomerations of statically-sized literals you're going to want to put into it, and therefore can't guarantee enough storage.
then you can correctly allocate enough space at the point where the operator= is implemented.
At runtime; (operator= is a function that executes code, not a type definition that provides automatic storage) which makes the memory dynamically allocated, as I said.
Of course the type erasure might be an issue, but knowing the eventual size at compile time allows you a lot of optimizations you otherwise can't or won't do.
Here, I can't see it bringing any benefit compared to knowing it at runtime, since the allocation can only happen at runtime.
Yes, but then you have a left-leaning tree of iterator pairs. ;)
Just like you have a tree of a bounded segments, or an AST if you go for a proto solution.
If the new string type implemented iterators (several types of iterators in fact) and manages the memory for you in a configurable manner, *and* allows you to convert it to either an std::string or an std::wstring, why wouldn't it be inter-operable?
Because your code will expect your string type, not the string type of the user, meaning he'll have to convert to it. Converting is not actually needed, since you could just treat your type, the type of the user, or any smart lazy evaluated type the same.
But the range adaptors don't solve the issue of multiple allocations
How do they not? Your problem is that you reallocate multiple times with a classic binary operator+ implementation: (pseudo-code) buffer = a + b + c is buffer = allocate(size(a)) copy(buffer, a) buffer2 = allocate(size(buffer)+size(b)) copy(buffer2, buffer) cat(buffer2, b) free(buffer) buffer = allocate(size(buffer2)+size(c)) copy(buffer, buffer2) cat(buffer, c) free(buffer2) Instead of that you could just evaluate that expression as r = join(a, b, c) buffer3 = allocate(size(r)) copy(buffer3, r) As you can see, it solves the problem just fine.
Ah, because 'auto' is C++0x and I was still thinking that maybe people not moving to C++0x might as well want to be able to use this new string type even at C++03.
You can write the type out yourself or use a result_of-like protocol, like Fusion does.
Yes! Well, technically not stateful operations -- they're functionally pure operations: concatenation doesn't munge with the existing strings, it just returns a new string made of the contents of the other strings (or string generators).
If you end up re-storing the result into the same variable, it basically is stateful.