Re: [boost] [Potentially OT] String Concatenation Operator

27 Aug 2010

      On 27/08/2010 03:50, Dean Michael Berris wrote:
...
Actually, not *just* a range of characters.
If your range dealt with ownership of data and abstracted the means by
which the data is actually manipulated (even through iterators) then
maybe a string is just a range of characters. If your range made sure
that the data is moved instead of copied in certain situations, or
whether there is an optimization that can be made on certain
operations (like concatenation) then yeah I'll agree that a string is
just a range of characters.
If you think about a string as a special beast that does a lot of things:
1. Allocates memory which holds the characters
2. Offers an abstraction unique to strings (token_iterator,
line_iterator, char_iterator, wchar_iterator)
3. Has value semantics similar to built-in types (copyable, movable, "regular")
Then it doesn't look like just a range.
What I call a string is just the data, not a container of said data.

I don't see what 2 is doing in there. Surely those mechanisms are 
independent of the container.
...
Sure, but if the string type already does that for you at the time you
need it, then it's inherently supported by the type, no?
If you can iterate through a sequence of copyable elements, then you can 
copy those elements into a new sequence of the structure of your choosing.
This doesn't require any particular support from the first sequence type.
...
That's cool, but in cases where you deal with potentially huge strings
(i.e. more than a memory page's worth of data) you start looking for
ways of moving some of the work out of runtime and into compile time
This doesn't make sense.
If the data is too big to fit into memory, then it's going to be 
completely out of the range (no pun intended) of what the compile-time 
world can deal with.
...
And especially in
cases where you need to abstract the string from being something that
is exclusively in memory to something that refers to data that is not
in memory (retrieved from a socket, from a file, from user input) you
run into issues like buffer management, demand-driven/lazy-loading
data, etc. that a range is not the end-all and/or best solution for.
For instance, think about a forward iterator (or a single-pass
iterator?) that the string can expose to allow for one-time traversal
of the data it holds or it refers to. The string doesn't have to have
the data all in memory if it doesn't yet, and it can then lazily load
the data and expose it through that forward iterator.
I don't get your argument.
Basically you're saying "wouldn't it be cool if you could not actually 
store your data in memory, but generate it as you're traversing or get 
it from I/O on demand?". That's what ranges are.
Ranges *are* iterators.
...
If the string handle knew that some part
of the string was a conglomeration of statically-sized literals then
it can hold that data in a boost::array of the correct size
No it cannot.
string_handle is a single type defined in advance, and it must have a 
finite size. It cannot guess how many conglomerations of 
statically-sized literals you're going to want to put into it, and 
therefore can't guarantee enough storage.
...
then you can correctly
allocate enough space at the point where the operator= is implemented.
At runtime; (operator= is a function that executes code, not a type 
definition that provides automatic storage) which makes the memory 
dynamically allocated, as I said.
...
Of course the type erasure might be an issue, but knowing the eventual
size at compile time allows you a lot of optimizations you otherwise
can't or won't do.
Here, I can't see it bringing any benefit compared to knowing it at 
runtime, since the allocation can only happen at runtime.
...
Yes, but then you have a left-leaning tree of iterator pairs. ;)
Just like you have a tree of a bounded segments, or an AST if you go for 
a proto solution.
...
If the new string type implemented iterators (several types of
iterators in fact) and manages the memory for you in a configurable
manner, *and* allows you to convert it to either an std::string or an
std::wstring, why wouldn't it be inter-operable?
Because your code will expect your string type, not the string type of 
the user, meaning he'll have to convert to it.
Converting is not actually needed, since you could just treat your type, 
the type of the user, or any smart lazy evaluated type the same.
...
But the range adaptors don't solve the issue of multiple allocations
How do they not?

Your problem is that you reallocate multiple times with a classic binary 
operator+ implementation: (pseudo-code)

buffer = a + b + c

is

buffer = allocate(size(a))
copy(buffer, a)

buffer2 = allocate(size(buffer)+size(b))
copy(buffer2, buffer)
cat(buffer2, b)
free(buffer)

buffer = allocate(size(buffer2)+size(c))
copy(buffer, buffer2)
cat(buffer, c)
free(buffer2)

Instead of that you could just evaluate that expression as

r = join(a, b, c)
buffer3 = allocate(size(r))
copy(buffer3, r)

As you can see, it solves the problem just fine.
...
Ah, because 'auto' is C++0x and I was still thinking that maybe people
not moving to C++0x might as well want to be able to use this new
string type even at C++03.
You can write the type out yourself or use a result_of-like protocol, 
like Fusion does.
...
Yes! Well, technically not stateful operations -- they're functionally
pure operations: concatenation doesn't munge with the existing
strings, it just returns a new string made of the contents of the
other strings (or string generators).
If you end up re-storing the result into the same variable, it basically 
is stateful.

Re: [boost] [Potentially OT] String Concatenation Operator

Mathias Gaunard