[string_ref] type erasure to support arbitrary strings

Hi, When I made a first glance of the proposal I expected that string_ref is some kind of type erasing non owning holder of a reference to an arbitrary string class. The imperfect silver bullet for the diversity of string types. Take a project that uses c++ libraries using std::string, Qt GUI and maybe a module that works with chunked strings (string composed out of a list of usually equally sized blocks of memory). At the boundaries of the software modules copies have to occur. Frequently the interface expects some kind of constant references to a certain string type. Most of the time the callee just wants to process the complete string or just a part of the string. So there might not even be the necessity to have random access hence no need for a contiguous copy of the string contents. Thats why I would prefer a facility that only requires that the source string is a forward range of characters in an arbitrary encoding. I do not think that it makes sense to only address contiguous sequences of characters. regards Andreas Pokorny

On 30/01/13 14:16, Andreas Pokorny wrote:
Thats why I would prefer a facility that only requires that the source string is a forward range of characters in an arbitrary encoding.
I do not think that it makes sense to only address contiguous sequences of characters.
In practice strings are always stored contiguously, whatever the string type (std::string, Qt or whatever). Making it work with arbitrary forward iterators would have a huge unnecessary overhead. There might be a slight use case for chunked strings, but it's definitely not necessary to go the arbitrary forward iterator route.

Hi, 2013/1/30 Mathias Gaunard <mathias.gaunard@ens-lyon.org>:
In practice strings are always stored contiguously, whatever the string type (std::string, Qt or whatever). Making it work with arbitrary forward iterators would have a huge unnecessary overhead.
Well I expect that facility to be only used to give the callee access to the string in the expected encoding. Can the current proposal of string_ref capture QtStrings without copying?
There might be a slight use case for chunked strings, but it's definitely not necessary to go the arbitrary forward iterator route.
Thought thats all those nice string algorithms need. regards Andreas

On Jan 30, 2013, at 6:22 AM, Andreas Pokorny <andreas.pokorny@gmail.com> wrote:
Can the current proposal of string_ref capture QtStrings without copying?
I haven't actually tried it, but after checking the QT docs, a QString str; basic_stringref<QChar> ref (str.data(), str.len()); should work. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

Hi 2013/1/30 Marshall Clow <mclow.lists@gmail.com>:
On Jan 30, 2013, at 6:22 AM, Andreas Pokorny <andreas.pokorny@gmail.com> wrote:
Can the current proposal of string_ref capture QtStrings without copying?
I haven't actually tried it, but after checking the QT docs, a
QString str; basic_stringref<QChar> ref (str.data(), str.len());
QChar seems to be a class warpping a 16bit unicode character. Then you can only use that for functions taking a basic_string_ref<QChar> or functions templated on the encoding type. That does fit to the use case .. the callee defines the signature. Andreas

On Wed, Jan 30, 2013 at 5:16 AM, Andreas Pokorny <andreas.pokorny@gmail.com>wrote:
Hi, When I made a first glance of the proposal I expected that string_ref is some kind of type erasing non owning holder of a reference to an arbitrary string class. The imperfect silver bullet for the diversity of string types.
Take a project that uses c++ libraries using std::string, Qt GUI and maybe a module that works with chunked strings (string composed out of a list of usually equally sized blocks of memory). At the boundaries of the software modules copies have to occur. Frequently the interface expects some kind of constant references to a certain string type. Most of the time the callee just wants to process the complete string or just a part of the string. So there might not even be the necessity to have random access hence no need for a contiguous copy of the string contents.
Thats why I would prefer a facility that only requires that the source string is a forward range of characters in an arbitrary encoding.
I do not think that it makes sense to only address contiguous sequences of characters.
It's somewhat of a compromise as a string_ref has very little access overhead compared to using a std::string or similar directly (I would think the only overhead, if any, is in the construction of the string_ref). A type-erased reference to a range of characters would have quite a bit of access overhead (everything would rely on dynamic dispatching). That said, there are certainly situations (e.g., module boundaries) where a type-erased string could make sense; for such situations, take a look at any_range within Boost.Range. - Jeff

On 30/01/13 18:35, Jeffrey Lee Hellrung, Jr. wrote:
It's somewhat of a compromise as a string_ref has very little access overhead compared to using a std::string or similar directly (I would think the only overhead, if any, is in the construction of the string_ref).
Which isn't in the access itself. There is no overhead compared to accessing a pointer.

Hi, 2013/1/30 Jeffrey Lee Hellrung, Jr. <jeffrey.hellrung@gmail.com>:
[...]
It's somewhat of a compromise as a string_ref has very little access overhead compared to using a std::string or similar directly (I would think the only overhead, if any, is in the construction of the string_ref). A type-erased reference to a range of characters would have quite a bit of access overhead (everything would rely on dynamic dispatching).
Yes that is the reason why I always resisted from declaring a more dynamic string interface. Never noticed or accepted the need. But well there is now string_ref. Why stop halfways? Is it a case of paying for something that you dont need (always)?
That said, there are certainly situations (e.g., module boundaries) where a type-erased string could make sense; for such situations, take a look at any_range within Boost.Range.
Thank you - have not seen that yet. regards Andreas

On Thu, Jan 31, 2013 at 1:38 PM, Andreas Pokorny <andreas.pokorny@gmail.com> wrote:
Yes that is the reason why I always resisted from declaring a more dynamic string interface. Never noticed or accepted the need. But well there is now string_ref. Why stop halfways? Is it a case of paying for something that you dont need (always)?
Performance and simplicity. -- Olaf
participants (5)
-
Andreas Pokorny
-
Jeffrey Lee Hellrung, Jr.
-
Marshall Clow
-
Mathias Gaunard
-
Olaf van der Spek