[program_options] Some methods take const char*, others take std::string

Hi all, Some methods in program_options accept const char*s (e.g. option_description constructor), while others take const std::string&s (e.g. option_description::key). This makes no difference to the user if he is passing string literals, but becomes annoying if he wants to pass strings, or expressions which return strings. Of course he can put (...).c_str() everywhere, but this is ugly and unnecessary, since program_options just takes its parameters and shoves them into string variables anyways. Since changing the parameter types to const std::string& should not break existing code, I do not see a downside to doing so (plus, it would allow me to get rid of .c_str()s everywhere :-). Does it seem reasonable for program_options public APIs to accept std::strings instead of const char*s? If so I would be happy to do the legwork and submit a patch. Thanks, -Gabe

Gabriel Redner wrote:
Hi all,
Some methods in program_options accept const char*s (e.g. option_description constructor), while others take const std::string&s (e.g. option_description::key). This makes no difference to the user if he is passing string literals, but becomes annoying if he wants to pass strings, or expressions which return strings. Of course he can put (...).c_str() everywhere, but this is ugly and unnecessary, since program_options just takes its parameters and shoves them into string variables anyways. Since changing the parameter types to const std::string& should not break existing code, I do not see a downside to doing so (plus, it would allow me to get rid of .c_str()s everywhere :-).
Does it seem reasonable for program_options public APIs to accept std::strings instead of const char*s? If so I would be happy to do the legwork and submit a patch.
This is: https://svn.boost.org/trac/boost/ticket/4142 The point is that using std::string would impose code size overhead when you pass string literal. I would be happy to see a patch, but such a patch would have to retain overloads taking const char*. Thanks, -- Vladimir Prus CodeSourcery / Mentor Graphics +7 (812) 677-68-40

Another option is to define class str_ref : public iterator_range<const char*> { ... }; Convertible from both, string literals and std::strings, and provide only one overload that accepts it everywhere. Just an idea. On Wed, Aug 10, 2011 at 23:13, Vladimir Prus <vladimir@codesourcery.com>wrote:
Gabriel Redner wrote:
Hi all,
Some methods in program_options accept const char*s (e.g. option_description constructor), while others take const std::string&s (e.g. option_description::key). This makes no difference to the user if he is passing string literals, but becomes annoying if he wants to pass strings, or expressions which return strings. Of course he can put (...).c_str() everywhere, but this is ugly and unnecessary, since program_options just takes its parameters and shoves them into string variables anyways. Since changing the parameter types to const std::string& should not break existing code, I do not see a downside to doing so (plus, it would allow me to get rid of .c_str()s everywhere :-).
Does it seem reasonable for program_options public APIs to accept std::strings instead of const char*s? If so I would be happy to do the legwork and submit a patch.
This is: https://svn.boost.org/trac/boost/ticket/4142
The point is that using std::string would impose code size overhead when you pass string literal. I would be happy to see a patch, but such a patch would have to retain overloads taking const char*.
Thanks,
-- Vladimir Prus CodeSourcery / Mentor Graphics +7 (812) 677-68-40
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Yakov

Hi Vladimir, I have added the std::string overloads, and my patch (against trunk) is attached to this message. I also updated relevant tests to check both overloads, and I have run the full test suite. The one thing I was not able to do was check that the new generated docs are correct - how do I regenerate the docs? Additionally, if you don't like the wording (or the content) of my changes to the 'rationale' section, feel free to rewrite it. Thanks, -Gabe

On Thu, Aug 11, 2011 at 4:45 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
Another option is to define
class str_ref : public iterator_range<const char*> { ... };
Convertible from both, string literals and std::strings, and provide only one overload that accepts it everywhere.
Just an idea. -- Yakov
Hi Yakov, Thanks for your suggestion! In the end though, I decided to add separate std::string overloads. Since the interfaces involved are few and simple, it seemed better to overload in the most straightforward way, rather than add something fancier just to save a bit of typing on the implementation side. Thanks, -Gabe

It's not just for saving typing, it's for making the interface more generic. Consider the following: void f(const char*); void f(const std::string&); I can use it for null-terminated strings and for std::srings just fine. But consider the case I want to pass it a sub-string of some string, or a string stored in std::vector, or a non-null terminated string: std::string str1 = ...; std::vector<char> str2 = ...; char str3[1024]; GetBinaryPacket(str3); f(str1.substr(1, 5)); // unnecessary copy f(std::string(str2.begin(), str2.end())); // unnecessary copy f(std::string(str3, str3 + 5)); // unnecessary copy Compare it with: typedef iterator_range<const char*> str_ref; void f(str_ref); f(str_ref(str1.data()+1, str1.data()+6)); // no copy f(str_ref(&str2[0], &str2[0] + str2.size())); // no copy f(str_ref(str3, str3 + 5)); // no copy Ok, so the above is longer than the first because we didn't define a nice interface of str_ref yet, but it demonstrated the generality. Note that the most general way is to use templates accepting any Range of chars, just as boost string algorithms do. But this is not good for separate compilation, so the above is the best compromise of efficiency, separate compilation and generality we can get. Yakov On Mon, Aug 15, 2011 at 01:45, Gabriel Redner <gredner@gmail.com> wrote:
On Thu, Aug 11, 2011 at 4:45 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
Another option is to define
class str_ref : public iterator_range<const char*> { ... };
Convertible from both, string literals and std::strings, and provide only one overload that accepts it everywhere.
Just an idea. -- Yakov
Hi Yakov,
Thanks for your suggestion! In the end though, I decided to add separate std::string overloads. Since the interfaces involved are few and simple, it seemed better to overload in the most straightforward way, rather than add something fancier just to save a bit of typing on the implementation side.
Thanks, -Gabe _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Yakov

On 15 Aug 2011, at 07:57, Yakov Galka wrote:
It's not just for saving typing, it's for making the interface more generic. Consider the following:
void f(const char*); void f(const std::string&);
I can use it for null-terminated strings and for std::srings just fine. But consider the case I want to pass it a sub-string of some string, or a string stored in std::vector, or a non-null terminated string:
std::string str1 = ...; std::vector<char> str2 = ...; char str3[1024]; GetBinaryPacket(str3);
f(str1.substr(1, 5)); // unnecessary copy f(std::string(str2.begin(), str2.end())); // unnecessary copy f(std::string(str3, str3 + 5)); // unnecessary copy
Compare it with:
typedef iterator_range<const char*> str_ref; void f(str_ref);
f(str_ref(str1.data()+1, str1.data()+6)); // no copy f(str_ref(&str2[0], &str2[0] + str2.size())); // no copy f(str_ref(str3, str3 + 5)); // no copy
Is worrying about high efficiency, and lack of copying, in program_options really that important? I would prefer the simplest possible interface, even if it has some small inefficiencies. Chris

On Mon, Aug 15, 2011 at 19:49, Christopher Jefferson <chris@bubblescope.net>wrote:
[...] Is worrying about high efficiency, and lack of copying, in program_options really that important? I would prefer the simplest possible interface, even if it has some small inefficiencies.
This is not about program_options, it's about passing strings in general. It's just this issue came up in this thread, so I proposed it here. On Mon, Aug 15, 2011 at 22:33, Gabriel Redner <gredner@gmail.com> wrote:
Hi Yakov,
It's not just for saving typing, it's for making the interface more generic. Consider the following: <snip> Ok, so the above is longer than the first because we didn't define a nice interface of str_ref yet, but it demonstrated the generality.
This is all true, but it's important to keep in mind our use cases. In general, these APIs will be called with string literals. The next-most-common case is for them to be called with std::strings, and other cases are even less likely. So it seems best to do the simplest thing possible which covers the common cases and does not rule out the uncommon ones.
There are dozens of other string classes people use. If anyone could map his contiguous storage string with compatible encoding to str_ref, it could simplify things greatly.
Note that the most general way is to use templates accepting any Range of chars, just as boost string algorithms do. But this is not good for separate compilation, so the above is the best compromise of efficiency, separate compilation and generality we can get.
Your compromise is missing one factor - simplicity. This is a simple API which does a simple job. Decorating it with generic tools will make it harder to understand and to maintain, in exchange for some small benefit in uncommon use cases.
Once str_ref is implemented, it makes everything else simpler. So I think that in the end things are getting simpler. Also In any case, it's Vladimir's library, so he has the last word on this.
This is not just about this library. It's relevant to interprocess too, for example. On Mon, Aug 15, 2011 at 23:25, Olaf van der Spek <ml@vdspek.org> wrote:
[...] This isn't the only lib that takes string references. It'd be nice to solve this properly once and for all.
Exactly! Yet I must admit that I have a very little experience with my own proposal. I used it successfully in compiler application where passing references to the original strings allowed one cheaply (and lazily) compute the source of an error. -- Yakov

Hi Yakov,
It's not just for saving typing, it's for making the interface more generic. Consider the following: <snip> Ok, so the above is longer than the first because we didn't define a nice interface of str_ref yet, but it demonstrated the generality.
This is all true, but it's important to keep in mind our use cases. In general, these APIs will be called with string literals. The next-most-common case is for them to be called with std::strings, and other cases are even less likely. So it seems best to do the simplest thing possible which covers the common cases and does not rule out the uncommon ones.
Note that the most general way is to use templates accepting any Range of chars, just as boost string algorithms do. But this is not good for separate compilation, so the above is the best compromise of efficiency, separate compilation and generality we can get.
Your compromise is missing one factor - simplicity. This is a simple API which does a simple job. Decorating it with generic tools will make it harder to understand and to maintain, in exchange for some small benefit in uncommon use cases. In any case, it's Vladimir's library, so he has the last word on this. Thanks, -Gabe

On 15 August 2011 14:33, Gabriel Redner <gredner@gmail.com> wrote:
Your compromise is missing one factor - simplicity. This is a simple API which does a simple job. Decorating it with generic tools will make it harder to understand and to maintain, in exchange for some small benefit in uncommon use cases.
When the library is taken is isolation, that is true. When taken are part of a greater whole (Boost), it makes it harder, as it is now has an interface that is inconsistent with other libraries. Having to keep track of which libraries support which "similar but not quite the same" interface does not make things simpler.
In any case, it's Vladimir's library, so he has the last word on this.
Of course. -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

On Mon, Aug 15, 2011 at 9:33 PM, Gabriel Redner <gredner@gmail.com> wrote:
This is all true, but it's important to keep in mind our use cases. In general, these APIs will be called with string literals. The next-most-common case is for them to be called with std::strings, and other cases are even less likely. So it seems best to do the simplest thing possible which covers the common cases and does not rule out the uncommon ones.
Using str_ref would require only a single overload, while using const char* and const str::string& would require two. So what is the simplest thing possible?
Your compromise is missing one factor - simplicity. This is a simple API which does a simple job. Decorating it with generic tools will make it harder to understand and to maintain, in exchange for some small benefit in uncommon use cases.
This isn't the only lib that takes string references. It'd be nice to solve this properly once and for all. Olaf

On Mon, Aug 15, 2011 at 10:25 PM, Olaf van der Spek <ml@vdspek.org> wrote:
On Mon, Aug 15, 2011 at 9:33 PM, Gabriel Redner <gredner@gmail.com> wrote:
This is all true, but it's important to keep in mind our use cases. In general, these APIs will be called with string literals. The next-most-common case is for them to be called with std::strings, and other cases are even less likely. So it seems best to do the simplest thing possible which covers the common cases and does not rule out the uncommon ones.
Using str_ref would require only a single overload, while using const char* and const str::string& would require two. So what is the simplest thing possible?
Not using str_ref would require a third overload: (const char*, size_t). Even then, it still doesn't support for example vector<char>, CString (MFC) or QString (QT). Olaf

On Thu, Aug 11, 2011 at 10:45 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
class str_ref : public iterator_range<const char*> { ... };
Convertible from both, string literals and std::strings, and provide only one overload that accepts it everywhere.
Doesn't iterator_range treat literals as arrays by default? The str_ref idea is great, but IMO iterator_range isn't suitable for it. Also, doesn't iterator_range require memory allocations itself? Olaf

On Mon, Aug 15, 2011 at 19:44, Olaf van der Spek <ml@vdspek.org> wrote:
On Thu, Aug 11, 2011 at 10:45 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
class str_ref : public iterator_range<const char*> { ... };
Convertible from both, string literals and std::strings, and provide only one overload that accepts it everywhere.
Doesn't iterator_range treat literals as arrays by default? The str_ref idea is great, but IMO iterator_range isn't suitable for it. Also, doesn't iterator_range require memory allocations itself?
1) It's a proof of concept, not an implementation. iterator_range is just one of the infinity of types which can be mapped to the Range concept and used wherever Ranges expected. 2) Yes, it treats string literals as arrays, so what? It's exactly what we want. If you're talking about including or not the null terminator, then I'm still not sure on this point. We can say that our strings don't contain zeros in the middle (which is acceptable for function expecting const char*), then if end() points to a null terminator we can use it as an optimization when passing to a low-level function accepting a zero-terminated string. If end() doesn't point to zero, then we must do a copy when passing to something like fread(). The alternative is to never include the null terminator when initializing from a string. This solution is also good, since only a small fraction of our functions pass the strings to the system (where a null terminated string is needed), and even then sometimes we *must* do a copy, especially if we want to support Unicode on windows (consider all boost::interprocess which cannot be used now for opening Unicode paths on windows). 3) No, iterator_range just stores a pair of iterators, no memory allocations are done. Also see 1), it's irrelevant to the discussion. -- Yakov

On Wed, Aug 24, 2011 at 5:27 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
2) Yes, it treats string literals as arrays, so what? It's exactly what we want.
No, because it'll include the null terminator.
If you're talking about including or not the null terminator, then I'm still not sure on this point. We can say that our strings don't contain zeros in the middle (which is acceptable for function expecting const char*), then if
It'd be nice to support 'binary' strings too.
end() points to a null terminator we can use it as an optimization when
end() will point one past the terminator.
3) No, iterator_range just stores a pair of iterators, no memory allocations are done. Also see 1), it's irrelevant to the discussion.
Ah, I think I'm confused with another type then. Olaf

On Wed, Aug 24, 2011 at 22:22, Olaf van der Spek <ml@vdspek.org> wrote:
On Wed, Aug 24, 2011 at 5:27 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
2) Yes, it treats string literals as arrays, so what? It's exactly what we want.
No, because it'll include the null terminator. [...] end() will point one past the terminator.
The two are exactly opposite. So do you want to include the null in the range or not? Yes, my mistake. When I said 'end() points to' I meant 'end()-1 points to'.
If you're talking about including or not the null terminator, then I'm still
not sure on this point. We can say that our strings don't contain zeros in the middle (which is acceptable for function expecting const char*), then if
It'd be nice to support 'binary' strings too.
I remember that you brought up this topic some time ago. However I don't think that you need anything special to handle binary chunks of data: typedef iterator_range<const char*> data_ref; Unlike for str_ref, this *is* a complete, working implementation. -- Yakov

On Sat, Aug 27, 2011 at 10:26 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Wed, Aug 24, 2011 at 22:22, Olaf van der Spek <ml@vdspek.org> wrote:
On Wed, Aug 24, 2011 at 5:27 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
2) Yes, it treats string literals as arrays, so what? It's exactly what we want.
No, because it'll include the null terminator. [...] end() will point one past the terminator.
The two are exactly opposite.
No they're not. end() is not included in the range, so if end() points to one past the terminator, the terminator is part of the range. if end() points to the terminator, the terminator is not part of the range.
So do you want to include the null in the range or not?
I don't.
It'd be nice to support 'binary' strings too.
I remember that you brought up this topic some time ago. However I don't think that you need anything special to handle binary chunks of data:
typedef iterator_range<const char*> data_ref;
Unlike for str_ref, this *is* a complete, working implementation.
The parameter types are usually (const void*, size_t). Not const char*. Olaf
participants (6)
-
Christopher Jefferson
-
Gabriel Redner
-
Nevin Liber
-
Olaf van der Spek
-
Vladimir Prus
-
Yakov Galka