boost::program_options extensions

I've been working with boost::options, and these is one gaping hole in it, that I have noticed. There is no way to use a custom converter in the default po::typed_value class. This is such a flaw, I've actually copied typed_value, and created one that accepts a function of parser(), which you would supply a boost::function2<void, boost::any &, const std::vector<std::basic_string<charT> > & Why would I want to do this? I have created a bunch of generic converters that can be used for a number of things, from validation to translation. The ones I have so far are: validate_oneof - used as: value<T>()->parser(validate_oneof<T>(first_val) + second_val + third_val + fourth_val)) This will then ensure that the value supplied is one of these values (obviously validate_oneof is a class with an overridden operator+ and an operator() for calling). validate_range - used as: value<T>()->parser(validate_range<T>(first, last)) This will ensure that the value lies between these values. You might be able to achieve the same with notifiers, I don't know, but from here, it gets more interesting. validate_mapped - used as: value<T>()->parser(validate_mapped<K,T>(first_key, first_val) + std::make_pair(second_key, second_val) + std::make_pair(third_key, third_val)) This will then allow you to specify a value of one type, but have a different type stored. This is particularly useful to allow, say: LogLevel = info and then have that actually stored as an in (say, 5) in the values. I have others to do syntax validation too, so say validate_ipaddr which will ensure that it is a valid IP address before storing it, and so on. Again, I'm not sure how much notifiers would do this too. But also consider the following. A 'validator' that allows you to take an IP address as text, verify it is a valid IP address (hell, if someone wanted to add logic, verify it is within a certain range if they wanted to), and then store it as an in_addr_t. Or even better, a 'validator' that checks to see if 'http://' is prefixed to a web address, and either removes it or adds it, depending on what you need. A notifier might be able to catch bad values, but not change types or correct the data. And all in all, the change to do this was minimal, apart from adding the 'parser' function, my xparse function is simply: if (m_parser) m_parser(value_store, new_tokens) else validate(value_store, new_tokens, (T*) 0, 0); Feel free to e-mail me if you want my code, or want me to post it. -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

Hi Preston,
I've been working with boost::options, and these is one gaping hole in it, that I have noticed. There is no way to use a custom converter in the default po::typed_value class. ....... validate_mapped - used as: value<T>()->parser(validate_mapped<K,T>(first_key, first_val) + std::make_pair(second_key, second_val) + std::make_pair(third_key, third_val)) This will then allow you to specify a value of one type, but have a different type stored. This is particularly useful to allow, say: LogLevel = info and then have that actually stored as an in (say, 5) in the values.
This is cool.
I have others to do syntax validation too, so say validate_ipaddr which will ensure that it is a valid IP address before storing it, and so on.
Again, I'm not sure how much notifiers would do this too. But also consider the following.
A 'validator' that allows you to take an IP address as text, verify it is a valid IP address (hell, if someone wanted to add logic, verify it is within a certain range if they wanted to), and then store it as an in_addr_t.
This is indeed cannot be conveniently done in the current version. You can create a subclass of 'value_semantics', say, "ip_address_value_semantics_t", then create a function "ip_address_value_semantics", and use it instead of "value". But passing just the validator function is more convenient, so I intend to implement this functionality.
And all in all, the change to do this was minimal, apart from adding the 'parser' function, my xparse function is simply: if (m_parser) m_parser(value_store, new_tokens) else validate(value_store, new_tokens, (T*) 0, 0);
Feel free to e-mail me if you want my code, or want me to post it.
I understand what your change is, so I can quickly apply it myself. I'll only need to decide if it's better to provide custom parser, custom validator (called after parsing), or both. And I also need to find out what's sizeof(boost::function<....>) is ;-) Thanks, Volodya

On Fri, 28 Jan 2005 16:09:28 +0300, Vladimir Prus wrote:
This is indeed cannot be conveniently done in the current version. You can create a subclass of 'value_semantics', say, "ip_address_value_semantics_t", then create a function "ip_address_value_semantics", and use it instead of "value". But passing just the validator function is more convenient, so I intend to implement this functionality. Why does your vector validator not share a common ancestor with your non-vector validator?
It does a lexical_cast<T> inside it - when it would much more sense to use a structure somewhat similar to the following: // Implemented as a class so we can partially specialize it later on template<class T> class validate_internal { public: template<class charT> T operator()(const std::basic_string<charT> &s) { try { return boost::lexical_cast<T>(s); } catch (const bad_lexical_cast &) { throw invalid_option_value(s); } } }; template<class T, class charT> T validate_internal() { } template<class T, class charT> validate(boost::any &v, const std::vector<std::basic_string<charT> >& xs) T *, long) { validators::check_first_occurance(v); std::basic_string<charT> s(validators::get_single_string(xs)); static validate_internal<T> validator; v = validator(s); } template<class T, class charT> validate(boost::any &v, const std::vector<std::basic_string<charT> >& xs) std::vector<T> *, long) { if (v.empty()) v = boost::any(std::vector<T>()); std::vector<T> *tv = boost::any_cast<std::vector<T> >(&v); assert(tv); static validate_internal<T> validator; for (unsigned int i = 0; i < xs.size(); ++i) tv->push_back(validator(xs[i])); } This way, if I need my own validator for a specific type, I just need to make the validate_internal specialization for my type, and it will work stand-alone OR with a vector. I have this exact issue, where I need to validate a custom type that needs more than a lexical_cast, so my validate_internal (for my 'duration' class) for that would look like: template<> class validate_internal<duration> { public: template<class charT> duration operator()(const std::basic_string<charT> &s) { try { date_duration date(not_a_date_time); time_duration time(not_a_date_time); StringToDuration(s, date, time); return duration(date, time); } catch (const invalid_duration_format &) { throw invalid_option_value(s); } } }; This conversion would be used then whether I used value<duration>() or value<std::vector<duration> >(). Which is what I'm pretty sure most people want :) It also makes it easier for people writing custom parsers, since they can call the standard validation functions too. Notice, btw, how I made the instantiation of the validator static - this should save some cycles (not too many, but some) - by only ever creating one instance of the validator for each type in each function calling it - as opposed to creating an instance of the validator each time it is needed :) As an aside, I know it would be easy, but is there any reason you did not include the validators for std::set, std::multiset, std::list, std::deque, std::queue and std::stack (the other 'big' single-storage containers)? They would be very easy to do, except the first two use 'insert', the second two use 'push_back' and the third two use 'push' to add an entry to the container, respectively :) Next - what is the use of the fourth argument to the validate() function? Finally, as a feature request, I'd LOVE to have a way to be able to have something like: mykey.1.host = some_host mykey.1.port = some_port mykey.1.password = some_pass mykey.1.priority = some_prio mykey.2.host = some_host mykey.2.port = some_port mykey.2.password = some_password mykey.2.priority = some_prio Without having to define every possible 'middle' portion (1, 2, etc). Even if this only allowed sequential numbers as the 'middle' portion. Right now, one of my data items is defined as: mykey = some_host1 some_port1 some_pass1 some_prio1 mykey = some_host2 some_port2 some_pass2 some_prio2 Which is kind of ugly, when it would make much more sense to do: [mykey.1] host = some_host1 port = some_port1 password = some_password1 priority = some_priority1 and so on. Anyway, let me know :) -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

On Fri, 28 Jan 2005 22:10:21 -0500, Preston A. Elder wrote:
It also makes it easier for people writing custom parsers, since they can call the standard validation functions too. Notice, btw, how I made the instantiation of the validator static - this should save some cycles (not too many, but some) - by only ever creating one instance of the validator for each type in each function calling it - as opposed to creating an instance of the validator each time it is needed :)
Actually, a better idea would be to replace that 4th value (the 'long') and have instead the parser passed to the validate function, defaulting, of course, to validate_internal<T> (which uses lexical_cast). This way, if they want to define a new TYPE validator, specialize validate_internal<T>. If they want to define a different way to validate a specific type (ie. a parser), then create a new 'validator' - which basically is any function that has: template<typename charT> T operator()(const std::basic_string<charT> &s); defined (where T is the type to be returned - left as an exercise for the use to ensure that it is the same as T the validator is called with). so we end up with the validate function looking like: template<class T, class charT> validate(boost::any &v, const std::vector<std::basic_string<charT> >& xs), T *, const boost::function1<T, const std::basic_string<charT> &> &parser = validate_internal<T>()); You might be guessing where I'm going now. This makes the 'validate' functions basically only a way to differentiate between how data is stored (ie. as a value directly, as a vector of values, etc). The actual details of how to retrieve the data will then be completely left up to the parser, which could either be a specialization of validate_internal (or the default version using lexical_cast), a stand-alone function that returns the correct type, or any class that has an operator() accepting a (w)string and returns the correct type. As an added bonus then, the parser doesn't need to know whether its being stored in a vector, (as it currently does need to in my hacked up version). More food for thought. Meanwhile, I'm going to alter my hacked up version to do the above, and I'll post a link to the code when done :) -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

On Fri, 28 Jan 2005 23:26:22 -0500, Preston A. Elder wrote:
More food for thought. Meanwhile, I'm going to alter my hacked up version to do the above, and I'll post a link to the code when done :) As promised, here is a link to the code:
http://www.neuromancy.net/viewcvs/Mantra-I/include/mantra/core/typed_value.h?root=mantra&rev=1.1&view=auto In this code, the validate functions are only there to handle how the value being converted is stored, NOT how to convert the value. The validate_internal class does the actual conversion by default. However a substitute for this may be passed to the typed_value class - and it will not matter whether its being stored in a vector or whatever. As you can see, many of the user-provided validators often simply call validate_internal to get the value, then do their own processing on it (for example, see validate_range, which uses the default validator to get the item before doing the range comparison). It doesn't have to, as with validate_space (which uses some of my own code to convert a textual space ("3k", or "5m") to a boost::uint64_t - it is left as an exercise for the user to ensure that the typed_value class is typed for boost::uint64_t). You will also note I specialized the bool version of validate_internal, just as was done with the bool version of validate in the original - again, this calls my own function which looks for a textual version of a boolean, and returns a tribool (the code is in 'utils.h' in the same tree if you care :P). This is the kind of system I REALLY hope to see inside program_options, as it takes care of multiple data stores and custom validators all in one fell swoop. Including 'daisy chaining' validators (such as what validate_range does). Please note, some stuff in here (such as the 'duration' stuff, and get_bool, and validate_host, etc) are kind of specific to my application, but should not be too hard to either replicate or remove. I am using this in my own application right now (which is why it is in my own namespace), and I can confirm it works :) Incidentally, why is 'arg' a global variable? This is horribly thread-unsafe - it would be better to have the argument name passed to value_semantic in its parse function or something - because global variables really suck *nod* - especially for multi-threaded applications (not that I plan to have two parses running at once, but it IS possible). -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

Preston A. Elder wrote:
On Fri, 28 Jan 2005 23:26:22 -0500, Preston A. Elder wrote:
More food for thought. Meanwhile, I'm going to alter my hacked up version to do the above, and I'll post a link to the code when done :) As promised, here is a link to the code:
I'll have a closer look at this and then decide what's the best way to mix validation of single value with validation of the whole option. This starts to resemble output formatters library which was reviewed some time ago.
Incidentally, why is 'arg' a global variable? This is horribly thread-unsafe - it would be better to have the argument name passed to value_semantic in its parse function or something - because global variables really suck *nod* - especially for multi-threaded applications (not that I plan to have two parses running at once, but it IS possible).
This a variable which is never modified. What thread-safely problems do you mean? - Volodya

On Mon, 07 Feb 2005 17:39:10 +0300, Vladimir Prus wrote:
I'll have a closer look at this and then decide what's the best way to mix validation of single value with validation of the whole option. This starts to resemble output formatters library which was reviewed some time ago. Well, as I mentioned, I'm currently using the version I pasted to you (actually, I've made a few modifications since, nothing major though, if you look at the current head version, you'll see the modifications).
It works quite well, and most importantly does almost everything I want. The only thing it doesn't do is allow me to 'daisy-chain' validators. In other words, do: value<int>()->parse(validate_multiple_of(5, validate_min(5))) The validate_multiple_of would use validate_min to retrieve the value, which would in turn use the default validator to do the conversion. This, obviously, would be a way to ensure both that the value is both a multiple of five, and not below five. Useful for, say: validate_multiple_of(512, validate_min(2048)) - ie. only take values evenly divisible by 512 (block size), with a minimum of 2k. Of course, I could write a custom validator to do just this, but it'd be nice to have a mechanism to do this built in, but I've not figured out a sane way to do it without making the template syntax really ugly.
Incidentally, why is 'arg' a global variable? This is horribly thread-unsafe - it would be better to have the argument name passed to value_semantic in its parse function or something - because global variables really suck *nod* - especially for multi-threaded applications (not that I plan to have two parses running at once, but it IS possible).
This a variable which is never modified. What thread-safely problems do you mean? 'arg' must be modified, else it would not work with, say, the name() function. I assume this is set to the 'currently processing value', for both help, and to fill in the correct value when an exception is thrown (if I throw invalid_option_value, for example). If this is the case, if two threads run their own po::parse_* functions at the same time, then it might get confused, since from what I could tell, 'arg' is not localized to any specific parse, nor is it thread specific storage.
I've not tested this, of course :) -- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)

Preston A. Elder wrote:
just the validator function is more convenient, so I intend to implement this functionality. Why does your vector validator not share a common ancestor with your non-vector validator?
It does a lexical_cast<T> inside it - when it would much more sense to use a structure somewhat similar to the following:
// Implemented as a class so we can partially specialize it later on
Sorry, previous version of the library used partial specialisation and did not work on borland as the result.
template<class T, class charT> validate(boost::any &v, const std::vector<std::basic_string<charT> >& xs) std::vector<T> *, long) { if (v.empty()) v = boost::any(std::vector<T>()); std::vector<T> *tv = boost::any_cast<std::vector<T> >(&v); assert(tv); static validate_internal<T> validator; for (unsigned int i = 0; i < xs.size(); ++i) tv->push_back(validator(xs[i])); }
I think it would make sense to call 'validate' for the single element from 'validate' for vector. That would solve you problem.
As an aside, I know it would be easy, but is there any reason you did not include the validators for std::set, std::multiset, std::list, std::deque, std::queue and std::stack (the other 'big' single-storage containers)? They would be very easy to do, except the first two use 'insert', the second two use 'push_back' and the third two use 'push' to add an entry to the container, respectively :)
There were no demand for that ;-)
Next - what is the use of the fourth argument to the validate() function?
This is a workaround for compilers without partial template specialization. The generic version is defined as template<class T, class charT> void validate(boost::any& v, const std::vector< std::basic_string<charT> >& xs, T*, long) and specific version is template<class T, class charT> void validate(boost::any& v, const std::vector< std::basic_string<charT> >& xs, std::vector<T>*, int) When 'validate' is called with vector<T>* and '0' as third and fourth parameter, such compilers see that third parameter can be passed to both function. However '0' -> int conversion is no-op and '0' -> long conversion is integer promotion. So, the second version of the function is selected.
Finally, as a feature request, I'd LOVE to have a way to be able to have something like: mykey.1.host = some_host mykey.1.port = some_port mykey.1.password = some_pass mykey.1.priority = some_prio mykey.2.host = some_host mykey.2.port = some_port mykey.2.password = some_password mykey.2.priority = some_prio
Without having to define every possible 'middle' portion (1, 2, etc). Even if this only allowed sequential numbers as the 'middle' portion. Right now, one of my data items is defined as:
mykey = some_host1 some_port1 some_pass1 some_prio1 mykey = some_host2 some_port2 some_pass2 some_prio2
Which is kind of ugly, when it would make much more sense to do: [mykey.1] host = some_host1 port = some_port1 password = some_password1 priority = some_priority1
and so on.
Anyway, let me know :)
Request noted. I'll try to do something. - Volodya

On Mon, 31 Jan 2005 11:05:19 +0300, Vladimir Prus wrote:
I think it would make sense to call 'validate' for the single element from 'validate' for vector. That would solve you problem. Only problem here, is I can't - since the single-element validator has two checks in it. First to ensure the option has not been seen before, and second to ensure that there is one and only one element in the std::vector<std::basic_string<charT> > passed to it.
I don't know if you saw it, but i posted my idea, in fully working form (I know it works, because I'm actively using it for my own code) here: http://www.neuromancy.net/viewcvs/Mantra-I/include/mantra/core/typed_value.h?root=mantra&rev=1.1&view=auto As I suggested, the single-element and multi-element variants both call a common 'value parser' - which can be passed in or defaulted. This separates the logic of how to store the value, and how to parse it. And more complex parsers that I pass in (say, validate_range) can call the default parser to get the value, and then do the checks necessary - and it won't matter a whit whether one or many of them are being stored.
There were no demand for that ;-) Well, I had need for set<>, and there is no real reason that the major single-element std containers should not be implemented by default, as a matter of course.
This is a workaround for compilers without partial template specialization. The generic version is defined as
template<class T, class charT> void validate(boost::any& v, const std::vector< std::basic_string<charT> >& xs, T*, long)
and specific version is
template<class T, class charT> void validate(boost::any& v, const std::vector< std::basic_string<charT> >& xs, std::vector<T>*, int)
Fair enough, my code (that I pasted above) doesn't have this workaround, however it would be brain-dead to put in.
Request noted. I'll try to do something. Thanks :)
-- PreZ :) Founder. The Neuromancy Society (http://www.neuromancy.net)
participants (2)
-
Preston A. Elder
-
Vladimir Prus