[boost-users][[proto] : Seeking advice on a DSL

Hello, I want to create a DSL where users are allowed to create "free" variables and then use them are parameters and local variables in their mini-program. For example, I want to have something like int_ a,b,c; Program(&a,b) [c = b*2, a = c] The parameter with the '&' prefix represents output, the ones without represent input. So, Program(&a, b)[...] should behave as a function object with prototype void foo(int&, const int), and I should be able to do the following : int ra; BOOST_AUTO(pr, Program(&a,b) [c = b*2, a = c]); pr(ra, 4); I am seeking some advice on implementing the "evaluator" for this language. In particular, where do I store the values corresponding to these free variables? To draw parallels with Boost.Phoenix, in phoenix, arg1, arg2.. etc are uniquely typed, and the evaluator just creates a fusion::vector<...> with the values passed in, and argN evaluation just returns fusion::at_c<N>(env.args()). In my case, I could create a unique ID for each free variable, and create a std::map from ID to value, and use that as the environment. Evaluator of the free variables can then look up that map to find values. But I consider this too heavy weight. In contrast, the fusion::vector<...> would incur very little overhead during run time. I will be grateful for any suggestions on other ways to implement the evaluator. Thanks in advance, Manjunath

On 3/18/2010 12:12 PM, Manjunath Kudlur wrote:
Hello,
I want to create a DSL where users are allowed to create "free" variables and then use them are parameters and local variables in their mini-program. For example, I want to have something like
int_ a,b,c;
Trouble. These all have the same type.
Program(&a,b) [c = b*2, a = c]
The parameter with the '&' prefix represents output, the ones without represent input. So, Program(&a, b)[...] should behave as a function object with prototype void foo(int&, const int), and I should be able to do the following :
int ra; BOOST_AUTO(pr, Program(&a,b) [c = b*2, a = c]); pr(ra, 4);
I am seeking some advice on implementing the "evaluator" for this language. In particular, where do I store the values corresponding to these free variables? To draw parallels with Boost.Phoenix, in phoenix, arg1, arg2.. etc are uniquely typed, and the evaluator just creates a fusion::vector<...> with the values passed in, and argN evaluation just returns fusion::at_c<N>(env.args()). In my case, I could create a unique ID for each free variable, and create a std::map from ID to value, and use that as the environment. Evaluator of the free variables can then look up that map to find values.
Tricky. I can imagine a scheme where, as you build the program
expression, you walk the expression and keep a map
But I consider this too heavy weight. In contrast, the fusion::vector<...> would incur very little overhead during run time. I will be grateful for any suggestions on other ways to implement the evaluator.
Hope the above gets you moving in the right direction, -- Eric Niebler BoostPro Computing http://www.boostpro.com

Tricky. I can imagine a scheme where, as you build the program expression, you walk the expression and keep a map
from parameters addresses (e.g. int_*) to monotonically increasing slot numbers; as you go, you can replace the parameters with their slot numbers. You end up with a program where each "a" is replaced with "slot(0)", each "b" with "slot(2)" and each "c" with "slot(3)" (where "slot" is some type that wraps a runtime int). Later, when evaluating the program, the parameters to the program are put in a std::vector, and the evaluator knows to evaluate "slot" terminals by indexing into the vector. Handling out parameters requires special handling, as would handling expressions where not all parameters have the same type. This is essentially the same as what you describe above, except the expensive mapping is done by the code the builds the program rather than the code that evaluates it.
Thanks, that does help. The other approach I was toying with was to hold the values "in situ", i.e., make int_ contain the datum. The biggest hurdle I am facing is the fact that proto copies around the objects held in proto::terminals, whenever they have to go through a generator. For example, I need to define operator(...) for the expression returned by Program(). For that, I put it in a domain, whose generator wraps the expression in a wrapper with operator(...), but during that process, the objects held at terminals get copied. So there is a new object created for each occurrence of a variable in the expression, which screws up the evaluator. Any comments on trying to hold the values "in situ"? Manjunath

On 3/18/2010 1:47 PM, Manjunath Kudlur wrote:
Tricky. I can imagine a scheme where, as you build the program expression, you walk the expression and keep a map
from parameters addresses (e.g. int_*) to monotonically increasing slot numbers; as you go, you can replace the parameters with their slot numbers. You end up with a program where each "a" is replaced with "slot(0)", each "b" with "slot(2)" and each "c" with "slot(3)" (where "slot" is some type that wraps a runtime int). Later, when evaluating the program, the parameters to the program are put in a std::vector, and the evaluator knows to evaluate "slot" terminals by indexing into the vector. Handling out parameters requires special handling, as would handling expressions where not all parameters have the same type. This is essentially the same as what you describe above, except the expensive mapping is done by the code the builds the program rather than the code that evaluates it.
Thanks, that does help. The other approach I was toying with was to hold the values "in situ", i.e., make int_ contain the datum.
I don't see how. When you create your "program", the data are not yet available. They are only supplied later when you are ready to execute your "program". I must be misunderstanding what you're suggesting.
The biggest hurdle I am facing is the fact that proto copies around the objects held in proto::terminals, whenever they have to go through a generator. For example, I need to define operator(...) for the expression returned by Program().
No, you need to define operator[] for the expression returned by Program(). You need operator() on whatever is returned by Program(...)[...]
For that, I put it in a domain, whose generator wraps the expression in a wrapper with operator(...), but during that process, the objects held at terminals get copied.
They shouln't, unless your generator is doing something extra.
So there is a new object created for each occurrence of a variable in the expression, which screws up the evaluator. Any comments on trying to hold the values "in situ"?
No comments, other than that it feels vaguely wrong to me. I was intrigued enough by this problem to have a go at implementing my suggestion above. It seems to work (see attached). This is a rough mock-up and needs much work. In particular, see the TODO for the overloaded operator() in program_expr. But on the whole, this seems workable. Note, you'll need to use the latest version of Proto from trunk. I had to fix two bugs to get this to work. :-( -- Eric Niebler BoostPro Computing http://www.boostpro.com

Thanks, that does help. The other approach I was toying with was to hold the values "in situ", i.e., make int_ contain the datum.
I don't see how. When you create your "program", the data are not yet available. They are only supplied later when you are ready to execute your "program". I must be misunderstanding what you're suggesting.
Sorry, I should have been clearer. My suggestion was to make program_variable look like this. template<typename T> struct program_variable { T value; }; And in the operator(..) function of program_expr, copy the parameters to the values like so: proto::value(proto::left(proto::left(proto::left(*this)))).value = a0; proto::value(proto::left(proto::right(proto::left(*this)))).value = a1; Then call program_eval()(*this), then copy back to a0 and a1.
They shouln't, unless your generator is doing something extra.
You are right, I spoke too soon. I am attaching the code I wrote with in situ storage, based on the code you sent. And it seems to work. I like the idea of in place storage because everything is nicely packaged up in the expression object. No need for extra maps and vectors. Also, using the other way, handling different types becomes a problem. What if I want Program(&a, b) where a is int_ and b is double_.? In place storage just has the right type already. Please do let me know if you spot some inelegance or any other problem with this approach.
I was intrigued enough by this problem to have a go at implementing my suggestion above. It seems to work (see attached). This is a rough mock-up and needs much work. In particular, see the TODO for the overloaded operator() in program_expr. But on the whole, this seems workable.
Note, you'll need to use the latest version of Proto from trunk. I had to fix two bugs to get this to work. :-(
Thanks for the code, it sure would serve as an excellent use case for Proto users. Manjunath

On 3/19/2010 7:09 AM, Manjunath Kudlur wrote:
Thanks, that does help. The other approach I was toying with was to hold the values "in situ", i.e., make int_ contain the datum.
I don't see how. When you create your "program", the data are not yet available. They are only supplied later when you are ready to execute your "program". I must be misunderstanding what you're suggesting.
Sorry, I should have been clearer. My suggestion was to make program_variable look like this.
template<typename T> struct program_variable { T value; };
And in the operator(..) function of program_expr, copy the parameters to the values like so: proto::value(proto::left(proto::left(proto::left(*this)))).value = a0; proto::value(proto::left(proto::right(proto::left(*this)))).value = a1; Then call program_eval()(*this), then copy back to a0 and a1.
OK, I see. There are 2 problems with this. One is fixable. The other isn't, but it may not be an issue depending on your use case. See below.
They shouln't, unless your generator is doing something extra.
You are right, I spoke too soon. I am attaching the code I wrote with in situ storage, based on the code you sent. And it seems to work. I like the idea of in place storage because everything is nicely packaged up in the expression object. No need for extra maps and vectors.
Right.
Also, using the other way, handling different types becomes a problem. What if I want Program(&a, b) where a is int_ and b is double_.? In place storage just has the right type already.
Right.
Please do let me know if you spot some inelegance or any other problem with this approach.
That is an interesting approach. The two problems I see are: 1) Your use of BOOST_AUTO leaves a bunch of dangling references and creates undefined behavior. As you have it defined, program_generator no longer has the effect of deep-copying the expression tree. Try displaying typeid(p).name() and see, for instance, that the intermediate expressions are stored by reference, and the literal 2 is stored by int const&. Any attempt to use p will likely expode sooner or later. The fix is to deep-copy the expression when you wrap it in program_expr, but a simple deep-copy will break your scheme because every program_variable will be deep-copied too. A simple fix would be to make program_variable<T> a wrapper for a boost::shared_ptr<T>, but ... 2) Your program objects are not thread safe. Since they contain mutable data, you can't stick the program in, say, a boost::function, make a few copies and evaluate them concurrently within different threads. That may not be important to you, but I consider it a pretty serious shortcoming. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On Thu, Mar 18, 2010 at 4:23 PM, Eric Niebler
That is an interesting approach. The two problems I see are:
1) Your use of BOOST_AUTO leaves a bunch of dangling references and creates undefined behavior. As you have it defined, program_generator no longer has the effect of deep-copying the expression tree. Try displaying typeid(p).name() and see, for instance, that the intermediate expressions are stored by reference, and the literal 2 is stored by int const&. Any attempt to use p will likely expode sooner or later. The fix is to deep-copy the expression when you wrap it in program_expr, but a simple deep-copy will break your scheme because every program_variable will be deep-copied too. A simple fix would be to make program_variable<T> a wrapper for a boost::shared_ptr<T>, but ...
2) Your program objects are not thread safe. Since they contain mutable data, you can't stick the program in, say, a boost::function, make a few copies and evaluate them concurrently within different threads. That may not be important to you, but I consider it a pretty serious shortcoming.
I did think about thread-safety. The in-place approach is definitely not thread-safe. But under restricted conditions (single-threaded, evaluated in the same scope as the free variables), and a reasonable optimizing compiler, the performance of this method should approach hand-coded C, no? Manjunath
participants (2)
-
Eric Niebler
-
Manjunath Kudlur