[Boost.Parameter] advice needed for Boost.svg_plot

All, As you can see below, I've gone a long way on letting users customize plots. Link to code: http://svn.boost.org/trac/boost/wiki/soc/2007/VisualizationOfContainers#Comp... Link to image: http://www.tcnj.edu/~voytko2/svg_complex.htm Mathias Gaunard suggested I look at "Boost.Parameter", and I think that it could provide a lot of benefits to my library. The question I have is how much should I let it influence my interface? There are functions that could clearly benefit from named parameters. plot_range() is the most obvious for me.. you could specify stroke and fill colors, shapes, etc along with the iterator information. There are also smaller gains that could be made, such as having a function for setting the styles of the legend, one for the plot ticks, one for the background, etc. What is the best practice for this? Should the use of named parameters be ubiquitous, or should I only use them where there are huge benefits? I want to make sure I stick with good practices before I start rewriting vast swaths of my headers. Thank you for your help, Jake

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Jake Voytko Sent: 27 June 2007 03:08 To: boost@lists.boost.org Subject: [boost] [Boost.Parameter] advice needed for Boost.svg_plot
As you can see below, I've gone a long way on letting users customize plots.
Mathias Gaunard suggested I look at "Boost.Parameter", and I think that it could provide a lot of benefits to my library. The question I have is how much should I let it influence my interface?
There are functions that could clearly benefit from named parameters.
My opinion, FWIW, is that the layout you have is nice and intuitive. I specially like the way the chaining allows you to group various features together in a way that makes it easy to read. Although named parameters might be nice - and would have been really nice if they had been included in the original ++ when C was incremented, since the Standards people have not been persuaded to add names parameters to the language (because there are other more important faults, and there is trouble with mixing with positional parameters and defaults), named parameters remain unfamiliar to most users. So I would not get diverted to this unless the benefits are *really* compelling, and I can't see that they are. Your time is limited and despite your excellent progress, there is much to do :-) (There is also the potential downside of longer compile times and perhaps complications to get working with all compilers). Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

My opinion, FWIW, is that the layout you have is nice and intuitive.
I specially like the way the chaining allows you to group various features together in a way that makes it easy to read.
Although named parameters might be nice - and would have been really nice if they had been included in the original ++ when C was incremented, since the Standards people have not been persuaded to add names parameters to the language (because there are other more important faults, and there is trouble with mixing with positional parameters and defaults), named parameters remain unfamiliar to most users.
This was one of my concerns for whether or not I should use these. So I would not get diverted to this unless the benefits are *really*
compelling, and I can't see that they are. Your time is limited and despite your excellent progress, there is much to do :-)
Here's my case for why the benefits are compelling for plot_range(), if not other functions. Each different data point could potentially be styled many different ways. They could have their own shape, their own stroke color and fill color, their own size, 2d graphs could have line styling, line thickness, line color, whether or not there is a regression line, and whether or not the curve of the data is interpolated, to name just what I can think of off the top of my head. Either I make one monolithic function that takes in all of these arguments: plot_range(my_plot, data.begin(), data.end(), circle, blue, black, 12, dotted, 2, black, false, false); Or I create 5 or 6 plot_range() functions to take care of the combinations I feel people will need. It looks nice so far because I've been purposely picking and choosing what I implement in order to make it nice ;). The rest of the functions I may not need it as much (or at all.. it seems to work fine without named parameters), but I definitely need to come up with something better for plot_range, because otherwise each new feature I add to plot_range() will either make the number of plot_range() overloads jump up, or make it that much harder to understand what's going on. (There is also the potential downside of longer compile times and perhaps
complications to get working with all compilers).
Noted. Jake

On 27/06/07, Jake Voytko <jakevoytko@gmail.com> wrote:
Here's my case for why the benefits are compelling for plot_range(), if not other functions. Each different data point could potentially be styled many different ways. They could have their own shape, their own stroke color and fill color, their own size, 2d graphs could have line styling, line thickness, line color, whether or not there is a regression line, and whether or not the curve of the data is interpolated, to name just what I can think of off the top of my head.
Either I make one monolithic function that takes in all of these arguments:
plot_range(my_plot, data.begin(), data.end(), circle, blue, black, 12, dotted, 2, black, false, false);
Or I create 5 or 6 plot_range() functions to take care of the combinations I feel people will need. It looks nice so far because I've been purposely picking and choosing what I implement in order to make it nice ;). The rest of the functions I may not need it as much (or at all.. it seems to work fine without named parameters), but I definitely need to come up with something better for plot_range, because otherwise each new feature I add to plot_range() will either make the number of plot_range() overloads jump up, or make it that much harder to understand what's going on.
It seems like with that much stuff for each point, it might be useful to have a struct for representing a point style. It seems plausible to want to pass that kind of thing around, and nobody will want to forward all those options manually, named parameters or not. That way you get the non-Boost.Parameter way of faking named parameters free: plot_range( my_plot, data.begin(), data.end(), point_style().set_shape(circle) .set_fill(blue) .set_outline(black) ); And you get the defaults for everything unspecified. You also might be able to get rid of most of the overloads and instead have functions that return point_style objects, or have some implicit constructors for the simple cases (like from a colour, to do the same thing as the overload you use in the examples). ~ Scott McMurray

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Scott McMurray Sent: 27 June 2007 14:21 To: boost@lists.boost.org Subject: Re: [boost] [Boost.Parameter] advice needed for Boost.svg_plot
Here's my case for why the benefits are compelling for
On 27/06/07, Jake Voytko <jakevoytko@gmail.com> wrote: plot_range(),
<snip>
It seems like with that much stuff for each point, it might be useful to have a struct for representing a point style. It seems plausible to want to pass that kind of thing around,
and nobody will want to forward all those options manually, named parameters or not.
This seems an important point - named parameters don't fully solve the problem.
That way you get the non-Boost.Parameter way of faking named parameters free: plot_range( my_plot, data.begin(), data.end(), point_style().set_shape(circle) .set_fill(blue) .set_outline(black) ); And you get the defaults for everything unspecified. You also might be able to get rid of most of the overloads and instead have functions that return point_style objects, or have some implicit constructors for the simple cases (like from a colour, to do the same thing as the overload you use in the examples).
Sounds well worth exploring. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

on Tue Jun 26 2007, "Jake Voytko" <jakevoytko-AT-gmail.com> wrote:
What is the best practice for this? Should the use of named parameters be ubiquitous, or should I only use them where there are huge benefits? I want to make sure I stick with good practices before I start rewriting vast swaths of my headers.
I don't have an opinion, but I thought you should be aware of some of the actual benefits over the other approach. Some are described in this posting, and others are described in that thread. http://news.gmane.org/find-root.php?message_id=%3cuekiyo6n5.fsf%40boost%2dco... Maybe that will help you to make up your mind. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Thank you for that resource.. I took a quick look at it and looks like it'll help dramatically. I haven't made a decision yet, and my priorities right now are such that I can defer the decision, but I think this thread will help me put my decision in proper context Jake On 7/2/07, David Abrahams <dave@boost-consulting.com> wrote:
on Tue Jun 26 2007, "Jake Voytko" <jakevoytko-AT-gmail.com> wrote:
What is the best practice for this? Should the use of named parameters be ubiquitous, or should I only use them where there are huge benefits? I want to make sure I stick with good practices before I start rewriting vast swaths of my headers.
I don't have an opinion, but I thought you should be aware of some of the actual benefits over the other approach. Some are described in this posting, and others are described in that thread.
http://news.gmane.org/find-root.php?message_id=%3cuekiyo6n5.fsf%40boost%2dco...
Maybe that will help you to make up your mind.
-- Dave Abrahams Boost Consulting http://www.boost-consulting.com

I decided that I wasn't getting anywhere without a case study to help make my decision, so I decided to model my 2D plot_range() function, with a few features I'd like to add but haven't yet. I figure there are four different ways to implement this: monolithic function without Boost.Parameter, monolithic function with Boost.Parameter, refactored function without Boost.Parameter, or refactored function with Boost.Parameter // human_age is a functor that converts a human object's age into a double // multimap<double, human> data; ------------------------ METHOD 1: monolithic function: plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, circle, orange, red, 3, 10); This is obviously unacceptable, readability wise. Even though I designed the interface an hour ago, it takes me a few seconds to realize what everything is. However, the cost of this function is no more than copying the <double, double> contents of data. "null" represents the fact that we don't need a functor to cast a double to a double. There is no easy way to indicate to the interface that we'd like to use the default value for the first functor parameter. ----------------------------- METHOD 2: monolithic function with Boost.Parameter: plot_range(my_plot, data.begin(), data.end(), functor2 = human_age, point_style = circle, fill = orange, stroke = red, stroke_width = 3, size = 10); This is a little better, as far as readability is concerned. The untrained observer can easily guess what this function is trying to do, and may need to look in the documentation to see what functor2 is, if they are unfamiliar with such things. Best of all, we are able to use a default value for functor1, and we still have roughly the same speed as METHOD 1. ---------------------- METHOD 3: Refactored function point_style my_point(circle, orange, red); my_point.set_stroke_width(3); my_point.set_size(10); plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, my_point); This is clearer, and gives the untrained eye a better idea of what is going on. However, it does have the downside of needing extra function overhead, an extra copy for two colors, two integers, and a style. We also still need to put in a null for the first functor value. Given the infrequency with which this function will be called, as well as the amount of data that is being copied from data, I think we can consider this almost negligible. ---------------------------- METHOD 4: Refactored, using Boost.Parameter: point_style my_point(circle, stroke=orange, fill=red, stroke_width=3, size=10); plot_range(my_plot, data.begin(), data.end(), functor2 = human_age, style = my_point); As far as readability is concerned, this is as good as it is going to get. It still has the added cost of the extra copies that need to be made, but we don't have to worry about defaults for the conversion functor. ---------------------------------- I'm starting to lean towards 2 or 4 as a reasonable way to implement this (though if I run into enough resistance, 3 would be my choice). I really don't like the idea of having the default functor as a placeholder for the first argument.. it feels kludgy. I may just be mistaken and not know the obvious way around that problem, and if so, please let me know :). Boost.Parameter is a part of Boost, so clearly enough people overcame the difference in syntax to give it a "yes" vote. If anybody feels strongly about which way would be best, please speak up soon, *before* I implement my decision :) Jake

Jake Voytko wrote:
plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, circle, orange, red, 3, 10);
It would be great to be able to write just 'data' in place of 'data.begin(), data.end()'. Is there anything stopping this? I hope that in typical usage it will automatically choose a plot style (and I would vote for allocating colours in 'resistor colour code order', though starting with brown is not often a popular choice). Presumably the two functors are to extract the X and Y values from the object. Having 'x' and 'y' in the names would make this clear, if this is the case. I would be interested to see how the various lambda techniques could be used to write these functors inline. Presumably in the case where the data is a sequence of numeric values the functor is unnecessary. In particular, I would hope that a sequence<numeric> would use the index as X (starting at 0 or 1?), and the value as Y, and container<std::pair<numeric,numeric>> would use pair.first as X and pair.second as Y. Right? (There are plenty of C++ users (though not on this list) who would be put off by the concept of functors, and who would benefit from having it 'just work'). So hopefully in many cases it reduces to: plot_range(my_plot, data); which I certainly find acceptable without any extra magic. When you do need to use more non-default parameters, I think that I would use a style object like this: plot_range(my_plot, data, plot_style(red, circle)); Since the types for the colours and point shapes are distinct you can save the user from having to remember a correct order by providing plot_style constructors with both parameter orderings. This can extend to the stroke width. But there is a problem with the fill colour; what is that? Is it for the area under the curve? Why would you want it to be different from the stroke colour? - and also for the 'size'; what is that?, because they are not distinct. Boost.Parameter does a good job when large numbers of parameters are unavoidable. Do check on the comprehensibility of error messages and the compile time increase, as these are the most common problems (IMO) with 'advanced' things like Boost.Parameter. Regards, Phil.

On 7/4/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Jake Voytko wrote:
plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, circle, orange, red, 3, 10);
It would be great to be able to write just 'data' in place of 'data.begin(), data.end()'. Is there anything stopping this?
The STL algorithm functions are used as the basis here, and it carries all of the same benefits. First, you can select a small subset of your data if you'd like that to be plotted (for example, plotting a single year out of 100 years of data). Second, the interface itself almost completely weeds out noncompliant functions, as functions without iterator support won't compile, and the error message will (usually) show on the line in the user's code where the error is. A minor benefit is also location of compiler errors. Looking at: plot(my_plot, data, ...); Instead of giving a compiler error at the call for plot(), it would report the error inside of plot(). That's a minor issue, as it's still relatively clear why the error is occuring, but the line of the error is still not at the location of the error. And granted, the iterator interface doesn't ward this off altogether (as it's possible for functions to have a begin() and end() that returns the same data type, but not support a ++() operator. I hope that in typical usage it will automatically choose a plot style
(and I would vote for allocating colours in 'resistor colour code order', though starting with brown is not often a popular choice).
Absolutely.. this is a worst case example, a perfectionist user who wants to customize everything. Defaults is one of the things that I am working on this week, and it does drastically cut down what the user has to specify. Defaults would also be a benefit of named parameters.. users don't have to specify values they don't want to specify. Presumably the two functors are to extract the X and Y values from the
object. Having 'x' and 'y' in the names would make this clear, if this is the case.
Noted.. I was having trouble coming up with a clear name I would be interested to see how the various lambda
techniques could be used to write these functors inline.
That's a good idea. Presumably in
the case where the data is a sequence of numeric values the functor is unnecessary.
That is correct. The functors (which aren't even supported yet) are/will not be necessary unless the user wants to do something like plot a pair<double, human>. In particular, I would hope that a sequence<numeric>
would use the index as X (starting at 0 or 1?), and the value as Y, and container<std::pair<numeric,numeric>> would use pair.first as X and pair.second as Y. Right?
The pair< , > part was correct, and I will soon support sequence containers in the same fashion (I only started 2D support in the past week, so its feature set is still underdeveloped) (There are plenty of C++ users (though not
on this list) who would be put off by the concept of functors, and who would benefit from having it 'just work').
These people have the ability to just use the numeric clauses. Functor conversion is a feature that has been suggested by a few people, so I know that there is at least minimal demand. I included them in this example in order to highlight another potential problem with the plot_range() function.. needing to convert the second value of a pair< , >, but not the first So hopefully in many cases it reduces to:
plot_range(my_plot, data);
The current signature is plot_range(my_plot, data.begin(), data.end(), series_title, fill_color); but this is using the monolithic approach. I'm examining my options for the best way to expand the utility of this function, which is why I started this thread :). I think that at minimum, we need my_plot, data, and series_title, because if the user wants a legend, each series has to be called something, and requiring the user to go back and change the labels after they add a function would be awkward. When you do need to use more non-default parameters, I think that I
would use a style object like this:
plot_range(my_plot, data, plot_style(red, circle));
Since the types for the colours and point shapes are distinct you can save the user from having to remember a correct order by providing plot_style constructors with both parameter orderings. This can extend to the stroke width. But there is a problem with the fill colour; what is that? Is it for the area under the curve? Why would you want it to be different from the stroke colour? - and also for the 'size'; what is that?, because they are not distinct.
To use a square as an example, the outline of the square is the stroke, and the inside of the square is the fill. These elements apply to most SVG visual objects, so I stuck with the SVG naming convention. The difference of why you'd want it to be different than the stroke color is so that it'll show up easier.. you can have a background color of white, a stroke color of black, and a fill color of white, and get empty circles that represent your data point. Boost.Parameter does a good job when large numbers of parameters are
unavoidable. Do check on the comprehensibility of error messages and the compile time increase, as these are the most common problems (IMO) with 'advanced' things like Boost.Parameter.
Good thinking! Jake

On 7/4/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Jake Voytko wrote:
plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, circle, orange, red, 3, 10);
It would be great to be able to write just 'data' in place of 'data.begin(), data.end()'. Is there anything stopping this?
The STL algorithm functions are used as the basis here, and it carries all of the same benefits. First, you can select a small subset of your data if you'd like that to be plotted (for example, plotting a single year out of 100 years of data). I'd recommend looking at boost::range here. The common case would be to
Jake Voytko wrote: plot an entire range, but users can specify sub-ranges as std::pair vector or whatever, and your code need not worry the slightest bit. Just call boost::begin & boost::end. :) boost.:range actually works very nice, and when you add the possibility of filtered/permutated/transformed ranges, the mind boggles quite a bit about how sexy it could be. :)
Second, the interface itself almost completely weeds out noncompliant functions, as functions without iterator support won't compile, and the error message will (usually) show on the line in the user's code where the error is.
I suppose I don't know enough about svg_plot's design. What noncompiliant functions do you refer to?
A minor benefit is also location of compiler errors. Looking at:
plot(my_plot, data, ...);
Instead of giving a compiler error at the call for plot(), it would report the error inside of plot(). That's a minor issue, as it's still relatively clear why the error is occuring, but the line of the error is still not at the location of the error. And granted, the iterator interface doesn't ward this off altogether (as it's possible for functions to have a begin() and end() that returns the same data type, but not support a ++() operator.
But when we get concepts, it will be ok again, so until then you just have to get used to it. :) /Marcus

On 7/4/07, Jake Voytko <jakevoytko@gmail.com> wrote:
On 7/4/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Jake Voytko wrote:
plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, circle, orange, red, 3, 10);
It would be great to be able to write just 'data' in place of 'data.begin(), data.end()'. Is there anything stopping this?
The STL algorithm functions are used as the basis here, and it carries all of the same benefits. First, you can select a small subset of your data if you'd like that to be plotted (for example, plotting a single year out of 100 years of data).
You can do the same with a range based interface, you just need to pass to boost::make_iterator_range() the iterators defining your subrange. Most of the time you might want to plot a whole container, so the range based algorithm is easier to use. Not considering that you can chain the result of lazy algorithms: plot_range(my_plot, filtered(data, point_selector)); It is hard to do the same with an iterator interface. In general range based interfaces are superior to iterator based ones. Range based algorithms might even appear in the next C++ standard.
Second, the interface itself almost completely weeds out noncompliant functions, as functions without iterator support won't compile, and the error message will (usually) show on the line in the user's code where the error is.
SFINAE using an hypothetical is_range trait [1] can help here. HTH, gpd [1] some one is was working on a container traits library a while ago, and I believe a subset of it is already present in the detail of some boost library.

On 7/4/07, Giovanni Piero Deretta <gpderetta@gmail.com> wrote:
On 7/4/07, Jake Voytko <jakevoytko@gmail.com> wrote:
On 7/4/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Jake Voytko wrote:
plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, circle, orange, red, 3, 10);
It would be great to be able to write just 'data' in place of 'data.begin(), data.end()'. Is there anything stopping this?
The STL algorithm functions are used as the basis here, and it carries all of the same benefits. First, you can select a small subset of your data if you'd like that to be plotted (for example, plotting a single year out of 100 years of data).
You can do the same with a range based interface, you just need to pass to boost::make_iterator_range() the iterators defining your subrange. Most of the time you might want to plot a whole container, so the range based algorithm is easier to use. Not considering that you can chain the result of lazy algorithms:
plot_range(my_plot, filtered(data, point_selector));
+1 for range based plot "I want it to do this" example: ---------------------------------------------------------------- boost::bimap<float,float> bm; assign::insert(bm) (1,123) (2,345) (3,184) (4, 256) (5, 241); plot_range(my_plot, bm.left.range( 2 <= _key, _key < 5 ) ); ---------------------------------------------------------------- Best regards Matias

On 7/4/07, Matias Capeletto <matias.capeletto@gmail.com> wrote:
On 7/4/07, Jake Voytko <jakevoytko@gmail.com> wrote:
On 7/4/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Jake Voytko wrote:
plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, circle, orange, red, 3, 10);
It would be great to be able to write just 'data' in place of 'data.begin(), data.end()'. Is there anything stopping this?
The STL algorithm functions are used as the basis here, and it carries all of the same benefits. First, you can select a small subset of your data if you'd like that to be plotted (for example, plotting a single year out of 100 years of data).
You can do the same with a range based interface, you just need to pass to boost::make_iterator_range() the iterators defining your subrange. Most of the time you might want to plot a whole container, so the range
On 7/4/07, Giovanni Piero Deretta <gpderetta@gmail.com> wrote: based
algorithm is easier to use. Not considering that you can chain the result of lazy algorithms:
plot_range(my_plot, filtered(data, point_selector));
+1 for range based plot
"I want it to do this" example: ---------------------------------------------------------------- boost::bimap<float,float> bm; assign::insert(bm) (1,123) (2,345) (3,184) (4, 256) (5, 241); plot_range(my_plot, bm.left.range( 2 <= _key, _key < 5 ) ); ----------------------------------------------------------------
I was not aware of the existence of Boost.Range, and it looks like it's a very nice library. I'll have to explore its use a little further So now I have the interface for plot_range() narrowed to plot(my_plot, my_container, title, ...); and plot_range(my_plot, my_range, title, ...); but now I'm back in the same boat I was in the beginning of the thread.. how to display the rest of the data. There are 3 more areas of concern for the interface of the plot functions: * conversion functors for arguments 1 and 2 of a pair< , > * color / style information for each individual point (circle/square, stroke color, fill color, size) * color / style information for all lines between each point (show at all?, interpolated curve?, stroke color, fill color, stroke width, dotted?) Even if I refactor each of those into their related categories, there are 8 possible combinations of needing/not needing them. The simple case now looks simple, but I'm not any further on the complex cases. To me, this looks like a job for Boost.Parameter :) Jake

Jake Voytko wrote:
I was not aware of the existence of Boost.Range, and it looks like it's a very nice library. I'll have to explore its use a little further
So now I have the interface for plot_range() narrowed to
plot(my_plot, my_container, title, ...); and plot_range(my_plot, my_range, title, ...);
There's no need to differentiate between the two. A container works as a range too. Cheers, /Marcus

Jake Voytko wrote:"
On 7/4/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Jake Voytko wrote:
plot_range(my_plot, data.begin(), data.end(), default_functor, human_age, circle, orange, red, 3, 10);
It would be great to be able to write just 'data' in place of 'data.begin(), data.end()'. Is there anything stopping this?
The STL algorithm functions are used as the basis here, and it carries all of the same benefits.
..and the same disadvantage, namely verbosity in the common case. I'm only suggesting that passing the whole container should be available as an alternative, not instead of the begin-end pair. I seem to recall that making the STL algorithms accept whole containers has been proposed, but the only thing I can find now is the item "Container-based algorithms" at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1901.html. Maybe someone else knows what I'm thinking of? Phil.
participants (8)
-
David Abrahams
-
Giovanni Piero Deretta
-
Jake Voytko
-
Marcus Lindblom
-
Matias Capeletto
-
Paul A Bristow
-
Phil Endecott
-
Scott McMurray