
"Calum Grant" <calum@visula.org> wrote
Arkadiy Vertleyb wrote:
OTOH, writing a primitive optimizer, would, in our opinion do more harm than good.
By "harm" do you mean have worse performance?
Yes, and also (possibly) inability to handle more complicated cases.
Well I certainly agree that a library that couldn't handle complicated cases, or whose performance was poor, would need a rethink.
But I don't think RML is in that category. It has excellent performance and can handle very complex queries. I should exercise caution however - I await to see how well your library copes with 1000000 items, does a self-join, adds another 1000000 items and does a second self join. You sound fairly confident that you will beat me!
I am not confident about this. We didn't spent a lot of time on optimization. But if I wanted to compete with you on this particular example, I would not use the technique I showed -- I wouldn't use transactions, indexing, etc. -- I would just use a merge-join, and stopped after ten results, like you did, and it would take no time. I might beat you or not, but RTL is flexible enough to allow me to build an expression that would evaluate this (or any other) expression efficiently, with the minimal overhead. And since I do it by hand, I don't have to worry about my optimizer not being able to cope with this.
If you like you can come up with an extremely complex query - and we can compare again.
For starters, try to change the expression to something like abs(c1 - c2) == 1, and output (or just walk) _all_ the results. Then repeat this 1000 times. You can go back from 1000000 to 3000 :-)
If you're going to make claims about the performance of an approach, then you need to have benchmarks. You can't argue about air. For example the RML benchmarks
http://visula.org/relational/benchmarks.html
show RML to be matching or outperforming std::map.
Would you care to benchmark your RTL versus RML or the STL before making such claims? I think that would settle which approach did more "harm".
Note that I didn't say RML's approach is harmful. All I said is that a simplistic approach to implementing optimizer is IMO inacceptable. Are you saying your optimizer is simplistic? :-)
Of course not!
I believe that talks about performance are premature. I think it's time to talk about what can and what can't be done with this libraries. Then comes the optimization.
I don't think talking about performance is premature. If we are going to provide a practical library, then performance matters. If a user has a choice between using RTL in a real application, but it is 10 times slower than Boost.MultiIndex, then I expect most users would stick with Boost.MultiIndex. People need to know.
Of course. However I don't believe RTL even targets this niche. Relational approach is much more than just indexing on multiple fields.
On the other hand, I believe that RML is a much easier and more efficient (than std:: containers) way to manage collections. But I am biassed :-)
As for benchmarks, for such simple case as yours, you don't need RTL -- you can benchmark sorted std::vector or Boost.Multi_index. The performance of RTL will not be different, since it's based on these containers. To outperform STL is definitely not one of RTL's tasks.
For more complicated cases though, such as ones that I described in my original post, RTL does provide an efficient, index-based solution, while other libraries, AFAIK, don't.
In that case, could you give an example where RTL would be efficient, while "other libraries" would be inefficient. I have a case for you that might not be so efficient in RTL: What about indexing on multiple columns? This is what RML is designed for, otherwise you may as well just use std::map.
I don't know if we are much less efficient with multiple columns. Again, the matter of fine tuning. And again, relational approach is much more than just indexing on multiple columns. For just indexing on multiple columns one can use Boost.MultiIndex. IMO, relational approach is about providing multiple views into the same data model. And making these views most efficient. The constrained language of relational algebra allows one to achieve this. This is what RTL is about.
Although it sounds a little competitive, I think it's pretty instructive to be able to compare different implementations.
Nothing wrong about being competitive :) Regards, Arkadiy