Re: [boost] RTL: Relational Template Library

7 Oct 2005

      "Calum Grant" <calum@visula.org> wrote
...
Arkadiy Vertleyb wrote:
...
...
...
OTOH, writing a primitive optimizer, would, in our
opinion do more
harm than good.
By "harm" do you mean have worse performance?
Yes, and also (possibly) inability to handle more complicated cases.
Well I certainly agree that a library that couldn't handle complicated
cases, or whose performance was poor, would need a rethink.
But I don't think RML is in that category.  It has excellent performance
and can handle very complex queries.  I should exercise caution however
- I await to see how well your library copes with 1000000 items, does a
self-join, adds another 1000000 items and does a second self join.  You
sound fairly confident that you will beat me!
I am not confident about this.  We didn't spent a lot of time on
optimization.

But if I wanted to compete with you on this particular example, I would not
use the technique I showed -- I wouldn't use transactions, indexing, etc. -- 
I would just use a merge-join, and stopped after ten results, like you did,
and it would take no time.  I might beat you or not, but RTL is flexible
enough to allow me to build an expression that would evaluate this (or any
other) expression efficiently, with the minimal overhead.  And since I do it
by hand, I don't have to worry about my optimizer not being able to cope
with this.
...
If you like you can come up with an extremely complex query - and we can
compare again.
For starters, try to change the expression to something like abs(c1 - c2) ==
1, and output (or just walk) _all_ the results.  Then repeat this 1000
times.
You can go back from 1000000 to 3000 :-)
...
...
...
If you're going to make claims about the performance of an
approach,
then you need to have benchmarks.  You can't argue about air.  For
example the RML benchmarks
http://visula.org/relational/benchmarks.html
show RML to be matching or outperforming std::map.
Would you care to benchmark your RTL versus RML or the STL before
making such claims?  I think that would settle which
approach did more
"harm".
Note that I didn't say RML's approach is harmful.  All I said
is that a simplistic approach to implementing optimizer is
IMO inacceptable.  Are you saying your optimizer is simplistic? :-)
Of course not!
...
I believe that talks about performance are premature.  I
think it's time to talk about what can and what can't be done
with this libraries.  Then comes the optimization.
I don't think talking about performance is premature.  If we are going
to provide a practical library, then performance matters.  If a user has
a choice between using RTL in a real application, but it is 10 times
slower than Boost.MultiIndex, then I expect most users would stick with
Boost.MultiIndex.  People need to know.
Of course.

However I don't believe RTL even targets this niche.  Relational approach is
much more than just indexing on multiple fields.
...
On the other hand, I believe that RML is a much easier and more
efficient (than std:: containers) way to manage collections.  But I am
biassed :-)
...
As for benchmarks, for such simple case as yours, you don't
need RTL -- you can benchmark sorted std::vector or
Boost.Multi_index.  The performance of RTL will not be
different, since it's based on these containers.  To
outperform STL is definitely not one of RTL's tasks.
For more complicated cases though, such as ones that I
described in my original post, RTL does provide an efficient,
index-based solution, while other libraries, AFAIK, don't.
In that case, could you give an example where RTL would be efficient,
while "other libraries" would be inefficient.  I have a case for you
that might not be so efficient in RTL:  What about indexing on multiple
columns?  This is what RML is designed for, otherwise you may as well
just use std::map.
I don't know if we are much less efficient with multiple columns.  Again,
the matter of fine tuning.  And again, relational approach is much more than
just indexing on multiple columns.  For just indexing on multiple columns
one can use Boost.MultiIndex.

IMO, relational approach is about providing multiple views into the same
data model.  And making these views most efficient.  The constrained
language of relational algebra allows one to achieve this.  This is what RTL
is about.
...
Although it sounds a little competitive, I think it's pretty instructive
to be able to compare different implementations.
Nothing wrong about being competitive :)

Regards,
Arkadiy