
"Calum Grant" <calum@visula.org> wrote
Arkadiy Vertleyb wrote
Here is a simple example that illustrates this. The attached program does the following:
1) creates table of one column; 2) self-joins the table 3) selects records where columns are equal.
This by itself doesn't make any sence, but it closely approximates the task of finding close cities, stars, billiard balls, monsters and soldiers, etc.
Here is the equivalent in RML. Forget 3000 rows, what about adding 1000000 rows, then adding another 1000000 to it? What time does the equivalent RTL take?
Calum, I think you are missing my point. The above example was provided to illustrate my two statements: 1) RTL can index any (arbitrary complicated) expression, not just table; 2) RTL can _incrementally_ update any such index when the underlying table has been changed. Did I use the most efficient expression for this particular task? Of course not. I used cross-product where I would use a range (or merge) join otherwise. I did it _intentionally_, because I wanted to provide an analogy of a complicated condition, such condition for which optimization is simply not possible. I am pretty sure your library figured out how to do this query efficiently, without considering every possible combination (it didn't consider 10^12 combinations, did it?). However, what it would do if the condition was more complicated? Like calculating the distance, matching people by certain characteristics, etc.? RTL can build an index on even slow expression, after which subsequent queries become a few orders of magnitude faster. And the indexes are not broken when the tables are changed. Having said all this, RTL doesn't enforce this kind of usage. If the query needs to be executed only once, or if one doesn't want to consum memory for indexes, any appropriate relational expression can be built, with any number of indexes in different nodes, to achieve needed tradeof between speed, initial response, memory consumption, etc. My firm belief is that this kind of flexibility is simply impossible with SQL-like approach.
I don't think self-joins come up that often in real life however.
I think they are. I just gave examples where they could be (and are) used. This, however, is not about a self-join. This is about any query which is slightly more complicated than just a simple index-based selection. Once again, I don't believe this is a right time for any performance contests. They will not prove anything. I think RML will be faster in some cases, and RTL -- in the others. And a couple of days spent on optimization can dramatically change the picture. What I would like to be discussing instead is the _design principles_ behind the libraries. Regards, Arkadiy