[relational] Relational Model Library

Hi all, I have recently completed a library for implementing relational models in C++. Using template metaprogramming, multi-index containers can be manipulated using SQL-style queries, while retaining type-safety and efficiency. The web page is http://visula.org/relational This might be an appropriate interface for the database library that was recently discussed. This is not the same as Relational Template Library, which uses an entirely different interface. Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML? Many thanks, Calum

On 09/27/2005 03:24 PM, Calum Grant wrote: [snip]
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML? http://visula.org/relational/implementation.html contains:
The reason this has to be a macro is because it needs to work with names, not just types. It was felt that the benefits of being able to write customer.name or customers.name instead of column<1>(customers) or col<1,1>() outweighs the inherent disadvantages of using a macro. why not use an enumerator: column<name>(customers) instead of a number: column<1>(customers) ? Would this eliminate the macro need? This idea of using enumerators for easier reading was one motivation for the indexed_types library in the sandbox: http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/inde...

-----Original Message-----
[mailto:boost-bounces@lists.boost.org] On Behalf Of Larry Evans
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML? http://visula.org/relational/implementation.html contains:
The reason this has to be a macro is because it needs to work with names, not just types. It was felt that the benefits of being able to write customer.name or customers.name instead of column<1>(customers) or col<1,1>() outweighs the inherent disadvantages of using a macro.
why not use an enumerator:
column<name>(customers)
instead of a number:
column<1>(customers)
? Would this eliminate the macro need?
An early RML did use tuples indexed in the way you describe. I did use enums - writing magic numbers is very bad for readibility and maintainability. Personally I prefer to read customers.name instead of column<name>(customers) There is also a big problem with enums - being able to associate the correct enum with the correct type. I.e. the "name" column may be 2 for customers, and 3 for products. So you really need column<customer_name>(customers) and column<product_name>(product). Calum

On 09/27/2005 05:05 PM, Calum Grant wrote:
-----Original Message-----
[mailto:boost-bounces@lists.boost.org] On Behalf Of Larry Evans
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
http://visula.org/relational/implementation.html contains:
The reason this has to be a macro is because it needs to work with names, not just types. It was felt that the benefits of being able to write customer.name or customers.name instead of column<1>(customers) or col<1,1>() outweighs the inherent disadvantages of using a macro.
why not use an enumerator:
column<name>(customers)
instead of a number:
column<1>(customers)
? Would this eliminate the macro need?
An early RML did use tuples indexed in the way you describe. I did use enums - writing magic numbers is very bad for readibility and
^^^^^^^^^^^^^ I guess you mean an enumeration here?
maintainability. Personally I prefer to read
customers.name
instead of
column<name>(customers)
Well, how about: customers.get<name>() IMO, that's only marginally more verbose and just as readable.
There is also a big problem with enums - being able to associate the correct enum with the correct type. I.e. the "name" column may be 2 for customers, and 3 for products. So you really need column<customer_name>(customers) and column<product_name>(product).
Or, customers.get<customers::name>() OK, that's a little less readable :( , but I guess you believe the disadvantage of macro use is not as bad as the disadvantage of prefixing customers:: to the enumerator. I'm not that sure.

Larry Evans wrote:
On 09/27/2005 05:05 PM, Calum Grant wrote:
An early RML did use tuples indexed in the way you describe. I did use enums - writing magic numbers is very bad for readibility and ^^^^^^^^^^^^^ I guess you mean an enumeration here?
I read it to mean he used enums *because* writing magic numbers is bad. However, he prefers to read ...
maintainability. Personally I prefer to read
customers.name
instead of
column<name>(customers)
Or, customers.get<customers::name>()
OK, that's a little less readable :( , but I guess you believe the disadvantage of macro use is not as bad as the disadvantage of prefixing customers:: to the enumerator. I'm not that sure.
I'm very interested in this library and have some thoughts on Larry's reply. A benefit of Domain-Specific Languages is that you can provide a more natural and expressive syntax for the domain. Since most people's experience of relational models involves SQL I think a syntax that (even vaguely) resembles SQL is better than what is pretty exclusively C++ syntax. Try getting a developer who prefers java to write all those angle brackets and double colons when they could just connect to an RDBMS with JDBC and run SQL queries at runtime (several orders of magnitude slower, of course :) Then, if you wanted to refer to the column from within a template you'd have to say "customers.template get<customers::name>()" which is syntax that no library should require from its users unless really unavoidable. Also, many queries have a large number of conditions, so the C++ syntax will get very long and so is *much* harder to read (and write!) As for avoiding macros, IMHO using macros is fine in this context. Macros are code-generators, and that's exactly what Calum's using them for. Again, there's no reason a declaration of a table in the DSL has to look like a C++ declaration. jon

Larry Evans wrote:
On 09/27/2005 05:05 PM, Calum Grant wrote:
An early RML did use tuples indexed in the way you describe. I did use enums - writing magic numbers is very bad for readibility and
^^^^^^^^^^^^^ I guess you mean an enumeration here?
I read it to mean he used enums *because* writing magic numbers is bad. However, he prefers to read ... Ah! Thanks for clarifying. I guess the "magic number" would be
On 10/02/2005 04:00 PM, Jonathan Wakely wrote: the 0 in get<0>(a_tuple) for the first element in the tuple.
maintainability. Personally I prefer to read
customers.name
[snip]
Or, customers.get<customers::name>()
OK, that's a little less readable :( , but I guess you believe the disadvantage of macro use is not as bad as the disadvantage of prefixing customers:: to the enumerator. I'm not that sure. [snip] Also, many queries have a large number of conditions, so the C++ syntax will get very long and so is *much* harder to read (and write!)
OK. You've convinced me.

"Larry Evans" <cppljevans@cox-internet.com> wrote
Larry Evans wrote:
On 09/27/2005 05:05 PM, Calum Grant wrote:
An early RML did use tuples indexed in the way you describe. I did use enums - writing magic numbers is very bad for readibility and
^^^^^^^^^^^^^ I guess you mean an enumeration here?
I read it to mean he used enums *because* writing magic numbers is bad. However, he prefers to read ... Ah! Thanks for clarifying. I guess the "magic number" would be
On 10/02/2005 04:00 PM, Jonathan Wakely wrote: the 0 in get<0>(a_tuple) for the first element in the tuple.
maintainability. Personally I prefer to read
customers.name
[snip]
Or, customers.get<customers::name>()
It's pretty easy to achieve something like customers[customers::name()]. In RTL we do just this. Regards, Arkadiy

Jonathan Wakely wrote:
I'm very interested in this library and have some thoughts on Larry's reply.
A benefit of Domain-Specific Languages is that you can provide a more natural and expressive syntax for the domain. Since most people's experience of relational models involves SQL I think a syntax that (even vaguely) resembles SQL is better than what is pretty exclusively C++ syntax. Try getting a developer who prefers java to write all those angle brackets and double colons when they could just connect to an RDBMS with JDBC and run SQL queries at runtime (several orders of magnitude slower, of course :)
I am not sure how something like this would be structured, but what about something like Boost.Spirit/Phoenix/Lambda/Xpressive for SQL?
Also, many queries have a large number of conditions, so the C++ syntax will get very long and so is *much* harder to read (and write!)
As for avoiding macros, IMHO using macros is fine in this context. Macros are code-generators, and that's exactly what Calum's using them for. Again, there's no reason a declaration of a table in the DSL has to look like a C++ declaration.
Macros are useful tools - you only have to look at Boost.PreProcessor to see how powerful it can be! However, macros/preprocessor sould be, wherever possible, restricted to implementation (like Boost.Function). As I mentioned above, having a Boost.Spirit-style SQL syntax would complement the direction that C++/Boost is going with respect to describing external constructs (RegExes, BNFL grammars) within C++. The RML database and results could be kept in tuples, so CREATE TABLE would be: typedef boost::tuple< std::string, std::string, int > people_table; static const int first_name = 0; ... I only have limited knowledge of SQL syntax, but we could then have something like: rml::sql_statement results = select_( item_<first_name> && item_<last_name> ) .where_ [ item_<age> >= 30 ] BOOST_FOREACH( data = results( rml_database )) { std::cout << get<first_name>( data ) << std::endl; } - Reece

On 10/02/2005 04:53 PM, Reece Dunn wrote: [snip]
The RML database and results could be kept in tuples, so CREATE TABLE would be:
typedef boost::tuple< std::string, std::string, int > people_table; static const int first_name = 0; ...
But, as Calum pointed out in: http://lists.boost.org/Archives/boost/2005/09/94348.php I.e. the "first_name" column may be 2 for customers, and 3 for products. Hence, this would have to be qualified with the table or record name, e.g. customers::first_name products::first_name and this would lead to the readability problem which Jonathan repeated in: http://lists.boost.org/Archives/boost/2005/10/94661.php

Larry Evans wrote:
On 10/02/2005 04:53 PM, Reece Dunn wrote: [snip]
The RML database and results could be kept in tuples, so CREATE TABLE would be:
typedef boost::tuple< std::string, std::string, int > people_table; static const int first_name = 0; ...
But, as Calum pointed out in:
http://lists.boost.org/Archives/boost/2005/09/94348.php
I.e. the "first_name" column may be 2 for customers, and 3 for products. Hence, this would have to be qualified with the table or record name, e.g.
customers::first_name products::first_name
and this would lead to the readability problem which Jonathan repeated in:
True. Here, macros would be needed for something like: BOOST_CREATE_TABLE( mytable, BOOST_TABLE_ROW( first_name, std::string ) ... ); but this would no longer be a tuple :(. Unless you have something like: struct people { typedef tuple< ... > table; static const int first_name = 0; }; then the macro definition breaks :(. - Reece

On 10/02/2005 06:07 PM, Reece Dunn wrote:
Larry Evans wrote: [snip]
and this would lead to the readability problem which Jonathan repeated in:
True. Here, macros would be needed for something like:
BOOST_CREATE_TABLE( mytable, BOOST_TABLE_ROW( first_name, std::string ) ... ); Or, something like that shown in:
http://lists.boost.org/Archives/boost/2005/09/94418.php which contains: RM_DEFINE_ROW(xxx,((int,a_int))((float,a_float))); where the correspondence with the above BOOST_CREATE_TABLE(...) is: xxx mytable a_int first_name int std:string a_float ? float ?
but this would no longer be a tuple :(.
True. What are the advantages of a tuple that would outweigh the readability disadvantage mentioned by both Jonathan and Calum?
Unless you have something like:
struct people { typedef tuple< ... > table; static const int first_name = 0; };
then the macro definition breaks :(.
Sorry, I'm not following you. Why would you need something like the above 'struct people' when RM_DEFINE_ROW would suffice?

Reece Dunn wrote:
As I mentioned above, having a Boost.Spirit-style SQL syntax would complement the direction that C++/Boost is going with respect to describing external constructs (RegExes, BNFL grammars) within C++.
Thinking about this some more, I now have: [NOTE: These are some ideas I am throwing around and have no idea yet as to their implementability or useability.] CREATE TABLE: typedef tuple < std::string, std::string, int > people_row; static const int first_name = 0; static const int second_name = 1; static const int age = 2; But see the discussion about people::first_name = 2 and customer::first_name = 3. typedef sql::table < std::string, std::string, int > people_table; people_table database; people_table::row_type would evaluate to tuple< std::string, std::string, int > and people_table would hold a std::vector< row_type >-like object. SELECT: sql::statement query = sql::select ( sql::item< first_name > && sql::item< second_name > ).where [ sql::item< age >() >= 30 ]; sql::select( ... ) would create a tuple -> tuple mapping that defines the result type. This would define an sql::table<> type. .where[ ... ] would define the binary predicate functor that returns true if the given tuple matches the criteria. Thus, you could do something like: if( query( database[ 0 ])) ...; ROWSET: sql::rowset results = database.query( query ); BOOST_FOREACH( row, results ) { std::cout << get<first_name>( row ) << std::endl; } JOIN: A join command could be something like: sql::statement query = sql::select ( sql::item< first_name, 0 > && sql::item< postcode, 1 > ) .where [ sql::item< age, 0 > == 21 ]; sql::id< id_value > defines an identifier for multi-table selects. sql::item< column_index, table_id = 0 > identifies a column to apply. Then you would do: results = query.join ( id<0>( people ) * id<1>( addresses ) ); - Reece

On 10/02/2005 06:13 PM, Reece Dunn wrote:
Reece Dunn wrote: [snip] static const int first_name = 0; static const int second_name = 1; static const int age = 2;
But see the discussion about people::first_name = 2 and customer::first_name = 3. Are you referring to:
http://lists.boost.org/Archives/boost/2005/10/94666.php ? I'm unsure because there, instead of people, there was products, and instead of customer, there was customers and also some other minor differences. [snip]
sql::select( ... ) would create a tuple -> tuple mapping that defines the result type. This would define an sql::table<> type.
OK. Now I *think* I'm beginning to understand your reluctance to use macros. The macro method can be used to create the original table with meaningful row names: struct mytable { std::string first_name ; int social_security_number ;}; but, obviously, there's no way to specify the tuple -> tuple mapping mentioned above using a macro, AFAICT.

On 10/02/2005 07:00 PM, Larry Evans wrote: [snip]
but, obviously, there's no way to specify the tuple -> tuple mapping mentioned above using a macro, AFAICT.
OK, I'm probably wrong about this because Calum must have a way to do that, probably by mapping the meaningful names to column number and then just mapping the column numbers in the inputs to colum numbers in the output table. I'm just guessing though. Calum?

Also, many queries have a large number of conditions, so the C++ syntax will get very long and so is *much* harder to read (and write!)
As for avoiding macros, IMHO using macros is fine in this context. Macros are code-generators, and that's exactly what Calum's using them for. Again, there's no reason a declaration of a table in the DSL has to look like a C++ declaration.
Macros are useful tools - you only have to look at Boost.PreProcessor to see how powerful it can be! However, macros/preprocessor sould be, wherever possible, restricted to implementation (like Boost.Function).
As I mentioned above, having a Boost.Spirit-style SQL syntax would complement the direction that C++/Boost is going with respect to describing external constructs (RegExes, BNFL grammars) within C++.
The RML database and results could be kept in tuples, so CREATE TABLE would be:
No - RML results are iterated not stored.
typedef boost::tuple< std::string, std::string, int > people_table; static const int first_name = 0; ...
I think an enum would be more appropriate.
I only have limited knowledge of SQL syntax, but we could then have something like:
rml::sql_statement results = select_( item_<first_name> && item_<last_name> ) .where_ [ item_<age> >= 30 ]
BOOST_FOREACH( data = results( rml_database )) { std::cout << get<first_name>( data ) << std::endl; }
I think the misconception is that results are not stored in a temporary table. The problem with the above is the type of rml::sql_statement. It is very complicated and would be different for each query. Specifically the iterator is generated to be able to evaluate the query whilst making use of indexes, and is a different type each time. You would need auto statement_results = select(people, people.age>=30); which is significantly tidier than the Spiritesque syntax. Calum

Macros are useful tools - you only have to look at Boost.PreProcessor to see how powerful it can be! However, macros/preprocessor sould be, wherever possible, restricted to implementation (like Boost.Function).
I think the solution is to provide both syntaxes. Macro-haters can write typedef relational::row< unique<int>, indexed<std::string> > Person; enum { person_id, person_name }; table<Person> people; select(people, col<person_name>() == "Reece"); while people who don't mind macros can write RM_DEFINE_ROW(Person, unique<int>, id, indexed<std::string>, name); table<Person> people; select(people, people.name == "Reece"); It seems pretty clear that there are varying tastes, so the only solution is to provide both. The notation provided first was the original approach that RML used, and was free from macros, so it would not be too much trouble to put that code back in. I will do that in the next release of RML (which will probably be a couple of weeks as I collate feedback). Thanks all for the feedback, keep it coming, Calum

One idea that could make the syntax issues more palatable is to incorporate some ideas from the soci.sf.net <http://soci.sf.net> library http://lists.boost.org/Archives/boost/2005/10/94680.php I have not given thought to how easy it would be to implement it, but the idea is that the queries are valid sql queries and then the soci layer translates those to the more arcane syntax/es. Simplifying and unifying the syntax should be a good thing ! On 10/3/05, Calum Grant <calum@visula.org> wrote:
Macros are useful tools - you only have to look at Boost.PreProcessor to see how powerful it can be! However, macros/preprocessor sould be, wherever possible, restricted to implementation (like Boost.Function).
I think the solution is to provide both syntaxes. Macro-haters can write
typedef relational::row< unique<int>, indexed<std::string> > Person; enum { person_id, person_name }; table<Person> people;
select(people, col<person_name>() == "Reece");
while people who don't mind macros can write
RM_DEFINE_ROW(Person, unique<int>, id, indexed<std::string>, name); table<Person> people;
select(people, people.name <http://people.name> == "Reece");
It seems pretty clear that there are varying tastes, so the only solution is to provide both. The notation provided first was the original approach that RML used, and was free from macros, so it would not be too much trouble to put that code back in. I will do that in the next release of RML (which will probably be a couple of weeks as I collate feedback).
Thanks all for the feedback, keep it coming,
Calum
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"Jose" wrote:
One idea that could make the syntax issues more palatable is to incorporate some ideas from the soci.sf.net <http://soci.sf.net> library
http://lists.boost.org/Archives/boost/2005/10/94680.php
I have not given thought to how easy it would be to implement it, but the idea is that the queries are valid sql queries and then the soci layer translates those to the more arcane syntax/es. Simplifying and unifying the syntax should be a good thing !
SOCI library produces string which is passed down to SQL engine for interpretation. This approach is very different from what RML does. /Pavel

Yes. I was only wondering if it could be possible to map (at compile time) a sql string like select name from people where name ="Reece" ---> select(people, col<person_name>() == "Reece"); so that the "library user" specifies standard sql syntax (I am interested in using RML but am not an expert!) On 10/4/05, Pavel Vozenilek <pavel_vozenilek@hotmail.com> wrote:
"Jose" wrote:
One idea that could make the syntax issues more palatable is to incorporate some ideas from the soci.sf.net <http://soci.sf.net> <http://soci.sf.net> library
http://lists.boost.org/Archives/boost/2005/10/94680.php
I have not given thought to how easy it would be to implement it, but the idea is that the queries are valid sql queries and then the soci layer translates those to the more arcane syntax/es. Simplifying and unifying the syntax should be a good thing !
SOCI library produces string which is passed down to SQL engine for interpretation. This approach is very different from what RML does.
/Pavel
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Yes. I was only wondering if it could be possible to map (at compile time) a sql string like select name from people where name ="Reece" ---> select(people, col<person_name>() == "Reece"); so that the "library user" specifies standard sql syntax (I am interested in using RML but am not an expert!)
One could certainly map the other way, i.e. generate a SQL string from a C++ expression. So one could implement a function that could transform a RML expression to a SQL string, that could be passed to a database. The question is whether a C++-based syntax would be preferable to a string-based syntax.
On 10/4/05, Pavel Vozenilek <pavel_vozenilek@hotmail.com> wrote:
"Jose" wrote:
One idea that could make the syntax issues more palatable is to incorporate some ideas from the soci.sf.net <http://soci.sf.net> <http://soci.sf.net> library
http://lists.boost.org/Archives/boost/2005/10/94680.php
I have not given thought to how easy it would be to implement it, but the idea is that the queries are valid sql queries and then the soci layer translates those to the more arcane syntax/es.
Simplifying and
unifying the syntax should be a good thing !
SOCI library produces string which is passed down to SQL engine for interpretation. This approach is very different from what RML does.
/Pavel
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/bo> ost

On 10/4/05, Calum Grant <calum@visula.org> wrote:
Yes. I was only wondering if it could be possible to map (at compile time) a sql string like select name from people where name ="Reece" ---> select(people, col<person_name>() == "Reece"); so that the "library user" specifies standard sql syntax (I am interested in using RML but am not an expert!)
One could certainly map the other way, i.e. generate a SQL string from a C++ expression. So one could implement a function that could transform a RML expression to a SQL string, that could be passed to a database. The question is whether a C++-based syntax would be preferable to a string-based syntax.
Probably the reverse mapping you mention is useful for compatibility with rdbms engines. To me the string based syntax for queries is good because they would be validated at compile time and the syntax is simple and very-well known (this is a similar argument to the macro being good to keep the syntax easy). Now the alternatives would be "sql string" syntax and the template-based one

Calum, Great Library, I really like how it intelligently uses the operators. One thing I find abit messy is the use of the macros RM_DEFINE_ROW_*. Is it possible to turn these into templates? Also, is it possible to have a sort of interface into a real backend database? eg. make the data source/sink instead of your load/save csv into an ODBC interface. Minh Minh Phanivong +64 2 136 7118 (NZ) "Instead of living in the shadows of yesterday, walk in the light of today and the hope of tomorrow. " Āé·²½ Send instant messages to your online friends http://uk.messenger.yahoo.com

Calum,
Great Library, I really like how it intelligently uses the operators. One thing I find abit messy is the use of the macros RM_DEFINE_ROW_*. Is it possible to turn these into templates? Also, is it possible to have a sort of interface into a real backend database? eg. make the data source/sink instead of your load/save csv into an ODBC interface.
Thank you! I don't like macros either - in this case they save a lot of boilerplate. The syntax looks a little worse if you write col<x,y>() or column<x>(y), but it actually works as well. Making macros optional would be a good idea. In terms of a real database, that is on future work but I see that as a large project and RML is currently aimed at managing in-memory containers. RML can motivate the interface for access to a real DB backend, but I'm not planning to do that at the moment. Calum
Minh
Minh Phanivong +64 2 136 7118 (NZ)
"Instead of living in the shadows of yesterday, walk in the light of today and the hope of tomorrow. "
Āé·²½ Send instant messages to your online friends http://uk.messenger.yahoo.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/bo> ost

On Wed, 28 Sep 2005 22:51:03 +0100, Calum Grant wrote
In terms of a real database, that is on future work but I see that as a large project and RML is currently aimed at managing in-memory containers. RML can motivate the interface for access to a real DB backend, but I'm not planning to do that at the moment.
Might be interesting to see if it could work with the proposed shared memory library: http://boost-consulting.com/vault/index.php?&direction=0&order=&directory=Memory docs online at: http://ice.prohosting.com/newfunk/boost/libs/shmem/doc/html/index.html There's a memory mapped file primative in this lib. Haven't really had a chance to dig into your library, but the docs look interesting... Jeff

On Tue, Sep 27, 2005 at 09:24:24PM +0100, Calum Grant <calum@visula.org> wrote:
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
You could compare the library with sqlite. Furthermore you could implement a table that can be stored in a file, and used wihout reading all records into memory. Andreas Pokorny

[mailto:boost-bounces@lists.boost.org] On Behalf Of Andreas Pokorny On Tue, Sep 27, 2005 at 09:24:24PM +0100, Calum Grant <calum@visula.org> wrote:
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
You could compare the library with sqlite.
Completely different. Most database engines are controlled via text strings, and require parsing and data transformation. This cuts all that out using C++ templates, and is therefore way faster. It's a container.
Furthermore you could implement a table that can be stored in a file, and used wihout reading all records into memory.
I think the memory-mapped file route would be the most profitable, it would probably already work. Calum

On Wed, Sep 28, 2005 at 10:50:51PM +0100, Calum Grant <calum@visula.org> wrote:
[mailto:boost-bounces@lists.boost.org] On Behalf Of Andreas Pokorny On Tue, Sep 27, 2005 at 09:24:24PM +0100, Calum Grant <calum@visula.org> wrote:
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
You could compare the library with sqlite.
Completely different. Most database engines are controlled via text strings, and require parsing and data transformation. This cuts all that out using C++ templates, and is therefore way faster. It's a container.
Oh, sorry my request was not formulated clearly enough. I wanted to see how rml performs compared to sqlite in a benchmark. Sqlite is currently a nice and fast solution if you do not need a remote database. Rml could be an even better solution, if you do not require a text sql interface, provided that there are table types available in rml, which work without being fully loaded to memory.
Furthermore you could implement a table that can be stored in a file, and used wihout reading all records into memory.
I think the memory-mapped file route would be the most profitable, it would probably already work.
Interesting, could you give a short overview, how it could already work. Regards, Andreas

[mailto:boost-bounces@lists.boost.org] On Behalf Of Andreas Pokorny On Wed, Sep 28, 2005 at 10:50:51PM +0100, Calum Grant <calum@visula.org> wrote:
[mailto:boost-bounces@lists.boost.org] On Behalf Of Andreas Pokorny On Tue, Sep 27, 2005 at 09:24:24PM +0100, Calum Grant <calum@visula.org> wrote:
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
You could compare the library with sqlite.
Completely different. Most database engines are controlled via text strings, and require parsing and data transformation. This cuts all that out using C++ templates, and is therefore way faster. It's a container.
Oh, sorry my request was not formulated clearly enough. I wanted to see how rml performs compared to sqlite in a benchmark. Sqlite is currently a nice and fast solution if you do not need a remote database. Rml could be an even better solution, if you do not require a text sql interface, provided that there are table types available in rml, which work without being fully loaded to memory.
Furthermore you could implement a table that can be stored in a file, and used wihout reading all records into memory.
I think the memory-mapped file route would be the most profitable, it would probably already work.
Interesting, could you give a short overview, how it could already work.
You could use an allocator with RML (or indeed any STL container) that stores the data in a memory-mapped file. This basically makes the data persistent. I actually had a project a while back that did such a thing, and got some good benchmarks comparing MySQL vs file-mapped containers: http://lightwave.visula.org/persist/bench.html and I'm sure boost::shmem::allocator is similar. Performance is very good. We could devise a benchmark and I'll implement in RML + Persist/Shmem, you implement in SQLlite and we'll see the difference. Calum

On 9/29/05, Calum Grant <calum@visula.org> wrote: I actually had a project a while back that did such a thing, and got
some good benchmarks comparing MySQL vs file-mapped containers: http://lightwave.visula.org/persist/bench.html and I'm sure boost::shmem::allocator is similar. Performance is very good. We could devise a benchmark and I'll implement in RML + Persist/Shmem, you implement in SQLlite and we'll see the difference.
The one killer limitation of shmem (that I'm pretty sure Ion is working hard to remove) is that the shared memory region cannot be grown once it has been created. This is where your memory-mapped "persist" library has a leg up. -- Caleb Epstein caleb dot epstein at gmail dot com

The one killer limitation of shmem (that I'm pretty sure Ion is working hard to remove) is that the shared memory region cannot be grown once it has been created. This is where your memory-mapped "persist" library has a leg up.
Yes - Persist's approach is to manage a pool of memory blocks rather than to allocate one huge one. So resizing isn't an issue. This approach could of course be used with Shmem as well. In fact I think many memory allocators allocate the heap in chunks rather than assume it's contiguous. Calum

"Caleb Epstein" wrote:
The one killer limitation of shmem (that I'm pretty sure Ion is working hard to remove) is that the shared memory region cannot be grown once it has been created. This is where your memory-mapped "persist" library has a leg up.
It is current limitation which may be reconsidered after current version is finished. /Pavel

Hi to all, I've missed the thread but since I've seen Shmem is mentioned I couldn't resist...
The one killer limitation of shmem (that I'm pretty sure Ion is working hard to remove) is that the shared memory region cannot be grown once it has been created. This is where your memory-mapped "persist" library has a leg up.
The problem is quite hard to solve if you allow shared memory to be placed in different base addresses in different processes. And performance would suffer if every pointer access I should check if the memory segment it points is already mapped. To identify each segment, a pointer should have the name of the segment and an offset. Each access would imply discovering the real address of such segment in the current process accessing the pointer. Really a hard task to do and performance would suffer a lot. I think previous efforts ("A C++ Pooled, Shared Memory Allocator For The Standard Template Library" http://allocator.sourceforge.net/) with growing shared memory use fixed memory mappings in different processes. But this is an issue I would like to solve after the first version of Shmem is presented to a review (I plan to do this shortly, within two months) Memory mapped files are another thing. Disk blocks can be dispersed in the disk but the OS will give you the illusion that all data is contiguous. Currently in Shmem, when using memory mapped files as memory backend, if your memory mapped file is full of data, you can grow the memory mapped file and remap it, so you have more data to work. An in-memory DB can be easily implemented using this technique: when the insertion in any object allocated in the memory mapped file throws boost::shmem::bad_alloc, you just call: named_mfile_object->grow(1000000/*additional bytes*/); and the file grows and you can continue allocating objects. Take care because the OS might have changed the mapping address. In Shmem you can obtain offsets to objects to recover the new address of the remapped object. You can use the same technique with heap memory. The trick in Shmem is that to achieve maximum performance, the memory space must be contiguous. For growing memory, and persistent data, memory mapped files are available in Shmem. Maybe is not enough for a relational DB, but I would be happy to work with RTL library on this. I've downloaded RML and I've seen that "mt_tree" class uses raw pointers in the red-black tree algorithm. If you use memory mapped files and you store raw pointer there, this file is unusable if you don't map it again exactly in the same address where you created it. All data in the memory mapped file must be base-address independent. That's why Shmem uses offset_ptr-s and containers that accept this kind of pointers. So if we want to achieve persistence with RTL we must develop base independent containers. This is not a hard task but porting, for example, multiindex to offset_ptr-s, is not a one day issue. Regards, Ion

The one killer limitation of shmem (that I'm pretty sure Ion is working hard to remove) is that the shared memory region cannot be grown once it has been created. This is where your memory-mapped "persist" library has a leg up.
The problem is quite hard to solve if you allow shared memory to be placed in different base addresses in different processes. And performance would suffer if every pointer access I should check if the memory segment it points is already mapped. To identify each segment, a pointer should have the name of the segment and an offset. Each access would imply discovering the real address of such segment in the current process accessing the pointer. Really a hard task to do and performance would suffer a lot. I think previous efforts ("A C++ Pooled, Shared Memory Allocator For The Standard Template Library" http://allocator.sourceforge.net/) with growing shared memory use fixed memory mappings in different processes. But this is an issue I would like to solve after the first version of Shmem is presented to a review (I plan to do this shortly, within two months)
Memory mapped files are another thing. Disk blocks can be dispersed in the disk but the OS will give you the illusion that all data is contiguous. Currently in Shmem, when using memory mapped files as memory backend, if your memory mapped file is full of data, you can grow the memory mapped file and remap it, so you have more data to work. An in-memory DB can be easily implemented using this technique: when the insertion in any object allocated in the memory mapped file throws boost::shmem::bad_alloc, you just call:
named_mfile_object->grow(1000000/*additional bytes*/);
and the file grows and you can continue allocating objects.
Couldn't the allocator do this instead of asking the user to do it? It would be better if the container did not need special code for different allocators.
Take care because the OS might have changed the mapping address. In Shmem you can obtain offsets to objects to recover the new address of the remapped object. You can use the same technique with heap memory. The trick in Shmem is that to achieve maximum performance, the memory space must be contiguous. For growing memory, and persistent data, memory mapped files are available in Shmem. Maybe is not enough for a relational DB, but I would be happy to work with RTL library on this.
I've downloaded RML and I've seen that "mt_tree" class uses raw pointers in the red-black tree algorithm. If you use memory mapped files and you store raw pointer there, this file is unusable if you don't map it again exactly in the same address where you created it. All data in the memory mapped file must be base-address independent. That's why Shmem uses offset_ptr-s and containers that accept this kind of pointers. So if we want to achieve persistence with RTL we must develop base independent containers. This is not a hard task but porting, for example, multiindex to offset_ptr-s, is not a one day issue.
If you can make the assumption that memory will not move, it would make the implementation a lot simpler. There is a certain overhead in offsetting pointers on each pointer dereference, and the red-black tree algorithms are quite pointer intensive. Mt_tree could certainly use the pointer type from the allocator, and I'll put that into my next release of RML. Persist's approach uses a pool of mapped memory - thereby avoiding needing to move memory. [To people unfamiliar with mmap(): a file does not have to be mapped contiguously into the address space]. Allocating more memory means mapping another block, and no memory needs to be moved. Although I haven't seen it in practice, it is certainly a theoretical possibility that the OS will refuse to map the file back to the same memory addresses the next time the program is run, and this is the one reason why I haven't been pushing the Persist library because I just can't guarantee its safety. My feeling is that if the address space was large enough (i.e. 64-bit) and the OS could guarantee to map to a specific address, then the offset_ptr workaround would become unnecessary. The other problem is that other threads won't be expecting objects to move. This means that you can't have concurrent access to your memory-mapped data. Also if the file is shared between processes and you grow the file in one process, when does another process detect the change? My feeling is that safety is paramount, and that it is better to have a safe slower implementation using offset_ptrs, than to use absolute memory addresses and risk mmap() failure. Alternatively the application could be robust to mmap() failure, for example if the memory-mapped data could be reconstructed from another data source. You could perhaps provide two allocators in Shmem: one that uses offset_ptrs and another that does not. Regards, Calum

Hi,
Couldn't the allocator do this instead of asking the user to do it? It would be better if the container did not need special code for different allocators.
Well, the container does nothing, because just like when new fails and std::bad_alloc is thrown the STL container don't manage the exception. But maybe the allocator could try to grow the file. The problem is that currently I unmap the file and after that I map it again so this can be a problem since the allocator itself is in the mapped file. I don't know if it's safe to increase the mapping or I need to unmap it first, and this is portable.
If you can make the assumption that memory will not move, it would make the implementation a lot simpler. There is a certain overhead in offsetting pointers on each pointer dereference, and the red-black tree algorithms are quite pointer intensive. Mt_tree could certainly use the pointer type from the allocator, and I'll put that into my next release of RML.
I understand that maybe you might want to develop your own container but have a look to Shmem map/multimap/set/multiset family and look if you can use those to build your multi-index container. I somewhere read that some operating systems even don't allow mappings at fixed addresses but I don't think this is the case for the most used ones. I think some OSes reserve some virtual addresses to dll-s and shared memory, but in theory a malloc in an application can use the memory address I need to map the shared segment another process has just allocated. But this is a thing a need to investigate.
Persist's approach uses a pool of mapped memory - thereby avoiding needing to move memory. [To people unfamiliar with mmap(): a file does not have to be mapped contiguously into the address space]. Allocating more memory means mapping another block, and no memory needs to be moved.
Sorry, I don't understand this, since I have not investigated Persist's approach, but what is a pool of mapped memory and how do you use it? When your allocator is out of memory what do you do? Increase the file's size and remap it?
Although I haven't seen it in practice, it is certainly a theoretical possibility that the OS will refuse to map the file back to the same memory addresses the next time the program is run, and this is the one reason why I haven't been pushing the Persist library because I just can't guarantee its safety.
Just think that you want to open two mapped files in the same program, created separately. You couldn't open both mapped files simultaneously.
The other problem is that other threads won't be expecting objects to move. This means that you can't have concurrent access to your memory-mapped data. Also if the file is shared between processes and you grow the file in one process, when does another process detect the change?
The mapped file approach in Shmem is not for concurrent access between processes for the moment. Obviously, is easier to notify an application that the mapped file has grown than allocating a new shared memory segment, discovering the mapping address and hoping it would be mapped just where you want.
My feeling is that safety is paramount, and that it is better to have a safe slower implementation using offset_ptrs, than to use absolute memory addresses and risk mmap() failure.
I think that with few mappings you could achieve fixed addresses, but obviously, if you let the OS choose the address it can organize memory so that you can map the maximum number of bytes and the maximum number of segments.
You could perhaps provide two allocators in Shmem: one that uses offset_ptrs and another that does not.
In Shmem, in STL-like allocators (which in the end, call the master allocator that manages the mapped file) the pointer type is templatized in Shmem so you can use raw pointers if you want to map the memory in the same segment and use just STL containers. I think that a growing mechanism between processes is quite complicated to achieve but maybe I'm overlooking something. Regards, Ion

Hi Calum I am very interested in using both your RML and persist library (especially RML with persist) to avoid sql db altogether! Some questions I have for persist: - Your benchmark was for 0.9 release on 2.4 kernel. What happens on 0.95 and 2.6 kernel ? - The 1 million row limitation looks awfully low to me. Is the code 64-bit ready ? If not, do you have any plans for x86_64 ? - This seems a core library that boost should have. Do you plan to submit it ? Jose On 9/29/05, Calum Grant <calum@visula.org> wrote:
[mailto:boost-bounces@lists.boost.org] On Behalf Of Andreas Pokorny On Wed, Sep 28, 2005 at 10:50:51PM +0100, Calum Grant <calum@visula.org> wrote:
[mailto:boost-bounces@lists.boost.org] On Behalf Of Andreas Pokorny On Tue, Sep 27, 2005 at 09:24:24PM +0100, Calum Grant <calum@visula.org> wrote:
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
You could compare the library with sqlite.
Completely different. Most database engines are controlled via text strings, and require parsing and data transformation. This cuts all that out using C++ templates, and is therefore way faster. It's a container.
Oh, sorry my request was not formulated clearly enough. I wanted to see how rml performs compared to sqlite in a benchmark. Sqlite is currently a nice and fast solution if you do not need a remote database. Rml could be an even better solution, if you do not require a text sql interface, provided that there are table types available in rml, which work without being fully loaded to memory.
Furthermore you could implement a table that can be stored in a file, and used wihout reading all records into memory.
I think the memory-mapped file route would be the most profitable, it would probably already work.
Interesting, could you give a short overview, how it could already work.
You could use an allocator with RML (or indeed any STL container) that stores the data in a memory-mapped file. This basically makes the data persistent.
I actually had a project a while back that did such a thing, and got some good benchmarks comparing MySQL vs file-mapped containers: http://lightwave.visula.org/persist/bench.html and I'm sure boost::shmem::allocator is similar. Performance is very good. We could devise a benchmark and I'll implement in RML + Persist/Shmem, you implement in SQLlite and we'll see the difference.
Calum
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

I am very interested in using both your RML and persist library (especially RML with persist) to avoid sql db altogether!
I expect it would be very good in terms of performance - though not quite so convenient in terms of all the other niceties that DBMSes provide.
Some questions I have for persist:
- Your benchmark was for 0.9 release on 2.4 kernel. What happens on 0.95 and 2.6 kernel ?
The library itself is a few years old, I haven't retested it since then.
- The 1 million row limitation looks awfully low to me. Is the code 64-bit ready ? If not, do you have any plans for x86_64 ?
I don't have 64-bit hardware to play on. I expect it would work a lot better because the 32-bit limit wouldn't be hit.
- This seems a core library that boost should have. Do you plan to submit it ?
I hadn't honestly considered it. Isn't this functionality covered by Shmem anyway? Persist is completely free - so Shmem could make use of it. I would certainly be happy to change the license from LGPL to Boost. Calum

On 9/30/05, Calum Grant <calum@visula.org> wrote:
I am very interested in using both your RML and persist library (especially RML with persist) to avoid sql db altogether!
I expect it would be very good in terms of performance - though not quite so convenient in terms of all the other niceties that DBMSes provide.
If the performance advantage is clear many will contribute the niceties !
- The 1 million row limitation looks awfully low to me. Is the code 64-bit ready ? If not, do you have any plans for x86_64 ?
I don't have 64-bit hardware to play on. I expect it would work a lot better because the 32-bit limit wouldn't be hit.
Is the code 64-bit ready ? What was limiting the scalability to 1 million rows rather than say 100 million ?
- This seems a core library that boost should have. Do you
plan to submit it ?
I hadn't honestly considered it. Isn't this functionality covered by Shmem anyway? Persist is completely free - so Shmem could make use of it. I would certainly be happy to change the license from LGPL to Boost.
I have to study shmem but I think the persistent STL bits are more suited as the base library for other database libraries like RML and not as part of a shared memory library. It's only an organizational issue. My feeling is that if the persistent containers are developed by people working on database issues it will get more focus on performance and be more suited for real-world solutions.

Is the code 64-bit ready ? What was limiting the scalability to 1 million rows rather than say 100 million ?
The OS refused to map more than 2GB into my address space.
- This seems a core library that boost should have. Do you
plan to submit it ?
I hadn't honestly considered it. Isn't this functionality covered by Shmem anyway? Persist is completely free - so Shmem could make use of it. I would certainly be happy to change the license from LGPL to Boost.
I have to study shmem but I think the persistent STL bits are more suited as the base library for other database libraries like RML and not as part of
a shared memory library.
It's only an organizational issue. My feeling is that if the persistent containers are developed by people working on database issues it will get more focus on performance and be more suited for real-world solutions.
On the other hand, one would expect that the implementors of a memory-mapped library would not be completely oblivious to performance. There are many tricky OS-related issues with shared/mapped memory, that is a project in itself. The answer lies in benchmarking different implementations. Regards, Calum

On Thu, Sep 29, 2005 at 09:18:28PM +0100, Calum Grant <calum@visula.org> wrote:
Furthermore you could implement a table that can be stored in a file, and used wihout reading all records into memory.
I think the memory-mapped file route would be the most
[mailto:boost-bounces@lists.boost.org] On Behalf Of Andreas Pokorny On Wed, Sep 28, 2005 at 10:50:51PM +0100, Calum Grant <calum@visula.org> wrote: profitable, it
would probably already work.
Interesting, could you give a short overview, how it could already work.
You could use an allocator with RML (or indeed any STL container) that stores the data in a memory-mapped file. This basically makes the data persistent.
That simple? I feared there is more to it. Sounds like RML is written with a lot of care.
I actually had a project a while back that did such a thing, and got some good benchmarks comparing MySQL vs file-mapped containers: http://lightwave.visula.org/persist/bench.html
Looks impressive.
and I'm sure boost::shmem::allocator is similar. Performance is very good. We could devise a benchmark and I'll implement in RML + Persist/Shmem, you implement in SQLlite and we'll see the difference.
Yeah. When do you plan to submit RML to boost? I would really love to see such a library in boost. Regards Andreas

and I'm sure boost::shmem::allocator is similar. Performance is very good. We could devise a benchmark and I'll implement in RML + Persist/Shmem, you implement in SQLlite and we'll see the difference.
Yeah. When do you plan to submit RML to boost? I would really love to see such a library in boost.
I will one day, there seems to be sufficient interest. The important thing is to get the library right, which is why I am asking for feedback. Also RML is not at all integrated with Boost, which would need to be addressed before submitting. Regards, Calum

This looks really promising. I'm not sure I like the preprocessor usage to declare tables.. surely it could be made to use templates? On 9/27/05, Calum Grant <calum@visula.org> wrote:
Hi all,
I have recently completed a library for implementing relational models in C++. Using template metaprogramming, multi-index containers can be manipulated using SQL-style queries, while retaining type-safety and efficiency.
The web page is http://visula.org/relational
This might be an appropriate interface for the database library that was recently discussed. This is not the same as Relational Template Library, which uses an entirely different interface.
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
Many thanks, Calum
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Cory Nelson http://www.int64.org

This looks really promising. I'm not sure I like the preprocessor usage to declare tables.. surely it could be made to use templates?
Your vote has been counted. I can post some alternatives for review - yes I can make macros optional. Cheers, Calum
Hi all,
I have recently completed a library for implementing relational models in C++. Using template metaprogramming, multi-index containers can be manipulated using SQL-style queries, while retaining type-safety and efficiency.
The web page is http://visula.org/relational
This might be an appropriate interface for the database
On 9/27/05, Calum Grant <calum@visula.org> wrote: library that
was recently discussed. This is not the same as Relational Template Library, which uses an entirely different interface.
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
Many thanks, Calum
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Cory Nelson http://www.int64.org
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/bo> ost

On 09/27/2005 03:24 PM, Calum Grant wrote: [snip]
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML?
I'm having trouble understanding the purpose of table_column_number. In column_names.hpp, it appears in: template<typename Row> struct table_columns { \ ::relational::table_column_number<Row,0> N0; \ ::relational::table_column_number<Row,1> N1; }; \ which it looks like you're creating the equivalent of boost::tuple, but with more meaningful names (i.e. N0 is more meaningful than T0 of boost tuple because N0 is an argument to the RM_DEFINE_ROW_2 macro). I see table_column_number defined in expressions.hpp, but I see no member variables; hence, table_column_number<Row,0> can't be the equivalent of T0 (the argument to RM_DEFINE_ROW_2 before N0). Could you explain a little more?

On 09/28/2005 02:35 PM, Larry Evans wrote:
On 09/27/2005 03:24 PM, Calum Grant wrote: [snip] I'm having trouble understanding the purpose of table_column_number. [snip] Could you explain a little more? Never mind. I see, in column_names.hpp:
col<0>::type N0; \ col<1>::type N1; \ Name() { } \ where: template<int Col> struct col : public rtl_gaccess_column<Name,Col> {}; and rtl_gaccess_column<Name,i>::type is typedef'ed to the Ti arg to the RM_DEFINE_ROW_2 macro.

On 09/27/2005 03:24 PM, Calum Grant wrote: [snip]
Is there any interest in providing relational containers in Boost? Are there any features/improvements you can suggest for RML? Why not use boost/preprocessor in RM_DEFINE_ROW? The following shows it can be used to define at least 2 parts of the struct Name:
<------------- RM_DEFINE_ROW.hpp ------------------- #include <boost/preprocessor/punctuation/comma_if.hpp> #include <boost/preprocessor/seq/size.hpp> #include <boost/preprocessor/repetition/repeat.hpp> #include <boost/preprocessor/seq/elem.hpp> #include <boost/preprocessor/tuple/elem.hpp> #define RM_DECL_COL(r,d,tf_pair) \ BOOST_PP_TUPLE_ELEM(2,0,BOOST_PP_SEQ_ELEM(d,tf_pair)) \ BOOST_PP_TUPLE_ELEM(2,1,BOOST_PP_SEQ_ELEM(d,tf_pair)); \ /**/ #define RM_APPEND_TYPE(r,d,tf_pair) \ BOOST_PP_COMMA_IF(d) \ BOOST_PP_TUPLE_ELEM(2,0,BOOST_PP_SEQ_ELEM(d,tf_pair)) \ /**/ #define RM_DEFINE_ROW(Name, type_field_seq) \ struct Name \ { \ BOOST_PP_REPEAT(BOOST_PP_SEQ_SIZE(type_field_seq),RM_DECL_COL,type_field_seq) \ typedef ::relational::type_list_v< \ BOOST_PP_REPEAT(BOOST_PP_SEQ_SIZE(type_field_seq),RM_APPEND_TYPE,type_field_seq) \ > type_list; \ } \ /**/
--------------------------------------------------- <------------- test.cpp ------------------- #include "RM_DEFINE_ROW.hpp"
RM_DEFINE_ROW(xxx,((int,a_int))((float,a_float)));
------------------------------------------ The resulting preprocessor output is: <------------- code ------------------- struct xxx { int a_int; float a_float; typedef ::relational::type_list_v< int , float > type_list; }; ------------- code -------------------
participants (13)
-
Andreas Pokorny
-
Arkadiy Vertleyb
-
Caleb Epstein
-
Calum Grant
-
Cory Nelson
-
Ion GaztaƱaga
-
Jeff Garland
-
Jonathan Wakely
-
Jose
-
Larry Evans
-
Mr Minh
-
Pavel Vozenilek
-
Reece Dunn