Re: [boost] [Boost-users] SQL library with full embedded SQL-syntax

Dmitri Lyfar wrote:
Hi I would like to participate in Google Summer of Code in this year.
Great, welcome! Note, this discussion would probably be better hosted in the development list, so I've cross-posted it there.
I reviewed boost projects for last years and found an interesting project, I think. If you go to the link below you will see a short description for that library: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Google_Summer...
The goal of this post is to define the main library features, which will be useful for users. Is that library useful for C++ community? If you are using(or have ever used) similar C++ libraries for work with SQL: which of features do you want to have? Also I want to notice one of the biggest beauty of that approach: you can get the compiler to type-check the SQL-expression, what do you think about that?
Well, I'll be brutally honest -- I don't think I would go that direction. We didn't pick a database project last year because a majority of the mentors agreed that the 'dsl-based' approach was more of a toy than reality. The main reason for this is that in 'real-world' applications queries are often dynamic and must be built at run-time. So, before we get ahead of ourselves and think about 'sql embedded in C++' we need the basics of getting database access. At the point the leading candidate for boostification of basic db access is SOCI: http://soci.sourceforge.net/ As I understand the state of affairs, the main barrier to Boost submission is getting the code into the usual boost structure, writing additional tests and such. I"ve cc'ed a couple of principles on SOCI, so they may want to comment on my perception. However, if I'm correct, a great SOC submission may in fact be to Boostify SOCI. It's not as 'sexy' as the embedded SQL, but I, for one would be a big supporter. Jeff

Hi, Jeff Garland wrote:
Well, I'll be brutally honest -- I don't think I would go that direction. We didn't pick a database project last year because a majority of the mentors agreed that the 'dsl-based' approach was more of a toy than reality. The main reason for this is that in 'real-world' applications queries are often dynamic and must be built at run-time.
The other reason is that in 'real-world' you might want to call stored procedures instead of manually glued pieces of ad-hoc SQL queries, in which case the support in the area of query composition is probably not very useful anyway. The database library is not really about creating queries - it's about exchanging data (but see below).
At the point the leading candidate for boostification of basic db access is SOCI:
As I understand the state of affairs, the main barrier to Boost submission is getting the code into the usual boost structure, writing additional tests and such. I"ve cc'ed a couple of principles on SOCI, so they may want to comment on my perception. However, if I'm correct, a great SOC submission may in fact be to Boostify SOCI. It's not as 'sexy' as the embedded SQL, but I, for one would be a big supporter.
Thanks! :-) Indeed, we are now working on boostification of the SOCI library. The reorganization of code and change of naming convention was already done for the core part and half of the backends - these are available for your inspection any time from CVS (linked from the project home page). I admit that we are progressing somewhat slowly, but I'm optimistic about the process itself. We keep the list of suggestions gatheres during the last wave of DB discussion on the Boost developer's list and we will work out major points. Integration with Boost in terms of boost::optional (for example) is at the top of our tasks. Coming back to static SQL composition, please note that something like this can be an add-on to the SOCI library, where the query is in fact whatever object that is IOStream-able - we have never used anything else than a string or string literal there, but there is nothing that prevents you from doing some tricks with operator<<. Statically creating the temporary object that will get streamed into intended SQL query is a non-intrusive and optional solution that will probably make everybody happy. Cheers, -- Maciej Sobczak : http://www.msobczak.com/ Programming : http://www.msobczak.com/prog/

Maciej Sobczak wrote:
Hi,
Jeff Garland wrote:
Well, I'll be brutally honest -- I don't think I would go that direction. We didn't pick a database project last year because a majority of the mentors agreed that the 'dsl-based' approach was more of a toy than reality. The main reason for this is that in 'real-world' applications queries are often dynamic and must be built at run-time.
The other reason is that in 'real-world' you might want to call stored procedures instead of manually glued pieces of ad-hoc SQL queries, in which case the support in the area of query composition is probably not very useful anyway.
In the 'real-world' both things are used. Choosing C++ for a projects where databases are involved is probably rare and Java/C#/Python/Ruby is more likely. C++ doesn't exactly boost productivity, so it has to offer something else to be a condidate. And what can C++ offer? Type safety. -Thorsten

Thorsten Ottosen wrote:
Maciej Sobczak wrote:
Hi,
Jeff Garland wrote:
Well, I'll be brutally honest -- I don't think I would go that direction. We didn't pick a database project last year because a majority of the mentors agreed that the 'dsl-based' approach was more of a toy than reality. The main reason for this is that in 'real-world' applications queries are often dynamic and must be built at run-time.
The other reason is that in 'real-world' you might want to call stored procedures instead of manually glued pieces of ad-hoc SQL queries, in which case the support in the area of query composition is probably not very useful anyway.
In the 'real-world' both things are used.
Choosing C++ for a projects where databases are involved is probably rare and Java/C#/Python/Ruby is more likely. C++ doesn't exactly
Wow, I couldn't disagree more. Thousands of projects use C++ for database access. And even now, there are projects for which a tightly written db access core in C++ performs better than scripted or Java equivalent. Now I'll be the first to admit the lack of good modern libraries and tools for C++ database mapping doesn't make C++ the first choice -- in fact it hinders the adoption of C++ greatly. But if your application is already in C++ you really shouldn't be 'punished' by not having good db access tools and libraries.
boost productivity, so it has to offer something else to be a condidate. And what can C++ offer? Type safety.
Raw, unadulterated speed. Jeff

On 3/15/07, Jeff Garland <jeff@crystalclearsoftware.com> wrote:
Thorsten Ottosen wrote: [snip]
Choosing C++ for a projects where databases are involved is probably rare and Java/C#/Python/Ruby is more likely. C++ doesn't exactly
Wow, I couldn't disagree more. Thousands of projects use C++ for database access. And even now, there are projects for which a tightly written db access core in C++ performs better than scripted or Java equivalent. Now I'll be the first to admit the lack of good modern libraries and tools for C++ database mapping doesn't make C++ the first choice -- in fact it hinders the adoption of C++ greatly. But if your application is already in C++ you really shouldn't be 'punished' by not having good db access tools and libraries.
boost productivity, so it has to offer something else to be a condidate. And what can C++ offer? Type safety.
Raw, unadulterated speed.
[snip] I measured a timing of my recent Python & MySQL & file I/O application. It involved a lot of sequential file I/O read/verify/reformat/output, a lot of regexp parsing of the data, and not really much DB access (gulped several dictionaries from MySQL during startup, the biggest one with 0.5 millions rows). With substantial amount of data to process the program was somewhat slow. I thought that it spends most of time in regexp parsing and looked if it would make sense to rewrite the codes in C++. Turned to be that it ate roughly 15 seconds of CPU time during 15 minutes runs. Personally, I dropped the idea to rewrite it in C++ for its "raw, unadulterated speed" after the measurements...

Yuriy Koblents-Mishke wrote: Catching up....
On 3/15/07, Jeff Garland <jeff@crystalclearsoftware.com> wrote:
Raw, unadulterated speed.
[snip]
I measured a timing of my recent Python & MySQL & file I/O application. It involved a lot of sequential file I/O read/verify/reformat/output, a lot of regexp parsing of the data, and not really much DB access (gulped several dictionaries from MySQL during startup, the biggest one with 0.5 millions rows). With substantial amount of data to process the program was somewhat slow. I thought that it spends most of time in regexp parsing and looked if it would make sense to rewrite the codes in C++.
Turned to be that it ate roughly 15 seconds of CPU time during 15 minutes runs. Personally, I dropped the idea to rewrite it in C++ for its "raw, unadulterated speed" after the measurements...
That's fine...I don't want to start a performance argument here. And, I agree, that for many, many apps there's no advantage. That doesn't preclude the fact that for some there is. Jeff

Maciej Sobczak wrote:
Hi,
Jeff Garland wrote:
Well, I'll be brutally honest -- I don't think I would go that direction. We didn't pick a database project last year because a majority of the mentors agreed that the 'dsl-based' approach was more of a toy than reality. The main reason for this is that in 'real-world' applications queries are often dynamic and must be built at run-time.
The other reason is that in 'real-world' you might want to call stored procedures instead of manually glued pieces of ad-hoc SQL queries, in which case the support in the area of query composition is probably not very useful anyway.
True, but I wouldn't recommend it since it's inherently non-portable. Jeff

Jeff Garland wrote:
Maciej Sobczak wrote:
Hi,
Jeff Garland wrote:
Well, I'll be brutally honest -- I don't think I would go that direction. We didn't pick a database project last year because a majority of the mentors agreed that the 'dsl-based' approach was more of a toy than reality. The main reason for this is that in 'real-world' applications queries are often dynamic and must be built at run-time.
The other reason is that in 'real-world' you might want to call stored procedures instead of manually glued pieces of ad-hoc SQL queries, in which case the support in the area of query composition is probably not very useful anyway.
True, but I wouldn't recommend it since it's inherently non-portable.
Portability is another reason to go away from sql queries as strings. -Thorsten

Thorsten Ottosen wrote:
Portability is another reason to go away from sql queries as strings.
Since SQL is a standard language I don't see how using it in strings makes it non-portable. Could you explain you reasoning? -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
Thorsten Ottosen wrote:
Portability is another reason to go away from sql queries as strings.
Since SQL is a standard language I don't see how using it in strings makes it non-portable. Could you explain you reasoning?
Different DBs actually use different "dialects" of SQL. There is a subset of SQL that can be used pretty portably, but also a large amount of stuff that can't. Example: quoting. Postgres (and I think Oracle) use single quotes for strings, double quotes for identifiers. MySQL uses either double or single quotes for strings and backticks for identifiers. MSSQL uses single quotes for strings (not sure about double) and square brackets for identifiers. Also, embedded quotes are escaped by a backslash in some systems, by a double quote in others. Data types have subtly different names. Oracle discourages the use of the standard VARCHAR in favour of their own VARCHAR2, even though they're aliases. ("But they might not be in the future!") DDL is very poor in terms of standard support. Table definitions can differ quite significantly between databases. Hibernate solves this problem by having a "Dialect" class for every database. This class is responsible for generating the DB-specific SQL. Sebastian Redl

Sebastian Redl wrote:
Rene Rivera wrote:
Thorsten Ottosen wrote:
Portability is another reason to go away from sql queries as strings.
Since SQL is a standard language I don't see how using it in strings makes it non-portable. Could you explain you reasoning?
Different DBs actually use different "dialects" of SQL. There is a subset of SQL that can be used pretty portably, but also a large amount of stuff that can't. Example: quoting. Postgres (and I think Oracle) use single quotes for strings, double quotes for identifiers. MySQL uses either double or single quotes for strings and backticks for identifiers. MSSQL uses single quotes for strings (not sure about double) and square brackets for identifiers. Also, embedded quotes are escaped by a backslash in some systems, by a double quote in others. Data types have subtly different names. Oracle discourages the use of the standard VARCHAR in favour of their own VARCHAR2, even though they're aliases. ("But they might not be in the future!") DDL is very poor in terms of standard support. Table definitions can differ quite significantly between databases.
Hibernate solves this problem by having a "Dialect" class for every database. This class is responsible for generating the DB-specific SQL.
Security is another reason to go away from sql queries as strings. Prevent SQL injection attacks. Best regards Jorge

Hi, On 3/16/07, Jorge Lodos <lodos@segurmatica.cu> wrote:
Security is another reason to go away from sql queries as strings. Prevent SQL injection attacks.
You bind your parameters, you don't have any problems (except when this doesn't work, but then stored procedures don't help either). Regards, Michael

Michael Walter wrote:
On 3/16/07, Jorge Lodos <lodos@segurmatica.cu> wrote:
Security is another reason to go away from sql queries as strings. Prevent SQL injection attacks.
You bind your parameters, you don't have any problems (except when this doesn't work, but then stored procedures don't help either).
Sure, but it is the programmer responsibility to bind the parameters instead of concatenating strings. Not using SQL strings avoids errors from programmers. What happens with many of the existing SQL injection attacks is that programmers didn't bind parameters even when they had the possibility to do so. Best regards Jorge

Jorge Lodos wrote:
Michael Walter wrote:
On 3/16/07, Jorge Lodos <lodos@segurmatica.cu> wrote:
Security is another reason to go away from sql queries as strings. Prevent SQL injection attacks. You bind your parameters, you don't have any problems (except when this doesn't work, but then stored procedures don't help either).
Sure, but it is the programmer responsibility to bind the parameters instead
of concatenating strings. Not using SQL strings avoids errors from programmers. What happens with many
of the existing SQL injection attacks is that programmers didn't bind parameters even when they had the possibility to do so.
Sorry, I don't see how any of this applies -- just because the SQL is a string doesn't mean it comes from an untrusted source. And, programmers that don't validate input from untrusted sources deserve what they get.... Jeff

Sebastian Redl wrote:
Rene Rivera wrote:
Thorsten Ottosen wrote:
Portability is another reason to go away from sql queries as strings.
Since SQL is a standard language I don't see how using it in strings makes it non-portable. Could you explain you reasoning?
Different DBs actually use different "dialects" of SQL. There is a subset of SQL that can be used pretty portably, but also a large amount of stuff that can't.
[snip] Sure... But we could use that same argument to propose that we write C++ programs by writing an abstraction on top of C++ compilers in Python so that we can remove the incompatibilities in the various C++ implementations. Having had the displeasure of dealing with PL/SQL there is one serious drawback to the language embedded approach, it ties ones to the capabilities of the abstraction. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
Sebastian Redl wrote:
Rene Rivera wrote:
Thorsten Ottosen wrote:
Portability is another reason to go away from sql queries
as strings.
Since SQL is a standard language I don't see how using it
in strings
makes it non-portable. Could you explain you reasoning?
Different DBs actually use different "dialects" of SQL. There is a subset of SQL that can be used pretty portably, but also a large amount of stuff that can't.
[snip]
Sure... But we could use that same argument to propose that we write C++ programs by writing an abstraction on top of C++ compilers in Python so that we can remove the incompatibilities in the various C++ implementations.
I guess you could do that. Like C++, porting can be a major problem. Note, however, that is not an argument against having portible SQL DSL. Some C++ coders actually prefer to use a portable subset (STLSOFT libs for one).
Having had the displeasure of dealing with PL/SQL there is one serious drawback to the language embedded approach, it ties ones to the capabilities of the abstraction.
Right. The SQL library should provide a backdoor so you can still use string queries if you need to be non-portable or cutting egde etc. I some of our company tools, we support multiple server backends: at least oracle, mssql and mysql. I'm surpised just how diffucult it is to write portable SQL (even for very very simple queries). If you want to use different functions, there's a pretty good change they are different on different systems. I have end up writing my own abstractionlayer in php to compensate. Here's some of the functions I had to provide my own abstraction around: // // encapsulate sql differences // locateSyntax( $substr, $str ); // inverseLocateSyntax( $substr, $str ); // lastIdSyntax( $seq = "" ); // lengthSyntax(); // timeSyntax(); // timeExpiredSyntax(); // replaceWildcard( $value ); // toupperSyntax( $column ); // maxSyntax( $column ); // sumSyntax( $column ); // countSyntax( $column ); // limitSelectSyntax( $limit, $queryWithoutSelect ); // dropIfExistsSyntax( $table ); // createEmptyTableCopySyntax( $oldTable, $newTable ); // renameTableSyntax( $oldTable, $newTable ); // escapedStringSyntax( $string ); // insertBlobSyntax( $table, $column, $id, $data ); // deEscapedStringSyntax( $string ); // unixtimeToDateSyntax( $column ); // nullDateSyntax(); It's a nightmare :-) -Thorsten

Thorsten Ottosen wrote:
Rene Rivera wrote:
Sure... But we could use that same argument to propose that we write C++ programs by writing an abstraction on top of C++ compilers in Python so that we can remove the incompatibilities in the various C++ implementations.
I guess you could do that. Like C++, porting can be a major problem.
Note, however, that is not an argument against having portible SQL DSL.
True :-) But it's an argument that a portable SQL DSL is not a great advantage.
Having had the displeasure of dealing with PL/SQL there is one serious drawback to the language embedded approach, it ties ones to the capabilities of the abstraction.
Right. The SQL library should provide a backdoor so you can still use string queries if you need to be non-portable or cutting egde etc.
I some of our company tools, we support multiple server backends: at least oracle, mssql and mysql. I'm surpised just how diffucult it is to write portable SQL (even for very very simple queries).
If you want to use different functions, there's a pretty good change they are different on different systems. I have end up writing my own abstractionlayer in php to compensate.
That drives into what I think is the core problem I have with an SQL DSL. It doesn't provide an abstraction that has compelling advantages, other than syntactic sugar. What I would find way more useful would be an abstraction that removes the SQL language itself, i.e. a relational database library. That would have some advantages: * Better coherence with C++ data types. * Possibility to support non-SQL relational databases. * Make it easier to construct object/relational translation abstractions. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
Thorsten Ottosen wrote:
Rene Rivera wrote:
Sure... But we could use that same argument to propose that we write C++ programs by writing an abstraction on top of C++ compilers in Python so that we can remove the incompatibilities in the various C++ implementations. I guess you could do that. Like C++, porting can be a major problem.
Note, however, that is not an argument against having portible SQL DSL.
True :-) But it's an argument that a portable SQL DSL is not a great advantage.
A couple things. Some people don't really care about database portability. So in that case, they feel free to use every non-standard feature of the database they are using. For these folks it's really hard to argue for anything other than a string interface because otherwise you'd have to emulate every feature of every each database -- basically impossible. For people that want a portable database it would be an interesting first step to provide a portable string-based interface that supports basic queries and bindings. BTW, Rogue Wave went down both of these paths 10 years ago with their DBTools library. The first versions of the library *only* provided an extensive set of classes to build up sql commands -- it supports many database back ends seamlessly. Later, they ended up adding a 'string like' interface. I wasn't there, but I can only suppose that it was customer demand for things the 'safe library' didn't offer.
Having had the displeasure of dealing with PL/SQL there is one serious drawback to the language embedded approach, it ties ones to the capabilities of the abstraction. Right. The SQL library should provide a backdoor so you can still use string queries if you need to be non-portable or cutting egde etc.
I some of our company tools, we support multiple server backends: at least oracle, mssql and mysql. I'm surpised just how diffucult it is to write portable SQL (even for very very simple queries).
If you want to use different functions, there's a pretty good change they are different on different systems. I have end up writing my own abstractionlayer in php to compensate.
That drives into what I think is the core problem I have with an SQL DSL. It doesn't provide an abstraction that has compelling advantages, other than syntactic sugar. What I would find way more useful would be an abstraction that removes the SQL language itself, i.e. a relational database library. That would have some advantages:
* Better coherence with C++ data types. * Possibility to support non-SQL relational databases. * Make it easier to construct object/relational translation abstractions.
I still think the core has to be built on the more flexible string interface. BTW, there was a relational template library for boost at one point that might provide this abstraction you're looking for. But, no doubt in my mind that you are correct that the embedded DSL is a distraction from the real issues in this domain. Jeff

Rene Rivera wrote:
Thorsten Ottosen wrote:
Rene Rivera wrote:
Sure... But we could use that same argument to propose that we write C++ programs by writing an abstraction on top of C++ compilers in Python so that we can remove the incompatibilities in the various C++ implementations.
I guess you could do that. Like C++, porting can be a major problem.
Note, however, that is not an argument against having portible SQL DSL.
True :-) But it's an argument that a portable SQL DSL is not a great advantage.
I can't see that. Can you explain? -Thorsten
participants (8)
-
Jeff Garland
-
Jorge Lodos
-
Maciej Sobczak
-
Michael Walter
-
Rene Rivera
-
Sebastian Redl
-
Thorsten Ottosen
-
Yuriy Koblents-Mishke