Re: [boost] SOCI 3.0.0 - The Database Access Library

10 Jul 2008

      Ion Gaztan~aga wrote:
...
1) I think a solid ODBC backend is a must for a Boost DB library. After 
all, ODBC is the most widely used driver. Is SOCI ODBC compatibility 
mature or still needs serious work?
Some SOCI users enjoy the existing ODBC backend in production code. As 
far as I understand, it does not require any major work and that's why 
I'm pretty sure that the next library release will include the ODBC 
backend as a supported part of the whole. Very likely the binary release 
for Windows users (lib+dll) will be available as well.
...
2) Do we know how efficient SOCI is?
One of our users performed tests comparing SOCI to other libraries, 
including the native client library for one database. The conclusion 
from the test was that with optimized build (like -O2 for g++) the 
performance of SOCI is comparable to other alternatives, with few 
percent of overhead in relation to native client library, for *these* tests.

Note also that SOCI supports bulk operations in terms of vector<T>. The 
Oracle backend is currently the only one that fully benefits from it by 
performing data transmissions in blocks (it reduces network roundtrips), 
binding to raw underlying memory buffer for direct data transport. The 
scalability potential of this solution is huge, but the actual results 
can be very sensitive to the details of the given test.
...
I understand that database 
operations might be in most cases more expensive than SOCI operations
Yes, this is true. Of course, SOCI adds its own overhead in terms of 
dynamic data structures that are created here and there, but the 
important property of SOCI is that in its "canonical" idioms the library 
binds directly to data provided by the user. For example:

int salary;
sql << "select salary from persons where id = 123", into(salary);

Above, the result of the query is put *directly* in the salary variable 
if the given backend supports binary data transmission (Oracle, ODBC) 
and there is only one conversion step if the backend operates on strings 
internally (PostgreSQL, MySQL, ...). The same is true for explicit 
statements.

In other words, the overhead of SOCI is a one-time investment for each 
statement and is necessary to keep the statement in shape and to 
associate data references with query results. After the statement is 
created, all subsequent operations use existing structures for guidance 
what to put where and if the statement is repeated many times - which is 
where performance actually matters - the copy/conversion overhead is 
minimal or just zero.

(Well, we had to add some additional data copying to ensure 
const-correctness as perceived by the user code in *some* cases, but 
this would need to be done with any other alternative approach as well.)

The only place where SOCI introduces some significant overhead is when 
complicated user-defined datatypes are involved, since data needs to be 
often copied and converted back and forth. This is the price for 
flexibility. But hey, native client libraries and even ODBC have no idea 
how to translate the query results into boost::fusion<std::string, 
boost::optional<MyIntegerWithRange>, boost::gregorian::date>, right? ;-)

There are also other idioms supported by SOCI, like dynamic rowset 
description or stream-like data extraction and they can (depending on 
the backend) introduce additional overhead. Again, it is the price for 
ultimate flexibility. For those who are performance-paranoid, the basic 
statically bound idioms are recommended.

Having said all the above, it is important to note that with databases 
the biggest opportunity for performance tuning is not in the client 
interface, but in SQL queries and database structures themselves.
The rest is... Measure It Yourself. ;-)

-- 
Maciej Sobczak * www.msobczak.com * www.inspirel.com