Re: [boost] Interest in an ODBC library?

newer
Re: [boost] Interest in an ODBC...

Brian Braatz

28 Jul 2005 28 Jul '05

8:34 p.m.

On Behalf Of Hajo Kirchhoff

Brian Braatz wrote:

...
you split the lib into a few pieces:

1- Get the data from the data base and stream it into X 2- a stock form of "X" (like the table you described) 3- Helpers for binding

if you built the thing that way, then it will be usable by those of us out there who have large bodies of code relying on things like ADO recordsets. I would love to replace the recordsets, but I need to be able to selectively use pieces of your library and I need to have a usage model that is similar.

Thanks, interesting thoughts.

Hajo

[Brian Braatz Writes:] If you are interested in collaborating, I have wanted to (for some time now) build the table<> aspect of what you are working on. My goals: 1- functional drop in replacement for ADO recordset 2- Dynamic and static "views" and indexes 3- binding model that does NOT force you to use the bindings supplied with the library this is something ADO lacks, and something I have had to work around to great frustration 4- the ability to COPY from one table<> to another table<> * ADO does NOT ALLOW you to copy recordsets. (brilliant no?) Additionally - I am not sure if he is still interested, but Joaquin Munez and I exchanged emails awhile back about collaborating on such a beast. The general idea was to take the functionality in multi-index container and make it work dynamically. (and there is more) I am working on something unrelated to this discussion that could form a good foundation for table<>. I should have something to share middle of next week. Just let me know if you are interested in putting our heads together. :) Brian

Show replies by date

Brock Peabody

28 Jul 28 Jul

9:51 p.m.

New subject: Interest in an ODBC library?

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Brian Braatz

...

My goals: 1- functional drop in replacement for ADO recordset 2- Dynamic and static "views" and indexes 3- binding model that does NOT force you to use the bindings supplied with the library this is something ADO lacks, and something I have had to work around to great frustration 4- the ability to COPY from one table<> to another table<> * ADO does NOT ALLOW you to copy recordsets. (brilliant no?)

Hi, I've spent a lot of time developing a database library and a binding language on top of it the past year and finally catching up on this thread I just have to throw a few cents in. 1) Rather than basing it on ODBC, it might be nice to have a boost::database library for which there could be multiple implementations such as ODBC, (native) postgresql, etc...: boost::database d = boost::odbc::open(...); d = boost::postgresql::open(...); d = boost::msql::open(); ODBC is not always available or desirable. 2) With this done it's not too hard to write a non-intrusive DSEL for binding on top of it. I wrote one that supports dependent tables, automatic mapping to STL containers, boost::optional, and common data types like boost::posix_time::ptime. A simple example of the language might look like: table("customer") [ *field("id") = &customer::id, field("name") = &customer::name, field("number") = &customer::number, table("tax_exempt")[ -field("customer"), field("state_id") = &tax_exempt_data::state_id ] = &customer::tax_exempt ]; The structure customer has members id, name, and number. The id field is its key. It also contains an optional tax_exempt structure that resides in a table called tax_exempt and whose "customer" field is related to the customer's id field. Especially when dealing with nested data structures, the queries generated by the library are far more optimized than what one could easily write by hand, both for saving and loading. 3) On a bit of a tangent, it's also handy to have a DSEL for generating SQL criteria strings. It's so easy to make typos or forget 's on strings: field name("name"); field number("number"); //s = "name = 'bob' OR number < 5"; const string s = criteria(name == "bob" || number < 5); If any of that sounds useful I'd love to help out. Brock

Caleb Epstein

29 Jul 29 Jul

1:57 p.m.

New subject: Interest in an ODBC library?

On 7/28/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...

table("customer") [ *field("id") = &customer::id, field("name") = &customer::name, field("number") = &customer::number,

table("tax_exempt")[ -field("customer"), field("state_id") = &tax_exempt_data::state_id ] = &customer::tax_exempt ];

I assume 'customer' is a class with data members id, name, number? [...]

...

If any of that sounds useful I'd love to help out.

It sounds very useful! Are you planning on sharing the code you've written, or would this be a start-from-scratch approach? -- Caleb Epstein caleb dot epstein at gmail dot com

Brock Peabody

3 Aug 3 Aug

8:02 p.m.

New subject: Interest in an ODBC library?

Sorry for the late reply, my filter threw this message in the trash bin.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Caleb Epstein

On 7/28/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...
table("customer") [ *field("id") = &customer::id, field("name") = &customer::name, field("number") = &customer::number,

table("tax_exempt")[ -field("customer"), field("state_id") = &tax_exempt_data::state_id ] = &customer::tax_exempt ];

I assume 'customer' is a class with data members id, name, number?

Exactly. I tried to keep the example as short as I could. You can assign the above expression to a serializer: serializer<customer> s = table(.... //loads all customers named Bob std::vector<customer> customers = s.load<std::vector<customer> >(database, "name = 'Bob'"); //throws if more than one boost::optional<customer> c = s.load<boost::optional<customer> >(database, "id = 5"); Usually in a case like this customer wouldn't have an id field, it would be the key to a map: typedef std::map<int,customer> customer_map; typedef std::pair<int,customer> pair_type; serializer<pair_type> s = table("customer") [ *field("id") = &pair_type::first field("name") = &customer::name, ... ]; customer_map m = s.load<customer_map>(db, "id < 50");

...

[...]

...
If any of that sounds useful I'd love to help out.

It sounds very useful! Are you planning on sharing the code you've written, or would this be a start-from-scratch approach?

Thanks! I'll share all I've learned but it might be better to start from scratch. I've made too many assumptions that are true in my world but not in general and we can probably come up with a better syntax than what I've got. Where do you think would be a good place to start? I could put together and post an abstract database interface with a postgresql implementation for starters. Brock

Caleb Epstein

9:23 p.m.

New subject: Interest in an ODBC library?

On 8/3/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...

Where do you think would be a good place to start? I could put together and post an abstract database interface with a postgresql implementation for starters.

Sounds like a perfect start to me! -- Caleb Epstein caleb dot epstein at gmail dot com

Brock Peabody

14 Aug 14 Aug

10:16 p.m.

New subject: Interest in an ODBC library?

This has taken me longer to get to than I thought, but here it is. The zip file contains three units: 1) Boost.Database - a minimalist abstract database interface. The interface (boost::db::database in database.hpp) really exposes two main methods: one performs execute queries, and one performs queries that return data sets. These are called row_sets and provide a random access container interface. The interface also provides start/commit/rollback transaction functions. These are implemented with execute queries but are included to allow reference counted, nested transactions. 2) Boost.Database.Postgresql - an implementation of the above interface for postgresql 3) test - a small program that opens and queries a postgresql database. I've compiled this on VC 7.1 but I don't think there's anything that won't work on another platform. Let me know what you think, Brock

...

-----Original Message----- On Behalf Of Caleb Epstein Sent: Wednesday, August 03, 2005 4:24 PM To: boost@lists.boost.org Subject: Re: [boost] Interest in an ODBC library?

On 8/3/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...
Where do you think would be a good place to start? I could put together and post an abstract database interface with a postgresql implementation for starters.

Sounds like a perfect start to me!

Jonathan Wakely

18 Aug 18 Aug

8:57 a.m.

New subject: Interest in an ODBC library?

Brock Peabody wrote:

...

1) Boost.Database - a minimalist abstract database interface. The interface (boost::db::database in database.hpp) really exposes two main methods: one performs execute queries, and one performs queries that return data sets. These are called row_sets and provide a random access container interface. The interface also provides start/commit/rollback transaction functions. These are implemented with execute queries but are included to allow reference counted, nested transactions.

Hi Brock, I haven't looked at all the code yet, but I like the interface and what I've seen so far - I'm definitely interested in this library,

...

2) Boost.Database.Postgresql - an implementation of the above interface for postgresql

I have an implementation of your database library for MySQL. It's on my hard drive at home and I'm broadband-less right now so if anyone wants to see it let me know and I'll dial up tonight. It was very easy to adapt to use the MySQL API.

...

3) test - a small program that opens and queries a postgresql database.

I've compiled this on VC 7.1 but I don't think there's anything that won't work on another platform.

I tested it with GCC 3.4 and 4.1 with no real problems. postgresql/connect.cpp accidentally uses the <:: trigraph in the typedefs at the top of the file. You need a space before the template args to avoid that. How do you handle NULL values? the postgresql::row_set_imp::get() function doesn't seem to account for NULL, which would crash strlen(), atof() etc. jon

Jonathan Wakely

9:40 a.m.

New subject: Interest in an ODBC library?

Jonathan Wakely wrote:

...

How do you handle NULL values? the postgresql::row_set_imp::get() function doesn't seem to account for NULL, which would crash strlen(), atof() etc.

I see from the docs that PQgetvalue() never returns a NULL char*. Does that mean some way of distinguishing empty strings from NULL fields is needed? i.e. a wrapper for PQgetisnull. jon

Brock Peabody

1:18 p.m.

New subject: Interest in an ODBC library?

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Jonathan Wakely

...

...
How do you handle NULL values? the postgresql::row_set_imp::get() function doesn't seem to account for NULL, which would crash strlen(), atof() etc.

I see from the docs that PQgetvalue() never returns a NULL char*. Does that mean some way of distinguishing empty strings from NULL fields is needed? i.e. a wrapper for PQgetisnull.

We could make abstract_row_set::get return an optional<field> instead of a field. Then row::get_field could return an optional as well. I'd suggest that row::get, however, continue its current behavior which is to return a default constructed value in the presence of null fields.

Caleb Epstein

2:12 p.m.

New subject: Interest in an ODBC library?

On 8/18/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...

We could make abstract_row_set::get return an optional<field> instead of a field. Then row::get_field could return an optional as well.

I'd suggest that row::get, however, continue its current behavior which is to return a default constructed value in the presence of null fields.

I think the ability to detect NULL values for columns is a pretty important one in any database interface. Boost.Optional sounds like the perfect solution from a design perspective, but might hurt performance-wise. What if there were a method like bool row::is_null (unsigned int index)? -- Caleb Epstein caleb dot epstein at gmail dot com

Jonathan Wakely

2:50 p.m.

New subject: Interest in an ODBC library?

Caleb Epstein wrote:

...

On 8/18/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...
We could make abstract_row_set::get return an optional<field> instead of a field. Then row::get_field could return an optional as well.

I'd suggest that row::get, however, continue its current behavior which is to return a default constructed value in the presence of null fields.

I think the ability to detect NULL values for columns is a pretty important one in any database interface. Boost.Optional sounds like the perfect solution from a design perspective, but might hurt performance-wise. What if there were a method like bool row::is_null (unsigned int index)?

I think it's reasonable to have to ask whether it's NULL before fetching the value, rather than using optional. Some queries won't return NULLs, either because the column definition is NOT NULL or because the query specified "IFNULL(..., ...)" so the user is the best qualified to know whether it is necessary to test for NULL. If they call get() then the library can assume they know there's a non-NULL value there to get() and not worry about handling a NULL. jon -- "I always keep a supply of liquor handy in case I see a snake, which I also keep handy." - W.C. Fields

Brock Peabody

3:45 p.m.

New subject: Interest in an ODBC library?

...

On Behalf Of Jonathan Wakely

Caleb Epstein wrote:

...
I think the ability to detect NULL values for columns is a pretty important one in any database interface. Boost.Optional sounds like the perfect solution from a design perspective, but might hurt performance-wise. What if there were a method like bool row::is_null (unsigned int index)?

Generally, I'd lean towards cleaner interface until a performance problem is proven. Chances are any time spent dealing with optional<>s is going to be dwarfed by the database operations.

...

I think it's reasonable to have to ask whether it's NULL before fetching the value, rather than using optional.

This is exactly the situation Boost.Optional was designed for.

...

Some queries won't return NULLs, either because the column definition is NOT NULL or because the query specified "IFNULL(..., ...)" so the user is the best qualified to know whether it is necessary to test for NULL.

The user's program might not be the only one writing to the database though. It's nice to be able to have a way to provide a default automatically. Maybe: template <typename T> T get(const std::string& field_name, T default_) const; so you could call it like: i->get("field", 0); instead of i->get<int>("field");

...

If they call get() then the library can assume they know there's a non-NULL value there to get() and not worry about handling a NULL.

Still, I hate to have an interface where the user can easily forget to call is_null() and calling get() can blow up long after the user's code has been though testing. When you return an optional<> it's clear that you need to check it. I think the concrete interface can provide: 1) a version of get that returns an optional 2) is_null(unsigned int index) 3) a version of get converting nulls to defaults all by changing the abstract_row::get to return an optional<field>. If this proves to be a performance issue we can optimize the abstract interface with an is_null() function leaving the concrete user interface unchanged. On another topic, what do you think of the types included in the field variant? What important types are missing? Should we try to include all types our users could need or just the most common ones and an easy way for the users to change it?

Caleb Epstein

3:22 p.m.

New subject: Interest in an ODBC library?

On 8/18/05, Caleb Epstein <caleb.epstein@gmail.com> wrote:

...

On 8/18/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...
We could make abstract_row_set::get return an optional<field> instead of a field. Then row::get_field could return an optional as well.

I'd suggest that row::get, however, continue its current behavior which is to return a default constructed value in the presence of null fields.

I have a couple of comments/questions about your implementation so far, which is quite nice: * Iterators. I don't think these should be random access. Of the database APIs I have experience with (Sybase, SQLite), none allows you to navigate to an arbitrary row in a result set. You must process all results in sequence. * The transaction begin/commit/rollback methods should be moved to the abstract_database class as virtual methods. There is just too much variation in the way transactions are handled to implement this at the top level. For example, the syntax "START TRANSACTION" is not portable (isn't it just "BEGIN"?), and I believe that in general one must match each "BEGIN" with an explicit "COMMIT" or "ROLLBACK"; its not enough to just "COMMIT" or "ROLLBACK" once the nesting depth hits 1 unless you name your transactions. Additionally, some implementations (e.g. SQLite) don't support nested transactions, so the underlying impl ought to be able to throw in the case that the user requests such an operation. * I'd recommend a scoped_lock-like class for the transaction user interface. For the same reasons that it is not advisable to manually lock/unlock mutexes, it is not adviseable to manually begin/commit/rollback transactions. I'd suggest something like class transaction : boost::noncopyable { database& db_; bool committed_; public: transaction (database& d) : db_ (d) { db_.start_transaction (); } ~transaction () { if (!committed_) db_.rollback_transaction (); } void commit () { db_.commit_transaction (); committed_ = true; } }; * Prepared Statements. Sending data to a database is frequently done by preparing an SQL statement with placeholder arguments (e.g. "INSERT INTO foo ( col1, col2, ... ) VALUES ( ?, ?, ... )" and then binding program variables to those arguments and executing the prepared statement. Do you think it would be good to add this ability? * Binding. Some vendors provide APIs whereby query results can be directly bound to program variables. The general workflow would be something like: double price; std::string name; row_set results = database.execute (query); results.bind ("name", &name); results.bind ("price", &price); for (row_set::iterator i = results.begin (), e = results.end (); i != e && results.fetch (); ++i) { std::cout << "name=" << name << ", price=" << price << std::endl; } This saves the variant<> conversion overhead and can be extended to include user-defined types and conversions. Would you be open to including this in the implementation? -- Caleb Epstein caleb dot epstein at gmail dot com

Brock Peabody

4:56 p.m.

New subject: Interest in an ODBC library?

...

On Behalf Of Caleb Epstein

...

* Iterators. I don't think these should be random access. Of the database APIs I have experience with (Sybase, SQLite), none allows you to navigate to an arbitrary row in a result set. You must process all results in sequence.

The only ones I've used (MySQL, PostgreSQL, Access, DB2) have had random access. In all the code I've written I don't think I've done anything that required random access iterators for the database though. The main reason I went with random access was because it made the implementation of the iterator and my network database proxy easier. If there's no real use for random access iterators and some widely used databases don't support them then I agree we shouldn't either.

...

* The transaction begin/commit/rollback methods should be moved to the abstract_database class as virtual methods.

That's actually the way I used to have it.

...

There is just too much variation in the way transactions are handled to implement this at the top level. For example, the syntax "START TRANSACTION" is not portable (isn't it just "BEGIN"?),

Oops! That's easy to fix at least. If we're using the correct syntax is there still a need to add extra virtual functions?

...

and I believe that in general one must match each "BEGIN" with an explicit "COMMIT" or "ROLLBACK"; its not enough to just "COMMIT" or "ROLLBACK" once the nesting depth hits 1 unless you name your transactions. Additionally, some implementations (e.g. SQLite) don't support nested transactions, so the underlying impl ought to be able to throw in the case that the user requests such an operation.

The way I've got this implemented, I never have to nest transactions. I don't think any of the databases I use let you nest transactions, which is why the interface is set up the way it is. Think of it as a nested transactions simulator. Without this it's almost impossible for modules to use transactions. I like your class so I'll use it in my example: database d = open_database(); transaction t(d); save_customer(d, c); save_order(d,o); t.commit(); save_customer() may be called in other situations and it may have transactions of its own or call other functions that do. Counting transaction depth allows you to do this on databases that don't allow nesting.

...

* I'd recommend a scoped_lock-like class for the transaction user interface. For the same reasons that it is not advisable to manually lock/unlock mutexes, it is not adviseable to manually begin/commit/rollback transactions. I'd suggest something like

Great idea! We could also provide a named_transaction class for platforms that support them.

...

* Prepared Statements. Sending data to a database is frequently done by preparing an SQL statement with placeholder arguments (e.g. "INSERT INTO foo ( col1, col2, ... ) VALUES ( ?, ?, ... )" and then binding program variables to those arguments and executing the prepared statement. Do you think it would be good to add this ability?

What is the benefit of building a query this way? Personally, I'd rather see a type-safe SQL building DSEL. What do you think?

...

* Binding. Some vendors provide APIs whereby query results can be directly bound to program variables. The general workflow would be something like:

double price; std::string name;

row_set results = database.execute (query);

results.bind ("name", &name); results.bind ("price", &price);

for (row_set::iterator i = results.begin (), e = results.end (); i != e && results.fetch (); ++i) { std::cout << "name=" << name << ", price=" << price << std::endl; }

This saves the variant<> conversion overhead and can be extended to include user-defined types and conversions. Would you be open to including this in the implementation?

I don't know if this would need to be wired into the implementation; I think we could easily build it on top of our interface though. The second (and more exciting to me) part of this library is going to be the serializer and it should provide facilities similar to but more powerful than this.

Caleb Epstein

6:32 p.m.

New subject: Interest in an ODBC library?

On 8/18/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...

The only ones I've used (MySQL, PostgreSQL, Access, DB2) have had random access. In all the code I've written I don't think I've done anything that required random access iterators for the database though.

Hm, I wasn't aware MySQL offered random-access result iteration facilities. I have only used the mysql_fetch_result interface, but now I see mysql_data_seek. Neat. However, random access says to me that the client must receive and consume the entire result set from the server first. This is not practical for large result sets that may not fit into available memory.

...

The main reason I went with random access was because it made the implementation of the iterator and my network database proxy easier.

If there's no real use for random access iterators and some widely used databases don't support them then I agree we shouldn't either.

Perhaps two classes of iterators could be provided? RandomAccess for those platforms that support them and Sequential for the ones that don't. Actually, it should be a simple matter to build RandomAccess iterators on top of Sequential ones for those vendor APIs that don't natively support RA. Personally I don't see a lot of value in the RandomAccess iterators, but some may.

...

Oops! That's easy to fix at least. If we're using the correct syntax is there still a need to add extra virtual functions?

I think so (see below).

...

The way I've got this implemented, I never have to nest transactions. I don't think any of the databases I use let you nest transactions, which is why the interface is set up the way it is. Think of it as a nested transactions simulator. Without this it's almost impossible for modules to use transactions.

Agreed. FYI, the implementation in your email will emit multiple "BEGINS" ("START TRANSACTION") but only emits a single "COMMIT" or "ROLLBACK", which I think is a bug.

...

I like your class so I'll use it in my example:

Thanks :-)

...

transaction t(d); save_customer(d, c); save_order(d,o); t.commit();

save_customer() may be called in other situations and it may have transactions of its own or call other functions that do. Counting transaction depth allows you to do this on databases that don't allow nesting.

I think it would be nice to support nested transactions on those platforms where they exist. Someone *might* have reason to use sub-transactions within a larger transaction, some of which get committed and others which get rolled back before the ultimate parent transaction's commit/rollback. For example, a complex business process might involve inserts/updates to multiple tables. Perhaps one of those operations fails in a way that is deemed harmless (e.g. inserting a Customer record that already exists), and we wish to rollback this sub-operation but continue processing the larger transaction towards an ultimate COMMIT. It seems to me that if the underlying vendor product supports this, a Boost.Database ought not to stand in the way of using it. In part because supporting it seems quite simple to do.

...

...
* Prepared Statements. Sending data to a database is frequently done by preparing an SQL statement with placeholder arguments (e.g. "INSERT INTO foo ( col1, col2, ... ) VALUES ( ?, ?, ... )" and then binding program variables to those arguments and executing the prepared statement. Do you think it would be good to add this ability?

What is the benefit of building a query this way? Personally, I'd rather see a type-safe SQL building DSEL. What do you think?

That would be a very nice addition, as long as the syntax would not be too tortured :-) However, I suspect some users will come up with baroque and complex queries that may be difficult or impossible to represent in this DSEL. As with some complex grammars in Spirit, they may just be too much for the compiler to handle without choking or they may introduce unwanted compile-time penalties. Supporting user-generated SQL could also ease the porting of existing code that already has queries embedded in it. At the very least, Boost.Format or something like it could be used to build query strings in a type-safe and properly-escaped-string way. As to prepared statements, whether they be created from strings or a DSEL, they can be a major win performance-wise for applications which do lots of database writes. The overhead of parsing the SQL on each insert can become suprisingly high.

...

The second (and more exciting to me) part of this library is going to be the serializer and it should provide facilities similar to but more powerful than this.

I stopped myself from making that same point, as there is potentially a large overlap with Boost.Serialization as Dave A. mentioned. I do think a generalized object description framework (e.g. some form of Reflection) could be put to great use by a Database library, the Serialization library, the Langbinding code I hope will come into being some day, and any number of other applications. The possibilities are near endless. -- Caleb Epstein caleb dot epstein at gmail dot com

Brock Peabody

7:45 p.m.

New subject: Interest in an ODBC library?

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Caleb Epstein

...

Perhaps two classes of iterators could be provided? RandomAccess for those platforms that support them and Sequential for the ones that don't. Actually, it should be a simple matter to build RandomAccess iterators on top of Sequential ones for those vendor APIs that don't natively support RA.

I don't mind having two classes, but then do we have to make database a template? We've got a few choices: 1) make database a template with some sort of iteration policy 2) support only sequential iteration 3) provide only random access iteration, emulate where not available 4) provide separate begin()/end() functions for each type of iteration. I'm leaning towards 2.

...

FYI, the implementation in your email will emit multiple "BEGINS" ("START TRANSACTION") but only emits a single "COMMIT" or "ROLLBACK", which I think is a bug.

Oops again. It should only emit one. I'll try to invest some time tonight getting something set up in the sandbox CVS and post a fix for this and the <:: digraph.

...

I think it would be nice to support nested transactions on those platforms where they exist. Someone *might* have reason to use sub-transactions within a larger transaction, some of which get committed and others which get rolled back before the ultimate parent transaction's commit/rollback.

For example, a complex business process might involve inserts/updates to multiple tables. Perhaps one of those operations fails in a way that is deemed harmless (e.g. inserting a Customer record that already exists), and we wish to rollback this sub-operation but continue processing the larger transaction towards an ultimate COMMIT. It seems to me that if the underlying vendor product supports this, a Boost.Database ought not to stand in the way of using it. In part because supporting it seems quite simple to do.

OK, I'm beginning to see the light. We need the transactional methods to be abstract so the implementation can determine whether to allow nesting or to emulate it.

...

That would be a very nice addition, as long as the syntax would not be too tortured :-) However, I suspect some users will come up with baroque and complex queries that may be difficult or impossible to represent in this DSEL. As with some complex grammars in Spirit, they may just be too much for the compiler to handle without choking or they may introduce unwanted compile-time penalties.

One thing that makes it easier to manage than Spirit is that the final result is just a string so it's easy to break these expressions up without losing valuable type information. It's really just sugar for the most part and not high on my to-do list, but it gets more interesting when used in conjunction with the serialization library or with more "interesting" types like ptime.

...

Supporting user-generated SQL could also ease the porting of existing code that already has queries embedded in it. At the very least, Boost.Format or something like it could be used to build query strings in a type-safe and properly-escaped-string way.

Agreed. Keying in SQL by hand in C++ is annoying and error prone.

...

As to prepared statements, whether they be created from strings or a DSEL, they can be a major win performance-wise for applications which do lots of database writes. The overhead of parsing the SQL on each insert can become suprisingly high.

I'd always thought the cost of parsing SQL to be far outweighed by the cost of executing it, but maybe I'm wrong. If this is an important, portable optimization we probably need to include it.

...

I stopped myself from making that same point, as there is potentially a large overlap with Boost.Serialization as Dave A. mentioned. I do think a generalized object description framework (e.g. some form of Reflection) could be put to great use by a Database library, the Serialization library, the Langbinding code I hope will come into being some day, and any number of other applications. The possibilities are near endless.

What is the Langbinding code? I should have stopped too. Excited as I am by the serialization part, I think work on it should come after (if) there is a proposal for a Boost.Database. The serialization would certainly be easier with some sort of Reflection framework. I also anticipate a lot of simplification when Boost.Fusion is released. In any case we're probably better off waiting.

Caleb Epstein

19 Aug 19 Aug

3:14 p.m.

New subject: Interest in an ODBC library?

On 8/18/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...

What is the Langbinding code?

See http://boost-consulting.com/writing/oopsla04.html I tried to get the Suits here at work to fund this project, because I think this would be the Greatest Thing Since Sliced Bread. No money has emerged as of yet, but I'm still trying (sorry, Dave). -- Caleb Epstein caleb dot epstein at gmail dot com

David Abrahams

4:05 p.m.

New subject: Interest in an ODBC library?

Caleb Epstein <caleb.epstein@gmail.com> writes:

...

On 8/18/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...
What is the Langbinding code?

See http://boost-consulting.com/writing/oopsla04.html

I tried to get the Suits here at work to fund this project, because I think this would be the Greatest Thing Since Sliced Bread. No money has emerged as of yet, but I'm still trying (sorry, Dave).

Don't apologize; I appreciate your efforts! -- Dave Abrahams Boost Consulting www.boost-consulting.com

Corwin Joy

23 Aug 23 Aug

1:54 a.m.

New subject: Interest in an ODBC library?

Having designed the database template library (a C++ STL type interface for ODBC at http://dtemplatelib.sourceforge.net/) I thought I'd throw in a few cents about some design you'll want to think through *very* carefully. Here are some key questions to ask: Iterator design: ODBC supports multiple cursors types which I think are pretty typical of various kinds of databases. You want to think carefully about how you map these to STL iterator types. Lets say you do a simple mapping of forward cursor -> input iterator and scrollable cursor -> random access iterator. Immediately there are "gothcas" input iterator: - What happens when you copy your iterator? Should it point to the same cursor? If you do then making a copy of the iterator and incrementing will have a "side effect" on the original iterator. If you do not then users may be surprised when they compare two copies of the same iterator and don't get the same thing. How can two cursors be equal to begin() of a container but not have the same value? random access iterator: - I assume your iterators will all point to the same recordset resource - but if so, how do you avoid "thrashing". If a and b are two iterators that point to the same scrollable recordset at different then you don't really want to be excuting multiple database positioning operations to go back and forth between the values. This implies some kind of intelligent caching but you want to be carefull or you'll end up with more than you can handle in memory since it is not uncommon for tables to have a million rows or more (O.K., maybe the Human Genome center is a bit larger than most in this regard :-) ). - You'll probably also want to implemnent size() operations intelligently, paging through every record and counting is not so good... - Be aware of the performance cost of scrollable recordsets (see below for more). We also had a input + output iterator that we did to allow users to loop through a recordset and modify values. Binding design: - In DTL we wanted to go away from the MFC model of forcing you to inject our code into your classes. Instead we defined operators that specify the mapping relationship between columns in your query and members in your class. I think this is a good model since it keeps the database mappings from getting tangled up with the class logic. This kind of design is also used very successfully in the Java world by popular projects like Hibernate which let you define your class as you need and simply ask you to provide XML type mappings. I'm also not a big fan of frameworks that force you into a particular class design. Here is a link from DTL showing this: http://dtemplatelib.sourceforge.net/dtl_introduction.htm Here is an example of a hibernate mapping: http://www.hibernate.org/hib_docs/v3/reference/en/html/quickstart.html#quick... Transparent SQL: - One of the design goals in DTL was to try not to be "smart" about SQL. The fact is that folks do all kinds of strange things in SQL like different outer join syntax, hierarchical queries, analytic functions over join partitions, stored proceedures etc. I think there is a real advantage to having an iterator that can "get out of the way" and NOT try to parse what the user is doing so they can execute these kinds of complex queries. Also, SQL is well understood by most developers so letting them use it transparently rather than creating your own join language will significantly reduce the learning curve. - Having said this, being able to construct select, insert, update statements for the user has advantages - it lets you create read/write containers from a set of bindings transparently and can let you work around limitations of the underlying drivers. Binding strings: - In ODBC, at least, this is rather ugly. You can choose to either bind a fixed size buffer (implying a maximum string size) or you can use something called GetData and PutData to transfer an arbitrary length string in blocks. If you choose the latter (which is what we did so folks could bind std::string and not have to recompile every time their database field size changed) ODBC gets annoying in that many drivers then require that such string columns have to come last. Think carefully about how you want to support stored procedures: 1. A stored procedure can have input parameters, output parameters and can return multiple sets of results (and will in SQL server). How to support user defined types? 1. Users will want to define custom types that can be mapped transparently to sets of database fields. Examples: - A complex data type that maps to a pair of fields. - A "date" type that is stored as an integer (similar to boost::time) but might be written to multiple database fields. - Mapping of more complicated types, graphs, sounds etc. to blobs. Transactions: In ODBC, at least, there is not a database independent notion of a transaction. Instead, when you commit, you commit everything open against that connection. So if you want multiple transactions you need multiple connections. We went with C++ try...catch logic to handle transactions since this closely mirrors some of the ACID concepts you need for transactions in a database. So we would have try { // a set of operations here commit; } catch(...) { rollback; } More on our design for this is here: http://dtemplatelib.sourceforge.net/ExceptionHandling.htm More later... anyway I would be interested in helping on this project but don't know how much time I will have to commit to it. I definitely like the idea of defining the base layer in terms of simple operations so that it can support both ODBC and native drivers since I think this would be a big advantage for situations where and ODBC driver simply is not available ... Here is some background on how the ODBC cursor types compare to the C++ iterator types and a little bit on how the mapping was done in DTL ODBC cursors. The process of retrieving rows from a query result set and returning them to the application is called fetching. An application fetches data with a cursor. An ODBC cursor on a result set indicates the current position in the result set and what row of data will be returned next. In ODBC there are two main cursor types. Forward Only Cursors: - Can only move forward through the set of records. - Are not guaranteed to move through the records in any particular order. - Two different cursors against the same table using the same query may not return the same rows or same number of rows due to inserts, updates and deletes. Scrollable Cursors: - Can move forward and backward through the result set. - Two different cursors against the same table using the same query may not return the same rows or same number of rows due to inserts, updates and deletes. Scrollable cursors raise two additional questions not seen in forward-only cursors. First, how does the cursor establish some kind of order for the set of records returned? Without some kind of order for the records moving "forward" or "backward" through the result set does not have any meaning. Second, should the cursor detect changes to rows that have been previously fetched? In other words, when we re-position the cursor should it detect changes due to inserted, updated or deleted rows? Depending on how these questions are answered one obtains different scrollable cursor types. Scrollable Cursors: - Static cursor type. o Once the cursor has been created, the set of records is fixed regardless of any subsequent changes made to the database. Any inserts, updates or deletes by other applications to the set of records are not detected by this cursor. Usually this is implemented either by locking the underlying table so that no changes can be made by other applications while the cursor is in effect, or the set of records is viewed as of a particular version. - Dynamic cursor type. o Detects any changes made to the set of records due to insert, update or delete by other applications. Usually this kind of cursor is implemented by simply ordering the records with a unique index. This cursor then re-executes a fetch every time it scrolls to a new set of rows. Suppose the original set of records is defined by "SELECT * FROM EMPLOYEES ORDER BY EMPLOYEE_ID". Then, when the user requests that this cursor scroll to the next set of rows the cursor would execute the query "SELECT * FROM EMPLOYEES WHERE EMPLOYEE_ID > (?) ORDER BY EMPLOYEE_ID" where it would pass as a parameter the EMPLOYEE_ID from the last row in the current buffer. This kind of cursor actually creates multiple result sets due to the re-execution of the query every time a scroll operation is performed. As a result, the cursor does not establish a consistent order for the records in the sense that the "second" record in the result set might change its value and position if we scroll forward and then attempt to scroll back to this "second" record. - Keyset driven cursor type. o When the cursor is opened, it immediately stores a set of unique keys for all rows in the result set. Later, when the user attempts a scroll operation, the cursor simply retrieves rows corresponding to the previously stored keys. This means that updates by other applications are detected with a new value for the row (unless the update changes the key - if another application changes the key this is equivalent to a delete and then an insert as far as a keyset cursor is concerned). Deleted records show up as "holes" with a special status flag. As for inserts made by other applications, these are not detected since these inserted rows did not appear in the original set of keys that was fetched when the cursor was first opened. This cursor does establish a consistent order for the records in the sense that if we scroll forwards and backwards the same records will appear in the same position, although they may be marked as having been deleted and may contain updated values. - Mixed cursor. o This is a combination of a keyset driven cursor and a dynamic cursor. This kind of cursor is typically used when the table is too large to make it practical to save keys for all the rows in the result set. In this case, the cursor fetches keys for some subset of the rows. Inside this subset, the cursor behaves like a keyset driven cursor. Outside this subset, the cursor behaves like a dynamic cursor. A mixed cursor is equivalent to a keyset-driven cursor when the size of the key subset is equal to the result set size. A mixed cursor is equivalent to a dynamic cursor when the size of the key subset is equal to 1. STL Iterators. Here we summarize some key properties of certain C++ standard iterators. For this discussion we are really interested in only three types of iterators: input iterators, output iterators and random access iterators. Input and output iterators are considered primarily because these have the smallest set of requirements of all the iterator types and so can be mapped most easily to an ODBC cursor. Random access iterators are interesting since these have the largest set of requirements and correspond most closely to the type of usage that is made of scrollable ODBC cursors. Here we summarize only the iterator properties that are important to our discussion (for full details on the iterator requirements see section 24.1 of the C++ standard). Noteworthy Input Iterator Requirements 1.. Input iterators are assignable and copy constructable. 2.. Let a, b be input iterators. If a==b and (a,b) is in the domain of == then *a is equivalent to *b. 3.. Let r be an input iterator. If we increment r via ++r, then after the increment any copies of the previous value of r are no longer required to be dereferencable or in the domain of ==. 4.. For input iterators a==b does not imply ++a==++b. Algorithms on input iterators should never attempt to pass through the same iterator twice. They should be single pass algorithms. 5.. Let r be an input iterator. The result of *r must be convertible to the value type of the iterator. Some comments. - Requirement two that the dereferenced value of a and b be 'equivalent' is vague. Standard is not clear on what 'equivalent' means, but it does not mean these two must have the same value. - Requirement three means that conceptually when we implement an input iterator we are allowed to code it such that only one copy of the iterator can be "active". The reason for this is that once one copy of the iterator has been incremented other copies become invalid. Noteworthy Output Iterator Requirements 1.. Output iterators are assignable and copy constructable. 2.. Let r be an output iterator. The only valid use of operator * is on the left hand side of an assignment. After an assignment, the iterator, r, must be incremented before it can be assigned to again. Algorithms on output iterators should be single pass algorithms. Some comments. - Typically operator * on an output iterator is defined to return a proxy class that performs a write of the data upon assignment. This proxy might be the iterator itself or might be some other class. How DTL Abstracts ODBC cursors as iterators. SELECT statement, forward only cursor: Input Iterator - In order to allow multiple copies of the iterator to be active at the same time, we chose a design wherein each iterator holds its own ODBC cursor. This allows us to have logic like a=b; ++a; ++b; and have both iterators a and b be able to return a value. In fact, we did not have to do this since the standard allows only one copy of the iterator to be active. So, under the standard it would be allowable for b to become invalid once a had been incremented. - Note that we are taking full advantage of the standard in performing this mapping. Records are not guaranteed to come back in any particular order, nor are two copies of an iterator guaranteed to retrieve the same set of records due the fact that inserts, updates and deletes may be performed on the set of records the select statement is operating against. INSERT, UPDATE, DELETE statement, forward only cursor: Output Iterator - This is fairly straightforward. The only trick here is that the execute of the statement must happen on assignment. Otherwise, loops that use post-increment logic to write to the iterator will not behave correctly. STORED PROCEDURE, forward only cursor: Input + Output Iterator - Mapping stored procedures to C++ iterators was less straightforward since stored procedures perform both input and output. - Stored procedures accept arguments as input, and in this way act as an output iterator. - Stored procedures can return results from a query and can return results via its arguments so in this way it acts as an input iterator. What is more, stored procedures can return more than one result set which means the resulting iterator may need to act over more than one range. Therefore, we added a MoreResults() method to the sql_iterator object so that the iterator can pass through multiple result sets that might be returned by the query. - To split the input and output for stored procedures we have sql_iterators hold two buffers. One buffer we called the data object (DataObj) which stores the result sets that come back from the query. The second buffer we call the parameter object (ParamObj) which is designed to hold parameters that are passed to the query.

Brock Peabody

7:45 p.m.

New subject: Interest in an ODBC library?

...

Binding design: - In DTL we wanted to go away from the MFC model of forcing you to inject our code into your classes. Instead we defined operators that specify

...

mapping relationship between columns in your query and members in your class. I think this is a good model since it keeps the database mappings from getting tangled up with the class logic. This kind of design is also used very successfully in the Java world by popular projects like Hibernate which let you define your class as you need and simply ask you to

Hi Corwin, Do you think our interface should support bi-directional iteration to avoid the pitfalls you mention with input iterators, or should we only support input iterators to support the broadest number of implementations? Do you know of any databases whose native APIs don't support bi-directional iteration? the provide

...

XML type mappings. I'm also not a big fan of frameworks that force you into a particular class design. Here is a link from DTL showing this: http://dtemplatelib.sourceforge.net/dtl_introduction.htm

Here is an example of a hibernate mapping:

http://www.hibernate.org/hib_docs/v3/reference/en/html/quickstart.html#q ui

...

ckstart-mapping

...

Transactions: In ODBC, at least, there is not a database independent notion of a transaction. Instead, when you commit, you commit everything open against that connection. So if you want multiple transactions you need multiple connections. We went with C++ try...catch logic to handle

I agree with your non-intrusive philosophy. It's been suggested that any binding/serialization is going to have something in common with Boost.Serialization, and I agree. Personally, I'd like to see if it would be possible to just get a simple database library in boost first and do the serialization/binding (which could live in Boost.Serialization just as easily as Boost.Database) later. transactions

...

since this closely mirrors some of the ACID concepts you need for transactions in a database. So we would have

...

More later... anyway I would be interested in helping on this project but don't know how much time I will have to commit to it. I definitely

You might check out the "scoped" transaction management class in the sandbox (boost-sandbox/database/transaction.hpp) that Caleb suggested. transaction t(database); //do something with database //if commit is not called //rollback is triggered //in ~transaction t.commit(); like

...

the idea of defining the base layer in terms of simple operations so that it can support both ODBC and native drivers since I think this would be a big advantage for situations where and ODBC driver simply is not available

A modern c++ database interface implemented across several databases and operating systems will be really useful. It's pretty simple too - this is low-hanging fruit. If you get a chance you might check out the project in boost-sandbox/database. I'm working on the documentation now (and learning quickbook) but I posted a simple test program illustrating its use a few days ago.

Corwin Joy

24 Aug 24 Aug

12:44 a.m.

New subject: Interest in an ODBC library?

----- Original Message ----- From: "Brock Peabody" <brock.peabody@npcinternational.com>

...

Hi Corwin,

Do you think our interface should support bi-directional iteration to avoid the pitfalls you mention with input iterators, or should we only support input iterators to support the broadest number of implementations?

I think we should support both an input iterator and a random access iterator. The reason for this is that, as mentioned, for many databases by far the most efficient operation is to just retrieve the set of records via a forward-only cursor and not provide any guarantee that the order of rows is repeatable. As soon as you have to maintain a specific order for the records (which is what a bi-directional iterator implies) then the database often has to take a huge performance hit. As mentioned, some drivers implement this by downloading all records to the local computer (yuck!), locking the entire table or other ugly things. Hopefully the driver is smart and can do version tracking but it is hard to be sure. That, to me, is why logically you have two C++ iterator types that make sense: an input iterator and a random access iterator.

...

Do you know of any databases whose native APIs don't support bi-directional iteration?

Well, for example MS Access used to not support this via ODBC, but even if they do support it I don't think it is a good idea to *require* it for the iterator since this can have big performance costs. Also there are some kinds of result sets which may be forward only by definition. Result sets from stored procedures come to mind here.

...

If you get a chance you might check out the project in boost-sandbox/database.

I've browsed through it somewhat online. I need to look through the boost site docs and find out how to connect up to the sandbox via CVS.

Brock Peabody

1:23 a.m.

New subject: Interest in an ODBC library?

...

On Behalf Of Corwin Joy

...

I think we should support both an input iterator and a random access iterator.

I think I understand why there are problems with any iterator more powerful than an input iterator and I agree. I don't understand what benefit there is to supporting anything besides input iterators though. I've always used databases that provided random access, for instance, but I can't think of anything I've ever done with a database that couldn't have been done with just input iterators. To keep things simple I'd suggest just supporting input iterators. If we do support random access iterators, then the database library must either be templatized or have two sets of begin/end functions, one of which may fail at runtime. I think making it a template would be better.

Corwin Joy

3:32 a.m.

New subject: Interest in an ODBC library?

----- Original Message ----- From: "Brock Peabody" <brock.peabody@npcinternational.com>

...

I think I understand why there are problems with any iterator more powerful than an input iterator and I agree. I don't understand what benefit there is to supporting anything besides input iterators though. I've always used databases that provided random access, for instance, but I can't think of anything I've ever done with a database that couldn't have been done with just input iterators. To keep things simple I'd suggest just supporting input iterators.

I think we can start with input iterators since they are easier. Eventually, though, folks want to have things like scrollable reports and forms where random access becomes important. In DTL it wasn't until much later that we did a random access container. As long as you get fundamentals right about binding, though, it should be OK.

Corwin Joy

4:11 a.m.

New subject: Interest in an ODBC library?

----- Original Message ----- From: "Brock Peabody" <brock.peabody@npcinternational.com>

...

I think I understand why there are problems with any iterator more powerful than an input iterator and I agree. I don't understand what benefit there is to supporting anything besides input iterators though.

O.K., so if you agree that supporting forward-only cursors makes sense and that an input iterator has guarantees that are similar then you are ready to think about the next major decision in designing an input iterator which is how to implement the copy operation. When you implement copy I think there are two fundamental choices: 1. When you copy an iterator, continue to point to the same underlying cursor. There is precedent for this, the istream iterators do it. But I don't much like it since programmers may "surprised" that when they copy the iterator incrementing the copy affects the original. 2. When you copy the iterator, cache a copy of the current row value, on increment open a new cursor (COW pattern). This means that copies won't have "side effects" on the original iterator. However, opening a new cursor also can surprise programmers since they may expect the iterator to behave like a forward iterator and produce records in a particular order. This is what we did in DTL but it does very occasionally produce confusion when folks expect our iterator to act as a forward iterator and not just an input iterator. Below is a snippet from the DTL mailing list where I explain further. Anyway, to me this is a major design decision and I would be interested to hear what others think. discussion from http://groups.yahoo.com/group/DatabaseTemplateLibrary/message/1559

...

Paul Harris Wrote

here's what I was trying to do:

template <class In, class Out, class Compare> Out copy_duplicates(In begin, In end, Out out, Compare compare) { while (begin != end) { begin = adjacent_find(begin,end,compare); if (begin != end) { tcout << "Found duplicate " << *begin << endl; *out++ = *begin++; } } return out; }

I think you are expecting select_iterator to do something here that it does not provide a guarantee for. As we say in the docs, a select iterator is just an *Input Iterator*. For a definition look at the SGI STL webpage: http://www.sgi.com/tech/stl/InputIterator.html In particular if you look at the notes this means that [1] i == j does not imply ++i == ++j. [2] Every iterator in a valid range [i, j) is dereferenceable, and j is either dereferenceable or past-the-end. The fact that every iterator in the range is dereferenceable follows from the fact that incrementable iterators must be dereferenceable. [3] After executing ++i, it is not required that copies of the old value of i be dereferenceable or that they be in the domain of operator==. [4] It is not guaranteed that it is possible to pass through the same input iterator twice. So your algorithm is not following the requirements for an input iterator. Nor should you expect distance() to return something meaningful since the records are not in a particular order. So what does this mean in terms of STL - it means that a select_iterator will work with only a few of the STL algorithms, those that promise to need only an input iterator. Unfortunately there is not an easy way to enforce this check at compile time. The guys at boost did some work a while ago where you could do some meta checks for various iterator invariants but they could only detect crude gaffs. So then why isn't everything a RandomDBView that gives more guarantees? 1. Some ODBC drivers don't support scrollable recordsets or bulk fetches. 2. For the drivers that do support scrollable recordsets, sometimes getting a scrollable cursor extracts a huge penalty. When you ask for a scrollable cursor you are requesting a snapshot of a set of records in a particular order. On some databases, the way they handle this is to dump the entire set of records to a file on the client!! This can be very slow. Most modern databases are smarter, they track a version of the table and an internal list that is the order - this is still overhead but less. So, I think there is a place for select_iterator, it just needs to be clear what the guarantees are (which are few). I do also agree that having an incremented copy reset is not all that nice, but at least this guarantees that a copy of the iterator can give the full recordset. The other major alternative would be to have all copies of a select iterator point to the same recordset, but this gives the nastiness that incrementing a copy of an iterator can now affect the original. This may be acceptable, though, if we think about things like istream_iterator where this is exactly what happens. Anyway, I would want to think about it carefully before I went that route.

Brock Peabody

2:59 p.m.

New subject: Interest in an ODBC library?

...

On Behalf Of Corwin Joy

...

O.K., so if you agree that supporting forward-only cursors makes sense

I do. If we have to templatize the whole system anyway, it wouldn't be that big of a deal to add an iteration policy. Otherwise, I'd say that if we advertise the iterators as Input Iterators they should behave like other Input Iterators - copied iterators point to the same underlying structure.

Corwin Joy

1:03 a.m.

New subject: Interest in an ODBC library?

Brock, Here is another link with some details on some of the different ODBC cursor types http://msdn.microsoft.com/library/default.asp?url=/library/en-us/acdata/ac_8... It's a bit SQL server specific, but if you do a google on "ODBC cursor performance forward" the same considerations apply to other databases. Anyway, it is important to understand these tradeoffs before mapping to a specific C++ iterator type.

Brock Peabody

1:30 a.m.

New subject: Interest in an ODBC library?

...

On Behalf Of Corwin Joy

...

Brock, Here is another link with some details on some of the different ODBC cursor types http://msdn.microsoft.com/library/default.asp?url=/library/en- us/acdata/ac_8_con_07_3tdf.asp

It's a bit SQL server specific, but if you do a google on "ODBC cursor performance forward" the same considerations apply to other databases. Anyway, it is important to understand these tradeoffs before mapping to a specific C++ iterator type.

Thanks Corwin, The information in your first post was very informative on this issue too.

Corwin Joy

3:25 a.m.

New subject: Interest in an ODBC library?

More thoughts after looking at the code: You use std::string for exception messages, names of fields, etc. I think this is a mistake. Many databases return internationalized wide error messages and use wide strings for table and column names. Also, you will ideally want your library to return internationalized error messages. In DTL we ended up converting halfway through to support this - much better to get it right in the first place. variant: I'm don't think the variant type will exactly fit what you want here. I like the discriminated union which I would consider to be a better choice than boost::any since row allocation efficiency is an issue. One major difference, though, is that you're going to want "sticky types". The idea here is that once a field is created with a particular type (by reading what type it should be from the database) it needs to stay that type. When a user assigns say a string into a date field the field must not convert into a string type but instead must cast the field to a date and throw an error if that type of conversion is not legal. Ideally, this check gets done at compile time. This "sticky type" logic is what we did in DTL when we did a variant field. Also, I see that you are binding std::string as the type for character data in your variant type. Be aware that this leads you down the slippery slope of having to support arbitrary length strings with the associated problems that I mentioned in my previous post. You're also going to want to bind a boost date_time type (maybe the ptime type), a wstring type, a blob type and possibly a long string type. Adding new types shouldn't be a big deal though so this could easily be skipped for the initial design. BUT, I think you will benefit from including a date type early on since this gives an early example of what I call "complex types", i.e. types that are held in your class but do not map directly to a primitive database type but instead require reading into some kind of intermediate buffer and then translation to give the final type. In the case of dates, the primitive is (I think) an unsigned long but ODBC reads dates as a big {YEAR, MONTH, DAY, HH, MM, SS} type struct which has to be read and then translated. Take a look at what we did here: http://dtemplatelib.sourceforge.net/fmtUserTypes.htm to support the ability of users to write their own binding operators to build up complex field types as needed. Your field_description only has about 1/3 of the information you need to bind a field to a database column. Really, you need three kinds of sections for a binding (maybe in 3 classes) 1. C++ data type. 2. Database data type and related information. You can't just assume this from the C++ type. Sometimes you will need to specify extra information about the SQL type to bind nondefault values for use in things like mapping to larger strings, specifying non-default precision for dates etc. The database datatype will be driver layer specific, i.e. different for ODBC and various native driver layers. 3. Mapping between 1 <--> 2. Can include things like the address of where to put the data, intermediate buffers that may be used to manage the mappings, possibly any conversion policy. Here is an example snippet from DTL where we ended up having all 3 of these things in a single class: void *addr; // absolute address of C++ member data ptrdiff_t offset; // relative offset of this field from base address of its Row SDWORD sqlType; SDWORD cType; SDWORD paramType; // input, output or input/output type of field tstring name; // column name int typeId; // enum for type bool IsPrimitive; // is this a simple POD type, or one where we first need to fetch to an intermediate buffer then initialize such as a date or string type SDWORD size; // SQL size SDWORD bufferLength; // how big a buffer do we need to transfer chunks, this is a member here so that users can override and set it to be large for big blobs CountedPtr<SDWORD> bytesFetched; static const SDWORD lenAtExec; // needed for PutData() int VariantRowIdx; // -1: It isn't a variant row BoundIOs_base *pBoundIOs; // refers to collection of bindings that this object belongs to BoundType bindingType; // column or param??? int colNo; // column number MemPtr strbuf; // buffer needed for use with "complex types"

Brock Peabody

2:52 p.m.

New subject: Interest in an ODBC library?

...

On Behalf Of Corwin Joy

...

More thoughts after looking at the code: You use std::string for exception messages, names of fields, etc. I think this is a mistake. Many databases return internationalized wide error messages and use wide strings for table and column names. Also, you will ideally want your library to return internationalized error messages. In DTL we ended up converting halfway through to support this - much better to get it right in the first place.

That's a good point, though I'd rather make the whole system templated on the type of string that is used than force everyone to use a wide string type.

...

variant: I'm don't think the variant type will exactly fit what you want here. I like the discriminated union which I would consider to be a better choice than boost::any since row allocation efficiency is an

Boost.Variant is a discriminated union. It's a totally different beast than Boost.Any.

...

One major difference, though, is that you're going to want "sticky types".

I think this is more of an issue for a binding system. Boost.Database is more of simple uniform interface to a database. This type of functionality could be built on top of Boost.Database.

...

Also, I see that you are binding std::string as the type for character data in your variant type. Be aware that this leads you down the slippery slope of having to support arbitrary length strings with the associated problems that I mentioned in my previous post.

Arbitrary length strings are a common type in databases. I understand from your post that fixed length strings can be a problem, but I don't see why arbitrarily long ones are. My vision for this is to have a very simple interface than can easily be implemented on a wide range of databases and has just enough features to solve most problems. I'd say that in the vast majority of database projects the performance bottlenecks are going to be disk or network, even with a naïve database wrapper. I can see how there could also be room for a full ODBC implementation that gives users more fine grained control, but I think there will be a large group who just want something that's easy to use.

...

You're also going to want to bind a boost date_time type (maybe the ptime type), a wstring type, a blob type and possibly a long string type. Adding new types shouldn't be a big deal though so this could easily be skipped for the initial design. BUT, I think you will benefit from including a date type early on since this gives an early example of what I call "complex types", i.e. types that are held in your class but do not map directly to a primitive database type but instead require reading into some kind of intermediate buffer and then translation to give the final type.

...

2. Database data type and related information. You can't just assume

I agree. I'd probably add posix_time::time_duration, and gregorian::date too. If the database is going to be templatized anyway, it would be pretty easy to allow one of the parameters to be a type list with all of the types in the variant in it. this

...

from the C++ type. Sometimes you will need to specify extra information about the SQL type to bind nondefault values for use in things like mapping to larger strings, specifying non-default precision for dates etc. The database datatype will be driver layer specific, i.e. different for ODBC and various native driver layers.

If the user is using the non-C++ data type information, what they're doing won't be database-independent. I don't want to have to worry about what the underlying type is.

...

3. Mapping between 1 <--> 2. Can include things like the address of where to put the data, intermediate buffers that may be used to manage the mappings, possibly any conversion policy. Here is an example snippet from DTL where we ended up having all 3 of these things in a single class:

These are the types of things I'd prefer remain hidden. Aside from the complexity they add to the interface (I don't want to have to deal with buffers, I just want it to work!), it would be hard to make all this portable.

Corwin Joy

25 Aug 25 Aug

12:55 a.m.

New subject: Interest in an ODBC library?

"Brock Peabody" <brock.peabody@npcinternational.com> wrote in message news:000c01c5a8bb$7b148420$a70a10ac@npci.com...

...

I'd rather make the whole system templated on the type of string that is used than force everyone to use a wide string type.

Agreed. Just wanted to point this out. We did everything as a "tstring" with a compile switch to say whether you wanted wide or narrow strings and whether to link to the wide or narrow ODBC libraries.

...

...
One major difference, though, is that you're going to want "sticky types".

...

I think this is more of an issue for a binding system. Boost.Database is more of simple uniform interface to a database. This type of functionality could be built on top of Boost.Database.

I guess you could do this in the bindings. I'm not sure how you would get compile time checks that you are assigning an illegal value to a field if you just expose a collection of boost::variant fields tho'.

...

Arbitrary length strings are a common type in databases. I understand from your post that fixed length strings can be a problem, but I don't see why arbitrarily long ones are.

...

2. Database data type and related information. You can't just assume

Actually if you re-read my earlier post my point is exactly the reverse. Fixed length strings are easy to support from an implementation point of view, it is arbitrary length strings that are hard. Supporting arbitrary length strings under ODBC comes down to using SQLGetData and SQLPutData. Using these functions creates two problems: 1. Efficiency. These are slower than binding a small fixed length string. 2. More problematic, if you call SQLGetData and SQLPutData on a column then ODBC says it may not be supported unless that string column comes last. So, a simple query like "select name, salary from employees" will fail if you want to support arbitrary length strings because the string column does not come last. I'm not saying that supporting arbitrary length strings is not the right way to go, I'm just saying that it has some potentially unlpeasent consequences down the line. Arbitrary length strings was the way we went in DTL but it does not work so great when users want to run arbitrary queries at runtime. For this reason we have an alternate compile switch that makes the library bind to a fixed length buffer rather than call SQLGetData and SQLPutData. this

...

from the C++ type. Sometimes you will need to specify extra information about the SQL type to bind nondefault values for use in things like mapping to larger strings, specifying non-default precision for dates etc. The database datatype will be driver layer specific, i.e. different for ODBC and various native driver layers.

...

...
If the user is using the non-C++ data type information, what they're doing won't be database-independent. I don't want to have to worry about what the underlying type is.

I'm not saying that you should default construct this inside private class members, but what I am saying is that you will need to allow it to be overriden. I'll give a couple examples: 1. In microsoft access there are two kinds of text fields, text and memo. text can be up to 255 bytes long, memo can be much larger (I think up to 4k). When you bind a text column, you have to specify a SQL column length. If you specify >255 for a text column you will get an error. If you specify 255 for a memo column you will get truncated. When the user binds a std::string you don't know what kind of SQL type they are binding to. The same thing goes in Oracle when you bind a column, again you have VARCHAR for short columns and LONG VARCHAR for long text columns so again you have two different source data types. 2. Dates and precision. By default when we bind a date we set the precision to ignore milliseconds. But, for some applications they want a different precision down to the milliseconds read. Anyway, not having a class to hold the source SQL column type is just wrong IMO since there are all kinds of conversion rules that may be desired and the default may not be the right one and have to be overriden. I would recommend that you take a look at this conversion table to better understand what I am talking about with SQL type versus C++ type and the choice of conversion rules: http://www.uic.rsu.ru/doc/db/DB2CLI/db2l022.htm#HDRHDDTDCN

...

...
3. Mapping between 1 <--> 2. Can include things like the address of where to put the data, intermediate buffers that may be used to manage the mappings, possibly any conversion policy. Here is an example snippet from DTL where we ended up having all 3 of these things in a single class:

...

These are the types of things I'd prefer remain hidden. Aside from the complexity they add to the interface (I don't want to have to deal with buffers, I just want it to work!), it would be hard to make all this portable.

Yes, definitely they should be hidden as private members, I was just pointing out that this association needs to be made. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Brock Peabody

26 Aug 26 Aug

3:22 p.m.

New subject: Interest in an ODBC library?

...

On Behalf Of Corwin Joy

...

I guess you could do this in the bindings. I'm not sure how you would get compile time checks that you are assigning an illegal value to a field if you just expose a collection of boost::variant fields tho'.

Unfortunately, there really isn't any way to guarantee at compile time what the data types in a database will be at run time.

...

Actually if you re-read my earlier post my point is exactly the reverse. Fixed length strings are easy to support from an implementation point of view, it is arbitrary length strings that are hard.

I guess I'm still thinking of it from an interface point of view. Even if you're dealing in fixed length strings in the implementation, you can still return arbitrary length strings to the user. SQLPutData won't be a problem since we're using read-only record sets.

...

2. More problematic, if you call SQLGetData and SQLPutData on a

column > then > ODBC says it may not be supported unless that string column comes last.

That's odd. I've never used a native interface that cared about column order.

...

I'm not saying that you should default construct this inside private class members, but what I am saying is that you will need to allow it to be overriden. I'll give a couple examples: 1. In microsoft access there are two kinds of text fields, text and memo. text can be up to 255 bytes long, memo can be much larger (I think up to 4k). When you bind a text column, you have to specify a SQL column length. If you specify >255 for a text column you will get an error. If you specify 255 for a memo column you will get truncated. When the user binds a std::string you don't know what kind of SQL type they are binding to.

The interface only provides the ability to read from record sets, which is all some native interfaces allow. When returning std::string, it doesn't matter whether the source was memo or not. Speaking of Access; I'm pretty familiar with it too. We had been using it for 5 or 6 years via DAO and just finished converting our entire system to PostgreSQL. I didn't know about memo fields when we first set up our access databases, and the 255 byte limit was a major source of pain for us. Now all our text fields are unbounded (and per PostgreSQL documentation suffer no performance penalty). I think it's just common wisdom in programming that if you put an arbitrary size limitation on something you're asking for trouble, which is part of the reason I'm not too worried about fixed length strings.

...

Anyway, not having a class to hold the source SQL column type is just wrong IMO since there are all kinds of conversion rules that may be desired and the default may not be the right one and have to be overriden.

I'm still thinking about the interface. The implementation will certainly need no hang on to the SQL column type. It will be the implementation's job to choose the right conversion.

...

Yes, definitely they should be hidden as private members, I was just pointing out that this association needs to be made.

I think we're saying the same thing then. We definitely can't convert from database type to C++ type without this information. I was thinking that it would live in the implementations: Boost.Database.PostgreSQL, Boost.Database.MySQL, etc... and not in the interface.

Corwin Joy

27 Aug 27 Aug

5:34 a.m.

New subject: Interest in an ODBC library?

----- Original Message ----- From: "Brock Peabody" <brock.peabody@npcinternational.com>

...

...
Corwin Joy wrote:

...

...
I guess you could do this in the bindings. I'm not sure how you would get compile time checks that you are assigning an illegal value to a field

Brock wrote:

Unfortunately, there really isn't any way to guarantee at compile time what the data types in a database will be at run time.

Actually what I am talking about here is the situation where the user specifies the type of the target data at compile time. But you're right, in the case of a variant row the assumption is that we don't know the type until runtime. In DTL we also support binding to specific types. So, using DTL type sytax: struct Employee { string Name; double Salary; } BindEmployee(Employee &Emp, Bindings &cols) { cols["Name"] >> emp.Name; cols["Salary"] >> emp.Salary; } In the binding syntax for DTL when we say cols["Name"] >> emp.Name; we are really specifying 3 things: 1. Bind a SQL column called "Name" to the member Name in the Employee structure (or class). 2. Use the default SQL to C mapping. I.e. assume the source SQL column is of type SQL_VARCHAR. To do this, have a class that contains a list of default mappings that we specify like this: void ETI_Map::build() { // add to these mappings as needed ... (*this)[typeid(short).name()] = TypeTranslation(typeid(short).name(), C_SHORT, SQL_INTEGER, SQL_C_SSHORT, TypeTranslation::TYPE_PRIMITIVE, sizeof(short)); (*this)[typeid(unsigned short).name()] = TypeTranslation(typeid(unsigned short).name(), C_USHORT, SQL_INTEGER, SQL_C_USHORT, TypeTranslation::TYPE_PRIMITIVE, sizeof(unsigned short)); ... } If I did it again, I would probably stay away from RTTI since some folks don't like it. When we first designed DTL we didn't support stored procdures so I thought this was enough. Eventually, though, we came around to the point of view that supporting stored procedures with an iterator is actually important so we generalized the binding sytax so that 3.

...

...
Bind as an input parameter << Bind as an output parameter == Bind as an input/output parameter

Personally, I quite like using operators for the binding syntax rather than function calls since I think this make it much easier to read what is going on. You can find more examples here: http://dtemplatelib.sourceforge.net/LocalBCA.htm Here is a second quickie example // simple example for Local bindings ... read a map map<string, double> ReadDataLocalBCASingleField() { typedef pair<string, double> rowtype; rowtype row; // prototype row object to guide binding process // declare our bindings locally DBView<rowtype> view("EMPLOYEES", BCA(row, COLS["Name"] >> row.first && COLS["Salary"] >> row.second ) ); // copy the doubles from the view into the vector and return map<string, double> results(view.begin(), view.end()); return results; }

...

Brock Wrote

That's odd. I've never used a native interface that cared about column order.

In fact, many ODBC drivers don't care either. But some do. To quote the relevant ODBC documentation: " Using SQLGetData If the driver does not support extensions to SQLGetData, the function can return data only for unbound columns with a number greater than that of the last bound column. Furthermore, within a row of data, the value of the ColumnNumber argument in each call to SQLGetData must be greater than or equal to the value of ColumnNumber in the previous call" http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/od... It's sad but true.

Brock Peabody

2:55 p.m.

New subject: Interest in an ODBC library?

...

On Behalf Of Corwin Joy

...

Actually what I am talking about here is the situation where the user specifies the type of the target data at compile time.

I think what we have with the binding language isn't so much type safety as it is type consistency. If you specify that two fields are related with the binding language, we can ensure at compile time that they are the same types, for example. You could continue this with the query building language: field<std::string> name("Name"); field<double> salary("Salary"); BindEmployee(Employee &Emp, Bindings &cols) { cols[name] >> emp.Name; cols[salary] >> emp.Salary; } //compile time error std::string criteria = make_criteria(name == salary); Of course this is a little clumsy especially in that you lose automatic type deduction but maybe it would be worth it in some cases.

...

Personally, I quite like using operators for the binding syntax rather than function calls since I think this make it much easier to read what is going on.

...

Here is a second quickie example

// simple example for Local bindings ... read a map

map<string, double> ReadDataLocalBCASingleField() { typedef pair<string, double> rowtype; rowtype row; //

I agree. prototype We've been down some of the same paths. As it turns out almost everything I read from a database is read into a map. I got so tired of doing this that I finally wrote a map_pair_type<> template.

...

row object to guide binding process // declare our bindings locally

DBView<rowtype> view("EMPLOYEES", BCA(row, COLS["Name"] >> row.first && COLS["Salary"] >> row.second ) );

// copy the doubles from the view into the vector and return map<string, double> results(view.begin(), view.end()); return results; }

In my language this looks like: const serializer<rowtype> ser = table("EMPLOYEES") [ field("Name") = &rowtype::first, field("Salary") = &rowtype::second ]; return ser.load<map<string, double> >(database); Very similar!

Sebastian Galkin

4:18 p.m.

New subject: Interest in an ODBC library?

Maybe I'm missing something here :

...

Brock Peabody wrote: void ETI_Map::build()

{

// add to these mappings as needed ...

(*this)[typeid(short).name()] =

TypeTranslation(typeid(short).name(), C_SHORT, SQL_INTEGER, SQL_C_SSHORT, TypeTranslation::TYPE_PRIMITIVE, sizeof(short));

but are you using typeid(short).name() as a map key ? I think the standard doesn't guaranties uniqueness for the type_info::name. -- Sebastian Galkin

Corwin Joy

28 Aug 28 Aug

2:18 a.m.

New subject: Interest in an ODBC library?

----- Original Message ----- From: "Sebastian Galkin" <paraseba@gmail.com>

...

Maybe I'm missing something here :

...
Corwin Joy showed some example code that associates binding conversion rules with types from DTL using RTTI names as the key (snip)

but are you using typeid(short).name() as a map key ? I think the standard doesn't guaranties uniqueness for the type_info::name.

This is what I did in DTL, but I don't advocate using RTTI for maintaining the map of conversion rules here (though it worked well enough since every compiler I have seen will consistently return the same result when called on basic types). I think it is cleaner to simply have typed binding functions. So what I would advocate is a set of binding functions like this: class Binding { public: enum CType {}; enum DBType{}; PhysicalBufferInfo physical; Binding operator>>(double &member) { CType = TargetType(member); DBType = DefaultDBType(member); physical = MakeBuffer(member); return *this; } }; class Bindings : map<tstring, Binding> { // key = field name in the SQL table that the binding is associated with // tstring = wide or narrow string } So, as before this gets used to bind a field via struct Employee { string Name; double Salary; } MakeBindings(Employee &Emp, Bindings &cols) { col["Name"] >> Emp.Name; col["Salary"] >> Emp.Salary; } Notice a couple things about how I propose to do the binding syntax: 1. The binding functions are directly declared and do not use a template syntax. When I did DTL I first declared binding functions via a template syntax, but I found that compilers unevenly supported the "explicit" keyword and I would get unwanted coversions in which binding function was called. I therefore think it is better to just declare the bindings explicitly. 2. The binding stores - the name of the bound field in the database, - the type of the C field that is being bound, - the type of the SQL field to bind to (this type will have to be further specialized by each type of DB layer, but maybe we could start with a set of generic types in a base class that can be further specialized - or perhaps we must template this on the driver type as well), - the address of where to write the data, and the size of the physical buffer that is available (size only comes into play if one wants to store fixed length strings - but there may be other things needed here as well). 3. The binding information is stored in an "enum" type fashion rather than templating on the member being bound so that these bindings can be created "on-the-fly" at runtime for queries which don't know the types at compile time.

Jonathan Wakely

19 Aug 19 Aug

9:15 a.m.

New subject: Interest in an ODBC library?

Caleb Epstein wrote:

...

On 8/18/05, Brock Peabody <brock.peabody@npcinternational.com> wrote:

...
The only ones I've used (MySQL, PostgreSQL, Access, DB2) have had random access. In all the code I've written I don't think I've done anything that required random access iterators for the database though.

Hm, I wasn't aware MySQL offered random-access result iteration facilities. I have only used the mysql_fetch_result interface, but

mysql_use_result() presumably?

...

now I see mysql_data_seek. Neat. However, random access says to me that the client must receive and consume the entire result set from the server first. This is not practical for large result sets that may not fit into available memory.

Yes, you have to use mysql_store_result() which copies the entire result set into the client's address space, and allows seeking to arbitrary rows (but I *think* mysql_data_seek() is like std::advance() on forward iterators - not true random access - don't quote me on that though. It might depend whether the rows in the result are a fixed length or not.)

...

Personally I don't see a lot of value in the RandomAccess iterators, but some may.

Agreed. Despite supporting it in my mysql wrappers, I've never needed it. jon

Corwin Joy

23 Aug 23 Aug

3:49 a.m.

New subject: Interest in an ODBC library?

...

...
On Behalf Of Caleb Epstein

...

...
* Binding. Some vendors provide APIs whereby query results can be directly bound to program variables. The general workflow would be something like:

double price; std::string name;

row_set results = database.execute (query);

results.bind ("name", &name); results.bind ("price", &price);

for (row_set::iterator i = results.begin (), e = results.end (); i != e && results.fetch (); ++i) { std::cout << "name=" << name << ", price=" << price << std::endl; }

This saves the variant<> conversion overhead and can be extended to include user-defined types and conversions. Would you be open to including this in the implementation?

I don't know if this would need to be wired into the implementation; I think we could easily build it on top of our interface though.

If you are careful about how you do the bindings class and you keep it independent of what kind of row object you are binding to then you can have it easily bind to either a statically defined struct to hold the row or a variant_type row that holds dynamic types. Either way, I think it is definitely important to have not variant type binding because: 1. You often want to bind to a particular type to hold your result to the correct precision. 2. Speed. It has not been mentioned yet but an important feature to have is bulk operations since these can often be 100x faster than operating row-by-row. To do this, drivers typically bind to a block of known types.

Corwin Joy

3:33 a.m.

New subject: Interest in an ODBC library?

...

Jonathan Wakely wrote:

...
How do you handle NULL values? the postgresql::row_set_imp::get() function doesn't seem to account for NULL, which would crash strlen(), atof() etc.

In DTL, Paul Harris actually just created a version of boost::optional to handle NULL values which worked very well as a way to read and write types that could be null. He also did mappings of the boost::posix_time class and we also used a version of the boost:variant class to provide dynamically bound types (where you don't know the types until runtime).

Paul

24 Aug 24 Aug

4:59 a.m.

New subject: Interest in an ODBC library?

Corwin Joy wrote:

...

...
Jonathan Wakely wrote:

...
How do you handle NULL values? the postgresql::row_set_imp::get() function doesn't seem to account for NULL, which would crash strlen(), atof() etc.

In DTL, Paul Harris actually just created a version of boost::optional to handle NULL values which worked very well as a way to read and write types that could be null. He also did mappings of the boost::posix_time class and we also used a version of the boost:variant class to provide dynamically bound types (where you don't know the types until runtime).

Hi, I'll attempt to be even as fraction as informative as Corwin was ... I'm the guy responsible for the optional variant mentioned above. I couldn't directly use boost::optional<> because binders need to be given a memory address to write data to, even if its going to be NULL. I wasn't sure how to do that with boost::optional, as it works with uninitialised memory and other such stuff. I assume you can do it, but I didn't have the time to figure it out. So I created dtl::nullable<>, where the underlying value is always available at some memory address for binding, but the DTL must then go back over all the nullables and set their is_null flag. This is a key part to performance - the advantage of binding to raw memory is speed, but if you have more complex binders, then you are going to have go back over what you have retrieved and do some processing. Another example of this is my dtl_posix_time, which is a variant on boost::posix_time::ptime. You can't bind to ptime as it doesn't have the right memory signature... so I embed both a ptime and an SQL date variable into dtl_posix_time, and then DTL will import/export the ptime to the SQL version before and after each SQL operation. This is an intrusive solution, which is why I don't really like it. I would rather use the real data types instead of "sql-enabled" types. But they work with the DTL, so mission accomplished :) An alternative is to select into a buffer area and then copy-construct (or whatever) into the real instances. I assume this would work, but now you have to deal with bulk selects into a bound pre-allocated buffers, buffer management, etc. Another problem that has cropped up is the handling of NULL or empty strings. In the DTL, a NULL VARCHAR(10) can be bound to a dtl::nullable<dtl::tcstring<10> > field; Then there is a distiction between an empty string and a NULL string. But the distinction becomes a big blurred when you talk about NULL TEXT columns. IIRC, dtl::nullable<std::string> doesn't currently work correctly, as either the DTL or ODBC drivers assume that an empty string means NULL (I forget what the problem was). This will be fixed eventually, but its something you will have to be careful about too. Oh, and I'd like to say that I always select everything thats available, so I'm not terribly interested in input_iterators (it seems that everyone else is). This is because I usually do work on ALL the data in the database and produce some other set of data (that is usually not a 1-to-1 transformation) and I need to do it fast and often. I used to do more work in the SQL server, but that has evolved until the point where the SQL server is just a place to persist data. I suppose I could just serialise the data instead of using a database, however I also need to support multiple networked users working on the same dataset. Fortunately at this stage, the dataset isn't tremendous. Hope this helps, Paul

Brock Peabody

18 Aug 18 Aug

2:13 p.m.

New subject: Interest in an ODBC library?

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Jonathan Wakely

...

I haven't looked at all the code yet, but I like the interface and what I've seen so far - I'm definitely interested in this library,

Thanks!

...

...
2) Boost.Database.Postgresql - an implementation of the above

interface

...

...
for postgresql

I have an implementation of your database library for MySQL. It's on my hard drive at home and I'm broadband-less right now so if anyone wants to see it let me know and I'll dial up tonight. It was very easy to adapt to use the MySQL API.

I'd like to see it when you get a chance.

...

postgresql/connect.cpp accidentally uses the <:: trigraph in the typedefs at the top of the file. You need a space before the template args to avoid that.

Maybe I should put the files in the sandbox? Did you happen to make Boost.Jam files to compile the test?

...

How do you handle NULL values? the postgresql::row_set_imp::get() function doesn't seem to account for NULL, which would crash strlen(), atof() etc.

I was thinking about this some more. Long ago, where I work we had bad experiences with once working programs crashing when they tried to read null fields (via DAO) when we really just wanted default values (0s and empty strings). Ever since then when I've written a wrapper for a database library I've always provided automatic conversion of nulls to default values. Now that you've brought this up, however, I can see that not only are nulls important to some users, but that they will also be important for the serialization library when serializing boost::optionals. The method I'm using now is not so good.

Jonathan Wakely

2:59 p.m.

New subject: Interest in an ODBC library?

Brock Peabody wrote:

...

...
On Behalf Of Jonathan Wakely I have an implementation of your database library for MySQL. It's on my hard drive at home and I'm broadband-less right now so if anyone wants to see it let me know and I'll dial up tonight. It was very easy to adapt to use the MySQL API.

I'd like to see it when you get a chance.

Sure, I'll post it tonight or tomorrow.

...

...
postgresql/connect.cpp accidentally uses the <:: trigraph in the typedefs at the top of the file. You need a space before the template args to avoid that.

Maybe I should put the files in the sandbox? Did you happen to make Boost.Jam files to compile the test?

No, I'm fairly proficient with make and completely ignorant of bjam so I knocked up a Makefile that would let me test your impl and mine with either "make IMPL=postgresql" or "make IMPL=mysql" (and is easily extended to other impl's).

...

...
How do you handle NULL values? the postgresql::row_set_imp::get() function doesn't seem to account for NULL, which would crash strlen(), atof() etc.

I was thinking about this some more. Long ago, where I work we had bad experiences with once working programs crashing when they tried to read null fields (via DAO) when we really just wanted default values (0s and empty strings). Ever since then when I've written a wrapper for a database library I've always provided automatic conversion of nulls to default values.

Now that you've brought this up, however, I can see that not only are nulls important to some users, but that they will also be important for the serialization library when serializing boost::optionals. The method I'm using now is not so good.

Yes, it's essential to distinguish NULL in some situations (particularly if you've got left joins in the query) jon -- "Some men are born mediocre, some men achieve mediocrity, and some men have mediocrity thrust upon them." - Joseph Heller

Daryle Walker

19 Aug 19 Aug

1:02 p.m.

New subject: It's a digraph. (was: Re: Interest in an ODBC library?)

On 8/18/05 4:57 AM, "Jonathan Wakely" <cow@compsoc.man.ac.uk> wrote:

...

postgresql/connect.cpp accidentally uses the <:: trigraph in the typedefs at the top of the file. You need a space before the template args to avoid that.

I think you mean the "<:" digraph. It represents the "[" (opening square bracket) character. The official term is "alternative token" (see section 2.5 of the standard), because some of the tokens have more than two characters. (These include the bitwise and Boolean word operators, i.e. "and".) They're considered after the preprocessor (phase 3 of section 2.1), and trigraphs are done earlier than that (in phase 1). -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

David Abrahams

15 Aug 15 Aug

12:38 a.m.

New subject: Interest in an ODBC library?

"Brock Peabody" <brock.peabody@npcinternational.com> writes:

...

2) With this done it's not too hard to write a non-intrusive DSEL for binding on top of it. I wrote one that supports dependent tables, automatic mapping to STL containers, boost::optional, and common data types like boost::posix_time::ptime. A simple example of the language might look like:

table("customer") [

*field("id") = &customer::id, field("name") = &customer::name, field("number") = &customer::number,

table("tax_exempt")[

-field("customer"), field("state_id") = &tax_exempt_data::state_id ] = &customer::tax_exempt ];

This feels like it has some overlap with the serialization library. No? -- Dave Abrahams Boost Consulting www.boost-consulting.com

Brock Peabody

3:24 a.m.

New subject: Interest in an ODBC library?

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams

...

"Brock Peabody" <brock.peabody@npcinternational.com> writes:

...
2) With this done it's not too hard to write a non-intrusive DSEL for binding on top of it. I wrote one that supports dependent tables, automatic mapping to STL containers, boost::optional, and common data types like boost::posix_time::ptime. A simple example of the language might look like:

table("customer") [

*field("id") = &customer::id, field("name") = &customer::name, field("number") = &customer::number,

table("tax_exempt")[

-field("customer"), field("state_id") = &tax_exempt_data::state_id ] = &customer::tax_exempt ];

This feels like it has some overlap with the serialization library. No?

I haven't studied Boost.Serialization in detail yet, but I'd say they should overlap in at least a few ways: 1) Objects that support this interface should automatically support Boost.Serialization. This will be useful if your objects live in databases but are sent over sockets, for instance. 2) This library should use the same syntax as Boost.Serialization when doing the same things. 3) The implementation may be able to exploit Boost.Serialization for some compositions such as containers and pointers (and optional and variant!) but I'm not sure. Maybe the overlap is that the new library uses boost.serialization in its implementation and supports a superset of its syntax?

7289

Age (days ago)

7320

Last active (days ago)

List overview

Download

43 comments

9 participants

participants (9)

Brian Braatz
Brock Peabody
Caleb Epstein
Corwin Joy
Daryle Walker
David Abrahams
Jonathan Wakely
Paul
Sebastian Galkin