
Having designed the database template library (a C++ STL type interface for ODBC at http://dtemplatelib.sourceforge.net/) I thought I'd throw in a few cents about some design you'll want to think through *very* carefully. Here are some key questions to ask: Iterator design: ODBC supports multiple cursors types which I think are pretty typical of various kinds of databases. You want to think carefully about how you map these to STL iterator types. Lets say you do a simple mapping of forward cursor -> input iterator and scrollable cursor -> random access iterator. Immediately there are "gothcas" input iterator: - What happens when you copy your iterator? Should it point to the same cursor? If you do then making a copy of the iterator and incrementing will have a "side effect" on the original iterator. If you do not then users may be surprised when they compare two copies of the same iterator and don't get the same thing. How can two cursors be equal to begin() of a container but not have the same value? random access iterator: - I assume your iterators will all point to the same recordset resource - but if so, how do you avoid "thrashing". If a and b are two iterators that point to the same scrollable recordset at different then you don't really want to be excuting multiple database positioning operations to go back and forth between the values. This implies some kind of intelligent caching but you want to be carefull or you'll end up with more than you can handle in memory since it is not uncommon for tables to have a million rows or more (O.K., maybe the Human Genome center is a bit larger than most in this regard :-) ). - You'll probably also want to implemnent size() operations intelligently, paging through every record and counting is not so good... - Be aware of the performance cost of scrollable recordsets (see below for more). We also had a input + output iterator that we did to allow users to loop through a recordset and modify values. Binding design: - In DTL we wanted to go away from the MFC model of forcing you to inject our code into your classes. Instead we defined operators that specify the mapping relationship between columns in your query and members in your class. I think this is a good model since it keeps the database mappings from getting tangled up with the class logic. This kind of design is also used very successfully in the Java world by popular projects like Hibernate which let you define your class as you need and simply ask you to provide XML type mappings. I'm also not a big fan of frameworks that force you into a particular class design. Here is a link from DTL showing this: http://dtemplatelib.sourceforge.net/dtl_introduction.htm Here is an example of a hibernate mapping: http://www.hibernate.org/hib_docs/v3/reference/en/html/quickstart.html#quick... Transparent SQL: - One of the design goals in DTL was to try not to be "smart" about SQL. The fact is that folks do all kinds of strange things in SQL like different outer join syntax, hierarchical queries, analytic functions over join partitions, stored proceedures etc. I think there is a real advantage to having an iterator that can "get out of the way" and NOT try to parse what the user is doing so they can execute these kinds of complex queries. Also, SQL is well understood by most developers so letting them use it transparently rather than creating your own join language will significantly reduce the learning curve. - Having said this, being able to construct select, insert, update statements for the user has advantages - it lets you create read/write containers from a set of bindings transparently and can let you work around limitations of the underlying drivers. Binding strings: - In ODBC, at least, this is rather ugly. You can choose to either bind a fixed size buffer (implying a maximum string size) or you can use something called GetData and PutData to transfer an arbitrary length string in blocks. If you choose the latter (which is what we did so folks could bind std::string and not have to recompile every time their database field size changed) ODBC gets annoying in that many drivers then require that such string columns have to come last. Think carefully about how you want to support stored procedures: 1. A stored procedure can have input parameters, output parameters and can return multiple sets of results (and will in SQL server). How to support user defined types? 1. Users will want to define custom types that can be mapped transparently to sets of database fields. Examples: - A complex data type that maps to a pair of fields. - A "date" type that is stored as an integer (similar to boost::time) but might be written to multiple database fields. - Mapping of more complicated types, graphs, sounds etc. to blobs. Transactions: In ODBC, at least, there is not a database independent notion of a transaction. Instead, when you commit, you commit everything open against that connection. So if you want multiple transactions you need multiple connections. We went with C++ try...catch logic to handle transactions since this closely mirrors some of the ACID concepts you need for transactions in a database. So we would have try { // a set of operations here commit; } catch(...) { rollback; } More on our design for this is here: http://dtemplatelib.sourceforge.net/ExceptionHandling.htm More later... anyway I would be interested in helping on this project but don't know how much time I will have to commit to it. I definitely like the idea of defining the base layer in terms of simple operations so that it can support both ODBC and native drivers since I think this would be a big advantage for situations where and ODBC driver simply is not available ... Here is some background on how the ODBC cursor types compare to the C++ iterator types and a little bit on how the mapping was done in DTL ODBC cursors. The process of retrieving rows from a query result set and returning them to the application is called fetching. An application fetches data with a cursor. An ODBC cursor on a result set indicates the current position in the result set and what row of data will be returned next. In ODBC there are two main cursor types. Forward Only Cursors: - Can only move forward through the set of records. - Are not guaranteed to move through the records in any particular order. - Two different cursors against the same table using the same query may not return the same rows or same number of rows due to inserts, updates and deletes. Scrollable Cursors: - Can move forward and backward through the result set. - Two different cursors against the same table using the same query may not return the same rows or same number of rows due to inserts, updates and deletes. Scrollable cursors raise two additional questions not seen in forward-only cursors. First, how does the cursor establish some kind of order for the set of records returned? Without some kind of order for the records moving "forward" or "backward" through the result set does not have any meaning. Second, should the cursor detect changes to rows that have been previously fetched? In other words, when we re-position the cursor should it detect changes due to inserted, updated or deleted rows? Depending on how these questions are answered one obtains different scrollable cursor types. Scrollable Cursors: - Static cursor type. o Once the cursor has been created, the set of records is fixed regardless of any subsequent changes made to the database. Any inserts, updates or deletes by other applications to the set of records are not detected by this cursor. Usually this is implemented either by locking the underlying table so that no changes can be made by other applications while the cursor is in effect, or the set of records is viewed as of a particular version. - Dynamic cursor type. o Detects any changes made to the set of records due to insert, update or delete by other applications. Usually this kind of cursor is implemented by simply ordering the records with a unique index. This cursor then re-executes a fetch every time it scrolls to a new set of rows. Suppose the original set of records is defined by "SELECT * FROM EMPLOYEES ORDER BY EMPLOYEE_ID". Then, when the user requests that this cursor scroll to the next set of rows the cursor would execute the query "SELECT * FROM EMPLOYEES WHERE EMPLOYEE_ID > (?) ORDER BY EMPLOYEE_ID" where it would pass as a parameter the EMPLOYEE_ID from the last row in the current buffer. This kind of cursor actually creates multiple result sets due to the re-execution of the query every time a scroll operation is performed. As a result, the cursor does not establish a consistent order for the records in the sense that the "second" record in the result set might change its value and position if we scroll forward and then attempt to scroll back to this "second" record. - Keyset driven cursor type. o When the cursor is opened, it immediately stores a set of unique keys for all rows in the result set. Later, when the user attempts a scroll operation, the cursor simply retrieves rows corresponding to the previously stored keys. This means that updates by other applications are detected with a new value for the row (unless the update changes the key - if another application changes the key this is equivalent to a delete and then an insert as far as a keyset cursor is concerned). Deleted records show up as "holes" with a special status flag. As for inserts made by other applications, these are not detected since these inserted rows did not appear in the original set of keys that was fetched when the cursor was first opened. This cursor does establish a consistent order for the records in the sense that if we scroll forwards and backwards the same records will appear in the same position, although they may be marked as having been deleted and may contain updated values. - Mixed cursor. o This is a combination of a keyset driven cursor and a dynamic cursor. This kind of cursor is typically used when the table is too large to make it practical to save keys for all the rows in the result set. In this case, the cursor fetches keys for some subset of the rows. Inside this subset, the cursor behaves like a keyset driven cursor. Outside this subset, the cursor behaves like a dynamic cursor. A mixed cursor is equivalent to a keyset-driven cursor when the size of the key subset is equal to the result set size. A mixed cursor is equivalent to a dynamic cursor when the size of the key subset is equal to 1. STL Iterators. Here we summarize some key properties of certain C++ standard iterators. For this discussion we are really interested in only three types of iterators: input iterators, output iterators and random access iterators. Input and output iterators are considered primarily because these have the smallest set of requirements of all the iterator types and so can be mapped most easily to an ODBC cursor. Random access iterators are interesting since these have the largest set of requirements and correspond most closely to the type of usage that is made of scrollable ODBC cursors. Here we summarize only the iterator properties that are important to our discussion (for full details on the iterator requirements see section 24.1 of the C++ standard). Noteworthy Input Iterator Requirements 1.. Input iterators are assignable and copy constructable. 2.. Let a, b be input iterators. If a==b and (a,b) is in the domain of == then *a is equivalent to *b. 3.. Let r be an input iterator. If we increment r via ++r, then after the increment any copies of the previous value of r are no longer required to be dereferencable or in the domain of ==. 4.. For input iterators a==b does not imply ++a==++b. Algorithms on input iterators should never attempt to pass through the same iterator twice. They should be single pass algorithms. 5.. Let r be an input iterator. The result of *r must be convertible to the value type of the iterator. Some comments. - Requirement two that the dereferenced value of a and b be 'equivalent' is vague. Standard is not clear on what 'equivalent' means, but it does not mean these two must have the same value. - Requirement three means that conceptually when we implement an input iterator we are allowed to code it such that only one copy of the iterator can be "active". The reason for this is that once one copy of the iterator has been incremented other copies become invalid. Noteworthy Output Iterator Requirements 1.. Output iterators are assignable and copy constructable. 2.. Let r be an output iterator. The only valid use of operator * is on the left hand side of an assignment. After an assignment, the iterator, r, must be incremented before it can be assigned to again. Algorithms on output iterators should be single pass algorithms. Some comments. - Typically operator * on an output iterator is defined to return a proxy class that performs a write of the data upon assignment. This proxy might be the iterator itself or might be some other class. How DTL Abstracts ODBC cursors as iterators. SELECT statement, forward only cursor: Input Iterator - In order to allow multiple copies of the iterator to be active at the same time, we chose a design wherein each iterator holds its own ODBC cursor. This allows us to have logic like a=b; ++a; ++b; and have both iterators a and b be able to return a value. In fact, we did not have to do this since the standard allows only one copy of the iterator to be active. So, under the standard it would be allowable for b to become invalid once a had been incremented. - Note that we are taking full advantage of the standard in performing this mapping. Records are not guaranteed to come back in any particular order, nor are two copies of an iterator guaranteed to retrieve the same set of records due the fact that inserts, updates and deletes may be performed on the set of records the select statement is operating against. INSERT, UPDATE, DELETE statement, forward only cursor: Output Iterator - This is fairly straightforward. The only trick here is that the execute of the statement must happen on assignment. Otherwise, loops that use post-increment logic to write to the iterator will not behave correctly. STORED PROCEDURE, forward only cursor: Input + Output Iterator - Mapping stored procedures to C++ iterators was less straightforward since stored procedures perform both input and output. - Stored procedures accept arguments as input, and in this way act as an output iterator. - Stored procedures can return results from a query and can return results via its arguments so in this way it acts as an input iterator. What is more, stored procedures can return more than one result set which means the resulting iterator may need to act over more than one range. Therefore, we added a MoreResults() method to the sql_iterator object so that the iterator can pass through multiple result sets that might be returned by the query. - To split the input and output for stored procedures we have sql_iterators hold two buffers. One buffer we called the data object (DataObj) which stores the result sets that come back from the query. The second buffer we call the parameter object (ParamObj) which is designed to hold parameters that are passed to the query.