Requests for comments on a (partly) hypothetical non-relational serialization library

Hi all: I've been working for a while on a variety of tools to facilitate application development in normal (cross-platform) C++, and avoid the byzantine dependency chains (including needing multiple boost versions) which so often creep in because real applications always seem to piece together disparate parts with different build systems, requirements, even how to download the source code...pretty soon you're not programming C++ anymore, you're tinkering with Python make scripts, or Perl code generators, or learning Git or Subversion ... know what I'm saying? Anyhow, I'm a fan of the Mongo database, but it's notoriously hard to build even the drivers, and not really suited for simple SQLite-like object serialization for persistence between runs of an application (even though this is theoretically possible, it is poorly documented and still requires linking against the entire Mongo system). So I've decided to develop a serialization framework (not a database) with some "NoSQL" features based on Mongo, but alot easier to use. I believe this framework could provide a foundation upon which useful, moderately complex C++ applications could be designed, by providing extensions to the library which are optional to use but which incorporate my work (I hope that doesn't sound pedantic) on general application development, without extra external dependencies. Specifically, these extensions would include: 1) A tool for generating GUI code -- for wxWidgets, in particular -- from archives that could be edited with a simple textual front-end, vaguely like XAML; 2) A custom language based on Clojure -- a Lisp dialect originally implemented by Rich Hickey on the JVM -- for expressing queries and importing/exporting data from/to an archive; 3) Perl6-like regular expressions for matching against textual fields in an archive; 4) AI-inspired algorithms for sorting, filtering, and in other ways operating on archives. My academic background is in AI -- actually, to be precise, I wrote a doctoral dissertation in the philosophy of science, but I researched AI in this context -- but I'm especially interested in nonrelational database theory because it better captures the process of modeling complex systems, and, in general, nonrelational databases are more interesting from an AI perspective because the lack of a fixed schema means that operations like sorting and filtering can require some "reasoning". I'm particularly interested in application development because I think one concrete application of AI research is to make tools like IDEs smarter. A non-relational serialization library could potentially serve the application development process not only by providing an easy way to persist data, but through IDE extensions or project generators -- store lists of debug breakpoints in an archive, or parse source code for namespaces, types, etc., and store the results in an archive, or an archive to represent all the controls in a GUI... The library I have in mind would differ from boost.serialization by providing explicit support for non-relational functionality, and also by using a restricted type system along the lines of MongoDB and JSON: any persistable data field would have to be marshalled into one of a few predefined types, although users could explicitly extend the type system if desired. Aside from writing persistence code directly in the C++ source (along the lines of, e.g., instantiating a serialize() template in namespace boost::serialization), the test or demo applications I've been writing use external files, written in the (currently very minimal) Clojure-like language I mentioned above, and an interpreter does the actual serialization -- so the persistence strategy could be altered without recompiling the application, even while it is running. I think this offers new potential for using AI-style algorithms for things like tracking usage patterns, because all of that could be implemented fully orthogonal to the application itself. So, that's the project I've sort of assigned myself, and I would appreciate any comments and ideas and what I could do to make this the kind of library C++ programmers would consider trying out. Thanks in advance.

At Sat, 19 Jun 2010 15:38:14 -0500, nathaniel@photino.org wrote:
Hi all: I've been working for a while on a variety of tools to facilitate application development in normal (cross-platform) C++, and avoid the byzantine dependency chains (including needing multiple boost versions) which so often creep in
Cool; would love to hear more about how you do that.
because real applications always seem to piece together disparate parts with different build systems, requirements, even how to download the source code...pretty soon you're not programming C++ anymore, you're tinkering with Python make scripts, or Perl code generators, or learning Git or Subversion ... know what I'm saying?
Maybe in your view it will just amount to more tinkering, but it sounds like http://ryppl.org is designed to address many of these issues.
Anyhow, I'm a fan of the Mongo database, but it's notoriously hard to build even the drivers, and not really suited for simple SQLite-like object serialization for persistence between runs of an application (even though this is theoretically possible, it is poorly documented and still requires linking against the entire Mongo system).
So I've decided to develop a serialization framework (not a database) with some "NoSQL" features
From what I can find, the term “NoSQL” is so nebulous (we don't know much except that it's not a traditional SQL database) that it's hard to imagine what a “NoSQL” feature might be. More specifics could help.
based on Mongo, but alot easier to use. I believe this framework could provide a foundation upon which useful, moderately complex C++ applications could be designed, by providing extensions to the library which are optional to use but which incorporate my work (I hope that doesn't sound pedantic) on general application development, without extra external dependencies.
This makes me a bit nervous, because it begins to sound like a framework of its own, which tends to imply its own dependencies. I don't think you'll find much interest here in components whose use require dragging in a dependency on some kind of database store.
Specifically, these extensions would include:
1) A tool for generating GUI code -- for wxWidgets, in particular -- from archives that could be edited with a simple textual front-end, vaguely like XAML;
2) A custom language based on Clojure -- a Lisp dialect originally implemented by Rich Hickey on the JVM -- for expressing queries and importing/exporting data from/to an archive;
3) Perl6-like regular expressions for matching against textual fields in an archive;
4) AI-inspired algorithms for sorting, filtering, and in other ways operating on archives.
These all sounds quite interesting, but also they all sound like they should be independent projects.
My academic background is in AI -- actually, to be precise, I wrote a doctoral dissertation in the philosophy of science, but I researched AI in this context -- but I'm especially interested in nonrelational database theory because it better captures the process of modeling complex systems, and, in general, nonrelational databases are more interesting from an AI perspective because the lack of a fixed schema means that operations like sorting and filtering can require some "reasoning". I'm particularly interested in application development because I think one concrete application of AI research is to make tools like IDEs smarter. A non-relational serialization library could potentially serve the application development process not only by providing an easy way to persist data, but through IDE extensions or project generators -- store lists of debug breakpoints in an archive, or parse source code for namespaces, types, etc., and store the results in an archive, or an archive to represent all the controls in a GUI...
The library I have in mind would differ from boost.serialization by providing explicit support for non-relational functionality,
Please be specific.
and also by using a restricted type system
Boost.Serialization already uses a restricted type system AFAICT.
along the lines of MongoDB and JSON: any persistable data field would have to be marshalled into one of a few predefined types, although users could explicitly extend the type system if desired. Aside from writing persistence code directly in the C++ source (along the lines of, e.g., instantiating a serialize() template in namespace boost::serialization), the test or demo applications I've been writing use external files, written in the (currently very minimal) Clojure-like language I mentioned above, and an interpreter does the actual serialization -- so the persistence strategy could be altered without recompiling the application, even while it is running. I think this offers new potential for using AI-style algorithms for things like tracking usage patterns, because all of that could be implemented fully orthogonal to the application itself.
That sounds pretty research-speculative at this point. Am I right?
So, that's the project I've sort of assigned myself, and I would appreciate any comments and ideas and what I could do to make this the kind of library C++ programmers would consider trying out.
My advice: you have lots of really interesting ideas, but any one of them by itself could make for an all-consuming project. Start by decoupling them. Then, pick a small piece to implement first. Nothing kills off great ambitions faster than biting off too large a hunk at once. Also, try to be more specific and fill in more details when you describe what you're doing. Don't assume that everyone who would want to use your work knows anything about non-RDBMSes, AI, Functional Programming, etc. HTH, -- Dave Abrahams BoostPro Computing http://www.boostpro.com
participants (2)
-
David Abrahams
-
nathaniel@photino.org