
David Abrahams wrote:
I consider the submission a use case for archive creation and/or extension.
But I don't understand what you mean about it being a "use case."
I mean an example of how the library can be extended to achieve some specified requirement. In this case, improved method of saving/loading certain types of data in certain types of archives.
There are some negative consequences of creating the hooks outside Boost.Serialization. Once you understand them, I'm pretty sure you will think they are significant.
I'm all ears. I can really only comment on the proposal submitted and that's what I did.
Let me explain one place where our difference lies.
Having read everything that follows, I don't see any explanation of a "place where our difference lies." The parts I understand (most of it) sound like "motherhood and apple pie" -- good, common sense that's hard to disagree with. Is it a thought that was never finished? Would you care to try to put it more succinctly?
From looking at these discussions, one might get the impression
It seemed to me that that submission didn't take these aspects of the design into account. I had presumed that this was because the separation was nowhere really made explicit. I was trying to make up for that. I was concerned that it might not be obvious that the distribution of implementation in the hierarchy of class was very deliberate and not arbitrary. I can see how someone might look at the way something was done and say, "wow - that's not necessary - we can just collapse out that layer" etc. In fact I would expect a lot of people to react that way when they first see it. (What follows is a diversion from the question at hand for those who have some extra time or interest. Feel free to skip) An interesting thing is how the "implementation organization" comes about. If one is an avid reader of boost mail archives he will notice a huge amount of discussion about the design of things. How things should be separated, what implementation techniques should be used, etc., etc - the discussion addresses things a finer and finer level of detail as time goes on. Most of the discussion is is speculative - If one does this things this way then you'll be able to do x - but who needs to do x when you can do y, etc. that this is the way something like a largish body of code such the serialization library is designed. The truth is - it doesn't happen this way - at least not with me. The discussions can be interesting and helpful - up to a point. But once it arrives at a certain level of detail - its truely beyond the human brains capacity to imagine all the consequences of these design decisions. So when I started out, I had a) positive experience with Microsoft's MFC serialization b) a list of things about it that I wanted to "fix" c) a list of other systems which attempted to address the same issues I did. Although none of these systems included all the things I wanted to fix - many had interesting ideas. d) a fxed idea that description of how something is serialized must be orthogonal to the archive implementation. d) a concise half page description of how it would be used (your Archive Concept) I made the first tutorial demo and developed that in parallel with the first version of the library. It started out very simple As time went on, more "requirements" were added. Much of these "requirements" were formulated during the the first review. Lots of boost type discussion (good and bad) consumed lots of effort. All this discussion was pretty much summarized on G. Rosenthals definitiive review if the library. It was very complete and very well written. This resulted in much refactoring. After acceptance I realized we needed a polymorphic interface. Dynamic DLL loading resulted in more refactoring. Through all this the original demo tutorial application hardly ever changed. The final design is the triumph of evolution over intelligent design. There's a very deep lesson here I'm sure. I see things such as xtreme programming vs waterfall design, evolution vs creationism, maket capitalism vs socialist central planing, as all related. (End of diversion) So, from the above, it's obvious to me that how to implement the serialization system is not at all obvious. (If it were, I would have needed only one iteration !) Like lots of things it might be obvious in retrospect. Or worse it might LOOK obvious when its really not. I hope that clarifies things.
It is only in this way the the library can be extended without being complicated geometircally as time goes on.
I am a bit surprised to hear you state flatly that there is only one way to extend the library that can ever work. How can you possibly know you've considered every possibility? I don't have the same confidence, even about problems I've studied for years.
Hmmm what I meant to say is illustrated by the following: Suppose one has some library L. If its successful, there will be demand to enhance it as time goes on. This is a "good thing" (tm). Now suppose that the introduction of enhancement E results in L' which presents an API which is a superset of L. Of course its internally more complex with "global" modes and object traits etc. It does take more effort to debug than originally anticipated but it does work and Its backward compatible and now has the new functionality and everyone's happy. For a while. Almost everybody. Now its a little harder to learn to use for beginners. But its OK. The success of enhancement E stokes demand for enhancement F. Each additional enhancement is harder to implement, and the resulting library can be understood by less and less people, and its harder and harder to learn to use. This is a typical cycle which many software products suffer from. (BTW - other products suffer from this as well. It almost seems there is a thermodynamic principle at work - conceptual integrity of all ideas decrease over time as attempts are made to apply them ever more broadly) Now suppose when demand for enhancement E comes up someone says - wait a minute - You have to implement E as some sort of add on module. It seems like its more work. But since the work doesn't make the original code more intricate the effort to design, code, debug, test and document E is striclty proportional to the size of E. So of while there are lots of ways to extend a library - But by choosing an inconvenient method - the original utiliy of the library will suffer - even as the library gains functionality !!! So maybe instead of saying there's only one way to extend the library, I really meant to say there are lots of ways NOT to extend a library. What if the enhancement can't be done as an add-on? Then you've got to refactor the library. This should happen less and less frequently as time goes on.
As time goes on I would hope that this can be improved. But maybe this explains my reluctance to maintain parts of the library beyond the reach of those making other archives.
Other archives? Beyond reach? I don't understand what you're saying here.
I don't remember what I meant to say here. I probably meant to say that I would hope that the library extends by adding on more and more functionality through extension and accretion rather than making the stuff that's already in there more elaborate.
we are going to start from new code that doesn't change any part of Boost.Serialization, so if possible, it might be better to try to forget about what you've seen before.
no problem - I can't remember that far back anyway.
I suspect that the job of making a protable binary archive is much harder than it first appears.
Actually it's almost trivial (I did it over 10 years ago), but I don't know what that has to do with what we're trying to accomplish.
The speedups we're proposing don't have anything in particular to do with portable binary archives.
I presumed too much then. From the thread discussion, it seemed that this was just the intial effort to adapt the serialization library to the needs of High Performance Computing. XDR compatibility. (http://www.faqs.org/rfcs/rfc1014.html) was mentioned at some point as was MPI (http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-1.1/node39.htm#Node3... think) Both of these entail portable binary format - with atendant endian issues. Maybe the mentioning of this in the context of discussion of the submission which didn't really mention this confused things in my own mind. So just to keep the pot boiling - it seems to me that gaining the 10x speed up associated with "bitwise collecion" serializaton in the context of portable binary archives such as XDR is going to be a tall order.
I didn't pursue this as I really don't want to discourage these kinds of efforts and they are (or should be) orhogonal to the library as it is currently implemented.. If they can be implemented without altering the core - then I have no problem. If someone believes that modifying the core is unavoidable, then either he or I have made some sort of mistake and it will have to be resolved.
It's not unavoidable; as I've said before, it just has consequences that we don't like, and we think you probably won't like either. If you can hang on until we've presented what we think is the best design that avoids altering the core, then we can look at the consequences. Once you understand them, if you still don't want to make any changes and you're willing to accept the consequences, we're not going to press the issue any further.
Fine, I was asked to comment on what was submitted. We'll start the next round with a clean slate.
If they don't reallly have to alter the core, but the archive auther thinks it would make his job easier - then we have a probem.
Let me be very clear about this, at least:
,---- | Ease of archive implementation is unrelated to the motivation for | requesting core changes. `----
I hope that allays at least one of your concerns.
It does. And I'm sure you probably deal with this on a regular basis with your own libraries.
I get a suggestion about once a month to modify the core of he library for this or that reason. Aside from bugs, it usually boils down to the suggestor looking at the code and seeing - "Oh I could fix this right there!" without considering all the repercussions and without considering the alternatives. (As you might guess, this is what I believe happened in this case).
Note that this isn't a personal criticism - its a natural occurance that happens all the time.
Actually Matthias' considerations went much deeper than you give him credit for. In my opinion, he just failed to communicate his rationale properly, and since the details of his code seemed to you to violate basic principles of your design, I'm sure it was all the more difficult for you to understand the problems he is trying to avoid.
LOL - I think I understood the code submitted and what it was intended to achive. As far as I could fathom the rationale, I presented an alternative designed to achieve the same results without sprinking bits of code throughout lots of other modules.
Working from new code that (I hope!) won't cause you any alarm, it might be easier to understand the rationale.
I guess you and Matthias were somewhat taken aback by my response. Sorry about that. Anyway, it seems you do have an understanding and even appreciation of my concerns so I'm optimistic that the next iteration will be better. The crux of my argument is that I believe that the kinds of extensions you want to implement can best be done without altering the current library. I'm willing to be proved wrong with a counter example - but the last didn't qualify in my opinion. Also it seems that lots of people are using the library in ways I haven't totally forseen there there have been lots of opportunities for such counter examples to be presented. (The only one that really stuck was shared_ptr serialization - and I'm still not sure about that!!)
Another common occurence is the attempt to use the serialization system to accomplish some end for which it is not suited. A typical idea is to use it to implement some externally defined file format. I know I drag my feet, I know it drives people crazy, but I truely believe that the success of the library is due in no small part to my reluctance to add in any more than is absolutly necessary.
Understood. It might be a good idea for you to clearly define the intended scope of the library. What criteria distinguish an appropriate application from an inappropriate one? I'm interested in hearing your intention as the library author, rather than something like "an appropriate application is one that works well with the library as it is currently specified and/or implemented." Depending on your answer, we might indeed be barking up the wrong tree.
The very first sentence of the Overview of the Documentation states: "Here, we use the term "serialization" to mean the reversible deconstruction of an arbitrary set of C++ data structures to a sequence of bytes. Such a system can be used to reconstitute an equivalent structure in another program context. Depending on the context, this might used implement object persistence, remote parameter passing or other facility. In this system we use the term "archive" to refer to a specific rendering of this stream of bytes. This could be a file of binary data, text data, XML, or some other created by the user of this library. " I'm not sure I can make a better statement than that regarding what I expected the library to be used for.
So, I look forward to seeing progress on the following:
a) better handling of special optimization opportunites which obtain for certain combinations of data-types and archives. Hopefully, an elegantl implementation will serve as a model for other people's pet addiitions.
I hope we'll be able to show you something elegant very soon.
No need to hurry on my account.
b) A protable binary implementation suitable for such things as MPI messages.
Portable binary archives and MPI have little relationship to one another. You don't flatten your data into a portable format, ship it in an MPI message that is just a sequence of bytes, and then deserialize. MPI handles portability internally.
I've taken only the most cursory look at MPI. (turns out this may change due to some other project). So I won't dispute this. I don't see how one could pass information between heterogeneas machines without addressing all the issues related to making a portable binary archive. Perhaps MPI leaves that part undefined - but still it will have to be dealt with somewhere.
I also expect these to take some time and hope they can be subjected to the boost "process" of public criticism and refinement. This will take more time but result in a better product. Hopefully, it will be less stressful as well - though I doubt it.
I really am trying to wind down my involvement in the serialization library.
That's a bit alarming, actually. Have you got someone else lined up to maintain it?
I was thinking of Matthias though I've never brought it up
It's important to us and to many others that the library has a future.
As long as people continue to use it I'm sure it will have a future ...
Without the involvement of the original author, that would be in doubt.
...regardless of whether the original author is involved. Does this mean I can't die until I get a replacement? Personally, I see the idea that the viability of any piece of code is tied to the continuing involvment of the original author as a sign that the code is lacking in some dimension. It should be easy for someone to see what is going on and fix. If it's not - its really a failing on the original author. So I've been personally gratified to have people send me fixes to very obscure and arcane bugs. I don't always incorporate the fix due design considerations but I often do. Some of these things are devilishly hard - what happens when code implementing serialization is dynamically unloaded? - things like that. Other's are obscure corners of other standards - e.g. how does one encode a string with and embedded \0 into and html string. Or what is portable way to create a sNaN when loading a portable archive. There a probably lots of little corners with things that need fixing and the truth is I'm already relying on people with more specialized knowledge to help with these things. So already things are moving to other people on a case by case basis. I would hope to see the library grow and proper by seeing things layered on top of it. Thus my personal involvement should taper off as it seems to have in other successful boost libraries - and as it should in any successful programming project. There is one kind of change that I would like to in the core library as time goes on. I would like to see certain things migrate out of the library and become boostified. Examples are things like strong typedef, extended typeinfo, dataflow iterators (my personal favorite). I recognize that that is a little unrealistic and I never mess with these things so its not a big issue - its just I would like to see the library smaller. Also it would be interesting to see if the boost class factory can be used to replace similar functionality implemented in the serialization library - there may be other such cases. Robert Ramey