[modularization] Extract xml_archive from serialization
Hi there, The biggest single problem of coupling in boost comes from the spirit dependency in the serialization library. This makes serialization itself very (and needlessly) heavy: http://www.steveire.com/boost/2014sept16_serialization.png Spirit is used only by the xml archiving classes. I recommend extracting an xml_archive library from serialization. That way, serialization no longer depends on spirit, which is already an improvement: http://www.steveire.com/boost/2014sept16_serialization-after-extract-xml_arc... Further, the serialization classes for boost::variant and boost::array are in the serialization library. This is not appropriate, as the serialization classes for all other types are in the libraries providing the types. Move the serialization classes to those libraries. mkdir ../variant/include/boost/serialization mv include/boost/serialization/variant.hpp ../variant/include/boost/serialization mkdir ../array/include/boost/serialization mv include/boost/serialization/array.hpp ../array/include/boost/serialization This is a large improvement: http://www.steveire.com/boost/2014sept16_serialization-after-type-move.png There is more that can be done. These things can be done now. I recommend doing them. Thanks, Steve.
Stephen Kelly-2 wrote
Hi there,
The biggest single problem of coupling in boost comes from the spirit dependency in the serialization library. This makes serialization itself very (and needlessly) heavy:
http://www.steveire.com/boost/2014sept16_serialization.png
Spirit is used only by the xml archiving classes.
I recommend extracting an xml_archive library from serialization. That way, serialization no longer depends on spirit, which is already an improvement:
http://www.steveire.com/boost/2014sept16_serialization-after-extract-xml_arc...
Further, the serialization classes for boost::variant and boost::array are in the serialization library. This is not appropriate, as the serialization classes for all other types are in the libraries providing the types. Move the serialization classes to those libraries.
mkdir ../variant/include/boost/serialization mv include/boost/serialization/variant.hpp ../variant/include/boost/serialization
mkdir ../array/include/boost/serialization mv include/boost/serialization/array.hpp ../array/include/boost/serialization
This is a large improvement:
http://www.steveire.com/boost/2014sept16_serialization-after-type-move.png
There is more that can be done. These things can be done now. I recommend doing them.
Thanks,
Steve.
I think the notion of "dependency" is richer than can be captured in this sort of graph. So it can't be understood in terms of this graph alone. I've written about this in the past - my maybe my post was lost due to google forum issues. For anyone who's interested here it is again. Consider another simple case - date time/serialization.hpp most date/time users don't use this - but a few do. Is serialization a prerequisite for date/time? which users are we talking about? One can't win here. If you distribute serialization with every use of date/time you're distributing too much. If you don't, you'll be failing to ship functionality which some users need. What is the solution here - make two libraries out of date/time? or what? Suppose I have a simple application A which uses the text_archive and only serializable types defined within the application itself. It should be clear that I can ship that application without shipping any of the libraries or code in ../serialization/variant.hpp etc..., xml_archive etc... So one can say that A is not dependent upon anything other than the serialization library. So, at least for this application, the dependency graph referred to above is not a good indicator of what I have to ship with my app. In fact, it's misleading. A little reflection reveals why this is so. The graph is generated by considering what it takes to build the serialization DLL and/or LIB which includes all the archive classes and perhaps a bunch more stuff. So the graph tells us something, but what? The serialization library has several classes of components a) library core - implements common code to all serialization/archives b) particular archive implementations, xml_archive, ... dependencies according to the particular archive type being used or built c) serialization of other library components - e.g. shared_ptr - which depends on share_ptr itself. d) the test suite - which depends on all the archives being tested - which is the boost build default usage e) examples - will depend only on a small part of the serialization library. Now if you wanted o make a series of graphs like: a) particular archives text_archive, ... b) serialization for each included type e.g variant c) all tests, or each subset per archive d) examples e) other libraries such as date/time which use the serialization library in some its applications and test but not in others. You'd have something more accurate - but alas - more complex to interpret and hence less useful. So - the degree of "modularization" cannot be determined or illustrated or measured by examining the graph above. The question has to be couched in more concrete terms: a) if I want to distribute the components required to build some particular app? I don't know if we have a tool to do this, but both boost build and CMake will build only the components required to build the app. b) If I want to distribute the components required to build any application which might use any of the interface in the serialization library - I'll be distributing a lot - as you point out above. etc. So, taken to it's logical conclusion, extracting xml_archive would lead to extracting other components as well. One or more for each of the classes a-e listed above. I'm not sure we want to do this. Traditionally, (and not just in boost) libraries have been organized around developer responsibility. This has more or less paralleled "dependency". "dependency" has also been fostered by incremental addition to the library set. If I had more time, I might be able to make this argument more coherent and tighter. Sorry about that. But the real questions are: a) what do we want modularization to accomplish and is this a feasible goal. b) Do we want to obsolete the original concept of equivalency between module and developer responsibility? c) Do we want to support deployment of boost subset? I think we do. d) How should such a subset be defined - via BCP or some boost build dependency. e) How fine grain should such a dependency measured. Does importing one header - makes the whole other library a prerequisite or just that header and associated *.cpp. My basic point is that these questions have to be addressed before the notion of decoupling can be carried much further. In concrete terms - the exclusion of xml_archive should be: a) dropped altogether - (find by me btw) b) created as a separate library module c) not included in builds that don't require it? note that boost build already do this. d) or should the whole serialization library be subdivided in to a "library group" based on consideration of the classes above. (lol - never going to happen) I'm concerned that the movement to diminish module dependencies is failing to take into account the above considerations. At least I don't recall seeing these considerations explicitly addressed. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-Extract-xml-archive-from-s... Sent from the Boost - Dev mailing list archive at Nabble.com.
On Tuesday 16 September 2014 09:42:25 Robert Ramey wrote:
I think the notion of "dependency" is richer than can be captured in this sort of graph. So it can't be understood in terms of this graph alone. I've written about this in the past - my maybe my post was lost due to google forum issues. For anyone who's interested here it is again.
Consider another simple case - date time/serialization.hpp
most date/time users don't use this - but a few do. Is serialization a prerequisite for date/time? which users are we talking about? One can't win here. If you distribute serialization with every use of date/time you're distributing too much. If you don't, you'll be failing to ship functionality which some users need. What is the solution here - make two libraries out of date/time? or what?
The solution will be to separate the dependency on Serialization into an optional component. This can be a header or a git submodule or a sublib in DateTime or something else. What exactly this is is defined by a number of aspects, including maintenance convenience, access control, distribution and deployment infrastructure. I agree that many of these aspects are not defined at the moment, but from the perspective of maintenance, access permissions and modularization effort a sublib looks most feasible to me.
Suppose I have a simple application A which uses the text_archive and only serializable types defined within the application itself. It should be clear that I can ship that application without shipping any of the libraries or code in ../serialization/variant.hpp etc..., xml_archive etc... So one can say that A is not dependent upon anything other than the serialization library. So, at least for this application, the dependency graph referred to above is not a good indicator of what I have to ship with my app. In fact, it's misleading.
In an ideal world you could distribute your application with the subset of Boost on per-header basis. But I think this task is not realistic at the current stage - mostly because it's difficult to correctly discover all possible dependencies on per-header basis. At this point the most reasonable level of dependency tracking is per-library or per-sub-library. It is not optimal in that it can add dependencies you don't actually need, but it's certainly better than the monolithic Boost. Returning to your example, the application will pull Serialization and everything it depends on, unless you extract the optional bits to sublibs or make them optional otherwise.
A little reflection reveals why this is so. The graph is generated by considering what it takes to build the serialization DLL and/or LIB which includes all the archive classes and perhaps a bunch more stuff.
So the graph tells us something, but what?
The serialization library has several classes of components
a) library core - implements common code to all serialization/archives b) particular archive implementations, xml_archive, ... dependencies according to the particular archive type being used or built c) serialization of other library components - e.g. shared_ptr - which depends on share_ptr itself.
These are probably the best candidates for separating from the core.
d) the test suite - which depends on all the archives being tested - which is the boost build default usage e) examples - will depend only on a small part of the serialization library.
Tests and examples typically use more components than the library itself (at least, most tests need some testing library or infrastructure). For this reason I consider them as a special kind of sublibs, in the sense that they are optional, and you would have to explicitly install them so that their dependencies are pulled. When you only need the library itself, you don't have to install dependencies of its tests and examples.
Now if you wanted o make a series of graphs like: a) particular archives text_archive, ... b) serialization for each included type e.g variant c) all tests, or each subset per archive d) examples e) other libraries such as date/time which use the serialization library in some its applications and test but not in others.
You'd have something more accurate - but alas - more complex to interpret and hence less useful.
The reports Peter publish show the library headers dependencies - which is our main concern now and is enough to work on the current stage. A more accurate report would also include dependencies needed to build library from sources (i.e. the dependencies of src/*). The dependencies of tests and examples are not the issue now, but they will be when we have a deployment tool. But if we can track dependencies between libraries, I don't see the problem doing the same for tests and examples.
If I had more time, I might be able to make this argument more coherent and tighter. Sorry about that.
But the real questions are: a) what do we want modularization to accomplish and is this a feasible goal.
Being able to download and install a subset of Boost.
b) Do we want to obsolete the original concept of equivalency between module and developer responsibility?
I don't think we're doing this. At least, not so far.
c) Do we want to support deployment of boost subset? I think we do.
I think too.
d) How should such a subset be defined - via BCP or some boost build dependency.
The instrumental question is important, and there's no definitive answer yet. Mostly because there are no prototypes, so there's nothing to choose from. I remember only one proposal that was discussed on this list, and it wasn't Boost.Build. Currently, boostdep is used to track dependencies and generate reports, but there's no modularized deployment tool.
e) How fine grain should such a dependency measured. Does importing one header - makes the whole other library a prerequisite or just that header and associated *.cpp.
At this point on library/sublib level. I don't think header level is feasible at this point, but it may be in future. All the above is my opinion and understanding, of course.
I think that these questions need to be addressed before a large and disruptive effort is undertaken piecemeal. I've always been a supporter of boost modularization effort even though I knew it was going to be disruptive. But before investing too much effort, I would like to see us develop a more explicit overall plan and set of goals. I've resolved to prepare a document - "the future of boost" which I hope will a basis of discussion. I realize I'm a TL;DR serial offender as I write many of my posts using the "stream of consciousness method". This time I'll prepare and review it separately before posting. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-Extract-xml-archive-from-s... Sent from the Boost - Dev mailing list archive at Nabble.com.
I think that these questions need to be addressed before a large and disruptive effort is undertaken piecemeal.
I've always been a supporter of boost modularization effort even though I knew it was going to be disruptive. But before investing too much effort, I would like to see us develop a more explicit overall plan and set of goals.
I've resolved to prepare a document - "the future of boost" which I hope will a basis of discussion. I realize I'm a TL;DR serial offender as I write many of my posts using the "stream of consciousness method". This time I'll prepare and review it separately before posting.
Right, I think these proposals are somewhat premature - we need to know where we're going with concrete proposals and at least prototype tools (if there are to be any) otherwise this is all just moving stuff around for the sake of it. Just my 2c.. John.
Robert Ramey wrote:
I've always been a supporter of boost modularization effort even though I knew it was going to be disruptive. But before investing too much effort, I would like to see us develop a more explicit overall plan and set of goals.
I've resolved to prepare a document - "the future of boost" which I hope will a basis of discussion. I realize I'm a TL;DR serial offender as I write many of my posts using the "stream of consciousness method". This time I'll prepare and review it separately before posting.
I'd like to review such a thing too if you get around to writing it. Before or after posting it. Thanks, Steve.
Andrey Semashev wrote:
On Tuesday 16 September 2014 09:42:25 Robert Ramey wrote:
most date/time users don't use this - but a few do. Is serialization a prerequisite for date/time? which users are we talking about? One can't win here. If you distribute serialization with every use of date/time you're distributing too much. If you don't, you'll be failing to ship functionality which some users need. What is the solution here - make two libraries out of date/time? or what?
The solution will be to separate the dependency on Serialization into an optional component. This can be a header or a git submodule or a sublib in DateTime or something else. What exactly this is is defined by a number of aspects, including maintenance convenience, access control, distribution and deployment infrastructure. I agree that many of these aspects are not defined at the moment, but from the perspective of maintenance, access permissions and modularization effort a sublib looks most feasible to me.
Having tens of tiny 1/2/3 file 'sublibs' is not good.
c) serialization of other library components - e.g. shared_ptr - which depends on share_ptr itself.
These are probably the best candidates for separating from the core.
However, they do little to affect the dependency graph. I keep prioritizing things that affect the dependencies the most. Currently that's the serialization->spirit edge and the range->algorithm edge. Thanks, Steve.
Hi, Thanks for the response. It's unfortunate the you provide so much stop-energy. Robert Ramey wrote:
I think the notion of "dependency" is richer than can be captured in this sort of graph.
No one claimed the graph was some kind of universal all-encapsulating representation of "dependency *inverted commas*!".
So it can't be understood in terms of this graph alone.
The graph is showing public module dependencies. I think that's understood.
Consider another simple case - date time/serialization.hpp
most date/time users don't use this - but a few do. Is serialization a prerequisite for date/time? which users are we talking about? One can't win here. If you distribute serialization with every use of date/time you're distributing too much. If you don't, you'll be failing to ship functionality which some users need. What is the solution here - make two libraries out of date/time? or what?
The solution is to make serialization low-cost to depend on, so that depending on it is not a problem. That is exactly what I am recommending. The current problem with serialization is that it is expensive in terms of needless dependencies. My recommendation does a lot to solve that for serialization.
So the graph tells us something, but what?
Module/package dependencies.
So - the degree of "modularization" cannot be determined or illustrated or measured by examining the graph above.
Disputed.
So, taken to it's logical conclusion, extracting xml_archive would lead to extracting other components as well.
Nope. No one has suggested that. Extracting xml_archive isolates the spirit dependency. There is no similar motivation to extract other parts. I looked a little bit into splitting all of the archive parts away from the serialization part, but that still ties all the rest of the archive parts needlessly to spirit. What I recommend isolates the cost of spirit to the code that uses it. There could be reason to try to split the rest of the archive stuff from serialization, but I didn't look into that, so I'm not recommending it.
Traditionally, (and not just in boost) libraries have been organized around developer responsibility.
What I recommend doesn't change anything of this.
But the real questions are: a) what do we want modularization to accomplish and is this a feasible goal.
This is where you are providing a lot of bad stop-energy. Were not these questions answered years ago? Tell me this: Why did boost migrate away from svn to 100 fractured (not modularized!) git repos?
c) Do we want to support deployment of boost subset? I think we do.
This question was answered years ago. Why did boost migrate away from svn to 100 fractured (not modularized!) git repos?
My basic point is that these questions have to be addressed before the notion of decoupling can be carried much further.
Insisting that they are not already answered is not helpful.
In concrete terms - the exclusion of xml_archive should be: a) dropped altogether - (find by me btw)
FYI, "fine by me" is what you say when you agree with a proposal. Here you state the opposite of the proposal and say it's "fine by me" with a typo. It is a very strange way to express yourself.
b) created as a separate library module
This is the proposal. Thanks, Steve.
On 09/17/2014 03:23 PM, Stephen Kelly wrote:
most date/time users don't use this - but a few do. Is serialization a prerequisite for date/time? which users are we talking about? One can't win here. If you distribute serialization with every use of date/time you're distributing too much. If you don't, you'll be failing to ship functionality which some users need. What is the solution here - make two libraries out of date/time? or what?
The solution is to make serialization low-cost to depend on, so that depending on it is not a problem. That is exactly what I am recommending. The current problem with serialization is that it is expensive in terms of needless dependencies. My recommendation does a lot to solve that for serialization.
Stephen, speaking purely as a person who used serialization in the past, I'm not sure I understand what you propose *exactly*. I thought that unless you include XML archive headers, serialization does not actually pull in Spirit headers. So what's your proposal exactly, since "extracting an xml_archive library from serialization" is not entirely clear to me.
b) created as a separate library module
If that's your proposal, I would be -1 on this, if anybody bothered to care about my opinion. In particular, that would mean that any changes to serialization library that affect XML archives and other parts would happen in two different repositories, with no obvious way to relate per-repository changes. MPL maintainers too a different approach to this, by creating two submodules inside a single repository - which seems a bit better to me. - Volodya
Vladimir Prus wrote:
In particular, that would mean that any changes to serialization library that affect XML archives and other parts would happen in two different repositories, with no obvious way to relate per-repository changes.
Please explain to me how this is different to any other directed edge of dependency between different boost libraries. Thanks, Steve.
On 09/17/2014 04:52 PM, Stephen Kelly wrote:
Vladimir Prus wrote:
In particular, that would mean that any changes to serialization library that affect XML archives and other parts would happen in two different repositories, with no obvious way to relate per-repository changes.
Please explain to me how this is different to any other directed edge of dependency between different boost libraries.
There's rather high cohesion between xml archives and binary archives and base archives, therefore a coordinated change to implementation of those is more likely to be required than a coordinated change between random two libraries. I hope you're not trying to say that all dependencies between any two libraries are absolutely the same to you? - Volodya
Stephen Kelly-2 wrote
The graph is showing public module dependencies. I think that's understood.
Not by me. There definition of "module dependency" is unclear to me. I presume it's defined by the situation where to build one thing, one has to build other things. So if you start out with thing "A" then implies build/inclusion of some stuff from other libraries, and so on inductively until one defines a closed set. I could buy this. But the problem is when thing A is a module. Does building A refer to building the library, running tests, building the examples, building one app. Clearly if I'm building something which includes test_archive I have a different set of dependent "modules" than if I'm building something that includes xml_archive. I'm questioning the whole concept of "module dependency". To me it's ill defined and actually not definable outside of a more specific context. Hence it can't be used to determine how a large body of code should be (re) factored. We need something more precise - which has yet to be articulated.
Consider another simple case - date time/serialization.hpp
most date/time users don't use this - but a few do. Is serialization a prerequisite for date/time? which users are we talking about? One can't win here. If you distribute serialization with every use of date/time you're distributing too much. If you don't, you'll be failing to ship functionality which some users need. What is the solution here - make two libraries out of date/time? or what?
The solution is to make serialization low-cost to depend on, so that depending on it is not a problem. That is exactly what I am recommending. The current problem with serialization is that it is expensive in terms of needless dependencies. My recommendation does a lot to solve that for serialization.
I'm reluctant to propose specific courses of action too soon. It's almost for sure I will get it at least partially wrong. I'm going to over come my reluctance to address this specific case as an example and see where it goes. A user includes the date-time library in his code. He is dependent on a of boost headers which don't include boost serialization. He can build his app without including and/or linking serialization. He's happy about this. But he has to install the whole serialization module which under current rules means he installs spirit, and a whole lot of other stuff. He's unhappy about this. Damn! the date-time library refers to serialization even though I don't use it and it means that my date-time DLL is a lot larger than it has to be. This is really annoying to me. Also when someone mucks with the serialization library it might keep serialization from building which might keep my app from building even though I don't use even one line of code from it!!!! Very annoying. Since most of the problem is xml_archive->spirit - we can "fix" this by moving the xml_archive to ?. This will "solve" the problem above. Of course this comes a the expensive of everyone who wants to ship serialization with support for all of the archives classes in the package. They will now have to link with some other module other than serialization which is pretty non-obvious. So the net improvement in utility of boost libraries is not likely to be positive. The "correct" solution to the above is for date-time to build two modules: date-time and date-time-serialization. Now the original app user above has only what he wants and is not dependent upon boost serialization. Yet other users of the serialization library have what they want - serialization all in one place. To summarize - the right thing to extract is the serialization of date-time to a separate module. This kind of module has been referred to as a "helper module" (or something like that I don't remember). It's place in a "module dependency" graph is unclear. This means that the author of date-time has to refactor somewhat to create two modules. Add support for auto-linking and this is not quite as easy as it would first appear. I know this as I have addressed this within the serialization library itself. I did not want users to have to import the whole wide character code when they weren't going to need it. Hence I create serialization.dll for all the common code and wserialization.dll which includes code specific to wide character functionality. wserialization.dll calls into serialization.dll for core functionality. So we have the case where applications which don't use wide character functionality don't have to pay for it. And those that do get this functionality without having to do anything special - auto-link is fully implemented. Note that this refactoring/modularity is not at all visible in the "module dependency" graph. Never the less, I think this approach and result are consistent with your goal of "minimizing dependencies" (don't forget I don't think this phrase is well defined). At this point there would be a couple of things that would be possible. a) require/encourage authors of "library helper" (bad term!) modules to build them as separate DLLS/LIBS. b) divide the serialization (again) so that rather than wserialization and serialization it would be four modules serialization, serialization_with_xml_archive, wserialization wserialization_with_xml_archive. And of course don't forget to support auto-link. Note that while either of these options would address the "problem" faced by the user(s) above, The current "module dependency" graph would be the same in all cases. That is, this graph cannot be used to distinguish those cases where a problem exists and where it doesn't. The graph in interesting, but can't be used to make any real decisions. Also not neither of these options would require any changes to git module organization. Only Boost Build scripts and module source code would change. So it's my view that the current focus "Modularization" is somewhat misguided. It needs to be considered in terms of what boost policy should be toward importing other boost modules, granularity of modules, implementation of auto-linking - things like that. And deciding these things will take a level of consideration and effort that we haven't yet been able to muster. Perhaps your advocacy will provide the necessary sense of urgency to do this.
So the graph tells us something, but what?
Module/package dependencies.
So - the degree of "modularization" cannot be determined or illustrated or measured by examining the graph above.
Disputed.
LOL - and what does that mean? Of course this is the source of our disagreement. To you it seems clear what it means, to me its undefined. It will take a while to reconcile this.
So, taken to it's logical conclusion, extracting xml_archive would lead to extracting other components as well.
Nope. No one has suggested that. Extracting xml_archive isolates the spirit dependency. There is no similar motivation to extract other parts. I looked a little bit into splitting all of the archive parts away from the serialization part, but that still ties all the rest of the archive parts needlessly to spirit.
What I recommend isolates the cost of spirit to the code that uses it.
There could be reason to try to split the rest of the archive stuff from serialization, but I didn't look into that, so I'm not recommending it.
I think the problem is more fundamental that just moving around a few libraries/sublibraries. To me the current "problem" is an incidental side effect of the lack of implementation of certain policies that we have failed to define. So this "piece meal" approach will lead to unnecessary complexity and not really fix much. If we keep going down this road there will always be something to (re)factor.
But the real questions are: a) what do we want modularization to accomplish and is this a feasible goal.
This is where you are providing a lot of bad stop-energy. Were not these questions answered years ago?
Tell me this: Why did boost migrate away from svn to 100 fractured (not modularized!) git repos?
c) Do we want to support deployment of boost subset? I think we do.
This question was answered years ago.
Why did boost migrate away from svn to 100 fractured (not modularized!) git repos?
My basic point is that these questions have to be addressed before the notion of decoupling can be carried much further.
Insisting that they are not already answered is not helpful.
Oh no !!!. The reason we're having this problem is that we're never really thought about it. Before modularized Boost, there wasn't much we could do about it. Now we're looking at using modularized Boost to permit Boost to be made a lot bigger, this in turn raises the issue of deployment subsets and and for the first time we're starting look seriously at this. Up until now it was just an occasional grumbling. You're suggesting I'm against doing anything. That's not true. I'm against doing the wrong thing. These are not the same. You're also suggesting that I don't think there is a problem. That's also not true. But I don't buy the argument "something needs to be done, this is something, therefore we must do this".
b) created as a separate library module
This is the proposal.
I'm still not quite getting what you mean by creating a separate module. Do you mean something similar to what I mentioned above as serialization_xml_archive... This wouldn't effect the "module dependency" graph but it would might accidentally address the "subset deployment" issue. Do you mean creating a separate module at the git level? This would make the "module dependency" graph look more like what I think you want it to look like. But I'm convinced it would actually address the issue of users importing code that they don't actually use - I'd have to think about this. Or do you mean something else entirely? My real point is that I believe it's pre-mature to start investing in "minimizing module dependencies" before really considering what it is we want to achieve and the alternatives for achieving it. I believe my arguments supports the proposition that this is not an unreasonable request. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-Extract-xml-archive-from-s... Sent from the Boost - Dev mailing list archive at Nabble.com.
Robert Ramey wrote:
The definition of "module dependency" is unclear to me.
The definition of module dependency that the report uses is: module X depends on module Y if some header of X's includes some header of Y's where "some header of X's" refers to the headers in libs/X/include. That is, it doesn't track dependencies from libs/X/src or libs/X/test. In other words, if someone tries to *use* X by including some of its headers, he needs (in the general case) to also have Y's headers installed, or the code will not compile.
Peter Dimov-2 wrote
Robert Ramey wrote:
The definition of "module dependency" is unclear to me.
The definition of module dependency that the report uses is:
module X depends on module Y if some header of X's includes some header of Y's
where "some header of X's" refers to the headers in libs/X/include.
That is, it doesn't track dependencies from libs/X/src or libs/X/test.
In other words, if someone tries to *use* X by including some of its headers, he needs (in the general case) to also have Y's headers installed, or the code will not compile.
OK - but what is the first code node? Is the users application? Is one of the tests? Is it all the cpp files in the library X? I think my biggest problem is the word "module". if I break date-time into two dlls - date_core and date_time_serialization will the "module" be one or the other or both? If date-time has a bunch of headers like gregorian.hpp etc. and one other library uses just that - is the whole other library dependent upon date-time? Does this change if one uses the popular "convenience header date-time.hpp? To me the "module" is sort of slippery and doesn't very well capture which we're concerned about. The attempt to say something which seems really natural like "library Y is a pre-requisite for library X" is, I think, a big problem. I realize you've answered it for purposes of running you dependency tool, but I don't think it's the definitive answer (I don't think there is one). That's why the reliance on this tool is leading us to difficulties. As I've said, this way of characterizing it doesn't help up in deciding what the best thing to do is. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-Extract-xml-archive-from-s... Sent from the Boost - Dev mailing list archive at Nabble.com.
Robert Ramey wrote:
Since most of the problem is xml_archive->spirit - we can "fix" this by moving the xml_archive to ?. This will "solve" the problem above. Of course this comes a the expensive of everyone who wants to ship serialization with support for all of the archives classes in the package. They will now have to link with some other module other than serialization which is pretty non-obvious.
So the net improvement in utility of boost libraries is not likely to be positive.
...
The "correct" solution to the above is for date-time to build two modules: date-time and date-time-serialization.
Is this "at the expense of everyone who wants to ship datetime with support for serialization in the package"? Is that 'non-obvious' too? Is this a net- positive? (putting aside that it's not clear what you mean by package, who's shipping what and to whom etc)
Now the original app user above has only what he wants and is not dependent upon boost serialization. Yet other users of the serialization library have what they want - serialization all in one place.
You just created a separate thing to (presumably separately) download. What's the 'all in one place' part?
So we have the case where applications which don't use wide character functionality don't have to pay for it. And those that do get this functionality without having to do anything special - auto-link is fully implemented.
Isn't auto-link a VC++ only thing? Trying to assess the veracity of the 'don't have to do anything special' claim.
So - the degree of "modularization" cannot be determined or illustrated or measured by examining the graph above.
Disputed.
LOL - and what does that mean?
It means that it is the source of our disagreement.
Before modularized Boost
Just so I understand why you use a phrase like this, can you tell me whether we are now 'after modularized Boost'? When did that happen? What event divides before and after? Was the modularization 'event' migration to a large number of interdependent git repos? Does that statement make any sense, given the word interdependent appears in it?
, there wasn't much we could do about it. Now we're looking at using modularized Boost to permit Boost to be made a lot bigger, this in turn raises the issue of deployment subsets and and for the first time we're starting look seriously at this.
Yes.
You're also suggesting that I don't think there is a problem. That's also not true. But I don't buy the argument "something needs to be done, this is something, therefore we must do this".
That's not my argument/attitude/approach.
b) created as a separate library module
This is the proposal.
I'm still not quite getting what you mean by creating a separate module.
Do you mean creating a separate module at the git level?
Yes. Locally I've moved the deleted files below into a new repo, xml_archive.git stephen@hal:~/dev/src/modular-boost/libs/serialization{(detached from 7f80632)}$ git status HEAD detached at 7f80632 Changes not staged for commit: (use "git add/rm <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: build/Jamfile.v2 deleted: include/boost/archive/basic_xml_archive.hpp deleted: include/boost/archive/basic_xml_iarchive.hpp deleted: include/boost/archive/basic_xml_oarchive.hpp deleted: include/boost/archive/impl/basic_xml_grammar.hpp deleted: include/boost/archive/impl/basic_xml_iarchive.ipp deleted: include/boost/archive/impl/basic_xml_oarchive.ipp deleted: include/boost/archive/impl/xml_iarchive_impl.ipp deleted: include/boost/archive/impl/xml_oarchive_impl.ipp deleted: include/boost/archive/impl/xml_wiarchive_impl.ipp deleted: include/boost/archive/impl/xml_woarchive_impl.ipp deleted: include/boost/archive/iterators/xml_escape.hpp deleted: include/boost/archive/iterators/xml_unescape.hpp deleted: include/boost/archive/iterators/xml_unescape_exception.hpp deleted: include/boost/archive/polymorphic_xml_iarchive.hpp deleted: include/boost/archive/polymorphic_xml_oarchive.hpp deleted: include/boost/archive/polymorphic_xml_wiarchive.hpp deleted: include/boost/archive/polymorphic_xml_woarchive.hpp deleted: include/boost/archive/xml_archive_exception.hpp deleted: include/boost/archive/xml_iarchive.hpp deleted: include/boost/archive/xml_oarchive.hpp deleted: include/boost/archive/xml_wiarchive.hpp deleted: include/boost/archive/xml_woarchive.hpp deleted: include/boost/serialization/array.hpp deleted: include/boost/serialization/variant.hpp deleted: src/basic_xml_archive.cpp deleted: src/basic_xml_grammar.ipp deleted: src/xml_archive_exception.cpp deleted: src/xml_grammar.cpp deleted: src/xml_iarchive.cpp deleted: src/xml_oarchive.cpp deleted: src/xml_wgrammar.cpp deleted: src/xml_wiarchive.cpp deleted: src/xml_woarchive.cpp modified: test/Jamfile.v2 deleted: test/polymorphic_xml_archive.hpp deleted: test/polymorphic_xml_warchive.hpp deleted: test/test_mult_archive_types.cpp deleted: test/xml_archive.hpp deleted: test/xml_warchive.hpp modified: util/test.jam Thanks, Steve.
Stephen Kelly-2 wrote
Robert Ramey wrote:
Since most of the problem is xml_archive->spirit - we can "fix" this by moving the xml_archive to ?. This will "solve" the problem above. Of course this comes a the expensive of everyone who wants to ship serialization with support for all of the archives classes in the package. They will now have to link with some other module other than serialization which is pretty non-obvious.
So the net improvement in utility of boost libraries is not likely to be positive.
...
The "correct" solution to the above is for date-time to build two modules: date-time and date-time-serialization.
Is this "at the expense of everyone who wants to ship datetime with support for serialization in the package"? Is that 'non-obvious' too? Is this a net- positive?
I think its a much smaller number of people. anyone who explicitly includes date-time/serialization.hpp will know that he has to ship the data-time-serialization.dll. others shipping the serialization dlls now have to decide whether to include wide-characters or not. Now they would have to start thinking about whether to include support for xml_archive or not. This suggests that the serialization library should create a set of dlls with names like serialization-core.dll serialization-text_archive.dll serialization-xml_archive.dll ... To avoid xml_archive being a special case which is quite confusing. Note that none of the above requires separate git repositories or include hierarchies. Moving the files around doesn't remove dependencies, it diminishes the presence of false dependencies in the dependency tracking tool.
(putting aside that it's not clear what you mean by package, who's shipping what and to whom etc)
LOL - I'm aware of that. Sorry. This is why I'm reluctant to spend too much on speculative use cases - though it seems unavoidable. <qoute>
Now the original app user above has only what he wants and is not dependent upon boost serialization. Yet other users of the serialization library have what they want - serialization all in one place.
You just created a separate thing to (presumably separately) download. What's the 'all in one place' part?
Hmmm - in the current scheme - we're creating multiple libraries or dlls within one git repo. That's what we do now. In your scheme - we also create more libraries/dlls but the the source is organized in a separate git repo. That's the difference.
So we have the case where applications which don't use wide character functionality don't have to pay for it. And those that do get this functionality without having to do anything special - auto-link is fully implemented.
Isn't auto-link a VC++ only thing? Trying to assess the veracity of the 'don't have to do anything special' claim.
Its a VC thing and also a Borland thing - though I guess that's not relevant any more. I guess its not a gcc or clang thing so I see now that this is a red herring. There is a related issue "code visibility" gcc creates entries in the linking table for all entry points not just the external ones. The macros autolinking also include the information so that VC applications can export a more limited symbol table which has some benefits. In VC this auto linking makes sure we pick the DLL or library with the expected ABI since VC supports a wide variety. So the number of DLLS or Libraries is some multiple to what we've been talking about. Gcc doesn't have this variety and I think your right so they don't have auto-linking.
Before modularized Boost
Just so I understand why you use a phrase like this, can you tell me whether we are now 'after modularized Boost'? When did that happen? What event divides before and after? Was the modularization 'event' migration to a large number of interdependent git repos? Does that statement make any sense, given the word interdependent appears in it?
I mean before we migrated to git implementation with submodules for each library.
b) created as a separate library module
This is the proposal.
I'm still not quite getting what you mean by creating a separate module.
Do you mean creating a separate module at the git level?
Yes. Locally I've moved the deleted files below into a new repo, xml_archive.git
stephen@hal:~/dev/src/modular-boost/libs/serialization{(detached from 7f80632)}$ git status HEAD detached at 7f80632 Changes not staged for commit: (use "git add/rm <file> ..." to update what will be committed) (use "git checkout -- <file> ..." to discard changes in working directory)
modified: build/Jamfile.v2 deleted: include/boost/archive/basic_xml_archive.hpp deleted: include/boost/archive/basic_xml_iarchive.hpp deleted: include/boost/archive/basic_xml_oarchive.hpp deleted: include/boost/archive/impl/basic_xml_grammar.hpp deleted: include/boost/archive/impl/basic_xml_iarchive.ipp deleted: include/boost/archive/impl/basic_xml_oarchive.ipp deleted: include/boost/archive/impl/xml_iarchive_impl.ipp deleted: include/boost/archive/impl/xml_oarchive_impl.ipp deleted: include/boost/archive/impl/xml_wiarchive_impl.ipp deleted: include/boost/archive/impl/xml_woarchive_impl.ipp deleted: include/boost/archive/iterators/xml_escape.hpp deleted: include/boost/archive/iterators/xml_unescape.hpp deleted: include/boost/archive/iterators/xml_unescape_exception.hpp deleted: include/boost/archive/polymorphic_xml_iarchive.hpp deleted: include/boost/archive/polymorphic_xml_oarchive.hpp deleted: include/boost/archive/polymorphic_xml_wiarchive.hpp deleted: include/boost/archive/polymorphic_xml_woarchive.hpp deleted: include/boost/archive/xml_archive_exception.hpp deleted: include/boost/archive/xml_iarchive.hpp deleted: include/boost/archive/xml_oarchive.hpp deleted: include/boost/archive/xml_wiarchive.hpp deleted: include/boost/archive/xml_woarchive.hpp deleted: include/boost/serialization/array.hpp deleted: include/boost/serialization/variant.hpp deleted: src/basic_xml_archive.cpp deleted: src/basic_xml_grammar.ipp deleted: src/xml_archive_exception.cpp deleted: src/xml_grammar.cpp deleted: src/xml_iarchive.cpp deleted: src/xml_oarchive.cpp deleted: src/xml_wgrammar.cpp deleted: src/xml_wiarchive.cpp deleted: src/xml_woarchive.cpp modified: test/Jamfile.v2 deleted: test/polymorphic_xml_archive.hpp deleted: test/polymorphic_xml_warchive.hpp deleted: test/test_mult_archive_types.cpp deleted: test/xml_archive.hpp deleted: test/xml_warchive.hpp modified: util/test.jam
Hmmm - this looks like its on your local machine. Do you plan to commit this? Do you have the authority to do so? Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-Extract-xml-archive-from-s... Sent from the Boost - Dev mailing list archive at Nabble.com.
Robert Ramey wrote:
Stephen Kelly-2 wrote
Is this "at the expense of everyone who wants to ship datetime with support for serialization in the package"? Is that 'non-obvious' too? Is this a net- positive?
I think its a much smaller number of people.
We can both think lots of things. I think there are more similarities in consequences of your plan the more you describe it.
anyone who explicitly includes date-time/serialization.hpp will know that he has to ship the data-time-serialization.dll.
Amazing. What will someone who explicitly includes archive/basic_xml_archive.hpp possibly think he has to ship?
others shipping the serialization dlls now have to decide whether to include wide-characters or not. Now they would have to start thinking about whether to include support for xml_archive or not.
... or date-time-serialization or not.
This suggests that the serialization library should create a set of dlls with names like serialization-core.dll serialization-text_archive.dll serialization-xml_archive.dll ...
To avoid xml_archive being a special case which is quite confusing.
I don't think it's confusing. It's just a change.
Note that none of the above requires separate git repositories or include hierarchies. Moving the files around doesn't remove dependencies, it diminishes the presence of false dependencies in the dependency tracking tool.
The dependencies are not false if they refer to dependencies between git repos, or between modularized release tarballs (if that's a goal).
Now the original app user above has only what he wants and is not dependent upon boost serialization. Yet other users of the serialization library have what they want - serialization all in one place.
You just created a separate thing to (presumably separately) download. What's the 'all in one place' part?
Hmmm - in the current scheme - we're creating multiple libraries or dlls within one git repo. That's what we do now.
In your scheme - we also create more libraries/dlls but the the source is organized in a separate git repo. That's the difference.
You write as-if you think git is the only/primary way to use boost. Are releases irrelevant? And release tarballs? Would your date-time- serialization library be in a separate release tarball?
Isn't auto-link a VC++ only thing? Trying to assess the veracity of the 'don't have to do anything special' claim.
Its a VC thing and also a Borland thing - though I guess that's not relevant any more. I guess its not a gcc or clang thing so I see now that this is a red herring.
Yes.
Do you mean creating a separate module at the git level?
Yes. Locally I've moved the deleted files below into a new repo, xml_archive.git
Hmmm - this looks like its on your local machine.
Yes.
Do you plan to commit this?
I'm not that rude! :) I wrote a script in July to automate this split (to be immune to merge conflicts etc). I started this thread to get support for going ahead with doing the split in develop.
Do you have the authority to do so?
No. Thanks, Steve.
On Wednesday 17 September 2014 14:24:28 Robert Ramey wrote:
Stephen Kelly-2 wrote
Robert Ramey wrote:
The "correct" solution to the above is for date-time to build two modules: date-time and date-time-serialization.
Is this "at the expense of everyone who wants to ship datetime with support for serialization in the package"? Is that 'non-obvious' too? Is this a net- positive?
I think its a much smaller number of people.
anyone who explicitly includes date-time/serialization.hpp will know that he has to ship the data-time-serialization.dll.
Just to be closer to reality, serialization support in DateTime is header- only.
On Wed, Sep 17, 2014 at 8:14 PM, Andrey Semashev
On Wednesday 17 September 2014 14:24:28 Robert Ramey wrote:
Stephen Kelly-2 wrote
Robert Ramey wrote:
The "correct" solution to the above is for date-time to build two modules: date-time and date-time-serialization.
Is this "at the expense of everyone who wants to ship datetime with support for serialization in the package"? Is that 'non-obvious' too? Is this a net- positive?
I think its a much smaller number of people.
anyone who explicitly includes date-time/serialization.hpp will know that he has to ship the data-time-serialization.dll.
Just to be closer to reality, serialization support in DateTime is header- only.
In principle there should be no reason for e.g. time_serialize.hpp to include Boost Serialization headers in order to define the "load" or"save" function templates needed for serialization. For example, to define: template<class Archive> void save( Archive & ar, const posix_time::time_duration& td, unsigned int /*version*/) { .... } one doesn't need to include any serialization headers. Users of time_serialize.hpp who need to save posix_time::time_duration objects should include the necessary serialization headers themselves. -- Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode
Emil Dotchevski-3 wrote
On Wed, Sep 17, 2014 at 8:14 PM, Andrey Semashev <
andrey.semashev@
> wrote:
Stephen Kelly-2 wrote
Robert Ramey wrote:
The "correct" solution to the above is for date-time to build two modules: date-time and date-time-serialization.
Is this "at the expense of everyone who wants to ship datetime with support for serialization in the package"? Is that 'non-obvious' too? Is this a net- positive?
I think its a much smaller number of people.
anyone who explicitly includes date-time/serialization.hpp will know
On Wednesday 17 September 2014 14:24:28 Robert Ramey wrote: that
he has to ship the data-time-serialization.dll.
Just to be closer to reality, serialization support in DateTime is header- only.
In principle there should be no reason for e.g. time_serialize.hpp to include Boost Serialization headers in order to define the "load" or"save" function templates needed for serialization.
For example, to define:
template <class Archive> void save( Archive & ar, const posix_time::time_duration& td, unsigned int /*version*/) { .... }
one doesn't need to include any serialization headers.
Users of time_serialize.hpp who need to save posix_time::time_duration objects should include the necessary serialization headers themselves.
My original concern was that we really need to spend some time reaching a consensus on what would like future deployment of boost to look like, what goals it should fulfill, and what policies/requirements we want to formulate to achieve this. I think this should get this done before we start we start just moving things around in order to make some dependency graph look simpler. Emils idea is an example of the kind of idea which looks very interesting to me and I think it needs exploring. It would be great to leverage on our modularization effort to create flexibility to in deployment of boost. Only if we do this will we be able to reach 500 libraries in the next 10 years. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-Extract-xml-archive-from-s... Sent from the Boost - Dev mailing list archive at Nabble.com.
On 09/18/2014 10:29 PM, Emil Dotchevski wrote:
On Wed, Sep 17, 2014 at 8:14 PM, Andrey Semashev
wrote: On Wednesday 17 September 2014 14:24:28 Robert Ramey wrote:
Stephen Kelly-2 wrote
Robert Ramey wrote:
The "correct" solution to the above is for date-time to build two modules: date-time and date-time-serialization.
Is this "at the expense of everyone who wants to ship datetime with support for serialization in the package"? Is that 'non-obvious' too? Is this a net- positive?
I think its a much smaller number of people.
anyone who explicitly includes date-time/serialization.hpp will know that he has to ship the data-time-serialization.dll.
Just to be closer to reality, serialization support in DateTime is header- only.
In principle there should be no reason for e.g. time_serialize.hpp to include Boost Serialization headers in order to define the "load" or"save" function templates needed for serialization.
For example, to define:
template<class Archive> void save( Archive & ar, const posix_time::time_duration& td, unsigned int /*version*/) { .... }
one doesn't need to include any serialization headers.
That's true as far as general C++ goes, but you'd very likely need to use BOOST_SERIALIZATION_NVP - and for XML archives it's required to use it - and it's a macro, so two phase lookup won't help? - Volodya
On Thu, Sep 18, 2014 at 9:53 PM, Vladimir Prus
On 09/18/2014 10:29 PM, Emil Dotchevski wrote:
In principle there should be no reason for e.g. time_serialize.hpp to include Boost Serialization headers in order to define the "load" or"save" function templates needed for serialization.
For example, to define:
template<class Archive> void save( Archive & ar, const posix_time::time_duration& td, unsigned int /*version*/) { .... }
one doesn't need to include any serialization headers.
That's true as far as general C++ goes, but you'd very likely need to use BOOST_SERIALIZATION_NVP - and for XML archives it's required to use it - and it's a macro, so two phase lookup won't help?
Sure, one can write a serialization library that makes this kind of definition impossible. My point is that in principle it is possible and if there is a general agreement to refactor important libraries to reduce physical coupling then it is something that can be done. This is even more important for a library like Serialization because any data type may need the ability to be serialized yet many (most?) programs that use such types will not serialize them. -- Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode
On 09/19/2014 09:20 AM, Emil Dotchevski wrote:
On Thu, Sep 18, 2014 at 9:53 PM, Vladimir Prus
wrote: On 09/18/2014 10:29 PM, Emil Dotchevski wrote:
In principle there should be no reason for e.g. time_serialize.hpp to include Boost Serialization headers in order to define the "load" or"save" function templates needed for serialization.
For example, to define:
template<class Archive> void save( Archive & ar, const posix_time::time_duration& td, unsigned int /*version*/) { .... }
one doesn't need to include any serialization headers.
That's true as far as general C++ goes, but you'd very likely need to use BOOST_SERIALIZATION_NVP - and for XML archives it's required to use it - and it's a macro, so two phase lookup won't help?
Sure, one can write a serialization library that makes this kind of definition impossible. My point is that in principle it is possible and if there is a general agreement to refactor important libraries to reduce physical coupling then it is something that can be done.
This is even more important for a library like Serialization because any data type may need the ability to be serialized yet many (most?) programs that use such types will not serialize them.
Yes, it's possible in principle, and even with current serialization code, but it's not clear that replacing BOOST_SERIALIZATION_NVP(data_member) with boost::serialization::make_nvp("data_member", data_member) is net win. - Volodya
On Mon, Sep 22, 2014 at 12:05 PM, Vladimir Prus
Yes, it's possible in principle, and even with current serialization code, but it's not clear that replacing
BOOST_SERIALIZATION_NVP(data_member)
with
boost::serialization::make_nvp("data_member", data_member)
is net win.
For users of library A who do not need its functionality that requires a library B, allowing them to compile A without B is a net win. Especially if A is a major coupling point, B is relatively large, and the need to use A is independent of the need to use B. As well, in principle the call to serialize a data member doesn't need to be qualified: read(ar,foo_,"foo") seems quite straight-forward. Besides, serialization usually requires access to private data members, so breaking this dependency by refactoring read/write code into a separate header is not possible in general. -- Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode
participants (8)
-
Andrey Semashev
-
Emil Dotchevski
-
John Maddock
-
Peter Dimov
-
Robert Ramey
-
Stephen Kelly
-
Vladimir Prus
-
Vladimir Prus