[modularization] spirtit -> serialization
Hi,
comments about this dependency:
|
Vicente J. Botet Escriba wrote:
Hi,
comments about this dependency:
|
| * from |
| It seems that this file is not used
|boost/spirit/home/support/detail/lexer/serialise.hpp
Could this file be removed or moves to examples?
I think if the intent is to remove circular dependencies, you should see if you can split the archive parts of the serialization out and make only that part depend on spirit. Thanks, Steve.
Le 13/06/14 18:41, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote:
Hi,
comments about this dependency:
|
| * from |
| It seems that this file is not used
|boost/spirit/home/support/detail/lexer/serialise.hpp
Could this file be removed or moves to examples? I think if the intent is to remove circular dependencies, you should see if you can split the archive parts of the serialization out and make only that part depend on spirit.
This corresponds to the opposite dependency. My first goal is to break the cycles. Vicente
Vicente J. Botet Escriba wrote:
Le 13/06/14 18:41, Stephen Kelly a écrit :
I think if the intent is to remove circular dependencies, you should see if you can split the archive parts of the serialization out and make only that part depend on spirit.
This corresponds to the opposite dependency. My first goal is to break the cycles.
I am aware of that. That is why I wrote:
I think if the intent is to remove circular dependencies [...]
Here is a graph which assumes the range->algorithm edge removal and treats math<->lexical_cast as an incidental module: http://www.steveire.com/boost/2014_jun_before-spirit-serialization.png And after removing the serialization->spirit edge: http://www.steveire.com/boost/2014_jun_after-spirit-serialization.png Thanks, Steve.
Le 14/06/14 08:56, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote:
Le 13/06/14 18:41, Stephen Kelly a écrit :
I think if the intent is to remove circular dependencies, you should see if you can split the archive parts of the serialization out and make only that part depend on spirit.
This corresponds to the opposite dependency. My first goal is to break the cycles. I am aware of that.
That is why I wrote:
I think if the intent is to remove circular dependencies [...]
Here is a graph which assumes the range->algorithm edge removal and treats math<->lexical_cast as an incidental module:
http://www.steveire.com/boost/2014_jun_before-spirit-serialization.png
And after removing the serialization->spirit edge:
http://www.steveire.com/boost/2014_jun_after-spirit-serialization.png
The local_function <-> scoped_exit cycle will be taken in account by Lorenzo. I don't think the serialization -> spirit dependency must be removed forcedly. As I said in another post the opposite seems unuseful as the file is not used. We can manage with the graph cycle by extracting the following submodules bimap.property_map -> bimap property_map property_map.parallel ->property_map mpi and grouping graph and disjoint_set. In the same way extracting the serialization part from date_time to a submodule helps to break the date_time. I would say that we should do the the same for each module that depends on serialization, create a submodule module.serialization -> module serialization The dependencies to tr1 should be removed and replaced by the underlying Boost libraries. Another dependency that can be broken is chrono -> interprocess buy adding a chrono.io submodule. I'll create the chrono.io submodule myself. Best, Vicente
Vicente J. Botet Escriba wrote:
Here is a graph which assumes the range->algorithm edge removal and treats math<->lexical_cast as an incidental module:
http://www.steveire.com/boost/2014_jun_before-spirit-serialization.png
And after removing the serialization->spirit edge:
http://www.steveire.com/boost/2014_jun_after-spirit-serialization.png
I don't think the serialization -> spirit dependency must be removed forcedly. As I said in another post the opposite seems unuseful as the file is not used.
Let's try to keep the discussion to serialization and spirit. The other things you note are off topic in this thread. Try to look only at the interaction of serialization and spirit. Look at http://www.steveire.com/boost/2014_jun_before-spirit-serialization.png If you remove spirit->serialization, the cycle spirit -> pool -> thread -> date_time -> serialization [ -> spirit ] still exists. First question: Do you see that? Thanks, Steve.
Le 14/06/14 18:34, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote:
Here is a graph which assumes the range->algorithm edge removal and treats math<->lexical_cast as an incidental module:
http://www.steveire.com/boost/2014_jun_before-spirit-serialization.png
And after removing the serialization->spirit edge:
http://www.steveire.com/boost/2014_jun_after-spirit-serialization.png
I don't think the serialization -> spirit dependency must be removed forcedly. As I said in another post the opposite seems unuseful as the file is not used. Let's try to keep the discussion to serialization and spirit. The other things you note are off topic in this thread. Try to look only at the interaction of serialization and spirit. You could as well start a new thread for this dependency ;-) Look at
http://www.steveire.com/boost/2014_jun_before-spirit-serialization.png
If you remove spirit->serialization, the cycle
spirit -> pool -> thread -> date_time -> serialization [ -> spirit ]
still exists.
First question:
Do you see that?
Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency. date_time.serialization -> date_time serialization Best, Vicente
On Sunday 15 June 2014 12:13:08 Vicente J. Botet Escriba wrote:
Le 14/06/14 18:34, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote:
Here is a graph which assumes the range->algorithm edge removal and treats math<->lexical_cast as an incidental module:
http://www.steveire.com/boost/2014_jun_before-spirit-serialization.png
And after removing the serialization->spirit edge:
http://www.steveire.com/boost/2014_jun_after-spirit-serialization.png
I don't think the serialization -> spirit dependency must be removed forcedly. As I said in another post the opposite seems unuseful as the file is not used.
Let's try to keep the discussion to serialization and spirit. The other things you note are off topic in this thread. Try to look only at the interaction of serialization and spirit.
You could as well start a new thread for this dependency ;-)
Look at
http://www.steveire.com/boost/2014_jun_before-spirit-serialization.png
If you remove spirit->serialization, the cycle
spirit -> pool -> thread -> date_time -> serialization [ -> spirit ]
still exists.
First question: Do you see that?
Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
date_time.serialization -> date_time serialization
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
date_time.serialization -> date_time serialization
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
+1, it seems frankly bonkers to extract single headers to new modules just because it makes a dependency graph look better. IMO we need a better way of looking at dependencies, perhaps by marking up glue headers as optional. John.
John Maddock wrote:
IMO we need a better way of looking at dependencies, perhaps by marking up glue headers as optional.
This approach causes difficulties down the road. module X X.hpp module Y Y1.hpp Y2.hpp (optional) includes X.hpp module Z Z.hpp includes Y2.hpp Does Z depend, indirectly, on X? If your answer is yes, and it must be, remember that Y does not depend on X, so the secondary dependencies are no longer the transitive closure of the primary dependencies. The tool can be made to figure these things out, but to do so, it will need to create virtual submodules, one per each optional header. Either that, or scrap the whole module-level dependency approach and start tracking individual headers.
On Sunday 15 June 2014 15:13:55 Peter Dimov wrote:
John Maddock wrote:
IMO we need a better way of looking at dependencies, perhaps by marking up glue headers as optional.
This approach causes difficulties down the road.
module X X.hpp
module Y Y1.hpp Y2.hpp (optional) includes X.hpp
module Z Z.hpp includes Y2.hpp
Does Z depend, indirectly, on X?
Yes, how could it be otherwise.
If your answer is yes, and it must be, remember that Y does not depend on X, so the secondary dependencies are no longer the transitive closure of the primary dependencies.
It's not the question of Y depending on X anymore. If Y is header-only, it's a question of Y2.hpp (from Y) depending on X.hpp (from X). If Y is not header-only, the dependencies should include all dependencies of the compiled part. Arguably, Y2.hpp may not require the compiled part, but that case could also be handled by a metadata flag.
The tool can be made to figure these things out, but to do so, it will need to create virtual submodules, one per each optional header.
Either that, or scrap the whole module-level dependency approach and start tracking individual headers.
I'm for tracking headers. Submodules just impose too much difficulties to use them on header level. Maintaining libraries should be fun.
Le 15/06/14 12:48, Andrey Semashev a écrit :
On Sunday 15 June 2014 12:13:08 Vicente J. Botet Escriba wrote:
Le 14/06/14 18:34, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote:
Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
date_time.serialization -> date_time serialization The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Why? I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module. What others think? Vicente
On Sunday 15 June 2014 13:26:52 Vicente J. Botet Escriba wrote:
Le 15/06/14 12:48, Andrey Semashev a écrit :
On Sunday 15 June 2014 12:13:08 Vicente J. Botet Escriba wrote:
Le 14/06/14 18:34, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote: Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
date_time.serialization -> date_time serialization
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Why?
Because it creates lots of tiny submodules, which creates maintainability and usability problems.
I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module.
You are right to desire not depending on Serialization if you don't use it. But this should not be achieved with submodules, IMHO.
Le 15/06/14 13:40, Andrey Semashev a écrit :
On Sunday 15 June 2014 13:26:52 Vicente J. Botet Escriba wrote:
Le 15/06/14 12:48, Andrey Semashev a écrit :
On Sunday 15 June 2014 12:13:08 Vicente J. Botet Escriba wrote:
Le 14/06/14 18:34, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote: Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
date_time.serialization -> date_time serialization The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies. Why? Because it creates lots of tiny submodules, which creates maintainability and usability problems.
Why?
I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module. You are right to desire not depending on Serialization if you don't use it. But this should not be achieved with submodules, IMHO.
I'm open to discuss any alternative solving the issue. Vicente
On Sunday 15 June 2014 13:49:20 Vicente J. Botet Escriba wrote:
Le 15/06/14 13:40, Andrey Semashev a écrit :
On Sunday 15 June 2014 13:26:52 Vicente J. Botet Escriba wrote:
Le 15/06/14 12:48, Andrey Semashev a écrit :
On Sunday 15 June 2014 12:13:08 Vicente J. Botet Escriba wrote:
Le 14/06/14 18:34, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote: Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
date_time.serialization -> date_time serialization
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Why?
Because it creates lots of tiny submodules, which creates maintainability and usability problems.
Why?
What you mean why? Submodules are a constant pain to deal with. They don't allow the complete history of the library, they don't allow synchronized operations on them (e.g. do changes to multiple submodules in a single commit/push), adding or removing them requires privileges. In Log I have at least 6 glue headers, I don't want to deal with 7 different repos if they are extracted.
I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module.
You are right to desire not depending on Serialization if you don't use it. But this should not be achieved with submodules, IMHO.
I'm open to discuss any alternative solving the issue.
I think there was a proposal not long ago to track dependencies based on headers, pretty much like boostdep does. Then we only need to mark the optional headers in some metadata files and there you go.
Andrey Semashev wrote:
What you mean why? Submodules are a constant pain to deal with. They don't allow the complete history of the library, they don't allow synchronized operations on them (e.g. do changes to multiple submodules in a single commit/push), adding or removing them requires privileges. In Log I have at least 6 glue headers, I don't want to deal with 7 different repos if they are extracted.
I understood Vicente to mean a sub-sub-module. These don't have their own repos. They are a subdirectory in an existing repo, and have the directory structure of a module. date_time/ include/ src/ test/ serialization/ include/ src/ test/ boostdep will show this as date_time~serialization (like numeric~conversion).
I think there was a proposal not long ago to track dependencies based on headers, pretty much like boostdep does. Then we only need to mark the optional headers in some metadata files and there you go.
Tracking headers instead of modules has its own disadvantages. The module levels report, for example, would no longer make sense, as parts of the same module would need to be at level 0 and other parts at level 11. In addition, if you include the right header of module X you'd be fine, and if you include the wrong header, you'll bring in the world. In another addition, if the right header of X is changed to include a wrong header from X, you'll suddenly start depending on the world. The current report is more stable. You can change includes within the same module without affecting it, and you can include another header from the same module without affecting it. One can argue that it's not "correct", but it's more useful. My module is on level 7, can I use this? It's on level 4, so it'd probably not be much of a problem. That other thing? It's on 9, so perhaps not.
On Sunday 15 June 2014 15:44:49 Peter Dimov wrote:
Andrey Semashev wrote:
What you mean why? Submodules are a constant pain to deal with. They don't allow the complete history of the library, they don't allow synchronized operations on them (e.g. do changes to multiple submodules in a single commit/push), adding or removing them requires privileges. In Log I have at least 6 glue headers, I don't want to deal with 7 different repos if they are extracted.
I understood Vicente to mean a sub-sub-module.
When I read this:
Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
I got the idea that Vicente meant the full fledged git submodule, with its own repository. If we're talking about just structural changes within the same submodule then the perspective changes.
These don't have their own repos. They are a subdirectory in an existing repo, and have the directory structure of a module.
date_time/ include/ src/ test/ serialization/ include/ src/ test/
boostdep will show this as date_time~serialization (like numeric~conversion).
Yes, this approach looks more interesting.
Tracking headers instead of modules has its own disadvantages. The module levels report, for example, would no longer make sense, as parts of the same module would need to be at level 0 and other parts at level 11.
Yes, although I don't really understand what a level means. It surely doesn't correspond to the number of dependencies, although there is some correlation. When deciding whether to use a library in my library I will be looking at its dependencies, not some level index.
In addition, if you include the right header of module X you'd be fine, and if you include the wrong header, you'll bring in the world. In another addition, if the right header of X is changed to include a wrong header from X, you'll suddenly start depending on the world.
True. Although you're not saved from changes when you build dependency graph based on submodules. I'd say, you're more vulnerable to dependency creep in case of submodules than headers. I guess, it all comes down to how we're going to use this dependency information. If our packaging/deploying tool is based on submodules then we should follow that line. In this case I would prefer that your above sub-sub- module approach is taken as a baseline; breeding repositories is not the way to go, IMHO. If the tool is header based then the submodules are just a convention to separate libraries, grant privileges and that's about it.
Andrey Semashev wrote:
Yes, although I don't really understand what a level means. It surely doesn't correspond to the number of dependencies, although there is some correlation. When deciding whether to use a library in my library I will be looking at its dependencies, not some level index.
If we assume that the purpose of the dependency report is to be informative (it's a report, after all), the problem is how to take the raw dependency information (which is basically what header includes what) and to distill it down to a form that will be most useful to humans. The module level is that information compressed into a single number. Modules on level N don't include anything from level N+1 and above. Obviously, a single (small) number can't hold enough information to describe the actual dependencies in full. It's merely a good proxy. It's something you quickly check to see if there's something amiss. Seeing boost::array at level 8 is enough to make one think. Of course, as with all proxies, if you make the level your sole focus the actual dependencies may suffer as a result, but suppose we are smart enough to not do that. :-) Doing secondary dependencies by module is also a way to compress the full dependency information into something more manageable, more understandable and more actionable, if you will. It answers the question "why does my module depend on Tokenizer" by giving you a module chain, instead of 172 (say) header chains.
Le 15/06/14 15:07, Andrey Semashev a écrit :
On Sunday 15 June 2014 15:44:49 Peter Dimov wrote:
Andrey Semashev wrote:
What you mean why? Submodules are a constant pain to deal with. They don't allow the complete history of the library, they don't allow synchronized operations on them (e.g. do changes to multiple submodules in a single commit/push), adding or removing them requires privileges. In Log I have at least 6 glue headers, I don't want to deal with 7 different repos if they are extracted. I understood Vicente to mean a sub-sub-module. When I read this:
Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency. I got the idea that Vicente meant the full fledged git submodule, with its own repository. If we're talking about just structural changes within the same submodule then the perspective changes.
I requested on this list what was the criteria for associating a file to a module(sub-module) and Peter gave me the criteria. I'm just using it.
These don't have their own repos. They are a subdirectory in an existing repo, and have the directory structure of a module.
date_time/ include/ src/ test/ serialization/ include/ src/ test/
boostdep will show this as date_time~serialization (like numeric~conversion). Yes, this approach looks more interesting.
Sorry if I was not clear enough. I have already extracted Stopwatches from Chrono in this way.
Tracking headers instead of modules has its own disadvantages. The module levels report, for example, would no longer make sense, as parts of the same module would need to be at level 0 and other parts at level 11. Yes, although I don't really understand what a level means. It surely doesn't correspond to the number of dependencies, although there is some correlation. When deciding whether to use a library in my library I will be looking at its dependencies, not some level index.
See my comment on level on the other post. Vicente
On Sunday 15 June 2014 15:43:20 Vicente J. Botet Escriba wrote:
Le 15/06/14 15:07, Andrey Semashev a écrit :
On Sunday 15 June 2014 15:44:49 Peter Dimov wrote:
Andrey Semashev wrote:
What you mean why? Submodules are a constant pain to deal with. They don't allow the complete history of the library, they don't allow synchronized operations on them (e.g. do changes to multiple submodules in a single commit/push), adding or removing them requires privileges. In Log I have at least 6 glue headers, I don't want to deal with 7 different repos if they are extracted.
I understood Vicente to mean a sub-sub-module.
When I read this:
Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
I got the idea that Vicente meant the full fledged git submodule, with its own repository. If we're talking about just structural changes within the same submodule then the perspective changes.
I requested on this list what was the criteria for associating a file to a module(sub-module) and Peter gave me the criteria. I'm just using it.
Ok, it was a misunderstanding on my side then, sorry.
Le 15/06/14 14:44, Peter Dimov a écrit :
Andrey Semashev wrote:
What you mean why? Submodules are a constant pain to deal with. They don't allow the complete history of the library, they don't allow synchronized operations on them (e.g. do changes to multiple submodules in a single commit/push), adding or removing them requires privileges. In Log I have at least 6 glue headers, I don't want to deal with 7 different repos if they are extracted.
I understood Vicente to mean a sub-sub-module. These don't have their own repos. They are a subdirectory in an existing repo, and have the directory structure of a module.
date_time/ include/ src/ test/ serialization/ include/ src/ test/
boostdep will show this as date_time~serialization (like numeric~conversion).
Yes this is exactly what I'm proposing.
I think there was a proposal not long ago to track dependencies based on headers, pretty much like boostdep does. Then we only need to mark the optional headers in some metadata files and there you go.
Optional headers should be associated to a sub-module as this reduce the dependencies. This is not hard to do. A different case is a file D.hpp that includes conditionally (preprocessor) another file B.hpp. IMO, this makes D depend on B until we don't include a context for the dependency relationship. This context could include a platform, a compiler, a standard library and also the defines that the user includes on its user.hpp file.
Tracking headers instead of modules has its own disadvantages. The module levels report, for example, would no longer make sense, as parts of the same module would need to be at level 0 and other parts at level 11. In addition, if you include the right header of module X you'd be fine, and if you include the wrong header, you'll bring in the world. In another addition, if the right header of X is changed to include a wrong header from X, you'll suddenly start depending on the world.
The current report is more stable. You can change includes within the same module without affecting it, and you can include another header from the same module without affecting it. One can argue that it's not "correct", but it's more useful. My module is on level 7, can I use this? It's on level 4, so it'd probably not be much of a problem. That other thing? It's on 9, so perhaps not.
I agree completely here with Peter. Tracking this level information is very useful. I would add that new dependencies on library at level 7 should be possible but that would need reflexion and IMO should be posted on this list. Once we reach to have a strict ordering we should preserve it. This doesn't mean that a library can not change of levels, but the changes must not add cycles. Best, Vicente
On 06/15/2014 02:30 PM, Andrey Semashev wrote:
On Sunday 15 June 2014 13:49:20 Vicente J. Botet Escriba wrote:
Le 15/06/14 13:40, Andrey Semashev a écrit :
On Sunday 15 June 2014 13:26:52 Vicente J. Botet Escriba wrote:
Le 15/06/14 12:48, Andrey Semashev a écrit :
On Sunday 15 June 2014 12:13:08 Vicente J. Botet Escriba wrote:
Le 14/06/14 18:34, Stephen Kelly a écrit : > Vicente J. Botet Escriba wrote: Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
date_time.serialization -> date_time serialization
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Why?
Because it creates lots of tiny submodules, which creates maintainability and usability problems.
Why?
What you mean why? Submodules are a constant pain to deal with. They don't allow the complete history of the library, they don't allow synchronized operations on them (e.g. do changes to multiple submodules in a single commit/push), adding or removing them requires privileges. In Log I have at least 6 glue headers, I don't want to deal with 7 different repos if they are extracted.
I is now clear to me that you are thinking about git submodules, not the mechanism supported by tools/boostdep which simply look for a sublibs file, and if it exsist, it will look in subdirectories for */include/boost and if found treat it as a sub-module. Boost.build already support this and it is in use in: $ find libs -name 'sublibs' libs/geometry/sublibs libs/spirit/sublibs libs/algorithm/sublibs libs/functional/sublibs libs/utility/sublibs libs/numeric/sublibs
I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module.
You are right to desire not depending on Serialization if you don't use it. But this should not be achieved with submodules, IMHO.
I'm open to discuss any alternative solving the issue.
I think there was a proposal not long ago to track dependencies based on headers, pretty much like boostdep does. Then we only need to mark the optional headers in some metadata files and there you go.
+1 That could work, but some concrete solution is needed. However, if you reconsider how much that differ from the solution above, maybe that is good enough? -- Bjørn
On 06/15/2014 01:40 PM, Andrey Semashev wrote:
On Sunday 15 June 2014 13:26:52 Vicente J. Botet Escriba wrote:
Le 15/06/14 12:48, Andrey Semashev a écrit :
On Sunday 15 June 2014 12:13:08 Vicente J. Botet Escriba wrote:
Le 14/06/14 18:34, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote: Yes, and I propose to create a date_time.serialization submodule that breaks the date_time -> serialization dependency.
date_time.serialization -> date_time serialization
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Why?
Because it creates lots of tiny submodules, which creates maintainability and usability problems.
Optional files with additional dependencies need to be considered as independent nodes in the dependency graph, and thus allow us to understand how to remove undesired dependencies and potential for cycles. If that is agreed, then the rest is a question of how to achieve all that with tools. For the proposed date_time.serialization -> date_time serialization building, deploying and using data_time no longer require serialization, unless you actually are using date_time.serialization in your source code at least. The fact that the source comes along in the date_time git repository is not a concern as long as it is reasonably inexpensive.
I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module.
You are right to desire not depending on Serialization if you don't use it. But this should not be achieved with submodules, IMHO.
Having these optional files in separate sub-module directories within a library modules git repository, is just part of one potential solution. There may be other ways, do you have anything else in mind that scales better? Using the term module, or sub-module, about nodes in the dependency graph make sense as what I think we attempt is modularization. However the term sub-module is somewhat misleading as they conceptually are just as much independent modules as the top level git repository (Library) module. But I guess a sort of maintenance ownership is reflected by a module being a sub-module. Sub-modules such as optional headers could also be, "as-is", files more embedded into the "parent" module file structure . However that may make it harder for tools to deal with the dependencies. -- Bjørn
Vicente J. Botet Escriba wrote: Le 15/06/14 12:48, Andrey Semashev a écrit :
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Why? I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module.
What others think?
I think that Vicente is right in this case. Moving serialization support to a submodule of DateTime will make the dependency report nicer _and_ it will actually be correct from the perspective of an automatic downloader. If you use DateTime, you'll get the DateTime repo, along with the serialization support, but you will not get the Serialization repo (and its dependencies) if you don't use Serialization. And this is exactly as it should be, unless I'm missing something subtle. It seems to me that this is a legitimate use of sub-sub-modules.
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov Sent: 15 June 2014 12:46 To: boost@lists.boost.org Subject: [boost] date_time -> serialization (Was: spirtit -> serialization)
Vicente J. Botet Escriba wrote: Le 15/06/14 12:48, Andrey Semashev a écrit :
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Why? I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module.
What others think?
I think that Vicente is right in this case. Moving serialization support to a submodule of DateTime will make the dependency report nicer _and_ it will actually be correct from the perspective of an automatic downloader. If you use DateTime, you'll get the DateTime repo, along with the serialization support, but you will not get
Serialization repo (and its dependencies) if you don't use Serialization. And
the this is
exactly as it should be, unless I'm missing something subtle.
It seems to me that this is a legitimate use of sub-sub-modules.
I've followed this thread with interest and general support, but there is one factor that doesn't seem to be 'factored-in' in. If someone is using Serialisation then isn't there a very high probability that they are also using DateTime? So having these in the same package doesn't really matter (except for the artificial level number)? Looking at the shrink-wrap users, I have a suspicion that this applies quite widely - many people will manage to pull in a big chunk of Boost. Rearranging the modules isn't going to change this much. Sub-sub-modules sound Very Evil to me. KISS applies? Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 01539 561830
On Sun, Jun 15, 2014 at 7:16 PM, Paul A. Bristow
If someone is using Serialisation then isn't there a very high probability that they are also using DateTime?
I'd say there isn't. I'd say even if a user uses both Serialization and DateTime this doesn't mean he uses Serialization support in DateTime.
Sub-sub-modules sound Very Evil to me.
Why?
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Andrey Semashev Sent: 15 June 2014 16:42 To: boost@lists.boost.org Subject: Re: [boost] date_time -> serialization (Was: spirtit -> serialization)
On Sun, Jun 15, 2014 at 7:16 PM, Paul A. Bristow
wrote: If someone is using Serialisation then isn't there a very high probability that they are also using DateTime?
I'd say there isn't. I'd say even if a user uses both Serialization and DateTime this doesn't mean he uses Serialization support in DateTime.
Ah - I had misunderstood DateTime did not imply Serialization support in DateTime. However, does my general point is that many users are already pulling in a large part of Boost, which may mean that your efforts may not be as useful in practice as in theory?
Sub-sub-modules sound Very Evil to me.
Why?
More complicated file layout. Will there be a sub-module GIT repo? Already getting sub-modules updated is troublesome. Bound to have side-effects? I've no idea how to avoid them, but they "Sound Evil" - if not "Smell Evil" ;-) Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 01539 561830
On Sun, Jun 15, 2014 at 9:21 PM, Paul A. Bristow
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Andrey Semashev Sent: 15 June 2014 16:42 To: boost@lists.boost.org Subject: Re: [boost] date_time -> serialization (Was: spirtit -> serialization)
On Sun, Jun 15, 2014 at 7:16 PM, Paul A. Bristow
wrote: If someone is using Serialisation then isn't there a very high probability that they are also using DateTime?
I'd say there isn't. I'd say even if a user uses both Serialization and DateTime this doesn't mean he uses Serialization support in DateTime.
Ah - I had misunderstood DateTime did not imply Serialization support in DateTime.
I'm not sure I understand you. DateTime and its support for Serialization live in the same git repository, so when you checkout DateTime you get everything. The problem is that currently there is no way to separate core DateTime functionality from Serialization support. The proposal was to move these support headers into another subdirectory inside the DateTime git submodule. By the build system, this would be equivalent to a new submodule. (BTW, we should introduce a new term for this; otherwise we will get confused all the time). As a result you will still checkout everything of DateTime when you checkout its git repo, but the dependency graph will have two nodes for it - DateTime and DateTime.Serialization. If you don't use DateTime.Serialization part, you will not have to checkout Serialization as well to get the usable Boost subset.
However, does my general point is that many users are already pulling in a large part of Boost, which may mean that your efforts may not be as useful in practice as in theory?
As long as people keep checking out the complete Boost tree and use monolithic Boost distribution, the effect of our work will be relatively small. But our goal is modular Boost, which includes modular distribution, as I understand it.
Sub-sub-modules sound Very Evil to me.
Why?
More complicated file layout.
Agree, this is inconvenient. But as long as there is no better solution, this is an acceptable evil.
Will there be a sub-module GIT repo? Already getting sub-modules updated is troublesome.
No, sub-sub-modules reside in the same git repo. Unless the maintainers want otherwise, of course, and I don't remember anyone requesting that.
I just caught up with this discussion, and based on what I read I think the future automated dependency handler should indeed operate on a per-header basis. This would mean that the configuration file of a module would list all headers in the module, and for each header in the module list all headers that it directly depends on. Of course still with support for conditional (e.g. tool-dependent) dependency annotations. *Optional* dependencies however could then be detected automatically, as I'll explain in my inline reply below. The handler would still download or not download modules entirely, but it would base its decisions on detailed per-header information. Andrey Semashev wrote:
Paul A. Bristow wrote:
Andrey Semashev wrote:
Paul A. Bristow wrote:
If someone is using Serialisation then isn't there a very high probability that they are also using DateTime?
I'd say there isn't. I'd say even if a user uses both Serialization and DateTime this doesn't mean he uses Serialization support in DateTime.
Ah - I had misunderstood DateTime did not imply Serialization support in DateTime.
I'm not sure I understand you. DateTime and its support for Serialization live in the same git repository, so when you checkout DateTime you get everything. The problem is that currently there is no way to separate core DateTime functionality from Serialization support.
That there is currently no way does not mean that a tool couldn't separate it. See below.
The proposal was to move these support headers into another subdirectory inside the DateTime git submodule. By the build system, this would be equivalent to a new submodule. (BTW, we should introduce a new term for this;
Perhaps "satellite module"? I agree with some of the other thread participants that we should not prefer this kind of construction, though.
otherwise we will get confused all the time). As a result you will still checkout everything of DateTime when you checkout its git repo, but the dependency graph will have two nodes for it - DateTime and DateTime.Serialization. If you don't use DateTime.Serialization part, you will not have to checkout Serialization as well to get the usable Boost subset.
There is a more elegant way to get the same result, as I'll detail below.
However, does my general point is that many users are already pulling in a large part of Boost, which may mean that your efforts may not be as useful in practice as in theory?
As long as people keep checking out the complete Boost tree and use monolithic Boost distribution, the effect of our work will be relatively small. But our goal is modular Boost, which includes modular distribution, as I understand it.
+1
Sub-sub-modules sound Very Evil to me.
Why?
More complicated file layout.
Agree, this is inconvenient. But as long as there is no better solution, this is an acceptable evil.
So I think there is a better solution. Given a module with a typical header layout boost/ mymodule.hpp mymodule/ core_header_1.hpp core_header_2.hpp support_other_module_1.hpp support_other_module_2.hpp with e.g. the following per-header dependencies mymodule.hpp mymodule/core_header_1.hpp mymodule/core_header_2.hpp hismodule/core_header_8.hpp hermodule/core_header_4.hpp mymodule/core_header_1.hpp hismodule/core_header_7.hpp hismodule/core_header_8.hpp hermodule/core_header_5.hpp mymodule/core_header_2.hpp mymodule/core_header_1.hpp hermodule/core_header_4.hpp hermodule/core_header_5.hpp mymodule/support_other_module_1.hpp mymodule/core_header_1.hpp mymodule/core_header_2.hpp other_module_1.hpp mymodule/support_other_module_2.hpp mymodule/core_header_1.hpp other_module_2/detail/hidden_gem.hpp the automated handler as described above could safely infer that hismodule and hermodule are core dependencies while other_module_1 and other_module_2 are optional. In other words, without any changes to directory layout, per-header dependency information should be sufficient for the handler to distinguish core dependencies from optional dependencies. It would also solve the issue of transitivity: if yourmodule depends on mymodule, the question whether it also depends on other_module_1 is simply answered by tracking whether a (core) header of yourmodule has mymodule/support_other_module_1.hpp as a (direct or indirect) dependency. The only (soft) requirement is that modules have a "catch all" core header like mymodule.hpp. Even if a library lacks such a header, it can still be emulated with an annotation in the configuration file. HTH, -Julian
On Sunday 15 June 2014 23:45:37 Julian Gonggrijp wrote:
I just caught up with this discussion, and based on what I read I think the future automated dependency handler should indeed operate on a per-header basis. This would mean that the configuration file of a module would list all headers in the module, and for each header in the module list all headers that it directly depends on. Of course still with support for conditional (e.g. tool-dependent) dependency annotations. *Optional* dependencies however could then be detected automatically, as I'll explain in my inline reply below.
Umm, I don't think that manually listing all headers and their dependencies in a config file is a viable idea. I have 235 headers in Log and who knows how many dependent headers. Even if that information is filled once, I can't realistically guarantee that the config file stays actual as I work on the library. The list of headers and their dependencies should be inferred by the tool from the headers themselves. Maintainers should be able to provide only the missing information - in particular, which headers are considered optional. Although, if the tool works on the header basis, the whole idea of optional headers becomes irrelevant - you don't include a header => you don't need its dependencies.
Andrey Semashev wrote:
On Sunday 15 June 2014 23:45:37 Julian Gonggrijp wrote:
I just caught up with this discussion, and based on what I read I think the future automated dependency handler should indeed operate on a per-header basis. This would mean that the configuration file of a module would list all headers in the module, and for each header in the module list all headers that it directly depends on. Of course still with support for conditional (e.g. tool-dependent) dependency annotations. *Optional* dependencies however could then be detected automatically, as I'll explain in my inline reply below.
Umm, I don't think that manually listing all headers and their dependencies in a config file is a viable idea.
That is not the intention. The tool generates the file for you (when you ask for it). The file is still there so you can annotate it with tool conditions. Having a single file that lists all header dependencies also speeds up automated downloading, because the tool doesn't have to traverse your 235 headers but can find all information in one place. The tool will also be able to update the file while respecting your annotations. It will also be invoked as a part of the regression tests to check that the file is complete. See also http://lists.boost.org/Archives/boost/2014/06/214429.php .
I have 235 headers in Log and who knows how many dependent headers. Even if that information is filled once, I can't realistically guarantee that the config file stays actual as I work on the library.
The list of headers and their dependencies should be inferred by the tool from the headers themselves.
Yes.
Maintainers should be able to provide only the missing information - in particular, which headers are considered optional. Although, if the tool works on the header basis, the whole idea of optional headers becomes irrelevant - you don't include a header => you don't need its dependencies.
Exactly. /Conditional/ dependencies are still relevant, however. -Julian
Le 16/06/14 01:51, Julian Gonggrijp a écrit :
Andrey Semashev wrote:
On Sunday 15 June 2014 23:45:37 Julian Gonggrijp wrote:
I just caught up with this discussion, and based on what I read I think the future automated dependency handler should indeed operate on a per-header basis. This would mean that the configuration file of a module would list all headers in the module, and for each header in the module list all headers that it directly depends on. Of course still with support for conditional (e.g. tool-dependent) dependency annotations. *Optional* dependencies however could then be detected automatically, as I'll explain in my inline reply below. Umm, I don't think that manually listing all headers and their dependencies in a config file is a viable idea. That is not the intention. The tool generates the file for you (when you ask for it). The file is still there so you can annotate it with tool conditions. Having a single file that lists all header dependencies also speeds up automated downloading, because the tool doesn't have to traverse your 235 headers but can find all information in one place.
The tool will also be able to update the file while respecting your annotations. It will also be invoked as a part of the regression tests to check that the file is complete.
See also http://lists.boost.org/Archives/boost/2014/06/214429.php . Hi Julian,
I don't think that the configuration file should contain the file dependencies, only the module dependencies are really needed and are easy to maintain. The boostdep tool give us the reasons why we depend on a module so if we want to refine our module dependency files we have the needed information. What is needed is the association of a file and a sub-module (sub-sub-module). Currently this is defined by the directory structure. Could you tell us more about how do you plan to manage the conditional dependencies? Which kind of conditions could be stated? platform, compiler, user defined? Could you show an example of the file you pretend the authors of a library need to maintain? Best, Vicente
I have 235 headers in Log and who knows how many dependent headers. Even if that information is filled once, I can't realistically guarantee that the config file stays actual as I work on the library.
The list of headers and their dependencies should be inferred by the tool from the headers themselves. Yes.
Maintainers should be able to provide only the missing information - in particular, which headers are considered optional. Although, if the tool works on the header basis, the whole idea of optional headers becomes irrelevant - you don't include a header => you don't need its dependencies. Exactly. /Conditional/ dependencies are still relevant, however.
-Julian
Vicente J. Botet Escriba wrote:
I don't think that the configuration file should contain the file dependencies, only the module dependencies are really needed and are easy to maintain.
Consider this problem pointed out by Peter: module X X.hpp module Y Y1.hpp Y2.hpp (optional) includes X.hpp module Z Z.hpp includes Y2.hpp If the configuration file contains only module-level dependency information, the handler will not be able to decide whether Z depends on X based on the file alone. The question is how to solve that. I can think of a few options: (1) Also annotate reverse dependencies in the configuration file, i.e. "this dependency is optional unless you want to install some_other_module". This leads to duplicate information, so it doesn't seem like a good idea to me. It would also require the handler to have access to all of Boost in order to detect such reverse dependencies. (2) Have the tool process all *pp files in a module live, every time it handles the module. This would slow down matters dramatically, apart from being an ugly solution in itself. (3) Isolate all optional dependencies into separate modules. I am convinced that this would severely complicate the directory layout, and take apart headers that semantically belong together. In fact I'm not 100% convinced that it will solve the optional dependency problem, unless each individual header becomes a module of its own. (4) Include header-level dependency information at full detail in the configuration file. It does not *need* to be maintained, because the tool can create, update and verify the file automatically. It also doesn't rule out the possibility to keep a module-level summary at the top of the file for casual readers.
The boostdep tool give us the reasons why we depend on a module so if we want to refine our module dependency files we have the needed information.
I can think of several possible interpretations for this sentence. Could you please restate in other words what you intended to say?
What is needed is the association of a file and a sub-module (sub-sub-module). Currently this is defined by the directory structure.
This is option (3) from above. Like Andrey, I believe that basing decisions on header-level information is the better way to go.
Could you tell us more about how do you plan to manage the conditional dependencies? Which kind of conditions could be stated? platform, compiler, user defined?
All conditions that Boost.Build knows about. If I'm right that includes platform and compiler (toolset) but not anything user-defined.
Could you show an example
Sure. Suppose that the tool generates the following file for mymodule: # module-level information may appear at the top, not shown here mymodule.hpp mymodule/core.hpp mymodule/utility.hpp mymodule/core.hpp config.hpp mymodule/detail/god_object.hpp mymodule/utility.hpp config.hpp mymodule/core.hpp mymodule/detail/hacks.hpp compressed_pair.hpp mymodule/filesystem.hpp mymodule/core.hpp filesystem.hpp mymodule/haiku.hpp mymodule/core.hpp mymodule/detail/hacks.hpp container/vector.hpp # mymodule/detail/ headers are standalone Based on just this information, the handler can tell that config and compressed_pair are fixed dependencies while filesystem and container are optional. Now we can annotate that mymodule/utility.hpp and container/vector.hpp are conditional on the compiler: mymodule.hpp mymodule/core.hpp <toolset>msvc-7,<toolset>msvc-8:mymodule/utility.hpp mymodule/core.hpp config.hpp mymodule/detail/god_object.hpp mymodule/utility.hpp config.hpp mymodule/core.hpp mymodule/detail/hacks.hpp compressed_pair.hpp mymodule/filesystem.hpp mymodule/core.hpp filesystem.hpp mymodule/haiku.hpp mymodule/core.hpp mymodule/detail/hacks.hpp <toolset>gcc-2:container/vector.hpp Now we're at it, we can make life easier for Haiku users and conditionally include mymodule/haiku.hpp into the catch-all header: mymodule.hpp mymodule/core.hpp <toolset>msvc-7,<toolset>msvc-8:mymodule/utility.hpp <os>Haiku:mymodule/haiku.hpp mymodule/core.hpp config.hpp mymodule/detail/god_object.hpp mymodule/utility.hpp config.hpp mymodule/core.hpp mymodule/detail/hacks.hpp compressed_pair.hpp mymodule/filesystem.hpp mymodule/core.hpp filesystem.hpp mymodule/haiku.hpp mymodule/core.hpp mymodule/detail/hacks.hpp <toolset>gcc-2:container/vector.hpp This is the module-level information that the handler may extract from the last version: fixed: config <toolset>msvc-7,<toolset>msvc-8:compressed_pair <os>Haiku,<toolset>gcc-2:container optional: filesystem
of the file you pretend the authors of a library need to maintain?
I want to emphasize that library authors would not *need* to maintain the file. Making annotations is beneficial (because it slims down dependencies under certain conditions) but not necessary. The file will be complete and valid without human intervention. -Julian
On Monday 16 June 2014 01:51:05 Julian Gonggrijp wrote:
Andrey Semashev wrote:
On Sunday 15 June 2014 23:45:37 Julian Gonggrijp wrote:
I just caught up with this discussion, and based on what I read I think the future automated dependency handler should indeed operate on a per-header basis. This would mean that the configuration file of a module would list all headers in the module, and for each header in the module list all headers that it directly depends on. Of course still with support for conditional (e.g. tool-dependent) dependency annotations. *Optional* dependencies however could then be detected automatically, as I'll explain in my inline reply below.
Umm, I don't think that manually listing all headers and their dependencies in a config file is a viable idea.
That is not the intention. The tool generates the file for you (when you ask for it). The file is still there so you can annotate it with tool conditions. Having a single file that lists all header dependencies also speeds up automated downloading, because the tool doesn't have to traverse your 235 headers but can find all information in one place.
The tool will also be able to update the file while respecting your annotations. It will also be invoked as a part of the regression tests to check that the file is complete.
See also http://lists.boost.org/Archives/boost/2014/06/214429.php .
I have 235 headers in Log and who knows how many dependent headers. Even if that information is filled once, I can't realistically guarantee that the config file stays actual as I work on the library.
The list of headers and their dependencies should be inferred by the tool from the headers themselves.
Yes.
If the tool is able to extract the information from the headers then why do we need the config files? We should minimize the amount of information to be managed by developers - to just the "optional" annotations. I recognize that the full list of headers and dependencies might be useful for the deployment system to avoid downloading and re-parsing headers. But this doesn't mean this list has to be stored in git and managed by developers. You can employ the approach taken by most package managers (e.g. in Linux and OS X ports, I think). There are downloadable packages (which would correspond to sub-modules or links to their git repos) and auto-generated metadata to help resolving dependencies _prior_ to downloading anything. The metadata should be automatically updated when the packages are uploaded (i.e. official snapshots are uploaded or a referred git tag is added). In fact, it seems to me that there is much more infrastructural work to it than just the dependency tracking tool. It would be nice to see a proposal describing how the deployment process would look like, involving the dependency tracking tool, git, Boost users and maintainers.
Andrey Semashev wrote:
If the tool is able to extract the information from the headers then why do we need the config files? We should minimize the amount of information to be managed by developers - to just the "optional" annotations.
I agree with this sentiment, but we need a way to cache the full dependencies that doesn't require us to change the entire structure of Boost.
I recognize that the full list of headers and dependencies might be useful for the deployment system to avoid downloading and re-parsing headers.
Yes.
But this doesn't mean this list has to be stored in git and managed by developers. You can employ the approach taken by most package managers (e.g. in Linux and OS X ports, I think). There are downloadable packages (which would correspond to sub-modules or links to their git repos) and auto-generated metadata to help resolving dependencies _prior_ to downloading anything.
Why emphasise "prior"? The user requests module X so I download X anyway. Then I can look up what X depends on, whether those data are summarized within X or somewhere outside it. Where do you store the metadata if not within the module? I like the idea of taking automatically generated data out of the versioning system, but it should be minimally invasive.
The metadata should be automatically updated when the packages are uploaded (i.e. official snapshots are uploaded or a referred git tag is added).
Do you envision this in the current situation where "packages" are loaded as sub-modules of the boost super-project, or in a new situation where the boost super-project is taken away and "packages" are standalone (but with dependencies)?
In fact, it seems to me that there is much more infrastructural work to it than just the dependency tracking tool. It would be nice to see a proposal describing how the deployment process would look like, involving the dependency tracking tool, git, Boost users and maintainers.
In a nutshell, both for users and maintainers: 1. Clone the superproject non-recursively. 2. Request specific modules to be installed by Boost.Build; dependencies are tracked by the handler tool which uses git to clone more modules. In addition, for maintainers: 3. Create/update the configuration file by running the handler tool before pushing to the public repo (this could be a git hook). 4. (Optionally) annotate the configuration file with conditional dependencies. (3 and 4 may be swapped if generated data are taken out of git.) In addition, for testers: 5. Additional test dependencies are also automatically tracked and cloned by the handler. 6. Handler verifies the configuration file as part of the test suite. The scope of what I propose does not go beyond this. -Julian
On Monday 16 June 2014 11:42:58 Julian Gonggrijp wrote:
Andrey Semashev wrote:
If the tool is able to extract the information from the headers then why do we need the config files? We should minimize the amount of information to be managed by developers - to just the "optional" annotations.
I agree with this sentiment, but we need a way to cache the full dependencies that doesn't require us to change the entire structure of Boost.
Ok, I just don't think that library repos are a good place for this cache.
But this doesn't mean this list has to be stored in git and managed by developers. You can employ the approach taken by most package managers (e.g. in Linux and OS X ports, I think). There are downloadable packages (which would correspond to sub-modules or links to their git repos) and auto-generated metadata to help resolving dependencies _prior_ to downloading anything.
Why emphasise "prior"? The user requests module X so I download X anyway. Then I can look up what X depends on, whether those data are summarized within X or somewhere outside it.
This is a common courtesy of package managers and installers. The tool should state what it's going to download and install (and often this alerts the user enough so that he cancels). This is a useful protection against undesirable situations like "let me install that one library... oh, now it installs half of Boost for no apparent reason."
Where do you store the metadata if not within the module? I like the idea of taking automatically generated data out of the versioning system, but it should be minimally invasive.
I'm specifically not restricting this part, other than that metadata has to be available without downloading the submodule. As a simple example, an ftp server should be enough to set up a Boost distribution repository. The metadata is stored in a (compressed) text file or several files (better in one file though to speedup its downloading). The tool is able to download this file and build the dependency graph upon user's request before installing anything. If you feel git or another VCS suits better for this metadata, you can use it instead, but I don't see much value in version controlling for this data, and VCSs seem to add quite some overhead. Another reason I want to separate the metadata from git repos - and I'm fantasizing now - is that I can see this tool being used without git at all - to download source packages and install Boost on the user's machine. For example, if I want to install a subset of Boost 1.57 on my machine, I'd like to be able to do that easily, without dealing with git submodules and without downloading the whole git history. The tool will just resolve dependencies, download and extract a set of archives for me.
The metadata should be automatically updated when the packages are uploaded (i.e. official snapshots are uploaded or a referred git tag is added).
Do you envision this in the current situation where "packages" are loaded as sub-modules of the boost super-project, or in a new situation where the boost super-project is taken away and "packages" are standalone (but with dependencies)?
What I meant is that there should be some kind of "official" Boost repository which will be used by the packaging tool (let's call it boost-pkg for brevity). That repository will serve the metadata for boost-pkg. The metadata will be updated when a certain new release is published into it, whether that is a new library release through a git tag or the whole Boost release. I'm not sure if updating the metadata upon a tag creation can be automated, but at least for major Boost releases this should be doable. Continuing my fantasy, the repository may contain standalone packages for library and Boost releases, which may be useful for Boost users (in the above example, I would be downloading archives of Boost 1.57 from this repository). It may also contain references to git repositories - tags or branches. boost- pkg would offer a unified interface so that it is possible to use either of them - e.g. download official Boost 1.57 and a newer release of that one library X, which fixes a critical bug for me. Potentially, boost-pkg could replace the superproject and be used to checkout the whole Boost from git on develop or master branch, which would be useful for developers and testers. For testers this would help to checkout the tested library from develop and everything else from master. Although it is also possible to do with plain git, when you have checked out everything.
In a nutshell, both for users and maintainers: 1. Clone the superproject non-recursively. 2. Request specific modules to be installed by Boost.Build; dependencies are tracked by the handler tool which uses git to clone more modules.
That would require at least Boost.Build to be checked out as well.
In addition, for maintainers: 3. Create/update the configuration file by running the handler tool before pushing to the public repo (this could be a git hook).
This would be a blocker, for me at least. I'm sure I will forget doing that and will be very much annoyed. A git hook that scans the headers on every push doesn't sound very good.
Andrey Semashev wrote:
On Monday 16 June 2014 11:42:58 Julian Gonggrijp wrote:
Where do you store the metadata if not within the module? I like the idea of taking automatically generated data out of the versioning system, but it should be minimally invasive.
I'm specifically not restricting this part, other than that metadata has to be available without downloading the submodule. As a simple example, an ftp server should be enough to set up a Boost distribution repository. The metadata is stored in a (compressed) text file or several files (better in one file though to speedup its downloading). The tool is able to download this file and build the dependency graph upon user's request before installing anything. If you feel git or another VCS suits better for this metadata, you can use it instead, but I don't see much value in version controlling for this data, and VCSs seem to add quite some overhead.
There is a very obvious value in version control: dependencies may change from one Boost commit to the next. The dependency handler should work not only for end users, but also for maintainers and testers who check out any point in Git history. Given that dependency information is directly tied to a specific commit, I think storing the information within the commit isn't such a strange idea. Maybe the full information shouldn't be exposed to the user, but I don't think taking the file completely out of the repository is the right approach. Perhaps it could be stored in a git note [1]. It is friendly to tell end users in advance what dependencies will be installed, but that can be solved by other means. A very simple solution would be to list the dependencies on the Boost website. A slightly more advanced solution would be to have the handler download only the dependency file associated with the release tag using git archive [2], before cloning the entire module (that might not work with git notes, though). For releases, the dependency information could also simply be aggregated in the superproject archive. The advantage of just storing a plain file in the module directory is that it certainly works, even if you download an archive without git history, and without a need to set up a new FTP server or other web service. I would prefer to start there and investigate prettier solutions later.
Another reason I want to separate the metadata from git repos - and I'm fantasizing now - is that I can see this tool being used without git at all - to download source packages and install Boost on the user's machine. For example, if I want to install a subset of Boost 1.57 on my machine, I'd like to be able to do that easily, without dealing with git submodules and without downloading the whole git history. The tool will just resolve dependencies, download and extract a set of archives for me.
If I'm right a module can also be downloaded without all of the history.
The metadata should be automatically updated when the packages are uploaded (i.e. official snapshots are uploaded or a referred git tag is added).
Do you envision this in the current situation where "packages" are loaded as sub-modules of the boost super-project, or in a new situation where the boost super-project is taken away and "packages" are standalone (but with dependencies)?
What I meant is that there should be some kind of "official" Boost repository which will be used by the packaging tool (let's call it boost-pkg for brevity). That repository will serve the metadata for boost-pkg. The metadata will be updated when a certain new release is published into it, whether that is a new library release through a git tag or the whole Boost release. I'm not sure if updating the metadata upon a tag creation can be automated, but at least for major Boost releases this should be doable.
This seems to confirm that you are not interested in dependency handling for maintainers and testers (yet).
Continuing my fantasy, the repository may contain standalone packages for library and Boost releases, which may be useful for Boost users (in the above example, I would be downloading archives of Boost 1.57 from this repository). It may also contain references to git repositories - tags or branches. boost- pkg would offer a unified interface so that it is possible to use either of them - e.g. download official Boost 1.57 and a newer release of that one library X, which fixes a critical bug for me.
This gets more and more ambitious. Don't take me wrong, I like what you describe, but I think it will be easier to get there if we take it one step at a time.
Potentially, boost-pkg could replace the superproject and be used to checkout the whole Boost from git on develop or master branch, which would be useful for developers and testers.
I think you have now almost reinvented Ryppl, but I might be mistaken.
For testers this would help to checkout the tested library from develop and everything else from master. Although it is also possible to do with plain git, when you have checked out everything.
In a nutshell, both for users and maintainers: 1. Clone the superproject non-recursively. 2. Request specific modules to be installed by Boost.Build; dependencies are tracked by the handler tool which uses git to clone more modules.
That would require at least Boost.Build to be checked out as well.
Ah, right.
In addition, for maintainers: 3. Create/update the configuration file by running the handler tool before pushing to the public repo (this could be a git hook).
This would be a blocker, for me at least. I'm sure I will forget doing that and will be very much annoyed. A git hook that scans the headers on every push doesn't sound very good.
But you do agree that caching is a good idea, right? You seem to believe that caches should only be created for releases. I think a dependency handler would have at least as much value to maintainers and testers, if it works for any commit. -Julian ____________ [1] https://www.kernel.org/pub/software/scm/git/docs/git-notes.html [2] https://www.kernel.org/pub/software/scm/git/docs/git-archive.html
On Mon, Jun 16, 2014 at 11:02 PM, Julian Gonggrijp
There is a very obvious value in version control: dependencies may change from one Boost commit to the next. The dependency handler should work not only for end users, but also for maintainers and testers who check out any point in Git history.
Given that the dependency tracking tool is able to reconstruct that information from headers, this doesn't seem like a loss. Remember, that cached metadata is only supposed to speed up things for the most frequent use cases, which are checking out the most recent version (for testers and maintainers) or installing a release (for users). Everything that cannot be reconstructed (the "optional" annotations) is still in git. Thinking about it more, the requirement to build the dependency graph before the the download may require the cache to be available for any given git commit. This is probably a good reason to make the cache version controlled.
Perhaps it could be stored in a git note [1].
I'm not very familiar with git and don't know anything about git notes. Maybe they fit for this purpose. But my request would be that these notes are not required to be added by maintainers. Do git notes affect history? If yes, it would be undesirable if libraries history is spammed with automated commits adding notes with dependency info.
It is friendly to tell end users in advance what dependencies will be installed, but that can be solved by other means. A very simple solution would be to list the dependencies on the Boost website.
That doesn't really work for obvious reasons: (a) the advertised dependencies will get out of sync with reality sooner or later and (b) you can't realistically request users to consult the website when they are about to install a Boost library. The tool should provide that information. It is possible that the tool is not able to do that, if the cache is not available for the given commit to be checked out. The tool should notify the user about this problem but still allow to download the necessary components "blindly", by parsing headers for dependencies.
A slightly more advanced solution would be to have the handler download only the dependency file associated with the release tag using git archive [2], before cloning the entire module (that might not work with git notes, though). For releases, the dependency information could also simply be aggregated in the superproject archive.
Ok. As I said, I specifically did not require any particular means for delivering metadata into the tool. If this is possible with git, provided that usability is satisfactory, I'm all for it. Another alternative is to create a new git submodule to store the cache in.
The advantage of just storing a plain file in the module directory is that it certainly works, even if you download an archive without git history, and without a need to set up a new FTP server or other web service. I would prefer to start there and investigate prettier solutions later.
We're discussing a mechanism that will require mass changes to the libraries and possibly the workflow. I'd say the system should be designed without fundamental flaws from the start. It may not implement everything from the beginning, that I agree.
What I meant is that there should be some kind of "official" Boost repository which will be used by the packaging tool (let's call it boost-pkg for brevity). That repository will serve the metadata for boost-pkg. The metadata will be updated when a certain new release is published into it, whether that is a new library release through a git tag or the whole Boost release. I'm not sure if updating the metadata upon a tag creation can be automated, but at least for major Boost releases this should be doable.
This seems to confirm that you are not interested in dependency handling for maintainers and testers (yet).
What makes you think so?
Continuing my fantasy,...
This gets more and more ambitious. Don't take me wrong, I like what you describe, but I think it will be easier to get there if we take it one step at a time.
I think you have now almost reinvented Ryppl, but I might be mistaken.
Yes, I let myself get carried away a little, and I'm not asking to implement all that. I'm just assessing the possibilities.
In addition, for maintainers: 3. Create/update the configuration file by running the handler tool before pushing to the public repo (this could be a git hook).
This would be a blocker, for me at least. I'm sure I will forget doing that and will be very much annoyed. A git hook that scans the headers on every push doesn't sound very good.
But you do agree that caching is a good idea, right? You seem to believe that caches should only be created for releases.
I agree that the cache is a good idea, as long as it's just a cache. I'm just saying that its role is auxiliary and it should not be managed by developers. I did not say that the cache should only be created for releases (note that I mentioned checking out develop and master with boost-pkg). But it might be more difficult to build the cache in time for heads of branches; there will be some latency between the commit and its metadata.
I think a dependency handler would have at least as much value to maintainers and testers, if it works for any commit.
I'm not arguing with that. And it should work, even if there is no cache whatsoever.
Andrey Semashev wrote:
Thinking about it more, the requirement to build the dependency graph before the the download may require the cache to be available for any given git commit. This is probably a good reason to make the cache version controlled.
Perhaps it could be stored in a git note [1].
I'm not very familiar with git and don't know anything about git notes. Maybe they fit for this purpose. But my request would be that these notes are not required to be added by maintainers.
[...]
A slightly more advanced solution would be to have the handler download only the dependency file associated with the release tag using git archive [2], before cloning the entire module (that might not work with git notes, though). For releases, the dependency information could also simply be aggregated in the superproject archive.
Ok. As I said, I specifically did not require any particular means for delivering metadata into the tool. If this is possible with git, provided that usability is satisfactory, I'm all for it.
[...]
I agree that the cache is a good idea, as long as it's just a cache. I'm just saying that its role is auxiliary and it should not be managed by developers. [...]
So how about this: we work with two files. For now, let's call them conditional_deps.txt and deps_cache.txt. Both are optional and versioned if present. The conditional_deps.txt contains only toolset/platform annotations and is maintained by humans. The deps_cache.txt contains only the "bare" header-level dependency information and is never maintained or even supposed to be read by a human (perhaps it could be hidden). A commit hook is provided that module maintainers can opt to add to their module configuration to have it generated automatically (this won't affect history or be slow; see below). Libraries that don't have the cache can still be handled "blindly", as you suggested. In release archives the cache is (automatically) bundled with the superproject. Would you find that agreeable?
Do git notes affect history? If yes, it would be undesirable if libraries history is spammed with automated commits adding notes with dependency info.
They don't. You may consider a git note a piece of custom metadata associated with a commit, although it works a bit differently under the hood. The same applies to a deps_cache.txt file: it is created as part of the commit procedure and included with the same commit object. No additional commits appear in history. The maintainer does not need to do anything to make this happen except for installing the hook, once.
It is friendly to tell end users in advance what dependencies will be installed, but that can be solved by other means. A very simple solution would be to list the dependencies on the Boost website.
That doesn't really work for obvious reasons: (a) the advertised dependencies will get out of sync with reality sooner or later
Of course they would be generated automatically (and that would only be necessary for global releases).
and (b) you can't realistically request users to consult the website when they are about to install a Boost library. The tool should provide that information.
Good point.
It is possible that the tool is not able to do that, if the cache is not available for the given commit to be checked out. The tool should notify the user about this problem but still allow to download the necessary components "blindly", by parsing headers for dependencies.
I believe this shouldn't really be necessary because a commit hook should be transparent to the maintainer and sufficient to ensure that the cache always exists. But I agree that this would be a reasonable fallback option.
[...]
Another alternative is to create a new git submodule to store the cache in.
I think that would be a bad idea. The cache should be directly coupled to the commit. We must avoid rolling our own datastructures just to match the right cache to the right commit.
The advantage of just storing a plain file in the module directory is that it certainly works, even if you download an archive without git history, and without a need to set up a new FTP server or other web service. I would prefer to start there and investigate prettier solutions later.
We're discussing a mechanism that will require mass changes to the libraries and possibly the workflow.
No, I think it shouldn't. My intention is to provide a new layer of convenience without shaking things up too much. It should make it easier to introduce other, more transformative changes; not the other way round.
[...]
[...] But it might be more difficult to build the cache in time for heads of branches; there will be some latency between the commit and its metadata.
If the cache is updated by a commit hook, this will not be true. The cache will always be 100% up-to-date. Committing by itself will not take notably longer than usual either, because in most cases only a small number of headers will be affected and this information is available to the commit hook. Even if the deps_cache.txt needs to be re-generated entirely and the module is very large, it should take less than a second. (*) Cheers, Julian ___________ (*) I just tried: $ cd PATH_TO/include/boost/math/ $ time grep -r --include="*pp" "#include" . > ~/test.txt and it took 87 ms. Disk access is order of magnitude slower than in-memory file processing, so I expect this to be fairly representative of single-module dependency detection even on older computers.
On Tuesday 17 June 2014 01:45:29 Julian Gonggrijp wrote:
So how about this: we work with two files. For now, let's call them conditional_deps.txt and deps_cache.txt. Both are optional and versioned if present. The conditional_deps.txt contains only toolset/platform annotations and is maintained by humans.
I'm ok with conditional_deps.txt.
The deps_cache.txt contains only the "bare" header-level dependency information and is never maintained or even supposed to be read by a human (perhaps it could be hidden).
Ok.
A commit hook is provided that module maintainers can opt to add to their module configuration to have it generated automatically (this won't affect history or be slow; see below).
I see two potential problems with git hooks. 1. As I understand, the hooks won't work with merging pull requests, unless merged manually, on the developer's machine with the hook installed. 2. AFAIK, the hooks need to be set up by developers on every local copy of the git repository. It is possible that someone performs a commit without running the hook (maybe not intentionally), leaving the cache outdated. Both could be solved by server-side hooks, but I don't know if this is possible to implement with GitHub. But I suspect any such server-side hooks would generate commits and the need to pull after push, which would not be acceptable, IMHO. Performance is also important. I'm not sure IO will necessarily be the limiting factor since I'm afraid you will have to perform a more elaborate preprocessing than grep to support inclusion repetition and header path combining in preprocessor. But it's too early to say anything about performance since there is no working prototype.
Libraries that don't have the cache can still be handled "blindly", as you suggested. In release archives the cache is (automatically) bundled with the superproject.
Ok with that.
[...]
Another alternative is to create a new git submodule to store the cache in.
I think that would be a bad idea. The cache should be directly coupled to the commit. We must avoid rolling our own datastructures just to match the right cache to the right commit.
There is a way to associate it with the commit - if the cache is stored in the superproject and generated when the superproject is updated to refer to the new commit in the submodule. The updated metadata can be committed in the same commit as the reference update.
We're discussing a mechanism that will require mass changes to the libraries and possibly the workflow.
No, I think it shouldn't. My intention is to provide a new layer of convenience without shaking things up too much. It should make it easier to introduce other, more transformative changes; not the other way round.
Well, commit hooks (the client side ones) are a change in the workflow.
Andrey Semashev wrote:
On Tuesday 17 June 2014 01:45:29 Julian Gonggrijp wrote:
A commit hook is provided that module maintainers can opt to add to their module configuration to have it generated automatically (this won't affect history or be slow; see below).
I see two potential problems with git hooks.
Thank you for taking the effort to figure this out with me. :-)
1. As I understand, the hooks won't work with merging pull requests, unless merged manually, on the developer's machine with the hook installed.
That's something I would have to look into. In principle the hook isn't necessary on a merge because the automated merge will take care of updates, but it might occasionally expose merge conflicts in the cache file to maintainers (although in that case there should always be a matching header file conflict). There might or might not exist such a thing as a merge conflict hook. Either way, it doesn't really matter anymore because you have convinced me that hooks at the module level are not the way to go.
2. AFAIK, the hooks need to be set up by developers on every local copy of the git repository. It is possible that someone performs a commit without running the hook (maybe not intentionally), leaving the cache outdated.
Yes. At first I thought this wouldn't be an issue, but I now realise it is.
[...]
[...]
Another alternative is to create a new git submodule to store the cache in.
I think that would be a bad idea. The cache should be directly coupled to the commit. We must avoid rolling our own datastructures just to match the right cache to the right commit.
There is a way to associate it with the commit - if the cache is stored in the superproject and generated when the superproject is updated to refer to the new commit in the submodule. The updated metadata can be committed in the same commit as the reference update.
I think this might be the best idea. It would be reasonably simple to implement and it covers most cases. Only if maintainers want to do custom checkouts on somebody else's module they'll have to use the "blind" fallback, but I guess that's acceptable. Note that this would still require hooks, but only for the superproject maintainers. -Julian
On Tue, Jun 17, 2014 at 1:26 PM, Julian Gonggrijp
Andrey Semashev wrote:
There is a way to associate it with the commit - if the cache is stored in the superproject and generated when the superproject is updated to refer to the new commit in the submodule. The updated metadata can be committed in the same commit as the reference update.
I think this might be the best idea. It would be reasonably simple to implement and it covers most cases. Only if maintainers want to do custom checkouts on somebody else's module they'll have to use the "blind" fallback, but I guess that's acceptable.
Note that this would still require hooks, but only for the superproject maintainers.
I'm not sure how the superproject is currently updated but I suspect it might be through polling and not hooks. I hope someone more informed can comment on this.
On 06/17/2014 01:45 AM, Julian Gonggrijp wrote:
Andrey Semashev wrote:
Thinking about it more, the requirement to build the dependency graph before the the download may require the cache to be available for any given git commit. This is probably a good reason to make the cache version controlled.
Perhaps it could be stored in a git note [1].
I'm not very familiar with git and don't know anything about git notes. Maybe they fit for this purpose. But my request would be that these notes are not required to be added by maintainers.
[...]
A slightly more advanced solution would be to have the handler download only the dependency file associated with the release tag using git archive [2], before cloning the entire module (that might not work with git notes, though). For releases, the dependency information could also simply be aggregated in the superproject archive.
Ok. As I said, I specifically did not require any particular means for delivering metadata into the tool. If this is possible with git, provided that usability is satisfactory, I'm all for it.
[...]
I agree that the cache is a good idea, as long as it's just a cache. I'm just saying that its role is auxiliary and it should not be managed by developers. [...]
So how about this: we work with two files. For now, let's call them conditional_deps.txt and deps_cache.txt. Both are optional and versioned if present.
A word of caution. I know very well the temptation to check in derived data, but I also know the pitfalls. It is not worth the risk. Find some other way to store and access the cache. The wrong version of the derived data will end up in commits, and you lose. Most people have an Internet connection, could the cache lookup be based on git commit sha1, and fetch the cache from some known base URL on the web? -- Bjørn
Bjørn Roald wrote:
On 06/17/2014 01:45 AM, Julian Gonggrijp wrote:
So how about this: we work with two files. For now, let's call them conditional_deps.txt and deps_cache.txt. Both are optional and versioned if present.
A word of caution. I know very well the temptation to check in derived data, but I also know the pitfalls. It is not worth the risk. Find some other way to store and access the cache. The wrong version of the derived data will end up in commits, and you lose.
Andrey and you have convinced me that the cache should not be stored in the individual modules. However, I think storing the cache in the superproject is better than storing it in a separate web service. As long as all superproject maintainers (which if I'm right is a small number of people) install the commit hook, version mismatches should not occur.
Most people have an Internet connection, could the cache lookup be based on git commit sha1, and fetch the cache from some known base URL on the web?
It could, but then we would have two problems. -Julian
On 15 June 2014 19:00, Andrey Semashev
More complicated file layout.
Agree, this is inconvenient. But as long as there is no better solution, this is an acceptable evil.
We don't have any solutions, as nothing has been implemented. It's possible to imagine other alternatives. For example, a configuration file specifying subsets of headers to be counted as a unit. I don't think it's a good idea to reorganise the file structure of a module based on a possible future tool which may or may not make use of this directory structure.
Andrey Semashev-2 wrote
On Sun, Jun 15, 2014 at 9:21 PM, Paul A. Bristow <
pbristow@.u-net
> wrote:
-----Original Message----- From: Boost [mailto:
boost-bounces@.boost
] On Behalf Of Andrey
Semashev
Sent: 15 June 2014 16:42 To:
boost@.boost
Subject: Re: [boost] date_time -> serialization (Was: spirtit -> serialization)
As long as people keep checking out the complete Boost tree and use monolithic Boost distribution, the effect of our work will be relatively small. But our goal is modular Boost, which includes modular distribution, as I understand it.
Sub-sub-modules sound Very Evil to me.
Why?
More complicated file layout.
Agree, this is inconvenient. But as long as there is no better solution, this is an acceptable evil. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Not it's not acceptable. git is already a little bit past the edge of what we can stand as far as adding complexity. "As long as people keep checking out the complete Boost tree and use monolithic Boost distribution, the effect of our work will be relatively small. But our goal is modular Boost, which includes modular distribution, as I understand it." The whole purpose of the exercise is to eliminate the requirement to checkout the complete boost tree. If we're going to assume that this is going to continue indefinitely we can just stop right now and declare some sort of victory. The whole "optional component issue" hasn't been properly considered. The key miss-step is in the idea that one library is dependent upon another library. This concept cannot be defined. It is our equivalent to an "undecidable proposition" As an example take the date-time library. For a user who is just going to invoke the basic functions - the serialization implementation should not be considered while for users that do use these functions it has to be. So you say - OK - we'll just make another submodule date-time/serialization. At this point you've basically given up on the idea that there is an unambiguous answer to the question - is the date-time library dependent upon the serialization library? the real answer is - can't say without more information. So now you say - well, that's all theoretical BS if we just make another submodule - we'll avoid the whole problem. But what about a user who needs to run tests on the one or other of the libraries. For example, the serialization library tests depend upon system and file system. The date-time library might depend upon boost test. Upshot is that it makes no sense to argue that library X is dependent upon library Y without considering a specific application. So once you've eliminated circular dependencies - which is bug you should stop. The user needs a tool (which we might or might not have) which takes an *.cpp file as an argument and returns a list of libraries which that *.cpp file requires. Of course this is more complicated that it might appear. Each of the *.cpp files in he library which are used by the application has to be checked in the same manner. And it's even more complex. For the DLL version of the date-time library, the author might have decided to package the serialization part in a separate DLL. So one has to follow only the chain of calls according to which DLLs are actually going to get called. Even in a static library, one will only want to follow the dependencies for the *.cpp files actually used. We're now starting down a path which will never arrive at a worthy destination. What want to end up with is a tool which looks like library-list <- tool *.cpp file list. so that users can download and ship the subset, only the subset, that their application requires. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-spirtit-serialization-tp46... Sent from the Boost - Dev mailing list archive at Nabble.com.
On Tue, Jun 17, 2014 at 9:08 AM, Robert Ramey
"As long as people keep checking out the complete Boost tree and use monolithic Boost distribution, the effect of our work will be relatively small. But our goal is modular Boost, which includes modular distribution, as I understand it."
The whole purpose of the exercise is to eliminate the requirement to checkout the complete boost tree. If we're going to assume that this is going to continue indefinitely we can just stop right now and declare some sort of victory.
I wasn't assuming this will continue indefinitely. But obviously this will continue to be the case until (a) there are tools for partial Boost checkout and (b) there is actual benefit from it. Admittedly, most work that had been done so far targets (b). You may argue that without the tool we can't know what changes will be beneficial. I think this is a chicken and egg problem, and someone just has to start somewhere. Besides, some changes will be beneficial regardless of the tool - for example, see recent discussions about TypeTraits and TypeTraits.Core. As for moving headers to sub-sub-modules, I would agree that the benefit of such changes depends on the tool.
The user needs a tool (which we might or might not have)
If we don't have it then all the modularization effort is pointless and we should no longer waste our time with it. To my mind, its presence is crucial to the success of the undertaking.
which takes an *.cpp file as an argument and returns a list of libraries which that *.cpp file requires. Of course this is more complicated that it might appear. Each of the *.cpp files in he library which are used by the application has to be checked in the same manner. And it's even more complex. For the DLL version of the date-time library, the author might have decided to package the serialization part in a separate DLL. So one has to follow only the chain of calls according to which DLLs are actually going to get called. Even in a static library, one will only want to follow the dependencies for the *.cpp files actually used.
We're now starting down a path which will never arrive at a worthy destination.
What want to end up with is a tool which looks like
library-list <- tool *.cpp file list.
so that users can download and ship the subset, only the subset, that their application requires.
Yes, the header-based tool (and for our purpose the cpp can be considered as a header) is being discussed now in the adjacent thread. I like the idea of generating dependencies based on the root cpp file.
What want to end up with is a tool which looks like
library-list <- tool *.cpp file list.
so that users can download and ship the subset, only the subset, that their application requires.
Well.... that's exactly what bcp provides, but of course no one would use that ;-) John.
I was aware of bcp - but I had presumed it just went through the library headers and not through the individual *.cpp files. Now you're telling me bcp does this and I'll want to take a more careful look at this. The fact that this hasn't come up in this discussion suggests that I'm not alone in this. Maybe the source of our problem is that John Maddock is just too modest to flog his stuff. Addressing this would lead us to another ripe topic - updating/regorganizing the Boost web presence and organization to make it a more effective tool to promote our views - to the extent we can agree what our views are. Not wanting to hijack this thread, we'll leave this to another discussion. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-spirtit-serialization-tp46... Sent from the Boost - Dev mailing list archive at Nabble.com.
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 17 June 2014 06:09 To: boost@lists.boost.org Subject: Re: [boost] date_time -> serialization (Was: spirtit -> serialization)
Subject: Re: [boost] date_time -> serialization (Was: spirtit -> serialization)
As long as people keep checking out the complete Boost tree and use monolithic Boost distribution, the effect of our work will be relatively small. But our goal is modular Boost, which includes modular distribution, as I understand it.
Sub-sub-modules sound Very Evil to me.
Why?
More complicated file layout.
Agree, this is inconvenient. But as long as there is no better solution, this is an acceptable evil. _______________________________________________
Not it's not acceptable.
git is already a little bit past the edge of what we can stand as far as adding complexity.
"As long as people keep checking out the complete Boost tree and use monolithic Boost distribution, the effect of our work will be relatively small. But our goal is modular Boost, which includes modular distribution, as I understand it."
The whole purpose of the exercise is to eliminate the requirement to checkout
complete boost tree. If we're going to assume that this is going to continue indefinitely we can just stop right now and declare some sort of victory.
The whole "optional component issue" hasn't been properly considered. The key miss-step is in the idea that one library is dependent upon another library. This concept cannot be defined. It is our equivalent to an "undecidable
+1 - And on top of that, some 'snags' with the hard/symlinks are emerging. At the very least, they make using the MS IDE (and others?) error prone. the proposition"
As an example take the date-time library. For a user who is just going to
basic functions - the serialization implementation should not be considered while for users that do use these functions it has to be.
So you say - OK - we'll just make another submodule date-time/serialization. At this point you've basically given up on the idea that there is an unambiguous answer to the question - is the date-time library dependent upon the serialization
real answer is - can't say without more information.
So now you say - well, that's all theoretical BS if we just make another submodule - we'll avoid the whole problem.
But what about a user who needs to run tests on the one or other of the
For example, the serialization library tests depend upon system and file system. The date-time library might depend upon boost test.
Upshot is that it makes no sense to argue that library X is dependent upon
invoke the library? the libraries. library Y
without considering a specific application.
So once you've eliminated circular dependencies - which is bug you should stop.
+1 At least until we see some clear signs that we are going get benefits from Modular Boost. Paul PS Meanwhile we *must* get a release out. --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 01539 561830
On 15 June 2014 16:42, Andrey Semashev
On Sun, Jun 15, 2014 at 7:16 PM, Paul A. Bristow
Sub-sub-modules sound Very Evil to me.
Why?
It's a bad name. Double prefixes are worrying, it suggests over-nesting. It's also confusing. Since we have to type 'git submodule' all the time, we're training ourselves to see 'submodule' as a git term. 'sub-sub-modules' will inevitably sound like nested git modules, even though that isn't what you mean. We actually already have a name for the concept, 'sublibs'. It's not really accurate, but it's better than 'sub-sub-module'.
On 06/15/2014 05:16 PM, Paul A. Bristow wrote:
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov Sent: 15 June 2014 12:46 To: boost@lists.boost.org Subject: [boost] date_time -> serialization (Was: spirtit -> serialization)
Vicente J. Botet Escriba wrote: Le 15/06/14 12:48, Andrey Semashev a écrit :
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies.
Why? I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module.
What others think?
I think that Vicente is right in this case. Moving serialization support to a submodule of DateTime will make the dependency report nicer _and_ it will actually be correct from the perspective of an automatic downloader. If you use DateTime, you'll get the DateTime repo, along with the serialization support, but you will not get the Serialization repo (and its dependencies) if you don't use Serialization. And this is exactly as it should be, unless I'm missing something subtle.
It seems to me that this is a legitimate use of sub-sub-modules.
I've followed this thread with interest and general support, but there is one factor that doesn't seem to be 'factored-in' in.
If someone is using Serialisation then isn't there a very high probability that they are also using DateTime?
maybe 20% is my best guess for that probability, but it is only a guess.
So having these in the same package doesn't really matter (except for the artificial level number)?
Would not that approach force all serialization users to depend on all modules that happens to have serialization support embedded in serialization module. How will that be better?
Looking at the shrink-wrap users, I have a suspicion that this applies quite widely - many people will manage to pull in a big chunk of Boost.
Rearranging the modules isn't going to change this much.
Sub-sub-modules sound Very Evil to me.
It does not sound all that good to me either, but they are just modules. So a better name may help with the "sound Evil" part.
KISS applies?
Yes, how are you proposing to keep it simple? -- Bjørn
Le 15/06/14 17:16, Paul A. Bristow a écrit :
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov Sent: 15 June 2014 12:46 To: boost@lists.boost.org Subject: [boost] date_time -> serialization (Was: spirtit -> serialization)
The approach of extracting glue headers to separate submodules is not scalable. We have many other libraries using the same approach to optional dependencies. Why? I don't see why I would depend on Serialization if I don't use it even if I use DateTime. IMHO, it is up to the client of the serialization of the DateTime types to use the DateTime.Serialization sub-module. What others think? I think that Vicente is right in this case. Moving serialization support to a submodule of DateTime will make the dependency report nicer _and_ it will actually be correct from the perspective of an automatic downloader. If you use DateTime, you'll get
Vicente J. Botet Escriba wrote: Le 15/06/14 12:48, Andrey Semashev a écrit : the DateTime repo, along with the serialization support, but you will not get the Serialization repo (and its dependencies) if you don't use Serialization. And this is exactly as it should be, unless I'm missing something subtle.
It seems to me that this is a legitimate use of sub-sub-modules. I've followed this thread with interest and general support, but there is one factor that doesn't seem to be 'factored-in' in.
If someone is using Serialisation then isn't there a very high probability that they are also using DateTime?
So having these in the same package doesn't really matter (except for the artificial level number)?
Looking at the shrink-wrap users, I have a suspicion that this applies quite widely - many people will manage to pull in a big chunk of Boost.
Rearranging the modules isn't going to change this much.
Sub-sub-modules sound Very Evil to me.
Why?
KISS applies?
KISS <=> without cycles Vicente
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Vicente J. Botet Escriba Sent: 15 June 2014 22:42 To: boost@lists.boost.org Subject: Re: [boost] date_time -> serialization (Was: spirtit -> serialization)
Le 15/06/14 17:16, Paul A. Bristow a écrit :
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov Sent: 15 June 2014 12:46 To: boost@lists.boost.org Subject: [boost] date_time -> serialization (Was: spirtit -> serialization) If someone is using Serialisation then isn't there a very high probability that they are also using DateTime?
So having these in the same package doesn't really matter (except for the artificial level number)?
Looking at the shrink-wrap users, I have a suspicion that this applies quite widely - many people will manage to pull in a big chunk of Boost.
Rearranging the modules isn't going to change this much.
Sub-sub-modules sound Very Evil to me.
Why?
Gut feel ;-)
KISS applies?
KISS <=> without cycles
OK - I can accept this as a goal. But we need to get a release out... Yours Worryingly Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 01539 561830
Appologies if you've seen this before. I've had problems with Google groups so I don't know if anyone saw this. I'm trying Nabble now. I remember that you originally brought up the question of "bridging" headers and I thought about that. Take for example the serialization library. The multi precision library implements the code required by the serialization library in order to save and restore multi precision data types to an archive. So it includes some headers from the serialization library. But is the multiprecision anyway dependent upon the serialization library? Well, most people will using multi precision without needing or wanting the serialization library. So should the current dependency scheme is going to be found wanting for those people. Before someone comes ups with the great idea - "just move all the multi precision serialization code in to the serialization library itself consider the question from the serialization library point of view. Now he is "dependent" on multi-precision library even though he doesn't use it. We've just shifted the problem from one place to another. So consider saying that multi precision serialization is a separate module. Aside from the proliferation of modules and confusion that this would entail there's still a problem. Running the tests for a library depends on still more modules. Since I would like to see more users run the test suite for the modules they use on their own environment then we've got some other dependencies. Basically - we just have to recognize that our "dependency minimization" efforts will never be definitive. But I think the exercise is useful, especially the breaking of cycles and eliminating gratuitous dependencies, as long as it doesn't become an end in itself. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-spirtit-serialization-tp46... Sent from the Boost - Dev mailing list archive at Nabble.com.
Le 13/06/14 18:41, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote:
Hi,
comments about this dependency:
|
| * from |
| It seems that this file is not used
|boost/spirit/home/support/detail/lexer/serialise.hpp
Could this file be removed or moves to examples? I think if the intent is to remove circular dependencies, you should see if you can split the archive parts of the serialization out and make only that part depend on spirit.
Sorry, I didn't understood correctly your sentence. Now I agree with you, spliting serialization and archive would help a lot. I'll start a new wiki page with the dependencies to break. Best, Vicente
Vicente J. Botet Escriba wrote:
I think if the intent is to remove circular dependencies, you should see if you can split the archive parts of the serialization out and make only that part depend on spirit.
Sorry, I didn't understood correctly your sentence. Now I agree with you, spliting serialization and archive would help a lot.
Great! In the future please don't be so quick to assume I don't know what I'm talking about and we can avoid wasting mails :). Once the range->algorithm and serialization->spirit edges are removed, another phase of this dependency work can begin. Those edges are the two most important ones from a cycles point of view. Thanks for all of your work, Steve.
Le 14/06/14 18:44, Vicente J. Botet Escriba a écrit :
Le 13/06/14 18:41, Stephen Kelly a écrit :
Vicente J. Botet Escriba wrote:
Hi,
comments about this dependency:
|
| * from |
| It seems that this file is not used
|boost/spirit/home/support/detail/lexer/serialise.hpp
Could this file be removed or moves to examples? I think if the intent is to remove circular dependencies, you should see if you can split the archive parts of the serialization out and make only that part depend on spirit.
Sorry, I didn't understood correctly your sentence. Now I agree with you, spliting serialization and archive would help a lot.
I'll start a new wiki page with the dependencies to break.
Done https://svn.boost.org/trac/boost/wiki/ModuleDepednecies Please, be free to update this page with any suggestion that could help to reduce the dependencies. Best, Vicente
participants (10)
-
Andrey Semashev
-
Bjørn Roald
-
Daniel James
-
John Maddock
-
Julian Gonggrijp
-
Paul A. Bristow
-
Peter Dimov
-
Robert Ramey
-
Stephen Kelly
-
Vicente J. Botet Escriba