[modularization] What is a module? What is a sub-module?

Vicente J. Botet Escriba

21 Sep 2014 21 Sep '14

8:36 a.m.

Hi all, After the long threads concerning the modularization it seems clear to me that we are in an impasse. While I guess breaking dependency cycles is a modularization goal for all of us, it seems that moving files from one module to a sub-module has not be well accepted, worst yet, could have some dramatic consequences (a lot of testers were broken with the MPL split) due to the way we build Boost. I'm wondering if we don't need an alternative definition of module/sub-module. This definition could be temporary, it would just be useful to identify the modules/sub-modules that would be the good ones. We could move the files when needed by some user oriented tools (install). Let me try it. A module or submodule is just a list of files. The first problem is how to identify this list. Currently, the boostdep tool request that a module is associated to the files in the directory include of a repository and a sub-module to the files in the directory include in a subdirectory of a repository. I don't remember which criteria Stephen Kelly is using in his tool. I propose to switch to a less restrictive and more explicit mapping that don't implies any change on how Boost is build/tested. The advantage is that we can change this mapping without any trouble on the regressions tests :) * If a repository don't have this explicit mapping, the module with the same name as the repository is composed of the files in the directory include, src and build of the repository. There could be other implicit modules for test, examples, benchmarks, ..., which are not user oriented. * If a repository has this explicit mapping, the module/sub-modules are the ones described in this mapping. Open points * Do we want/need to be able to define modules containing files in more than one repository? * Where the explicit mapping should be stored? * Should we track the build dependencies? How? * Should we rethink how Boost is build in a modularized world? * Do we want/need to take in account optional dependencies from the beginning? Note that this would not reduce the dependencies at the file level, but at least would help us to break the cycles. Peter, if we agree this is the way to exit from the impasse, would you like to adapt your tool to use an explicit mapping ? Any concrete proposal about how Boost could be installed in a modularized world would be welcome. Open point: Is anyone interested in working on this modular installation? Best, Vicente

Show replies by date

Andrey Semashev

21 Sep 21 Sep

9:12 a.m.

On Sunday 21 September 2014 10:36:45 Vicente J. Botet Escriba wrote:

...

Hi all,

After the long threads concerning the modularization it seems clear to me that we are in an impasse.

While I guess breaking dependency cycles is a modularization goal for all of us, it seems that moving files from one module to a sub-module has not be well accepted, worst yet, could have some dramatic consequences (a lot of testers were broken with the MPL split) due to the way we build Boost.

The current problems in testing are not caused by MPL per se, but by a Boost.Build issue which manifested itself when MPL got modularized. The Boost.Build problem can appear in other contexts and should be fixed even if we revert MPL.

...

I'm wondering if we don't need an alternative definition of module/sub-module. This definition could be temporary, it would just be useful to identify the modules/sub-modules that would be the good ones. We could move the files when needed by some user oriented tools (install).

Let me try it. A module or submodule is just a list of files. The first problem is how to identify this list. Currently, the boostdep tool request that a module is associated to the files in the directory include of a repository and a sub-module to the files in the directory include in a subdirectory of a repository. I don't remember which criteria Stephen Kelly is using in his tool. I propose to switch to a less restrictive and more explicit mapping that don't implies any change on how Boost is build/tested. The advantage is that we can change this mapping without any trouble on the regressions tests :)

* If a repository don't have this explicit mapping, the module with the same name as the repository is composed of the files in the directory include, src and build of the repository. There could be other implicit modules for test, examples, benchmarks, ..., which are not user oriented. * If a repository has this explicit mapping, the module/sub-modules are the ones described in this mapping.

Open points * Do we want/need to be able to define modules containing files in more than one repository? * Where the explicit mapping should be stored? * Should we track the build dependencies? How? * Should we rethink how Boost is build in a modularized world? * Do we want/need to take in account optional dependencies from the beginning?

Note that this would not reduce the dependencies at the file level, but at least would help us to break the cycles.

Peter, if we agree this is the way to exit from the impasse, would you like to adapt your tool to use an explicit mapping ?

Any concrete proposal about how Boost could be installed in a modularized world would be welcome. Open point: Is anyone interested in working on this modular installation?

I'm not sure I understand how this explicit mapping would be stored. With git submodules and sublibs the Boost modular structure can be inferred from the directory structure. With your approach there will be files belonging to different Boost modules in the same directory. If you propose some kind of metadata describing the distribution of files between Boost modules then how this metadata is supposed to be filled and maintained? I'd really like to avoid any approach that involves us manually filling it (and git hooks aren't really a good solution either, as we discussed earlier).

Vicente J. Botet Escriba

9:41 a.m.

...

On Sunday 21 September 2014 10:36:45 Vicente J. Botet Escriba wrote:

...
Hi all,

After the long threads concerning the modularization it seems clear to me that we are in an impasse.

While I guess breaking dependency cycles is a modularization goal for all of us, it seems that moving files from one module to a sub-module has not be well accepted, worst yet, could have some dramatic consequences (a lot of testers were broken with the MPL split) due to the way we build Boost. The current problems in testing are not caused by MPL per se, but by a Boost.Build issue which manifested itself when MPL got modularized. The Boost.Build problem can appear in other contexts and should be fixed even if we revert MPL. Right, this is why I'm saying that maybe we need to review the way we are building modular Boost.

...
I'm wondering if we don't need an alternative definition of module/sub-module. This definition could be temporary, it would just be useful to identify the modules/sub-modules that would be the good ones. We could move the files when needed by some user oriented tools (install).

<snip> I'm not sure I understand how this explicit mapping would be stored. With git submodules and sublibs the Boost modular structure can be inferred from the directory structure. With your approach there will be files belonging to different Boost modules in the same directory. As I said, this is temporary, to help to identify the modules/sub-modules. I guess that at the end the files would be moved to

Le 21/09/14 11:12, Andrey Semashev a écrit : the a specific repository or directory.

...

If you propose some kind of metadata describing the distribution of files between Boost modules then how this metadata is supposed to be filled and maintained? I'd really like to avoid any approach that involves us manually filling it (and git hooks aren't really a good solution either, as we discussed earlier).

I don't see any problem. For most of the modules the implicit mapping is the good one. For those that must be explicit, I purpose that the author maintain this metadata as far as it is located at the repository level. The boostdep tool should be able (or is already able?) to catch when there are new files that have been not mapped, and this should be fixed. There is still an case to answer if we need to have modules that are located now in multiple repositories, but the question is still open. Best, Vicente

Bjørn Roald

3:12 p.m.

On 09/21/2014 10:36 AM, Vicente J. Botet Escriba wrote:

...

Hi all,

After the long threads concerning the modularization it seems clear to me that we are in an impasse.

Maybe most of the friction is more of a case of lack of clear communication rather than real disagreements. It could be the goals would be agreed if they where clear to everyone. Some participants in the threads seems to have clear goals in mind for what need to be done first, and just feel need to to proceed, while others are confused about what is going on and why. The later may need to understand the "why" as in how we get to a end result we want and what that result looks like. The former group may be more concerned with what they "know" has to be done before we get anywhere. They need to convince the skeptics why that is the case. Neither sides statements and arguments are hard to understand if you are willing to try to shift mindset for the sake of understanding. Nevertheless it need to be some level of consensus before this can proceed. So how can consensus be achieved? I think starting with more concrete meaning to terminology used in discussions, proposals and guidelines would be a very helpful. Guessing what people mean with module, sub-module, library, sub-library, repo, sub-repo, package, dependency, etc. is not helpful to understanding each other. I have tried to follow the discussions and have to say misinterpretations seems to be a major problem. If we could agree on terminology, then the quality of the discussions could be improved vastly. A wiki page defining how these terms are used and not used in boost could be a normative reference. As I have been thinking about this a bit, I offer my thoughts here for comments and elaboration. There are certainly definitions here I am not sure are the best or the right ones, however I opt for not providing the alternatives I have been considering and pros and cons for each as I rather provide a cleaner proposal for discussions, here we go: Library: A library is a collection of code in Boost that is reviewed and accepted/rejected by boost as community. A library is maintained be individuals that are the library maintainers. The code is managed in a separate git repository that is included as a git submodule in the libs folder of the boost master repository. A library contain the library's main module in subdirectories include, src, test, build, and doc. In addition a library may contain a number of additional directories containing optional modules that depend on the main module, these are called sub-libraries. Sub-library: A library may contain related code in sub-libraries that should be treated as separate module to limit dependencies incurred if they are part of the library's main module. The sub-library has its own module structure containing its own include, src, test, build, and doc directories. A sub-library is part of the library and is maintained by the libraries maintainers. Package: Unit of deployment of boost source code and/or pre-build libraries, documentation etc. Typically there may be a one-to-one relationship between packages and modules, but it is possible to deploy more than one module in a package or break one module into more than one package. Repository: A version controlled directory structure containing checked out or modified files in a working directory and a database of the repository history and relationships to other repositories. In a git working directory, the database is in the .git subdirectory or is pointed to by a .git file. Sub-Repository: I suggest we do not use this term mean sub-library. Use the term sub-library or git submodule instead. Module: A organized set of boost library code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that is used to build one or more corresponding library files. Sub-module: I suggest we do not use this term to mean sub-library, use sub-library instead. If it is not clearly given by context, use git submodule if we have a git repository tracked using a git submodule in mind (http://git-scm.com/docs/git-submodule). Dependencies: Handling of dependencies is where I struggle the most with seeing a clear path forward. In particular what determines the nodes and edges in the dependency graphs we care about. And what are we going to use the dependency graph for. The naive approach is to track module dependencies alone. That is, each module is a node in a dependency graph. This does however have some major problems. Test Example, and Doc Dependencies: First of all, if test, example and doc code is part of the module and incur additional requirements, we certainly do not always want to track those dependencies as the modules dependencies. A separate dependency graph node for test code seems to be a solution if there is a real need to track it at all. Documentation can also clearly be treated separate if need be. However, given this, then the module as defined above is no longer the node in the dependency graph. But that is probably just the beginning. Lib Dependencies: Modules that are not header only have source files in the src directory that are compiled into one or more library files (ignoring variants directly supported by Boost.Build). Separate dependency graph nodes may be appropriate here to distinguish dependencies at link and compile time. But there are many possible facets of this, so I think the real use-cases for the dependency graph should drive requirements for what the nodes and edges shall model. In addition dependencies may vary on configuration of the target environment. It is not clear if or how such external dependencies should be tracked, however starting with the Jamfile lib dependencies is certainly a good start. It may be most package management systems has what is needed for the rest, so it is a mater of bridging these worlds. Include Dependencies: Dependencies in the include directory may cause compile and link time dependencies for the module user. These dependencies does not incur before a header is included directly or indirectly that require the specific dependency to be met. This could, as some have pointed out, be leveraged to get very flexible and fine-grained "real" dependency graph in boost. However, as the actual dependencies are not known before the application developer changes source code, compiles and links, and then understand cause of the resulting diagnostics, this is not very helpful for packaging of minimum required sub-sets of boost. I am also afraid the diagnostics for missing headers or object file symbols will not be a very user friendly solution. However if that could be fixed somehow to point directly at the missing package, or even better that a package manager could be more or less automatically invoked to fix it, then this may be a path forward. Such fine-grained dependency tracking could greatly reduce need for sub-libraries. Separating larger chunks of code in a sub-library may seem reasonable for several reasons, but to separate single headers into their own sub-library only to get a "pretty" graph may clearly be way off the reasonableness scale. Especially, if it can be reasoned that we don't push internal boost structure problems on the helpless application developer to figure out. It seems reasonable to look for facilitation for something much simpler in these cases. For the lack of a better term for what some are suggesting, I just invented bridging-header as a term which may be a mechanism to help in this situations. Bridging Headers: A bridging header is a C++ header files that bridges facilities in one module with facilities in another module to provide a new convenience facility to users. The bridging header is part of the include structure in one of the two modules and only depend on a minimal required set of features from the two modules to provide the new convenience facility. A bridging header is marked in a to-be-determined way that allow dependency tracking tools to track the set of bridging headers between any two modules as a separate node (a bridge) in the dependency graph. When a user include a bridging header it add both the bridged modules as dependencies, however it may not be practical to have every bridge tracked by a package manager as a separate package. -- Bjørn

Rob Stewart

22 Sep 22 Sep

9:37 a.m.

On September 21, 2014 11:12:51 AM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...

On 09/21/2014 10:36 AM, Vicente J. Botet Escriba wrote:

...
After the long threads concerning the modularization it seems clear to me that we are in an impasse.

Maybe most of the friction is more of a case of lack of clear communication rather than real disagreements. It could be the goals would be agreed if they where clear to everyone. Some participants in the threads seems to have clear goals in mind for what need to be done first, and just feel need to to proceed, while others are confused about what is going on and why. The latter may need to understand the "why" as in how we get to a end result we want and what that result looks like. The former group may be more concerned with what they "know" has to be done before we get anywhere. They need to convince the skeptics why that is the case. Neither side's statements and arguments are hard to understand if you are willing to try to shift mindset for the sake of understanding. Nevertheless it need to be some level of consensus before this can proceed.

You are very likely correct.

...

So how can consensus be achieved? I think starting with more concrete meaning to terminology used in discussions, proposals and guidelines would be a very helpful. Guessing what people mean with module, sub-module, library, sub-library, repo, sub-repo, package, dependency, etc. is not helpful to understanding each other.

...

Library: A library is a collection of code in Boost that is reviewed and accepted/rejected by boost as community. A library is maintained be individuals that are the library maintainers. The code is managed in a separate git repository that is included as a git submodule in the libs folder of the boost master repository. A library contain the library's main module in subdirectories include, src, test, build, and doc. In addition a library may contain a number of additional directories containing optional modules that depend on the main module, these are called sub-libraries.

You've defined "library" in terms of "module" and "sub-library" which have not yet been defined. What is a "main module"? I need to understand that to understand what's included in a library. More on module's definition below.

...

Sub-library: A library may contain related code in sub-libraries that should be treated as separate module to limit dependencies incurred if they are part of the library's main module. The sub-library has its own module structure containing its own include, src, test, build, and doc directories. A sub-library is part of the library and is maintained by the libraries maintainers.

I need to understand "module" to understand "sublibrary" (which needn't be hyphenated, BTW).

...

Package: Unit of deployment of boost source code and/or pre-build libraries,

I assume you meant "pre-built" rather than "pre-build" here.

...

documentation etc. Typically there may be a one-to-one relationship between packages and modules, but it is possible to deploy more than one module in a package or break one module into more than one package.

The current packaging model puts all modules into one package, so it's more than possible, it's the norm.

...

Repository: A version controlled directory structure containing checked out or modified files in a working directory and a database of the repository history and relationships to other repositories. In a git working directory, the database is in the .git subdirectory or is pointed to by a .git file.

The usual meaning of "repository", at least in my experience is the managed history in a certain control tool, not the files in a workspace.

...

Sub-Repository: I suggest we do not use this term mean sub-library. Use the term sub-library or git submodule instead.

If the VCS ever changes again, the tool-specific name of this entity will probably change. It would be better to provide an abstraction. That is, formalize "subrepository" and not that a git submodule is a subrepository.

...

Module: A organized set of boost library code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that is used to build one or more corresponding library files.

How is a module distinct from a library? Both are defined in terms of the directories they contain. Each is defined in terms of the other.

...

Sub-module: I suggest we do not use this term to mean sub-library, use sub-library instead. If it is not clearly given by context, use git submodule if we have a git repository tracked using a git submodule in mind (http://git-scm.com/docs/git-submodule).

Until I better understand the difference between "library" and "module", I can't say whether I agree with your conclusion on submodule.

...

Dependencies: Handling of dependencies is where I struggle the most with seeing a clear path forward. In particular what determines the nodes and edges in the dependency graphs we care about. And what are we going to use the dependency graph for.

Right

...

Test Example, and Doc Dependencies: First of all, if test, example and doc code is part of the module and incur additional requirements, we certainly do not always want to track those dependencies as the modules dependencies. A separate dependency graph node for test code seems to be a solution if there is a real need to track it at all. Documentation can also clearly be treated separate if need be. However, given this, then the module as defined above is no longer the node in the dependency graph. But that is probably just the beginning.

Test and doc dependencies should certainly be tracked separately, if at all.

...

Lib Dependencies: Modules that are not header only have source files in the src directory that are compiled into one or more library files (ignoring variants directly supported by Boost.Build). Separate dependency graph nodes may be appropriate here to distinguish dependencies at link and compile time. But there are many possible facets of this, so I think the real use-cases for the dependency graph should drive requirements for what the nodes and edges shall model. In addition dependencies may vary on configuration of the target environment. It is not clear if or how such external dependencies should be tracked, however starting with the Jamfile lib dependencies is certainly a good start. It may be most package management systems has what is needed for the rest, so it is a mater of bridging these worlds.

I should think dependencies would be computed at the logical grouping represented by library or module, depending on what those terms actually mean. I presume one will choose to build components by such logical entities.

...

Include Dependencies: Dependencies in the include directory may cause compile and link time dependencies for the module user. These dependencies does not incur before a header is included directly or indirectly that require the specific dependency to be met. This could, as some have pointed out, be leveraged to get very flexible and fine-grained "real" dependency graph in boost. However, as the actual dependencies are not known before the application developer changes source code, compiles and links, and then understand cause of the resulting diagnostics, this is not very helpful for packaging of minimum required sub-sets of boost. I am also afraid the diagnostics for missing headers or object file symbols will not be a very user friendly solution. However if that could be fixed somehow to point directly at the missing package, or even better that a package manager could be more or less automatically invoked to fix it, then this may be a path forward. Such fine-grained dependency tracking could greatly reduce need for sub-libraries.

I agree that such fine-gained tracking can be a cause of confusion and hassles. I normally prefer to think in terms of libraries, not optional features. That does less to problems managing dependencies like Date Time's optional dependency on Serialization, however.

...

Separating larger chunks of code in a sub-library may seem reasonable for several reasons, but to separate single headers into their own sub-library only to get a "pretty" graph may clearly be way off the reasonableness scale. Especially, if it can be reasoned that we don't push internal boost structure problems on the helpless application developer to figure out. It seems reasonable to look for facilitation for something much simpler in these cases.

...

For the lack of a better term for what some are suggesting, I just invented bridging-header as a term which may be a mechanism to help in this situations.

Bridging Headers: A bridging header is a C++ header files that bridges facilities in one module with facilities in another module to provide a new convenience facility to users. The bridging header is part of the include structure in one of the two modules and only depend on a minimal required set of features from the two modules to provide the new convenience facility. A bridging header is marked in a to-be-determined way that allow dependency tracking tools to track the set of bridging headers between any two modules as a separate node (a bridge) in the dependency graph. When a user include a bridging header it add both the bridged modules as dependencies, however it may not be practical to have every bridge tracked by a package manager as a separate package.

That seems like a decent approach. ___ Rob (Sent from my portable computation engine)

Bjørn Roald

9:20 p.m.

On 09/22/2014 11:37 AM, Rob Stewart wrote:

...

On September 21, 2014 11:12:51 AM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...
On 09/21/2014 10:36 AM, Vicente J. Botet Escriba wrote:

...
After the long threads concerning the modularization it seems clear to me that we are in an impasse.

Maybe most of the friction is more of a case of lack of clear communication rather than real disagreements. It could be the goals would be agreed if they where clear to everyone. Some participants in the threads seems to have clear goals in mind for what need to be done first, and just feel need to to proceed, while others are confused about what is going on and why. The latter may need to understand the "why" as in how we get to a end result we want and what that result looks like. The former group may be more concerned with what they "know" has to be done before we get anywhere. They need to convince the skeptics why that is the case. Neither side's statements and arguments are hard to understand if you are willing to try to shift mindset for the sake of understanding. Nevertheless it need to be some level of consensus before this can proceed.

You are very likely correct.

...
So how can consensus be achieved? I think starting with more concrete meaning to terminology used in discussions, proposals and guidelines would be a very helpful. Guessing what people mean with module, sub-module, library, sub-library, repo, sub-repo, package, dependency, etc. is not helpful to understanding each other.

+1

...
Library: A library is a collection of code in Boost that is reviewed and accepted/rejected by boost as community. A library is maintained be individuals that are the library maintainers. The code is managed in a separate git repository that is included as a git submodule in the libs folder of the boost master repository. A library contain the library's main module in subdirectories include, src, test, build, and doc. In addition a library may contain a number of additional directories containing optional modules that depend on the main module, these are called sub-libraries.

You've defined "library" in terms of "module" and "sub-library" which have not yet been defined.

Right, module should most likely be defined first as its definition depend less, if at all on the library definition.

...

What is a "main module"?

For library A, the main module live in libs/A/include libs/A/src etc. Each sub-library contain a module as well, sub.library A/x live in: libs/A/x/include libs/A/y/src etc. all these modules are modules of library A, but the main module is a sort of focus point. It is the boost library's primary features. Sub libraries are there to provide optional utilities that depend on or or create a bridge to other modules, boost or external modules. Sub-libraries could be used for other purposes than modularization, e.g. logical partitioning of a libraries facilities. But if that is useful, it is off-topic, so I leave that.

...

I need to understand that to understand what's included in a library. More on module's definition below.

...
Sub-library: A library may contain related code in sub-libraries that should be treated as separate module to limit dependencies incurred if they are part of the library's main module. The sub-library has its own module structure containing its own include, src, test, build, and doc directories. A sub-library is part of the library and is maintained by the libraries maintainers.

I need to understand "module" to understand "sublibrary" (which needn't be hyphenated, BTW).

OK, - actually I am struggling with the temptation of using submodule rather than sublibrary as term here as it really is more logical to me. Then you get the "main module" v.s. the "submodule(s)" inside a library. But I try to avoid using submodule due to the danger of mixup with the git thing with the same name. One option would simply be to call both the main module and the sublibrary simply for "modules". No main v.s. sub relationship implied. If there are more than one module in the library we require that they live in separate subdirectories or levels in the directory tree.

...

...
Package: Unit of deployment of boost source code and/or pre-build libraries,

I assume you meant "pre-built" rather than "pre-build" here.

yes

...

...
documentation etc. Typically there may be a one-to-one relationship between packages and modules, but it is possible to deploy more than one module in a package or break one module into more than one package.

The current packaging model puts all modules into one package, so it's more than possible, it's the norm.

agreed.

...

...
Repository: A version controlled directory structure containing checked out or modified files in a working directory and a database of the repository history and relationships to other repositories. In a git working directory, the database is in the .git subdirectory or is pointed to by a .git file.

The usual meaning of "repository", at least in my experience is the managed history in a certain control tool, not the files in a workspace.

Well, yes and no... in git what you are referring to is a "bare repository". But it is not important to me. We could call a repository with a working directory for "dressed up" -- just kidding. I just think most developers will think of the working directory when they clone or update their repository, so that is why I put it the way I did. If we include this in a normative definition we should try to be precise. The simplest way is to leave these details out if they do not add anything to the subject at hand.

...

...
Sub-Repository: I suggest we do not use this term mean sub-library. Use the term sub-library or git submodule instead.

If the VCS ever changes again, the tool-specific name of this entity will probably change. It would be better to provide an abstraction. That is, formalize "subrepository" and not that a git submodule is a subrepository.

Good point. But, my take here was that we do not need the term sub-repository, hence I don't really see the need for an abstraction either. If the discussion is about VCS, we have git repository and git submodule. If the discussion is about source code structure and organization we have libraries and modules. As stated above, maybe sublibrary is not needed, we can simply use module.

...

...
Module: A organized set of boost library code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that is used to build one or more corresponding library files.

How is a module distinct from a library?

A library can have more than one module. If it has one it is more or less the same.

...

Both are defined in terms of the directories they contain. Each is defined in terms of the other.

Module take 2: A organized set of boost code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that contain sources used to build static and dynamic library files that the user will link with.

...

...
Sub-module: I suggest we do not use this term to mean sub-library, use sub-library instead. If it is not clearly given by context, use git submodule if we have a git repository tracked using a git submodule in mind (http://git-scm.com/docs/git-submodule).

Until I better understand the difference between "library" and "module", I can't say whether I agree with your conclusion on submodule.

Hopefully some of this is clearer now.

...

...
Dependencies: Handling of dependencies is where I struggle the most with seeing a clear path forward. In particular what determines the nodes and edges in the dependency graphs we care about. And what are we going to use the dependency graph for.

Right

...
Test Example, and Doc Dependencies: First of all, if test, example and doc code is part of the module and incur additional requirements, we certainly do not always want to track those dependencies as the modules dependencies. A separate dependency graph node for test code seems to be a solution if there is a real need to track it at all. Documentation can also clearly be treated separate if need be. However, given this, then the module as defined above is no longer the node in the dependency graph. But that is probably just the beginning.

Test and doc dependencies should certainly be tracked separately, if at all.

...
Lib Dependencies: Modules that are not header only have source files in the src directory that are compiled into one or more library files (ignoring variants directly supported by Boost.Build). Separate dependency graph nodes may be appropriate here to distinguish dependencies at link and compile time. But there are many possible facets of this, so I think the real use-cases for the dependency graph should drive requirements for what the nodes and edges shall model. In addition dependencies may vary on configuration of the target environment. It is not clear if or how such external dependencies should be tracked, however starting with the Jamfile lib dependencies is certainly a good start. It may be most package management systems has what is needed for the rest, so it is a mater of bridging these worlds.

I should think dependencies would be computed at the logical grouping represented by library or module, depending on what those terms actually mean.

Yes I do agree with that, I was just trying to point out some addiitonal potenital aspects. I was not saying we needed to care about them if they are not needed. Module has that role as in modularization.

...

I presume one will choose to build components by such logical entities.

Maybe, but we need to define "component" and what that means if we are going to use it. Actually to me, with regard to boost, component is more or less synonym with module. Maybe components are more about how they are deployed and re-used, and module is more about the separation of the components sources from the sources of other components or modules in the boost source tree. But there are clearly alternative definitions of component. Nevertheless, I am not sure we need both component and module in the boost terminology dictionary, so I opted for module as it has been used more than component in discussions and it sort of fits with modularization.

...

...
Include Dependencies: Dependencies in the include directory may cause compile and link time dependencies for the module user. These dependencies does not incur before a header is included directly or indirectly that require the specific dependency to be met. This could, as some have pointed out, be leveraged to get very flexible and fine-grained "real" dependency graph in boost. However, as the actual dependencies are not known before the application developer changes source code, compiles and links, and then understand cause of the resulting diagnostics, this is not very helpful for packaging of minimum required sub-sets of boost. I am also afraid the diagnostics for missing headers or object file symbols will not be a very user friendly solution. However if that could be fixed somehow to point directly at the missing package, or even better that a package manager could be more or less automatically invoked to fix it, then this may be a path forward. Such fine-grained dependency tracking could greatly reduce need for sub-libraries.

I agree that such fine-gained tracking can be a cause of confusion and hassles. I normally prefer to think in terms of libraries, not optional features. That does less to problems managing dependencies like Date Time's optional dependency on Serialization, however.

...
Separating larger chunks of code in a sub-library may seem reasonable for several reasons, but to separate single headers into their own sub-library only to get a "pretty" graph may clearly be way off the reasonableness scale. Especially, if it can be reasoned that we don't push internal boost structure problems on the helpless application developer to figure out. It seems reasonable to look for facilitation for something much simpler in these cases.

+1

...
For the lack of a better term for what some are suggesting, I just invented bridging-header as a term which may be a mechanism to help in this situations.

Bridging Headers: A bridging header is a C++ header files that bridges facilities in one module with facilities in another module to provide a new convenience facility to users. The bridging header is part of the include structure in one of the two modules and only depend on a minimal required set of features from the two modules to provide the new convenience facility. A bridging header is marked in a to-be-determined way that allow dependency tracking tools to track the set of bridging headers between any two modules as a separate node (a bridge) in the dependency graph. When a user include a bridging header it add both the bridged modules as dependencies, however it may not be practical to have every bridge tracked by a package manager as a separate package.

That seems like a decent approach.

The main challenge may be that it does not fit well with the dependency tracking model used by many package managers. However, as it has been pointed out in the discussions, any reasonable use of a bridging header would be in an environment where the other package would installed, even if not by enforcement of package-manager dependency rules. E.g.: DateTime and Serialization packages would be naturally installed by a user before any attempt to serialize DateTime data types. So it may not be a big deal if there is no DateTimeSerialization package in addition, it seems almost like the extra package would just create friction here as users would have to discover it to be aware of of the bridging facilities. The bridge may just as well be part of the serialization package or the DateTime package without enforcing installation of the other. -- Bjørn

Rob Stewart

23 Sep 23 Sep

9:01 a.m.

On September 22, 2014 5:20:10 PM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...

On 09/22/2014 11:37 AM, Rob Stewart wrote:

...
On September 21, 2014 11:12:51 AM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...
What is a "main module"?

For library A, the main module live in libs/A/include libs/A/src etc.

Each sub-library contain a module as well, sub.library A/x live in: libs/A/x/include libs/A/y/src etc.

all these modules are modules of library A, but the main module is a sort of focus point. It is the boost library's primary features. Sub libraries are there to provide optional utilities that depend on or or create a bridge to other modules, boost or external modules. Sub-libraries could be used for other purposes than modularization, e.g. logical partitioning of a libraries facilities. But if that is useful,

it is off-topic, so I leave that.

Did you notice how you switched from using "module" to using "library" in your discussion? I fail to see the distinction you're trying to make between the two.

...

...
...
Repository: A version controlled directory structure containing checked out or modified files in a working directory and a database of the repository history and relationships to other repositories. In a git working directory, the database is in the .git subdirectory or is pointed to by a .git file.

The usual meaning of "repository", at least in my experience is the managed history in a certain control tool, not the files in a workspace.

That was supposed to be "version control tool", BTW.

...

...
...
Sub-Repository: I suggest we do not use this term mean sub-library. Use the term sub-library or git submodule instead.

If the VCS ever changes again, the tool-specific name of this entity will probably change. It would be better to provide an abstraction. That is, formalize "subrepository" and not that a git submodule is a subrepository.

Good point. But, my take here was that we do not need the term sub-repository, hence I don't really see the need for an abstraction either. If the discussion is about VCS, we have git repository and git submodule. If the discussion is about source code structure and organization we have libraries and modules. As stated above, maybe sublibrary is not needed, we can simply use module.

You say this isn't needed, but then you say if it is, use git terminology, while acknowledging that my suggestion of defining an abstract term is a good one. If this concept is needed, it should not be couched in git terminology, but rather in more generic VCS terms.

...

...
...
Module: A organized set of boost library code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that is used to build one or more corresponding library files.

How is a module distinct from a library?

A library can have more than one module. If it has one it is more or less the same.

...
Both are defined in terms of the directories they contain. Each is defined in terms of the other.

Module take 2: A organized set of boost code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that contain sources used to build static and dynamic library files that the user will link with.

I don't understand how that is distinct from "library".

...

...
I should think dependencies would be computed at the logical grouping represented by library or module, depending on what those terms actually mean.

Yes I do agree with that, I was just trying to point out some addiitonal potenital aspects. I was not saying we needed to care about them if they are not needed. Module has that role as in modularization.

...
I presume one will choose to build components by such logical entities.

Maybe, but we need to define "component" and what that means if we are going to use it. Actually to me, with regard to boost, component is more or less synonym with module.

I was using "component" in its usual sense while avoiding the terms you were trying to define. I wasn't necessarily trying to introduce it into the lexicon you're attempting to create. ___ Rob (Sent from my portable computation engine)

Bjørn Roald

4:54 p.m.

On 09/23/2014 11:01 AM, Rob Stewart wrote:

...

On September 22, 2014 5:20:10 PM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...
On 09/22/2014 11:37 AM, Rob Stewart wrote:

...
On September 21, 2014 11:12:51 AM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...
What is a "main module"?

For library A, the main module live in libs/A/include libs/A/src etc.

Each sub-library contain a module as well, sub.library A/x live in: libs/A/x/include libs/A/y/src etc.

all these modules are modules of library A, but the main module is a sort of focus point. It is the boost library's primary features. Sub libraries are there to provide optional utilities that depend on or or create a bridge to other modules, boost or external modules. Sub-libraries could be used for other purposes than modularization, e.g. logical partitioning of a libraries facilities. But if that is useful,

it is off-topic, so I leave that.

Did you notice how you switched from using "module" to using "library" in your discussion?

Why do you say that, I can not say I switched, at lieat not in that direction. Further down I discuss if sublibraries should simply be called modules.

...

I fail to see the distinction you're trying to make between the two.

OK, first and most important. Library is an established boost term, I do not want to suggesting anything fundamentally about that term. A library contains one or more modules. If more than one module, then the terms sub-module, sub-library have been used about these extra modules in the modularization discussions. I suggested to call them sub-library, and avoid calling them submodules due to probable confusion with git submodule, but I think maybe it is better to simply call them modules, and a Boost library may bave more than one of them to facilitate modularization.

...

...
...
...
Repository: A version controlled directory structure containing checked out or modified files in a working directory and a database of the repository history and relationships to other repositories. In a git working directory, the database is in the .git subdirectory or is pointed to by a .git file.

The usual meaning of "repository", at least in my experience is the managed history in a certain control tool, not the files in a workspace.

That was supposed to be "version control tool", BTW.

...
...
...
Sub-Repository: I suggest we do not use this term mean sub-library. Use the term sub-library or git submodule instead.

If the VCS ever changes again, the tool-specific name of this entity will probably change. It would be better to provide an abstraction. That is, formalize "subrepository" and not that a git submodule is a subrepository.

Good point. But, my take here was that we do not need the term sub-repository, hence I don't really see the need for an abstraction either. If the discussion is about VCS, we have git repository and git submodule.

This is where my discussion og VCS terms ends...

...

If the discussion is about source code structure and

...
organization we have libraries and modules. As stated above, maybe sublibrary is not needed, we can simply use module.

You say this isn't needed, but then you say if it is, use git terminology, while acknowledging that my suggestion of defining an abstract term is a good one. If this concept is needed, it should not be couched in git terminology, but rather in more generic VCS terms.

The concept I have in mind after the discussion above of VCS terms ends does not have anything to do with VCS, that is what I am trying to say. I am suggesting we do not need abstractions for VCS terms.

...

...
...
...
Module: A organized set of boost library code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that is used to build one or more corresponding library files.

How is a module distinct from a library?

A library can have more than one module. If it has one it is more or less the same.

...
Both are defined in terms of the directories they contain. Each is defined in terms of the other.

Module take 2: A organized set of boost code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that contain sources used to build static and dynamic library files that the user will link with.

I don't understand how that is distinct from "library".

A library may contain more than one module, if not modularization will create new boost libraries needing maintainers, reviews, etc.

...

...
...
I should think dependencies would be computed at the logical grouping represented by library or module, depending on what those terms actually mean.

Yes I do agree with that, I was just trying to point out some addiitonal potenital aspects. I was not saying we needed to care about them if they are not needed. Module has that role as in modularization.

...
I presume one will choose to build components by such logical entities.

Maybe, but we need to define "component" and what that means if we are going to use it. Actually to me, with regard to boost, component is more or less synonym with module.

I was using "component" in its usual sense while avoiding the terms you were trying to define. I wasn't necessarily trying to introduce it into the lexicon you're attempting to create.

OK. When I have some time, I will draft an updated proposal. -- Bjørn

Rob Stewart

24 Sep 24 Sep

9:07 a.m.

On September 23, 2014 12:54:28 PM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...

On 09/23/2014 11:01 AM, Rob Stewart wrote:

...
On September 22, 2014 5:20:10 PM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...
On 09/22/2014 11:37 AM, Rob Stewart wrote:

...
On September 21, 2014 11:12:51 AM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote:

...
What is a "main module"?

For library A, the main module live in libs/A/include libs/A/src etc.

Each sub-library contain a module as well, sub.library A/x live in: libs/A/x/include libs/A/y/src etc.

all these modules are modules of library A, but the main module is a sort of focus point. It is the boost library's primary features. Sub libraries are there to provide optional utilities that depend on or or create a bridge to other modules, boost or external modules. Sub-libraries could be used for other purposes than modularization, e.g. logical partitioning of a libraries facilities. But if that is useful,

it is off-topic, so I leave that.

Did you notice how you switched from using "module" to using "library" in your discussion?

Why do you say that, I can not say I switched, at lieat not in that direction. Further down I discuss if sublibraries should simply be called modules.

My impression, when reading your explanation, the first time, is that you began using "library" rather than "module". As I read it now, I see was mistaken.

...

...
I fail to see the distinction you're trying to make between the two.

OK, first and most important. Library is an established boost term, I do not want to suggesting anything fundamentally about that term. A library contains one or more modules. If more than one module, then the terms sub-module, sub-library have been used about these extra modules in the modularization discussions. I suggested to call them sub-library, and avoid calling them submodules due to probable confusion with git submodule, but I think maybe it is better to simply call them modules, and a Boost library may bave more than one of them to facilitate modularization.

I see nothing that makes "module" distinct from "library" in the foregoing.

...

...
...
...
...
Module: A organized set of boost library code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that is used to build one or more corresponding library files.

How is a module distinct from a library?

A library can have more than one module. If it has one it is more or less the same.

You also discuss "sublibraries" in the same way.

...

...
...
...
Both are defined in terms of the directories they contain. Each is defined in terms of the other.

Do you follow my point now?

...

...
...
Module take 2: A organized set of boost code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that contain sources used to build static and dynamic library files that the user will link with.

I don't understand how that is distinct from "library".

A library may contain more than one module, if not modularization will create new boost libraries needing maintainers, reviews, etc.

A library can contain more than one sublibrary. Why do you need "module" and "submodule" when "library" and "sublibrary" would do as well? If you only use "library" and "sublibrary" in every case, the substance of what you've written doesn't change as I see it. If you still think there's something to a library that isn't part of a module, please explain. So far, they appear to be synonyms to me. ___ Rob (Sent from my portable computation engine)

Thijs (M.A.) van den Berg

9:25 a.m.

On Sep 24, 2014, at 11:07 AM, Rob Stewart <robertstewart@comcast.net> wrote:

...

On September 23, 2014 12:54:28 PM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote: If you still think there's something to a library that isn't part of a module, please explain. So far, they appear to be synonyms to me.

Maybe rgis? I can imagine one could have libraries that depends on well-defined *parts* of other libraries (but not all of it), and that those parts can be called modules. One would not call such a part a library -and also not list it on the public website as a boost library- because its functionality to specific / narrow. For developers is should have a clear identity and not break dependent libraries when refactoring.

Rob Stewart

3:11 p.m.

On September 24, 2014 5:25:26 AM EDT, "Thijs (M.A.) van den Berg" <thijs@sitmo.com> wrote:

...

On Sep 24, 2014, at 11:07 AM, Rob Stewart <robertstewart@comcast.net> wrote:

...
On September 23, 2014 12:54:28 PM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote: If you still think there's something to a library that isn't part of a module, please explain. So far, they appear to be synonyms to me.

Maybe rgis?

I don't understand that.

...

I can imagine one could have libraries that depends on well-defined *parts* of other libraries (but not all of it), and that those parts can be called modules. One would not call such a part a library -and also not list it on the public website as a boost library- because its functionality to specific / narrow. For developers is should have a clear identity and not break dependent libraries when refactoring.

Why wouldn't you refer to such things as sublibraries? ___ Rob (Sent from my portable computation engine)

Bjørn Roald

5:31 p.m.

On 09/24/2014 05:11 PM, Rob Stewart wrote:

...

On September 24, 2014 5:25:26 AM EDT, "Thijs (M.A.) van den Berg" <thijs@sitmo.com> wrote:

...
On Sep 24, 2014, at 11:07 AM, Rob Stewart <robertstewart@comcast.net> wrote:

...
On September 23, 2014 12:54:28 PM EDT, "Bjørn Roald" <bjorn@4roald.org> wrote: If you still think there's something to a library that isn't part of a module, please explain. So far, they appear to be synonyms to me.

Maybe rgis?

I don't understand that.

...
I can imagine one could have libraries that depends on well-defined *parts* of other libraries (but not all of it), and that those parts can be called modules.

exactly.

...

...
One would not call such a part a library -and also not list it on the public website as a boost library- because its functionality to specific / narrow. For developers is should have a clear identity and not break dependent libraries when refactoring.

Why wouldn't you refer to such things as sublibraries?

We may, that was my initial suggestion in this thread, then I have tried to discuss if it would be simpler just to call them modules, and define that a boost library contain one or more modules. In some ways I think that is simpler, but I am not sure it is clearer or more natural. One problem with "Boost sub-library" is that it is deemed to trigger questions of whether they have separate maintainers, have been separately peer-reviewed etc. While a boost module is simply a structured part of a boost library. -- Bjørn

Paul A. Bristow

9:22 a.m.

...

-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Bjørn Roald Sent: 23 September 2014 17:54 To: boost@lists.boost.org Subject: Re: [boost] [modularization] What is a module? What is a sub-module?

...

OK, first and most important. Library is an established boost term, I do not want to suggesting anything fundamentally about that term.

I've read your brave attempts at definitions - with general approval. But all the words like library, module, .. that you use are also used in other contexts. So when the context doesn't make them completely obvious to the reader (not just the writer!), it may be necessary to *qualify* them, for example: "Boost library" "Object library" "British Library" ;-) "Git sub-module" "Repo sub-module" "Boost sub-module" "C++ module" ... By trying to avoid qualifiers, I feel you may be making it more difficult for the reader. HTH Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

Bjørn Roald

5:37 p.m.

On 09/24/2014 11:22 AM, Paul A. Bristow wrote:

...

...
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Bjørn Roald Sent: 23 September 2014 17:54 To: boost@lists.boost.org Subject: Re: [boost] [modularization] What is a module? What is a sub-module?

...
OK, first and most important. Library is an established boost term, I do not want to suggesting anything fundamentally about that term.

I've read your brave attempts at definitions - with general approval.

thanks.

...

But all the words like library, module, .. that you use are also used in other contexts.

So when the context doesn't make them completely obvious to the reader (not just the writer!), it may be necessary to *qualify* them, for example:

"Boost library" "Object library" "British Library" ;-)

"Git sub-module" "Repo sub-module" "Boost sub-module" "C++ module"

...

By trying to avoid qualifiers, I feel you may be making it more difficult for the reader.

Totally agreed, however it is inevitable that they will be used without qualifiers, thus I am sort of looking for the level of confusion caused by ambiguities as well. I take your advise though and will qualify when I am not testing the ambiguities on purpose. -- Bjørn

Robert Ramey

23 Sep 23 Sep

6:17 p.m.

Vicente Botet wrote

...

Hi all,

After the long threads concerning the modularization it seems clear to me that we are in an impasse.

I agree with this. It just takes more work. I would like to broaden the discussion a little. I was looking searching for John Lakos' email address in order to ask him to offer his insights into the discussion. I'm not able to find it (of course, thanks spammers) but surely someone here knows how to send him such an invitation. John attended BoostCon and CPPCon as a speaker and has writing the canonical reference on this subject. Growing Boost to 500 libraries would likely be the largest challenge ever to ideas about how to organized large, de-coupled libraries of code. I've looking around a little. and found: https://github.com/bloomberg/bde/wiki/Physical-Code-Organization which is a little helpful. I believe that we're having difficulties because we haven't stepped back enough. To lot's of people it's a simple matter - elimination cycles, simplifying dependency graphs, etc. But I think it's bigger than that. I think it will impact Boost coding standards, our deployment our testing model, deprecation, library overlap, and who knows what else. These kinds of changes are always disruptive and I would like to spend a little extra effort to try to get this closer to being right. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/modularization-What-is-a-module-What-is-a... Sent from the Boost - Dev mailing list archive at Nabble.com.

Niall Douglas

24 Sep 24 Sep

12:53 a.m.

On 23 Sep 2014 at 11:17, Robert Ramey wrote:

...

I was looking searching for John Lakos' email address in order to ask him to offer his insights into the discussion. I'm not able to find it (of course, thanks spammers) but surely someone here knows how to send him such an invitation. John attended BoostCon and CPPCon as a speaker and has writing the canonical reference on this subject. Growing Boost to 500 libraries would likely be the largest challenge ever to ideas about how to organized large, de-coupled libraries of code.

Myself and John are of remarkably similar opinion on most coding philosophies, probably because we both think at large scale, so prepare yourself for that Robert. I think though you'll find he won't give any concrete opinions in writing out of habit, but he will in person, and they are not positive about recent Boost history if I remember correctly. Anyway, try him for yourself at jlakos@bloomberg.net. And tell him I sent you, and that I send him my best regards. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Niall Douglas

12:55 a.m.

On 24 Sep 2014 at 1:53, Niall Douglas wrote:

...

Myself and John are of remarkably similar opinion on most coding philosophies, probably because we both think at large scale, so prepare yourself for that Robert. I think though you'll find he won't give any concrete opinions in writing out of habit, but he will in person, and they are not positive about recent Boost history if I remember correctly.

Ah crappity .... this email was meant for Robert off-list. My apologies for the noise. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

3986

Age (days ago)

3989

Last active (days ago)

List overview

Download

16 comments

8 participants

participants (8)

Andrey Semashev
Bjørn Roald
Niall Douglas
Paul A. Bristow
Rob Stewart
Robert Ramey
Thijs (M.A.) van den Berg
Vicente J. Botet Escriba