Request for comments on super-project workflow doc

newer
[regex] diagnosis and fix for #9544

Beman Dawes

2 Jan 2014 2 Jan '14

4:19 p.m.

The release managers with help from Peter Dimov have been developing super-project workflow documentation. See https://svn.boost.org/trac/boost/wiki/SuperProjectWorkflow Comments welcome! Thanks, --Beman

Show replies by date

Peter A. Bigot

2 Jan 2 Jan

6:16 p.m.

On 01/02/2014 10:19 AM, Beman Dawes wrote:

...

The release managers with help from Peter Dimov have been developing super-project workflow documentation.

See https://svn.boost.org/trac/boost/wiki/SuperProjectWorkflow

Comments welcome!

A couple comments from an outsider who's done a lot of git and release process management, but is not deeply familiar with Boost history: For branch semantics: https://svn.boost.org/trac/boost/wiki/SuperProjectWorkflow#Namingrationale has: * Branch "latest": automatically tracks "develop" of submodules (scripted). I don't see a need for a branch with this tracking policy. Boost imposes no stability requirements on submodule develop branches, let alone interoperability requirements. The timeline for module development may not be targeting the next super-project release (e.g., Boost.Test has four years of changes in develop that have never seen a release). Further, any remaining material in the super-project that depends on submodule content (e.g. testing) may break badly, making the branch unusable. Does anybody have a use case for this? * Branch "develop": automatically tracks "master" of submodules (scripted). git-flow expects develop to be used to branch releases, but an automatic update policy makes it difficult to first reach a stable point with a curated set of submodule releases. It also makes it hard to submit patches to any non-module content in the superproject (and I believe there will always be some such, if nothing more than metadata and generic documentation). To address these issues I propose the following changes: * Branch "latest": Merges from super-project "develop" and tracks "master" of submodules (scripted). By this I mean the script does a merge from develop to pick-up non-submodule changes, followed by blind updating all submodules to their latest release (even if the merge from develop restored an older commit). While there will still be a risk of module interoperability problems, this branch should generally work. Its this material (and not submodule develop branches) that is proposed (but not yet accepted) to be part of the next release, so it's where automated integration testing should be done. * Branch "develop": Maintained by release managers, who merge periodically from "latest" at commits that pass integration testing, and from feature branches that change non-submodule content. This branch should be expected to be nearly-releasable at all times, since it references submodule releases that are known to interoperate and pass tests. It's the best choice for module maintainers to branch from so they can detect interoperability problems with dependent modules before updating their own master branches. For release preparation: https://svn.boost.org/trac/boost/wiki/SuperProjectWorkflow#Releasepreparatio... specifies that a release branches from master, which is not how git-flow works: releases branch from develop, and merge to master. This becomes possible with the develop semantics I'm proposing, where develop is curated rather than scripted. Also the comment about "branching for release early" seems inconsistent with git-flow's description that release branches are for last-minute cleanup and metadata updates. I've found that, if you branch too early, you waste time shuttling patches between the release branch and the develop branch. Branching when you're pretty sure you have a releasable system reduces rework. Peter

Peter Dimov

8:58 p.m.

Peter A. Bigot wrote:

...

* Branch "latest": automatically tracks "develop" of submodules (scripted). ... Does anybody have a use case for this?

Yes. This branch is the equivalent of the old SVN trunk. It enables testing of the develop branches of the submodules. The typical Boost developer workflow is: - make changes - run local tests - when those pass, commit to develop (SVN trunk) - wait a few days for the develop (SVN trunk) testers to cycle - when those pass, merge to master (SVN release) In addition, testing a composition of the develop branches of the submodules enables people to detect inadvertent breaking changes in a dependent submodule and complain loudly. This ideally happens before the merge to master. (Or, alternatively, if the changes were in fact deliberate, it enables them to fix their code ahead of the merge to master.)

...

Boost imposes no stability requirements on submodule develop branches, let alone interoperability requirements.

That's not - quite - true. The develop branches are expected to work, even though this is not strictly enforced. We mostly rely on the honor system. :-)

Peter A. Bigot

9:30 p.m.

On 01/02/2014 02:58 PM, Peter Dimov wrote:

...

Peter A. Bigot wrote:

...
* Branch "latest": automatically tracks "develop" of submodules (scripted). ... Does anybody have a use case for this?

Yes.

This branch is the equivalent of the old SVN trunk. It enables testing of the develop branches of the submodules. The typical Boost developer workflow is:

- make changes - run local tests - when those pass, commit to develop (SVN trunk) - wait a few days for the develop (SVN trunk) testers to cycle - when those pass, merge to master (SVN release)

In addition, testing a composition of the develop branches of the submodules enables people to detect inadvertent breaking changes in a dependent submodule and complain loudly. This ideally happens before the merge to master. (Or, alternatively, if the changes were in fact deliberate, it enables them to fix their code ahead of the merge to master.)

Ack. FWIW, that's not how I see submodule-based git projects in the general case. If most Boost modules are independent (are they?), this process appears to introduce unnecessary coupling, rather like testing a bunch of unrelated feature branches together instead of validating and merging them one at a time as their individual developers feel they're ready. I suspect that separate curated develop and auto-updated latest/master (latest/develop) would be more robust. E.g. that approach reduces the risk of refactoring the assert library: neither module would appear in the stable branch (develop, in my proposal) until a release manager confirms they're compatible, preventing the churn from impacting people who don't care where the capability is placed. But as long as everybody who maintains a Boost module is clear on the expectations and process, the assumptions of non-Boost people aren't particularly relevant.

...

...
Boost imposes no stability requirements on submodule develop branches, let alone interoperability requirements.

That's not - quite - true. The develop branches are expected to work, even though this is not strictly enforced. We mostly rely on the honor system. :-)

OK. Somebody should update https://svn.boost.org/trac/boost/wiki/StartModWorkflow?version=9#Branchnames then, since no stability expectation is mentioned. In its current form, I would expect to feel free to initiate a long-term refactoring on my develop branch with no plans to push to master or maintain compatibility with other modules for several super-project release cycles, a process more consistent with my expectations of what the module master and develop branches mean under git-flow. Peter

Nat Goodspeed

9:53 p.m.

On Thu, Jan 2, 2014 at 4:30 PM, Peter A. Bigot <pab@pabigot.com> wrote:

...

that's not how I see submodule-based git projects in the general case. If most Boost modules are independent (are they?), this process appears to introduce unnecessary coupling, rather like testing a bunch of unrelated feature branches together instead of validating and merging them one at a time as their individual developers feel they're ready.

Many Boost libraries are built on other Boost libraries. I believe some of the referenced libraries are API-versioned. It would be unfortunate if Boost produced a release that doesn't work out of the box because library A has become incompatible with consuming library B.

Peter Dimov

10:19 p.m.

Peter A. Bigot wrote:

...

Ack. FWIW, that's not how I see submodule-based git projects in the general case. If most Boost modules are independent (are they?),

No. I'd say that the probability for two randomly selected submodules to be independent is not above 50% (although I haven't actually done the math). Boost libraries tend to depend heavily on other Boost libraries.

...

this process appears to introduce unnecessary coupling,

I don't see why. If the libraries were indeed independent, testing them together would not couple them to one another.

...

rather like testing a bunch of unrelated feature branches together instead of validating and merging them one at a time as their individual developers feel they're ready.

The suggested arrangement is partly shaped by what we had before - it's a gradual transition. And what we had was that testers checked out an SVN branch and ran the tests on that. (All of Boost was in one SVN repository.) So now, under the new arrangement, the testers would just check out the "latest" branch of the superproject - which would check out the "develop" branches of the submodules - and this'd be exactly equivalent to the old "trunk" test run. I agree that if you start from scratch, you might be able to come up with something better. But I'm not sure that it will be that different. You have to test the "develop" branch of a submodule; submodules generally depend on the rest of Boost; hence, you need to check out the rest of Boost somehow to create the environment in which to test the submodule. So the two obvious options are either to test submodule:develop against world:master, or to test submodule:develop against world:develop. The latter has the advantages that (a) it is what we've done for years and (b) it's easy to achieve without changes to the test scripts, by using a "latest" branch.

Peter A. Bigot

4 Jan 4 Jan

8:14 p.m.

On 01/02/2014 04:19 PM, Peter Dimov wrote:

...

Peter A. Bigot wrote:

...
Ack. FWIW, that's not how I see submodule-based git projects in the general case. If most Boost modules are independent (are they?),

No. I'd say that the probability for two randomly selected submodules to be independent is not above 50% (although I haven't actually done the math). Boost libraries tend to depend heavily on other Boost libraries.

After some analysis I agree: Boost libraries are much more highly coupled than I had guessed. The stats for the number of modules a given module depends on are: 114 samples from 0 to 46 ; sdev 9.18514 amean 13.7105 ; gmean 11.6043 ; hmean 8.13638 median 13 ; mode 6 occurs 7 times and the stats for the number of modules a given module supports are: 114 samples from 0 to 102 ; sdev 20.5803 amean 13.1579 ; gmean 6.82233 ; hmean 3.13355 median 5.5 ; mode 0 occurs 20 times For example, lib/graph uses 46 other modules, lib/config is used by 102 other modules, and there are 20 modules that are unused by any Boost library (including all seven tools/* modules). I don't want to waste anybody's time (including my own) pro-actively documenting the exact details of what I did, though I may clean up the tool and submit it for incorporation into some existing module if there's strong interest. (It uses Boost.Filesystem and Boost.Regex, and would use Boost.Graph if completed. It may be that the information could be extracted using Boost.Build, too; I didn't try to figure that out.) Peter

Peter Dimov

9:19 p.m.

...

I don't want to waste anybody's time (including my own) pro-actively documenting the exact details of what I did, though I may clean up the tool and submit it for incorporation into some existing module if

Peter A. Bigot wrote: there's strong interest. I think that a tool that - at least - outputs the dependencies of a given submodule will be highly useful. If we are serious about modularization and cutting dependencies, we'll certainly need something similar. It's possible that the existing "bcp" tool already can do this though; I'm not sure.

Peter Dimov

2 Jan 2 Jan

10:47 p.m.

Beman Dawes wrote:

...

See https://svn.boost.org/trac/boost/wiki/SuperProjectWorkflow Comments welcome!

"Changes from original proposal" says "Releases branched from "master", per Git Flow." But that's not how Git Flow works. Release branches are created from "develop" and merged to "master". Only hotfix branches are created from "master". Reference: http://nvie.com/posts/a-successful-git-branching-model/ "Release branches May branch off from: develop Must merge back into: develop and master Branch naming convention: release-*" "Hotfix branches May branch off from: master Must merge back into: develop and master Branch naming convention: hotfix-*"

Cox, Michael

3 Jan 3 Jan

8:25 a.m.

On Thu, Jan 2, 2014 at 3:47 PM, Peter Dimov <lists@pdimov.com> wrote:

...

Beman Dawes wrote:

See https://svn.boost.org/trac/boost/wiki/SuperProjectWorkflow

...
Comments welcome!

"Changes from original proposal" says "Releases branched from "master", per Git Flow."

But that's not how Git Flow works. Release branches are created from "develop" and merged to "master". Only hotfix branches are created from "master".

Reference: http://nvie.com/posts/a-successful-git-branching-model/

"Release branches

May branch off from: develop Must merge back into: develop and master Branch naming convention: release-*"

"Hotfix branches

May branch off from: master Must merge back into: develop and master Branch naming convention: hotfix-*"

I agree. The workflow maybe a valid scheme, but it definitely isn't Git Flow. So either incorporate the above comments into the page or remove the reference to Git Flow. But I don't think it is a valid scheme without a develop branch in the superproject. As long as their are files in the superproject and not just submodule references, you're going to need a develop branch to isolate superproject development changes from released code. I'd drop the caution about the develop branch not being provided. Regarding the naming conventions of the release and hotfix branches, I don't care which one you use, but be consistent, i.e. either: release/n.n.n hotfix/* or release-n.n.n hotfix-* Regarding the "latest" branch. Technically you really don't need it as you can checkout the develop branch and use a "git submodule foreach" to checkout the HEADs of the develop branches in the scripts. Also, any changes you make to the superproject would have to be first made in the "latest" branch and then merged into the develop branch (just more work and something to forget as most work will center around the develop branches). If you decide the "latest" branch is desirable, please change the name to something like "bleeding_edge" or "integration" (as mentioned on the web page). Folks not that familiar with git and coming from other SCM systems, e.g. Clearcase may thing that "latest" contains the latest released code. Regarding the release/n.n.n branch naming convention and tags. Normally in a X.Y.Z numbering scheme, the X.Y represents the branch and the Z is used to name the tag, e.g. tag boost-X.Y.Z on branch boost-X.Y (or in this case release/n.n). The web page mentions nothing about how/when tags are used and the naming convention for them. Also, as part of the tagging workflow I'd like to see the boost release script tag the submodules' master branches with a tag indicating the boost release, e.g. boost-X.Y.Z or whatever naming convention is picked. That way it's easy to identify what library specific release corresponds to the boost release just by looking in the submodule. Also, it makes it easier for the libraries not adopting independent releases (e.g. CMT orphaned libraries?) not to have to worry about tags. With the X.Y.Z proposed naming convention, what is the criteria for bumping each level? Something like: - Bumping the X implies a release with changes breaking backward compatibility with release X-1.Y.Z. - Bumping the Y implies a release that's backward compatible and new features. - Bumping the Z implies a release that's backward compatible with no new features (or very, very small ones). Whatever it is that should be documented somewhere. Also, will the libraries having independent releases follow the same naming convention/criteria. Having multiple versioning schemes and/or criteria for bumping each level would be confusing for the boost users. Are the release branches long-running branches or as in the Git Flow model, will be removed have the release branch is tagged and merged to master? Regarding the "Open Question". It all depends on how many previous releases Boost will support. If only the current release, then you could get away with just release. I would assume, though, that at a minimum, Boost should support the current and previous release. In that case you can do something like the Linux kernel does, have a maintenance branch. Anything more that that probably would require using the scheme you've come up with. BTW, the scheme Boost has maps fairly well to the Linux scheme (so Boost's scheme can't be too far off): maint (updates for the previous release) master=master (current release) develop=next (next release) latest=pu (proposed updates) The "Naming Rationale" link points to Boost Trac main page. Isn't the Naming Rationale what this page is about? Michael

Daniel James

12:23 p.m.

On 3 January 2014 08:25, Cox, Michael <mhcox@bluezoosoftware.com> wrote:

...

I agree. The workflow maybe a valid scheme, but it definitely isn't Git Flow. So either incorporate the above comments into the page or remove the reference to Git Flow.

Beman asked for feedback; you could try giving him a chance to incorporate it.

...

Regarding the "latest" branch. Technically you really don't need it as you can checkout the develop branch and use a "git submodule foreach" to checkout the HEADs of the develop branches in the scripts.

Then we'd have to identify each test run by over 100 hash values. It'd also complicate the regression testing script, the amount of time it's taking to get testing running illustrates why that's a bad idea. And it would limit our flexibility in the future - it's easier to make changes to how 'latest' is generated than it is to make changes to the regression script.

...

Also, any changes you make to the superproject would have to be first made in the "latest" branch and then merged into the develop branch (just more work and something to forget as most work will center around the develop branches).

As I've said elsewhere, as much content as possible should be removed from the super-project in order to reduce this problem and others. Super-project content is problematic regardless of the branching scheme. What's left should be easy to update (e.g. index.html in the root directory), or rarely updated. Development should usually be based on a single branch, whether that's latest or develop needs to be decided. Then transferring changes should be pretty simple, especially since few people have write access to the super-project. My preference is to base content work on 'develop' and copy content from develop automatically to 'latest', so that 'latest' is almost completely automatic. Could perhaps use a custom merge from develop to latest to make the history clearer (and things like 'git annotate' would work better), but never in the opposite direction.

...

latest=pu (proposed updates)

I'd rather avoid such an unfortunate acronym. Or is it a deliberate comment on quality?

Peter A. Bigot

12:43 p.m.

On 01/03/2014 06:23 AM, Daniel James wrote:

...

On 3 January 2014 08:25, Cox, Michael <mhcox@bluezoosoftware.com> wrote:

...
latest=pu (proposed updates) I'd rather avoid such an unfortunate acronym. Or is it a deliberate comment on quality?

It can be, but I don't think he was proposing using those names, just stating that Boost is not diverging wildly from the traditional git work-flow: * master: validated stable; evolving release branch; permanent * next: accepted for stable; never rebased but might still require fixes before being released * pu: appears to have value, so let's try it in place, but if it doesn't work it gets yanked through a rebase Some of us still follow this much simpler model for repositories where it's appropriate. It served Linux for a long time, and is still the flow used for git itself. It is not appropriate for Boost, though it might be appropriate for submodules if next is renamed develop. Peter

4206

Age (days ago)

4208

Last active (days ago)

List overview

Download

11 comments

6 participants

participants (6)

Beman Dawes
Cox, Michael
Daniel James
Nat Goodspeed
Peter A. Bigot
Peter Dimov