Re: [boost] [OT] Open Source Forking and Boost (was Re: [SQL-Connectivity] Is Boost interested in CppDB?)

18 Dec 2010

      WARNING: This is a long post. The tl;dr version is: 1) follow the
Linux Kernel development model 2) Employ the web of trust system using
GPG keys 3) we need a Boost Foundation like the Linux Foundation 4) we
need to lower the barrier to entry for would-be contributors 5) use
Git/Mercurial for distributed development.

On Fri, Dec 17, 2010 at 11:23 PM, Jim Bell <Jim@jc-bell.com> wrote:
...
On 1:59 PM, John Maddock wrote:
...
...
Interesting. So if I wanted to get SVN access and start working on
things in a private branch [...]
It depends - if the library has an active maintainer, then yes, you
just ask the maintainer or file a ticket [...]
I accept that there may be an issue with there not being enough folks
for the above to work all that well though.  The basic problem is that
to maintain quality we've generally required all library maintainers
to already have one accepted Boost library, so I guess the question
we're struggling with is how to broaden the field without risking
screwing things up too badly :-0
The crux of Boost.Guild's debate. And so many topics touch on this.
So how would you measure, or design a test, to determine how badly
things would get screwed up under various scenarios?
I think we need to look outside the box for a solution to this. Let me
cite an example of a way to broaden the contributor pool without
bogging the release process or the development cycles of developers
down.

The most successful example of a large open source project that has
tons of contributors and has an active community is the Linux Kernel
project. There you have Linus, the BDFL of the project, choosing to
trust maintainers to make choices regarding the maintenance and
improvement of different subsystems. Anybody -- as in absolutely
*anybody* -- is encouraged to clone the repository, make their
changes, and submit these changes to the maintainer(s) actively
maintaining that part of the kernel. You will find that there are
different kernels released by different maintainers, but the "de
facto" kernel is the one that Linus releases.

Note that Linus doesn't check each and every line of code that comes
into the kernel. What happens is he trusts a number of maintainers to
do that for the subsystems that they're responsible for. This "inner
circle" is a smallish group, around 10ish, who then delegate their
overall responsibility across a wider number of subsystem maintainers.
For your code to get to the "main line" kernel, you'd typically have
to submit it to the maintainer of the module you're patching, who then
shepherds it in by signing it and merging it into this repository and
then asks the maintainer of the subsystem that his module is part of
to pull from him, and then later on these maintainers ask Linus to
pull from their repositories when it comes time to stabilize and go
through the release process.

This sounds like a slow process, but because of the decentralized
nature of the development of the kernel you have people who have
different timelines and pace working on different parts of the kernel
without any one thing bogging the work down. The release process does
a code/feature freeze but that means the higher up maintainers focus
on stabilizing the code and then doing a release -- you can keep
working on your repository and changing whatever you want and then
when you feel your work is worth pulling in, you ask someone else to
pull from you. This model allows for faster innovation, greater
involvement, and lower barrier to entry.

Now, that doesn't remove the maintainer dilemma -- but the beauty of
that system is, even when a maintainer of a subsystem suddenly goes
MIA, the community can decide to just pull from a different person's
repository. Then, being a maintainer of a subsystem becomes no longer
a choice of the original maintainers, but mostly the contributors. Let
me try and explain a little bit more.

If I'm a developer A, and there's a maintainer B who's supposed to be
handling the module X, all I do is clone B's repository locally, make
my changes, and then ask maintainer B to pull my changes. I can send
him the changes via email, I can post the changes publicly (and sign
it with my GPG key), or I can expose my repository publicly so that
anybody (not just B) can get it. That should be simple enough to
understand.

What happens when B goes MIA or unresponsive? That's simple, I ask
someone else -- maybe Linus, or maybe some other higher-level
maintainer, or just someone the community already trusts -- to pull my
changes in. Losing maintainer B is not a hindrance anymore because the
community can start pulling from each other and stamping their trust
and confidence on the code. Later on the community just elects by way
of pull requests who it trusts to be a maintainer of a subsystem.

This sounds like some pie-in-the-sky dream, but this is the reality
already with the Linux kernel development. It is the single project I
know that spans the globe with thousands of contributors. This model
is already proven to scale.

How does the trust system work? The Linux development team uses GPG
heavily -- your key needs to have been signed by others already, and
the people that signed your key must be trust worthy (meaning their
key has already been signed as well by other trust-worthy people,
etc.). So for your key to be signed by Linus Torvalds means something
-- this means he trusts that you are someone that he will vouch for in
terms of your credibility or "realness". That web of trust keeps
people honest because if you start crewing up or doing something bad
by community standards, people can revoke their signature on your key
and that's like a no vote in parliament.

There are a lot of lessons in the Linux kernel development process
that Boost can certainly learn from. One of them is to decentralize
the development and just maintain a "canonical" or "official" release
of the library. Then having people maintaining either ports of the
library to different architectures, or having people concentrating on
warnings removal, and employing the web of trust system should ensure
the sustainability and scalability of the development process.
Essentially, by encouraging people to fork and innovate, then later on
have their fork folded into the main line is a good and scalable way
of developing systems in a progressive manner. Then the release
process would just be a matter of the BDFL or the community-trusted
people to pull from the maintainers and stabilize to get a suitable
release out.

What the Linux kernel project has that Boost doesn't (yet) is a Linux
Foundation which actually funds the development effort of the kernel.
The Linux Foundation ensures that people who want to do the kernel
development full-time (like Linus and others like him) get compensated
to do the shepherding and the innovation -- of course there's a
process for qualifying for Linux Foundation funding. Note that this is
different from the Apache Foundation which has a business-oriented and
parliamentary involvement process (which at one time I thought would
have been a good model for Boost, but have changed thoughts about
since a few conversations I've had at BoostCon 2010).

A Boost foundation that has stakeholders funding it to ensure that
Boost keeps going as a project and compensate those people who would
do this full-time but otherwise can't because of their employment (or
lack thereof) would be a Good Thing (TM) IMHO.
...
...
Thinking out loud here... one option might be for someone to say "I'm
going to try and give library X a decent update" and solicit the
mailing list for bug/feature requests, then carry out the work in the
sandbox and ask for a mini-review - at which point you're sort of
lumbered with being the new maintainer ;-)
If someone is that motivated. But could something useful happen if ten
people, each 1/10th as motivated, were to apply themselves?
I think the having to say it to the mailing list part and asking for
permission is the wrong way of going about it. I think, if someone
wants to do it, they should be able to do it -- and if their code is
good and the community (or the people that the community trusts) vote
with their pulls, then it gets folded in appropriately. For the matter
of having 10 people work on it, I don't think it will change the
situation.

If we use the current system of the maintainers being the BDFL's for
the projects they "maintain" and not allowing anybody else to take the
library in a different direction and letting the community have a
choice on the matter, is I think, not scalable. I would much rather
have 10 implementations of a Bloom filter, let the people choose which
one is better, and then have that implementation pulled into the main
release. The same goes for all the libraries already in the
repository.

Just to note, what I'm driving at here is the need to lower the
barrier to entry into Boost while having a means of ensuring quality
for the "official"/"canonical" Boost release. Right now there are
already a handful of release managers who I think don't do the release
management on a full-time basis, but could manage just pulling changes
from projects that have different development pace than the release
process. This requires of course that the libraries be broken up into
individual pieces and that the release process would be a
stabilization effort rather than an active development effort. The
issue of dependency management I think is over-blown with hypothetical
situations -- the Linux kernel is one monolithic kernel and the
lower-level subsystem details still get changed every so often (things
that almost all the other parts depend on -- scheduler, memory
management APIs, etc.) and they never had to complicate the matter of
dependencies among the parts.

Of course, it's needless to say that Boost ought to really go either
Git or Mercurial to make doing this kind of distributed development
trivial. ;)

Have a good one guys and I hope this helps.

-- 
Dean Michael Berris
about.me/deanberris