(Was: Boost Development Environment proposal)

2007/6/4, troy d. straszheim <troy@resophonic.com>:
No, commits don't pass into externals. the working copies created via the externals definition support are still disconnected from the primary working copy (on whose versioned directories the svn:externals property was actually set). And Subversion still only truly operates on non-disjoint working copies. So, for example, if you want to commit changes that you've made in one or more of those external working copies, you must run svn commit explicitly on those working copies—committing on the primary working copy will not recurse into any external ones. /$

On Tue, Jun 05, 2007 at 09:31:02PM +0200, Henrik Sundberg wrote:
I've read the manual, thanks. I didn't say exactly *how* I can make one commit across multiple projects. With something like emacs svn-mode you diff/tag with a few keystrokes and you make one commit across multiple projects. Or you can simply commit like this: svn commit * assuming everything that '*' matches is under version control, which it is when you do out-of-source builds, if not you can svn commit project1 project2 project3 There are myriad convenient ways to work with a checkout full of externals. -t

2007/6/5, troy d. straszheim <troy@resophonic.com>:
Thanks for the info! I'm new to svn. And I'm trying to build a good structure for a family of systems built on ~200 components with 1-20 subcomponents per component. Our current (pre-svn) method handles releases on subcomponent level. Integration teams selects the subcomponents to build a system from, builds it, and releases the resulting binaries/configurations. I'm trying to understand what the best svn-structure would be. Boost, and KDE, seem to have the same problems as I. I don't understand KDE (I did read their tutorial) enough to know how it can be compiled without a clear (to me) structure of subreleases. I'm trying to understand when to use externals among other things. They seemed to be useful for making higher order releases based on lower ones. But I got discouraged. I didn't intend to be rude. Sorry. /$

On Tue, Jun 05, 2007 at 10:43:48PM +0200, Henrik Sundberg wrote:
Thanks for your interest.... Since this (is/will become) part of a proposal for boost I guess this is on-topic. I can give you a tour/brain-dump of how we do it. Warning: this is an unedited rambling sprint through a lot of information. The jargon we use is 'projects' and 'meta-projects'. Meta-projects are just collections of projects. I assume the mapping would be meta-project => component, project => subcomponent. Our repository (recently outfitted with the slick new Trac browser) is here: http://code.icecube.wisc.edu/projects/icecube/browser you'll see projects/ and meta-projects/. Under projects/ there are a couple hundred projects, many of them have fallen out of use, or are maintained by people that haven't yet (or never intend to) submitted them for review. Each project directory (say, project icetray) looks like this: branches/ somebranch/ otherbranch/ release/ V01-00-00/ V01-00-01/ trunk/ And each project is organized in a certain fashion: http://code.icecube.wisc.edu/projects/icecube/browser/projects/icetray/trunk The toplevel directory contains a makefile, some cruft, and a public dir (headers), src dir (src and tests), and resources (misc). We have essentially three main distributions: offline-software (core stuff), icerec (core + some algorithms), simulation (core + other algorithms). offline-software / \ icerec simulation One of the fundamental operations of our enterprise is to have the simulation group produce data that is used by the icerec group to test the performance of the components inside icerec. Looking at a release of the offline-software meta-project: http://code.icecube.wisc.edu/projects/icecube/browser/meta-projects/offline-... you can see that the metaproject contains a little bit of boilerplate, some cruft (that 'mutineeer' thing), and a list of externals, like ithon http://code.icecube.wisc.edu/svn/projects/ithon/releases/B01-10-00 which means that when you check out the metaproject, directory ithon/ will be populated with what's on the other end of that URL. On a regular basis, releases of offline-software come out (this metaproject ought to be called 'core' or something), and the simulation and reconstruction groups merge, at their convenicence, this released code into their metaprojects. For instance this simulation metaproject: http://code.icecube.wisc.edu/projects/icecube/browser/meta-projects/simulati... contains the same externals as offline-software version V01-07-05, as well as the 'trunks' of a bunch of simulation-specific projects that may or may not be dependent on offline-software projects. Similarly for icerec: http://code.icecube.wisc.edu/projects/icecube/browser/meta-projects/icerec/t... As simulation stabilizes, they will tag up their various projects and when all of their components are 'stable' URLs, they copy off a release: http://code.icecube.wisc.edu/projects/icecube/browser/meta-projects/simulati... Once nice thing about this is that it is easy assemble other meta-projects. For instance, I put together a visualization tool that is dependent only on a small subset of offline-software (it only needs to be able to read data, then it makes pictures) and therefore only needs to be rereleased when serialization methods change, or there are new gui features. To achieve this you simply copy off the offline-software metaproject and tweak the externals: http://code.icecube.wisc.edu/projects/icecube/browser/meta-projects/glshovel... and people can check this out and build it without having to build the entire world. You'll notice that in the meta-projects directory: http://code.icecube.wisc.edu/projects/icecube/browser/meta-projects there are lots of such compliations of code. I believe about 2/3rds are in active use. Another thing that happens very frequently in our world is that somebody has to do a specific analysis, which involves writing a lot of customized code and usually hacking everything to bits. Once you get started making graphs you do *not* want to have your code broken by somebody's release, and you do *not* want somebody to reject your changes pending code review. So they simply copy off a metaproject and make it their PhD. Here's one: http://code.icecube.wisc.edu/projects/icecube/browser/meta-projects/string-2... You'll notice that: * Meta-projects don't nest. When the simulation group merges in changes from a new release of offline-software, they have to change all of the urls. This is dictated by the way svn:externals work. It seems like it isn't ideal, but having the dependencies of each metaproject laid out flat can be an advantage. * Branching multiple projects can be tedious. If you allow your individual projects to become coupled, you will need to branch many at once, and this you have to do by hand (svn copy, change external. svn copy, change external...) This can be tedious. Nobody has taken the time to develop tools to automate this, but SVN has good python bindings, it would probably be just an afternoon's work. Let me stop now. Is that clear? ;) -t

Thanks Troy! (I'm top posting and keeping the original mail as reference below, is this OK on this list?) I've got some questions after reading the mail, and following the links. Trac: Your http://code.icecube.wisc.edu/projects/icecube/roadmap (I registered to see it) only contains 3 milestones. I would expect every project to have its own milestones, but then the roadmap page would be cluttered. Should a few selected, big or unstable, Boost libraries get their own milestones? Or should there be just one milestone per complete Boost release? Release numbering: I noticed release numbers like V00-01-02. Is the format mandated by svn's/other tools' inability to sort lists numerically instead of alphabetically? Is this the format Boost versions ought to have? Tags/Releases: Are all releases releases, or would the more anonymous name tags be better?
Trunk-level: Do you have a private meta-project referencing the trunks of all projects you (as a developer) are working on? Is it in such directories you can use "svn commit *" to commit over externals boundaries? Could private meta-projects be avoided by having the trunk/releases/branches at the svn root level instead of at the root of each project? Or would this force you to check out all or just one project to a local directory? Sandbox: Is the structure inside the sandbox completely free? I.e. anyone can have internal releases there, so checking out the complete sandbox would cost a lot disk space? Should at least the top directories in the sandbox follow some naming conventions? Externals: There are several threads hinting at problems with svn and externals. Have you seen more problems than tedious branching, the need for Internals and non intuitive commit (and perhaps meta-meta-problems)? Are you satisfied with the use of externals overall? It's very encouraging to see that a proposed structure is actually tested before. /$ 2007/6/8, troy d straszheim <troy@resophonic.com>:

on Sat Jun 09 2007, "Henrik Sundberg" <storangen-AT-gmail.com> wrote:
Generally frowned upon (http://www.boost.org/more/discussion_policy.htm#effective) but, I suppose, not strictly prohibited. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Sat, Jun 09, 2007 at 11:04:33PM +0200, Henrik Sundberg wrote:
No, that's legacy stuff that I can't get rid of for political reasons. I don't have a proposal for what format version names in boost libraries should have.
Tags/Releases: Are all releases releases, or would the more anonymous name tags be better?
Eh, I don't think it is terribly important. It is easy to change.
That metaprojects don't nest is due to how svn works. There is talk of svn:internals, which sounds good, but iiud it isn't going to happen right away. So for now metaprojects dont nest.
Branching multiple projects can be tedious: You seem to lack a tool here.
It would be easy enough to write. This isn't related to hook scripts, it is all client-side. SVN has good python bindings, you could do it all in python.
Each of the main metaprojects have a trunk. That is where we work, primarily. Simultaneously there may be arbitrary numbers of other private metprojects floating around. I have a couple right now.
This 'inverted' structure has several disadvantages that were discussed. You wouldn't want to avoid 'private' meta-projects. They are one of the main features of the system.
You would never have a reason to check out the whole sandbox. This is a physics experiment. The sandbox is freeform. There are probably a lot of metaprojects floating around in there in private areas.
Should at least the top directories in the sandbox follow some naming conventions?
It could. Usually somebody sends out "Hey check out my such-and-such project" to the mailing list with a url in the sandbox. So you're never trolling through there randomly looking for stuff. The boost sandbox probably should have more structure.
Externals: There are several threads hinting at problems with svn and externals.
I haven't read any that said anything I didn't know already. One was about having an external embedded deep in your source, and when people branch their code, they forget to branch the external. This doesn't happen when externals are used as a primary code organization techinque and everybody is aware of them. The other was about http vs. https in externals, and this is trivial to fix.
Have you seen more problems than tedious branching, the need for
Branching is tedious when different components become coupled and changes have to span multiple components. Otherwise it isn't really.
Internals and non intuitive commit (and perhaps meta-meta-problems)? Are you satisfied with the use of externals overall?
Very. I wouldn't propose it otherwise. -t

troy d straszheim wrote:
On Sat, Jun 09, 2007 at 11:04:33PM +0200, Henrik Sundberg wrote:
Could you point out what those disadvantages are? AFAIK a top-level, what you call inverted, is equivalent to a bottom-up arrangement.
I do that all the time. And I keep arguing that the common use case for the sandbox, and Boost in general, is precisely the inverse of what you think it is.
The problem with anything that developers have to be "aware of" is that they will forget. Anything that can't be enforced will break at some point. So we really want to avoid the forgotten work aspect that keeps hitting us during releases. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On Sun, Jun 17, 2007 at 10:01:07AM -0500, Rene Rivera wrote:
The proposed structure facilitates such fundamental operations as: * renaming a project (all the branches come with you) * deleting a project (same) * finding out what branches exist for a project (they're right there) * controlling commit access to individual projects (it's only one directory) * giving descriptive names to branches (there are separate branch namespaces) I consider it a disadvantage that the current sandbox layout significantly hinders/complicates each of these operations. In the proposed layout: ** you can easily see what projects exist Not true for the current sandbox. A look at http://svn.boost.org/trac/boost/browser/sandbox does not reveal that a project called 'outfmt' exists. It is in sandbox-branches, but it isn't obvious there either. ** you can easily locate all the branches of a project Example: looking at 'units' in sandbox/, you can't see that steven_watanabe has a branch of 'units' in sandbox-branches that might be interesting. If it were associaated with 'units', I could. Example 2: you want to find all the branches of 'graph'. You can't find this in the current sandbox, for starters because 'graph' isn't there at all. There's code for it in the branches area, but it is hidden in 'expaler'. ** you can easily rename/delete a project and all of its branches. Not easy to do with the current layout. Example: When 'outfmt' becomes 'oformat', you can't easily find the branches to rename them all. In the proposed layout, this Just Happens. Example 2: When you delete 'units' from the sandbox you probably won't catch steven_watanabe's branch, since it is called 'steven_watanabe' and not 'units'. ** you have more namespace for your own pet branch, so you can give your branches more succint and descriptive names. This is because the branch namespace (directory) is private to the project. In the current layout the branch names of all projects are commingled. When branching my project I must be aware of what branchnames are already taken by other projects. So you have to adopt arbitrary naming schemes... a mess. -t

First off, your comparison against the current sandbox non-structure is just plain irrelevant. The current setup doesn't follow any structure yet, hence why I started a discussion about what the structure should be. So I'm just not going to respond to those parts below. The abstract structure that is equivalent is something like: /trunk /project1 /project2 /branches /project1 /one (branch copy here) /two (branch copy here) /project2 /breaking_changes /int_to_double (branch copy here) /nonbreaking_changes /int_to_long (branch copy here) /tags /project1 /1_0 /1_1 /project2 /prerelease /0_1 /0_2 /release /1_0 troy d straszheim wrote:
True.
* deleting a project (same)
Yes.
* finding out what branches exist for a project (they're right there)
Equivalent to the above.
* controlling commit access to individual projects (it's only one directory)
IIRC it should be just as easy to set a single permission with a wildcard such as "/*/project-name". So I'd say this is also equivalent.
* giving descriptive names to branches (there are separate branch namespaces)
Equivalent.
** you can easily see what projects exist
Same as above.
** you can easily locate all the branches of a project
Same as above.
** you can easily rename/delete a project and all of its branches.
OK, in the top-down layout you'd have to rename in a small number of spots, normally three. So I don't see this as a real disadvantage. Especially since renaming a project would be a rare occurance.
** you have more namespace for your own pet branch, so you can give your branches more succint and descriptive names.
Same as above. And as I've mentioned before the disadvantage to bottom-up: You don't have a single directory you can check out to see the current set of projects. If you check out the root you get every version of every library filling up your drive with stuff most people don't care about. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On Mon, Jun 18, 2007 at 12:26:50AM -0500, Rene Rivera wrote:
[various snips]
Oh man. I was trying to interpret noise the whole time? Egh. Ok so looking at the above I agree, of course they're essentially the same thing. This arrangement also does not preclude the use of externals or other piecewise-checkout mechanism, and it is easy to change should there be a need. End.
Ah, so the 'common use case' is that you want to check out the entire sandbox. Of course you're right, you can't do this when tags/branches/trunk are together. So what happens when somebody checks a garbage project in to /sandbox/trunk/garbage? Does this ruin your day, or do you have some way to mark it as unwanted? -t

troy d straszheim wrote:
Yea, the discussions can get confusing around here sometimes :-)
Indeed, externals are a separate question. But reducing the need for them is also a goal since using them is more work for developers. With your mention of attaching the version number to the external itself I think we can use them in that form (assuming we fix the http vs https issue). So for externals we want: * To always use version specific externals (-rNNN). Unless we have some special uses, and are very careful. * Only use them for historical collections.
So you mean if someone checks in a project that has tags/branches below it? Or the general someone checks in a project that I'm not interested in? The former is what I'm trying to avoid, and the latter is by definition not applicable. If I'm checking out the sandbox root, I'm interested in all sandbox projects. If I wasn't interested in the whole sandbox I would check out only parts of it. As for ignoring, I tried using the svn:ignore facility but that doesn't work for directories, only for files. So there's no way to ignore something at any one level of check out. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On Mon, Jun 18, 2007 at 09:34:40AM -0500, Rene Rivera wrote:
Oh man. I was trying to interpret noise the whole time? Egh.
Yea, the discussions can get confusing around here sometimes :-)
Yeah. Sorry. :)
The issue is if the overall net benefit offsets this additional work... my experience says that it does, by a lot. In practice it has been a few people (release manager types) that deal with the externals.
The http vs. https thing is solvable (just allow anonymous https, there may be other ways), but I don't follow here... why would externals be useful for historical collections only? At what point in the process would externals be introduced?
Right.
Ok, that makes sense. Thanks. -t

on Sun Jun 17 2007, Rene Rivera <grafikrobot-AT-gmail.com> wrote:
Can you use Trac search to make it practical even with a different project structure?
I'm not sure anyone should presume to know what the common use case is :) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
To "browse" the projects in Trac you don't need any search at all. You can just browse the tree. But I prefer solutions which don't rely on extra tools. In the same vein I could use a client side repo browser to do the same, tsvn has a rather nice one. But others might not be so lucky. And I still prefer just doing a checkout and browsing around on my local drive both for what projects are there, and randomly reading other peoples code. Yea... I admit it... I'm a code voyeur :-) Ultimately it's much easier to tell people just do: svn co http://svn.boost.org/svn/boost/sandbox To see the sandbox projects. It's the ease of one stop shopping ;-) Which I think we want, since it improves the general accessibility of the sandbox projects.
Good point... I was basing my statement on personal observations. There have been two or three persons who mentioned getting the whole of sandbox as a usual pattern for them. Although I can only specifically remember Victor saying that at this time. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
FWIW I checked out the whole SoC 2006 svn because I was interested in the active development and I just SVN switched the tags over to empty folders to save hard drive space. IMO an empty folder should be provided somewhere high up in the repo for this explicit purpose. Probably at ( http://svn.boost.org/svn/boost/empty ). If you're checking out the whole sandbox because you want to see what is changing it seems you might be interested when things are branched/tagged as well. If you check out the whole sandbox with the sandbox/ --project/ ----branches/ ----tags/ ----trunk/ layout you get to see new tags and branches easily. After noticing it if you decide it's not worth your drive space you can simply svn switch it away to the empty folder. - Michael Marcin

troy d straszheim wrote:
On Sat, Jun 09, 2007 at 11:04:33PM +0200, Henrik Sundberg wrote:
One disadvantage of externals, which you might not have run into yet, is that they become invalid when you rename a project. For example you have: dataclasses http://code.icecube.wisc.edu/svn/projects/dataclasses/releases/B01-12-00 Say if the dataclasses project gets renamed to dbclasses you would have to go find all the externals and change them to the new name: dataclasses http://code.icecube.wisc.edu/svn/projects/dbclasses/releases/B01-12-00 And you would also have to consider for each one whether to rename the subdir name: dbclasses http://code.icecube.wisc.edu/svn/projects/dbclasses/releases/B01-12-00 So either you've never renamed projects in your set up, or you have broken releases ;-) -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On Mon, Jun 18, 2007 at 01:05:49AM -0500, Rene Rivera wrote:
There are two ways to handle it. One, you put revision numbers in with your externals, for instance here: http://code.icecube.wisc.edu/projects/daq/browser/meta-projects/pdaq/release... Or you can tag the release as a whole with a revision number as in here, http://code.icecube.wisc.edu/projects/icecube/browser/meta-projects/offline-... (note the extra properties at the bottom). In this second scheme, if a project does move, you'll see that things are broken at checkout and you'll have to specify that revision on the command line. Probably the first is better.
So either you've never renamed projects in your set up, or you have broken releases ;-)
Eh, no, not *necessarily*, but quite possibly. The further back you go the more haphazard our release process was, so I bet you could find something that is broken. These are physicists, not professionals. Anyhow, you'll notice that your sandbox layout proposal helps a lot with this, since tags are separate from branches/trunk. -t

troy d straszheim wrote:
OK, it's good to know one can do that.
Doesn't seem that useful without extra tools to check the properties.
Probably the first is better.
Definitely :-) It would allow for deterministic checkouts of historical releases, or any other snapshot collection. Specifically the test system could generate tags for the collections that get tested (assuming the Boost mainline uses a similar structure).
Given that the majority of Boost developers are likely using SVN for the first time that puts Boost also in the "not professionals" category in this case ;-) -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
I'm not sure what makes you say that. Subversion has been around for a while, and I know a lot of people actively using it. Most migrated to it from CVS, some from something else. FWIW, Stefan -- ...ich hab' noch einen Koffer in Berlin...
participants (7)
-
David Abrahams
-
Henrik Sundberg
-
Michael Marcin
-
Rene Rivera
-
Stefan Seefeld
-
troy d straszheim
-
troy d. straszheim