
Julian Gonggrijp <j.gonggrijp@gmail.com> writes:
What is the meaning of a commit?
One possible interpretation is that a commit is a snapshot of your project. A snapshot is something that you store for future reference. Because in a sense it's a form of documentation, one will take care to submit well-crafted commits that include enough useful changes to license a new snapshot. In principle, every commit is assumed to introduce some form of progress compared to the previous. Making changes to such a history of snapshots is almost necessarily a form of fraud.
This is the kind of mental model of a commit that is stimulated by svn. You can see it from the terminology: making a commit causes the repository to move to the next revision number.
Another possible interpretation is that a commit represents a unit of work. This tends to favour many small commits over few big commits. Since anything you do before you're sure that it's the right thing is also work, shabby commits are part of the deal. The consequence is that it must be very cheap to isolate any messy state in temporary side tracks. Now the VCS is not only a collection of snapshots, but also a tool to manage your recent pieces of work before you finally commit* to some of them.
This kind of mental model is stimulated by git. It explains why git users make a fuss about amending, rebasing and efficient branching and merging.
I'm afraid I don't agree with this. The version control systems that came after CVS switched to storing repository-wide snapshots. CVS was a collection of RCS files and so completely file-centric. SVN, Mercurial, Git, ... are all repository-centric. Conceptually they work by storing a series (linear or not) of working copy snapshots. Darcs is a possible exception to this: I think it might fit more closely to your unit-of-work model since it models the repository state as a result of a number of patches (units-of-work).
There is no point in arguing that one mental model is superior to the other until you fully grasp both of them. I urge anyone who feels tempted to make agitated remarks to let this sink in for at least a few hours.
That said, I'm confident enough to think that I can give two solid arguments why the units-of-work model is ultimately more productive.
I also prefer people to do five units-of-work (commits) instead of one huge. Smaller commits should be more self-contained and will be easier to review. But I much prefer that the project is in a working state after every single unit-of-work. This is because I think of each commit as a snapshot that might be checked out alone. This commonly happens when using the bisect command: the tool (Git or Mercurial) will assist you in a binary search for a faulty commit. They update to the middle of the unchecked range and you have to build and test the commit. When doing that, it's annoying if you run into commits that fail because of other problems than the one you're investigating. Such false positives makes automated bisecting impossible. Each project must make up its own policy here. -- Martin Geisler aragost Trifork Professional Mercurial support http://www.aragost.com/mercurial/