I frequently use the idea of “coherence” to guide how I group changes into commits and break features down for implementation. I’ve never attempt to flesh that idea out or describe it, so here’s my attempt.
We take our version control practices seriously at Super Good
. We’re consultants who often work on legacy projects without access to the original developers. Good commit histories go a long way to helping us understand the historical context around the code in the application.
Most projects have pretty useless histories. It’s not uncommon to see 3000 line commits with messages that simply read “update checkout”. We do future developers (and often ourselves) a favour by making our commits small and tightly focused and writing descriptive commit messages.
There are a few great blog posts about how to write git commit messages. My favourite is the one by Chris Beams
. It offers excellent advice for writing messages that provide context on the changes and match the conventions that git’s own automated messages follow. I think it’s very sensible.
New members on my team are pointed to that article, but it only helps with writing the commit messages. It doesn’t help you decide what changes do and don’t belong in a commit.
The common advice is “make small commits”. One rule of thumb I’ve heard is that if you feel the need to use the word “and” in the commit summary, it should be two commits. That’s a fine heuristic, but it’s trivial to come up with a change that would require “and” in its summary, but isn’t reasonable to break up.
I try to make my commits coherent and also ensure that those commits leave the project in a coherent state. “Coherent” has two main definitions; it can mean “united as or forming a whole” or “logical and consistent”. I use it here to mean both of these ideas.
There’s not one true way to evaluate if a change is logical and consistent, but we can talk about some concrete things to look at.
An easy example is the tests. Do the tests pass for this commit? If they don’t, then the production code isn’t consistent with what the tests specify. A whole change updates the tests to reflect the changes to the system.
That doesn’t mean that the tests need to change, though. A commit that refactors part of the system is still logical and consistent as long as it completes the refactoring and leaves the code in a working state.
If you introduce a class, but don’t use it anywhere yet, that’s not really complete. Wether the class is part of a refactoring or new functionality, it’s only a fragment of that broader change. You’ve introduced a new concept into the system, but not updated the system to reflect the existence of this new concept.
If I’m a reviewer, then I want to see the new class in the context of how it will be used, especially if I’m being asked to provide design feedback. If I’m digging through version control to understand why the class was added, then I need to see how it was being used. If I ever needed to revert the change, then I would never revert the addition of this class without the changes that actually use it. They are the same change.
A common one I see in Rails is in the addition of new models. I’ve seen people split up some combination of the migration, schema update, new model classes, and even addition of the associations into separate commits.
Putting the migration in a commit without the schema changes puts the repository into an inconsistent state. There are migrations that change the schema, but the schema doesn’t reflect that change. The other combinations introduce incompleteness. If a commit leaves the repository in a state where it looks like someone forgot to do something (like add the model class to go with a database table, or the associations to go with a model), then it’s probably not a complete commit.
One trouble I see people run into is how to go about adding related functionality and large features. I want to be clear that I’m not saying functionality should be added in huge swaths. Having incomplete features doesn’t put the project into an incomplete or inconsistent state. All software is in a constant state of change. Features are added, removed, and altered every day. If the software were truly “complete”, then you wouldn’t care about any of this; you’d be stopping work on it.
Software is built incrementally. The kind of guidance I’m providing here is meant to support this. Feature slicing is a really deep topic that I can’t make fit into this issue, but just understand that I’m saying the changes should be complete and leave the project in a consistent state. I am not saying that the software’s functionality or the feature you’re working on needs to be completed in one commit. That would be absurd.
A commit should be a whole change to the application that takes it from one consistent state to another consistent state. It doesn’t need to be a change that can be externally observed; it can be a simple refactor. It doesn’t need to be a whole feature from a user’s perspective, but should do everything that it sets out to do. If you use acceptance tests, they can make it clear what exactly that is.
I’m sympathetic to people who advocate for squashing pull requests when merging. I see a frequent emphasis on small commits without any guidance on what might make a commit too small. Making your commits too small makes them less useful as reviewable units and for VCS operations like browsing and reverting. Squashing can be a heavy handed solution to that.
Small commits are good, but if you’re looking to get more value from how you use your VCS, size isn’t the goal. Commits that include everything which makes them complete and coherent, but nothing more, make the most useful version control histories.