In case of a broken repository, break glass: mitigating real-life Git goofs

23/05/2018

near 8 min of reading

Imagine that you’re busy with a project that’s become a lifeblood of your company. The release is just a few days away and the schedule is tight. You work overtime or spend your 9-5 switching back and forth between a multitude of JIRA tickets, pushing a lot of pull requests as new issues come and go.

Focusing on the task at hand between one coffee break and another is tedious, and once you finish and push that final patch to your remote, you stop for a second and get this tingle in your chest. „Is it just some random piece of code that was not supposed to be there? What, release branch? Oh, my gosh! So, what do I do now?”

As many of my co-workers and myself have found ourselves in such situation, I feel obligated to address this issue. Thankfully, the good folks at Git made sure that undoing something we have already committed to a remote is not impossible. In this article, I will explore the ways of recovering your overtime mistakes as well as their potential drawbacks.

Depending on what your workflow looks like

Merge and rebase are examples of Git workflows that are used most often within corporate projects. For those who are not familiar with either of them:

Merge workflow assumes that your team uses one or more branches that derive from the trunk (often indirectly, i.e. having been branched out from development/sprint branches), then merged into the parent branch using the classic Git mechanism utilising a merge commit. This has an advantage of seeing clearly when a given functionality has been introduced into the parent branch and also preserving commit hashes after introducing the functionality. Also, it is easier for VCS tracking systems (like BitBucket) to make sense of the progress of your repository.
The drawback: your repository tree gets polluted with merge commits and the history graph (if you are into such things) becomes quite untidy.
In a rebase workflow the features, after being branched out of their parents, become incorporated into the trunk seamlessly. The trunk log becomes straightforward and the history log gets easy to navigate. This, however, does not preserve commit hashes and unless used in conjuction with pull request tracking systems, can easily result in the repository becoming difficult to maintain.

As it happens, there are many ways to fix your repository. These ways depend on your workflow and the kind of mistake. Let’s go through some of the most common errors that can take place. Without further ado…

I pushed a shiny, brand new feature without a pull request

As we all know, pull requests do matter – they help us avoid subtle bugs and it never hurts to have another pair of eyes to look at your code. You can of course do the review afterwards, but this would require some discipline to maintain. It is easier to stick to pull requests, really.

Working overtime or undercaffeinated often leads one to forget to create a feature branch before an actual feature is implemented. Rushing to push it and to test it leads to a mess.

Revert to the rescue

git revert

git revert is a powerful command that can safely undo code changes without rewriting the history. It is powerful enough to gain respect in the eyes of many developers who usually hesitate to use its true potential.

Revert (2) works by replaying given commits in reverse, that is, creating new commits that remove added lines of code, they add back whatever was removed and undo modifications. This differs from git reset (1) that reverting is a fast-forward change that involves no history rewriting. Thus, if somebody already pulled down the tainted repository branch or introduced some changes on their own, it would be straightforward to address that. Here’s an illustration:

Revert takes a single ref, a range of refs or several arbitrary (unrelated) refs as an argument. A ref can be anything that can be resolved to a commit: the branch name (local or remote), relative commit (using tilde operator) or the commit hash itself. Only the sky is the limit.

git revert 74e0028545d52b680d9ac59edd3aff0ac4
git revert 74e002..9839b2
git revert HEAD~1..HEAD~3
git revert origin/develop~1 origin/develop~3 origin/develop~4

Reverting several changes at once

Normally, all commits in a range are reverted in order one by one. If you wish to make your log a little tidier and revert them all at once, use a no-commit flag and commit the result with a custom message:

git revert -n HEAD~1..HEAD~3
git commit -m „Revert commits x – y”

Undoing merges – wanted and unwanted

When using a merge flow, as opposed to a rebase flow, reverting changes becomes slightly more complicated – it requires the programmer to explicitly choose the parent branch based on which the changes are reverted. This often means choosing a release branch – it doesn’t affect what is actually being reverted, but what’s preserved in the history log.

Let’s suppose you base your feature off the development branch and you name it feature-1. You introduce some changes into your branch while somebody merges some of their changes into development. Both your branch and the parent branch undergo some changes and then you can proceed to merging.

After a while, this feature has to be undone for the release that your team has to work on some more, and thus it requires a revert of the features you have previously introduced. Some time has passed since then and many more features have been introduced into the release.

Every merge commit has two parents. To revert yours into the state that preserves your and your team’s changes in the log, you would usually specify the mainline branch (-m) as the first one:

git revert 36bca8 -m 1

However, once the team decides to reintroduce this change once again, it will mean that you will try to merge in a set of diffs that’s already in the target branch. This will result in a succinct message that everything is up-to-date. To mitigate that, we could try to switch to our feature branch, amend the commit (in case there’s only one change to revert) or use an interactive rebase.

git rebase -i HEAD~n

Where n is the number of the introduced changes with edit option. Amending the commits will replay them, not altering the changes or commit messages, but making the commits appear different to Git and thus allowing us to reintroduce them as if they were fresh.

If you use rebasing, read this

The rebasing eases the burden of keeping track of the merge parents – because there are no merges to begin with. The history is linear and, as such, reverts of unwanted code are straightforward to perform.

It may be tempting to use rewriting in this case, but keep in mind the golden rule of rewriting – unless you are absolutely sure that it’s only you and nobody else using that branch, don’t rewrite the history.

How (not) to use the –force

Not everybody is born Jedi, and most programmers are no different. Force pushing the commit allows us to overwrite the remote history even if our branch does not perfectly fit into the scene, but makes it dangerously easy to forgo the changes somebody else made. A rule of thumb is to only use the force push on your own feature branches, and only when it is absolutely necessary.

When is it fine to rewrite Git history?

Put simply, as long as we haven’t published our branch yet for somebody else’s use. As long as our changes stay local, we’re fine to do as we please – provided we take responsibility for the data we manipulate. Some commands, such as git reset –hard can lead to loss of data (which happens quite often, as one would be able to infer from multitude of Stack Overflow posts on the topic). Nevertheless, it’s always wise to create a backup branch (or otherwise remember the ref) before attempting such operations.

Goof Prevention Patrol

Other than using software solutions, it’s best to enforce the team discipline yourself – make fellow developers responsible for their mistakes and let them learn from practice.
Some points worth going over are:

protecting main/release branches from accidental pushes using restrictions and rules
establishing a naming convention for branches and commit messages
using CD/CI system for additional monitoring – this may help detect repository inconsistencies

Also, many Git tools and providers, such as BitBucket, allow one to specify branch restrictions, such as not allowing developers to push to a main or release branch without a pull request. This can be extended to matching multiple branches with a glob expression or a regex. This is very handy especially if your team follows naming branches after a specific pattern.

Summary and further Git reads

Mistakes were made, are made and will be. Thankfully, Git is smart enough to include mechanisms for undoing them for the developers brave enough to use them.
Hopefully this article resolved some common misunderstandings about Revert and when to use it. Should that not prove enough, here are some helpful links: