Rebase, what is it and why should you use it

Abstract

Git is one of the most popular Version Control Systems (VCS) out on the market right now. It is a powerful tool that can help to increase an IT-team’s efficiency in many ways, albeit the distributed aspect of git makes it sometimes hard to wrap your head around it. The following article intends to explain the often misunderstood concept of the rebase function. It will help you getting a clearer picture of how to use rebase, giving you a technical overview and a possible usage in an entire workflow.

Introduction

When you start using Git you will certainly use the merge command to get the changes of other branches into your branch. Merge incorporates two branches from the time their histories diverged. The merge will be done in an additional commit, the so called merge-commit. The merge command is mostly save. You merge the respective changes into your branch and if necessary solve the merge conflicts. But this operation is responsible for some ugly commit messages. Every time a merge takes place a merge commit will be created with a message. Here are two examples of a merge commit message:

Merge branch ‘master’ into feature/feature#123 or Merge branch feature/feature#123 into feature/feature#123

These messages contain the information that changes have been merged into your branch, but do not have any other added value. If you keep on merging changes, that have been made in your feature branch by your coworker or you merge the last changes from the master into your local working copy, it will create the above mentioned merge commit. After doing a lot of merges, they will be everywhere. They will soil your commit history.
To avoid this behavior and produce a cleaner and more readable commit history, you can use another operation called rebase.
Rebase basically is the process of moving a branch to a new base commit. All your commits will be moved on top of the specified branch. But internally, Git does not really move the commits, it creates new commits and applies them on top of the named base. So rebase rewrites the project history. The following diagram shows the main steps while rebasing a feature branch(branch1) onto the master branch.

Rebase branch1 onto master branch

 

Figure 1: Initial State

This is the initial state of the repository. There is one feature branch(branch1) that differs at commit ‚C1‘ from the master branch.

 

Figure 2: Copy commit into temporary area

When rebasing your branch onto the master branch, the rebase operation moves the commit ‚C2‘ into a temporary area.

 

Figure 3: branch1 points to head of master

After all commits have been moved to the temporary area, Git shifts the branch pointer to the head of the master.

 

Figure 4: Apply

Finally Git applies the commits from the temporary area on top of your feature branch. Commit C2 will be applied on top with new commit ID C2′ and branch1 points to that commit. As you can see here, applying commits from the temporary area on the master branch will create new commit IDs. This is what we refer to when we speak about rewriting the commit history.
The result of rebasing will be a linear commit history. All changes from the master will be available in our branch and it will be much easier to read. To illustrate the resulting history, here are two histories with the same commits. One is using merge and the other is using rebase.

Figure 5: Commit history with merge commits

Figure 6: Commit history after rebase

Figure 5 shows a commit history with 3 merge commits. The graph has a junction for every merge commit and makes the history much harder to read. Remember this is a simple illustration. When you are working in projects with 5 or 6 developers, there will be a lot more commits and merges.
Figure 6 however shows a linear, readable history, because instead of merging, the commits of your feature branch will be applied on top of the commits of the master branch. All commits in this illustration up to the commit with the message „master 3 ahead add f.txt“ are part of the master and your feature branch. After that there are three additional commits that are only in your feature branch.

Push to remote after rebasing

Git stipulates that you share changes with your co-worker via a remote repository. Rebasing may bring some issues when pushing your rebased branch to remote. In the following illustration, you can see the history of a branch called „featureBranch“ with the corresponding commit IDs before rebasing to master (Figure 7). After rebasing the featureBranch onto the master branch, the commit „add a.txt“ with ID „650c596“ is rewritten to „5c4e8c4“ (Figure 8). It contains the same content and same commit message, but has a new commit ID.

Figure 7: Commit history of featureBranch

Figure 8: Commit history after rebase

Why can this be a problem?

To get the remote changes from other developers, there is no problem on rebasing your local feature branch onto the remote feature branch. But pay attention when rebasing onto the master as showing in Figure 7 and 8. Generally, Git uses a function called fast-forward to bring new commits to the remote branch. That means Git assumes that all commits from remote are local with possibly some new commits at the end.
In case you already pushed your changes in the past, the remote repository contains commits with generated commit IDs. After rebasing, your commits will have new commit IDs. That means the commit IDs on remote are now different from your local commit IDs. Git cannot fast-forward. Figure 9 demonstrate this use case. It’s based on the previous example. When pushing the local rebased commits to remote, Git does not know if 5c4e8c4 or 650c596 must be applied on top of a6772d4.

Figure 9: No fast forward after rebasing

How can you solve that problem?

One solution could be to use “git push –force”. The parameter „–force“ will push your local state to remote and overwrite the entire remote commit history. But this could be a problem as well, when you are working with your co-worker. If he also pushed changes in the meantime, his will be lost. Another, better solution is using push with the additional parameter “–force-with-lease”. This will push your changes to remote and overwrite the remote branch, in case that the local remote-tracking branch does not have any changes to the remote branch. This will give you the guarantee that you will not overwrite other developers work.
There is also one drawback. If you fetch the remote changes into your local remote-tracking branch, and you do not pull them into your local working copy, the “push –force-with-lease” command will successfully be executed and causes the remote changes to be overwritten. This is because “–force-with-lease“ compares the remote-tracking branch with the remote branch. If there are any changes, the command will fail. If they are equal, the changes will be pushed. So be careful with that. In this case it is generally wise to talk to the other developers before you push your rebased branch.
The following section shows a workflow that uses the recently shown features rebase and push –force-with-lease.

Workflow

Figure 10: Git Workflow using rebase and push –force-with-lease

Steps 1 to 3 should be well known. You create your new branch and keep working until you commit your changes. In the meantime, your co-workers will work on the same branch. This is often seen, when a frontend and a backend developer are working on a feature. To get your changes you use “pull –rebase” (Step 4). This will rebase your commits onto the commits of the remote branch, containing your co-worker’s changes and end up in a clean history.
You repeat these steps until you decide to push your changes to remote. You can do this by using git push or if it is not possible to fast-forward use push –for-with-lease (Step 6). In case you decide to pull the changes from the master into your branch, you use “pull –rebase origin master”. This will bring in all new commits of the master branch and reapply all of your commits, as described in section „Merge vs Rebase“. This will rewrite your commit history. So be careful with that, especially when working with other developers. It might be a good practice to inform other developers (Step 5). After that you carry on with Step 6. Steps 2 to 6 can be repeated until you finished your feature. You can now create a pull request (Step 7). Your code will be reviewed by another developer. If there are no issues, your branch can be merged to the master branch (Step 8). Notice here, that your branch should be merged to the master, not rebased. If you would use rebase, you have no chance to regain, in what branch and at what time in history your features have been made. There would be no feature branches in your history at all. Altogether this workflow may help you and your team to work efficiently and produce a clean history. To make this work, all your team members have to understand every step of the workflow.

Conclusion

Finally, there are good situations for using both operations. You can use Git rebase when you are developing with a co-worker on a feature branch. You should coordinate with your co-worker, when you rebase your branch to the master branch, so that no commits will be lost. The result will be a good readable commit history. Using merge is mostly save and will never corrupt your commit history. Also use merge when bringing your feature branches into the master, so that you can see where your commits come from. On top of that, using these features in a structured and well known workflow will help your team become more productive.
While this article represents our ideas about how to use merge and rebase in a productive workflow, we are looking forward to feedback from other developers. Don’t’ hesitate to comment on this blog entry, as we are interested in your ideas and thoughts about this topic.

0 Kommentare

Dein Kommentar

Want to join the discussion?
Feel free to contribute!

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.