Time for revision control
An often-heard expression about backups is that people don’t pay attention to them until it is too late. Version control systems (systems that allow you to store different versions of files within a project) can be seen as backup systems for your projects, and I hope you all use such a system for your projects. If this is not the case, keep reading on to make sure you don’t fall into the pitfall of not paying attention to backups until it is too late!
When looking at version control software, there seem to be two major camps: Git and SVN. In this article, I will give a brief history of both systems, an overview of their features, and a look at which system is best geared towards which situation.
The history of SVN
It is almost impossible to detail the history of SVN without a mention of the Concurrent Versions System (CVS) as SVN can be seen as the improved version of it. Dick Grune, a Dutch computer scientist who was born in Enschede, developed CVS in 1985. He was working on a C compiler called the Amsterdam Compiler Kit together with two of his students. As the schedules of the three project members varied greatly, it made sense to use a tool that allows the user to upload different versions of the project files independently. Linux already contains a tool to manage different versions of single files, called RCS. The problem with using RCS for projects is that you would have to manage each file in the project individually. According to Dick Grune “[when using RCS for projects] you are responsible for the administration of the files, and that is what computers are for”.
To solve this problem, the tool that would later become CVS was created. It started out as a package containing 31 shell scripts, with each script representing one way in which two files can be related to each other (e.g. file A is newer than file B). In an interview, Dick Grune made it seem very easy to get the system to work: “I programmed all 31 possibilities, and then it worked”. Two years after Grune created the basis, Brian Berliner started to work on an improved version of the tool written in C, and in 1990 version 1.0 of CVS was released to the open source community.
CVS was the de facto standard version control system in the open source community for several years, but CollabNet, a company that used CVS internally, decided in 2000 that a new version of CVS had to be created. The limitations of the software were too big to warrant improving the original CVS. Therefore they decided to write a new version from scratch instead, retaining the basic principles behind CVS but fixing some bugs in the system and adding some new features in the process. Karl Fogel, the author of Open Source Development with CVS, was contacted by CollabNet in February, and was asked if he would like to work on the project. As luck would have it, he was already discussing a new version of the system with his friend Jim Blandy. Their frustration with CVS in its current shape had lead to them thinking about better ways to manage versioned data. He had even come up with a name (Subversion) and a data store model for the new version. Of course Karl decided to help working on the CVS successor, and after 14 months of coding, Subversion was used to manage versions of Subversion itself.
The history of Git
Much like SVN, Git was also born as new software that retains the basic principles of its ancestor, but unlike SVN, Git was created because the developers were forced to do so, instead of just wanting to create a better version. The developers I am talking about are the developers of the Linux kernel. From 2002 until 2005 the Linux kernel development community used BitKeeper, a proprietary system, to manage different versions of their files. In 2005, however, the company that developed BitKeeper accused the kernel developers of reverse-engineering the tool, and decided to withdraw free usage. Instead of paying for continued usage, Linus Torvalds (one of the driving forces behind the development of the Linux kernel) and others decided to develop a version control system from scratch. Some of his requirements were the ability to handle large projects like the Linux kernel efficiently, a focus on high speed and support for a distributed BitKeeper-like workflow. There was another interesting requirement that laid the foundation for the different version control software camps right from the start: take CVS as an example of what not to do; if in doubt, make the exact opposite decision. The new system was named Git, British slang that can be translated as “unpleasant person”. Torvalds said about this name that “I’m an egotistical bastard, and I name all my projects after myself. First ‘Linux’, now ‘git’.” Development of Git began on 3 April 2005, and it was used to manage versions of itself from 7 April 2005 onwards.
How SVN works
As you now have a basic idea of the history of both SVN and Git, it is time to delve into the workings of both revision control systems, in order to be able to make a proper comparison between the two solutions. Let’s start out by talking about how SVN works.
SVN is a centralized version control system, which means that for each project you manage with SVN, one central location (the server) stores all project files. The central place in which all SVN-managed files belonging to one project are stored, is called a repository. Unlike a standard folder on a file server, an SVN repository stores multiple versions of the same file, allowing you to compare changes made in one version of a file to an older version of the same file. Because of this extraordinary feature, an SVN repository is not a standard folder that can be browsed by every program. Instead, you have to manage repositories through the tools SVN provides.
Next to a central location for all project files, each user has a local copy in which he or she is free to edit the files as seen fit, until the users wants the changes to be communicated to, or committed to, the repository. This local copy is called the working copy of the user, and can be accessed by all programs without them being aware of SVN. You can even have several working copies on the same computer, for example allowing you to experiment with one feature in the first working copy, while fixing bugs in the other.
As you can imagine, problems arise when multiple users are working on the same file and want to commit their changes to the repository. There are two common solutions to this problem. The first solution is called the “lock-modify-unlock”-model. Imagine that there are two users working on the same file, Harry and Sally. Harry starts editing the file first. When this approach is used, when Harry wants to edit the file locks it, meaning that Sally cannot edit this file until the Harry decides to unlock the file again. This model is illustrated in Figure 1.
Another solution is called the “copy-modify-merge”-model, and is used by SVN. Let’s look at the situation in which Harry and Sally are both editing a file again, but now with a system that uses the copy-modify-merge model. When Harry starts editing the file, he first makes a copy of the file in the repository and stores it in his working directory. Shortly afterwards, Sally makes another copy of the file and store that in her working directory. After Sally has made the changes that she wanted to make to the file she commits it to the repository again. After a short while, Harry has finished editing too and wants to commit his file as well. When he tries to do so, however, he receives an error stating that his working directory is out-of-date, because Sally has just committed a new version. To solve this, Harry first has to update his working copy, analyze the conflicts in the file he and Sally were editing, make the necessary changes to the file to combine the two alterations (a process known as merging), and finally commit the file. The next time Sally updates her working directory, she has a copy of the file that not only contains her additions, but those of Harry as well. Using this copy-modify-merge solution means that users can work simultaneously on the same file. It does however require the users to discuss the changes they have made to files with each other in order to successfully merge conflicted files.
How Git works
Now you know how SVN works, you can forget a lot of what you can just learned, because Git works radically different. In contrast to SVN, Git is a distributed version control system. This means that there is no single location that stores all versions of each file. Instead, every user has his/her own local repository. Next to that, Git makes use of a staging area, where files reside until they are committed. It is an area that between your working copy and your local repository. When you want to work with others on a Git-managed project, you have to add changed files to your staging area and commit those changes to your local repository first, and after that you can push these changes to a remote (shared) repository, so others can access the changes you made. Because of those extra (file) areas a typical Git workflow, as illustrated in Figure 3 is as follows:
- You modify files in your working directory.
- You stage the files, which adds copies of them to the staging area.
- You do a commit, which moves the files in the staging area to your git directory / local repository.
- When you are ready to share your changed files with the other developers on your team, you do a push, which copies the changes in your local repository to the remote repository.
A big advantage of working with a distributed version control system like Git is that every user has a copy of the repository on his/her own machine. This means that, when you are comparing different versions of files, you don’t have to access the central repository, which makes this operation a lot quicker. Additionally, since you are not dependent on the central server, you can even work on your projects without access to a network.
What is also important to know about Git is that, unlike SVN, revisions are seen as snapshots of files instead of only delta files (files containing only what changed since the last version). The advantage of this approach becomes apparent when working with branches. Branches are different paths in development you have created alongside each other. For example, you could have a production branch, in which only thoroughly tested code is located, and a testing branch, in which new code is published and worked on until it is stable. With Git, merging, or putting together those two branches is very easy to do, because of the snapshot-based way of storing files Git employs.
When to use Git, and when to use SVN
Now you have a basic understanding of both Git and SVN, and the time has come to compare the two: which solution is suited best to which situation? It is beyond the scope of this article to list all advantages and disadvantages of each system. Instead here are some key scenarios in which one system is clearly better than the other.
When you want your version control system to be as quick as possible, Git is the system you are looking for. The speed of Git can be attributed to its distributed nature. Because every user has a local repository, typical version control system operations such as comparing files are a lot quicker than with SVN, because you don’t have to perform these operations over the network, but they can be done locally. Next to that, Git was designed with speed in mind, because it was created to manage the linux kernel, which contains a lot of small files that need to be managed.
If you set up your projects in such a way that not all users have to work on the project, but each team or user has a dedicated subdirectory to work on, SVN might be a better choice. This is because SVN allows you to check out a subdirectory of the repository (with full version control functionality) to work on. Doing the same thing with Git is not possible, since all users need to have a full local copy of the repository.
You might like to work with branches in your project, for example a branch for the stable version, which can be rolled out to clients without problems, and a development branch, in which all new features are implemented before they are tested for deployment. In this case, Git is the better option. Git was designed with branching in mind. Each new version stored in a Git repository links to the previous version, so it is easy to track where a project was branched. In addition, Git stores whole copies of files for each version (unlike SVN, that only stores the differences between versions), so merging different branches again is quick and usually doesn’t require user interaction.
When you want backup creation to be as easy as possible, SVN is a better fit for you. This comes down to the difference between a centralized system and a distributed system. In centralized systems, there is one clear location for the project files, and if you back up this location, you have backed up the whole project. By contrast, Git is distributed in nature, which means every user has his/her own repository. If there are no proper rules about pushing local repositories to the central repository, or if a user has forgotten this, there is a risk that a backup of the central repository doesn’t include the most up-to-date versions of the project files.
After reading this article, you should have a basic understanding of the history of SVN and Git and have some knowledge as to how both these systems work. With all this knowledge in mind, answering the question as to which of the two is better is still not a trivial task. What the best solution is entirely depends on the way you set up your projects and what features you want your version control system to have. This article has shed some light onto the considerations you can make in picking one of these systems, but is by no means exhaustive. Still, I hope it provides a starting point in making a decision between the two, and has motivated you to start using version control for your projects if your weren’t doing so yet.