Motivation

Greg Wilson’s excellent Script for Introduction to Version Control provides a detailed motivation why you will benefit greatly from using version control systems. Here follows a shorter motivation and a quick overview of the basic concepts.

Why not Dropbox or Google Drive?

The simplest services for hosting project files are Dropbox and Google Drive. It is very easy to get started with these systems, and they allow you to share files among laptops and mobile units with as many users as you want. The systems offer a kind of version control in that the files are stored frequently (several times per minute), and you can go back to previous versions for the last 30 days. However, it is challenging to find the right version from the past when there are so many of them and when the different versions are not annotated with sensible comments. Another deficiency of Dropbox and Google Drive is that they sync all your files in a folder, a feature you clearly do not want if there are many large files (simulation data, visualizations, movies, binaries from compilations, temporary scratch files, automatically generated copies) that can easily be regenerated.

However, the most serious problem with Dropbox and Google Drive arises when several people edit files simultaneously: it can be difficult detect who did what when, roll back to previous versions, and to manually merge the edits when these are incompatible. Then one needs more sophisticated tools, which means a true version control system. The following text aims at providing you with the minimum information to started with Git, the leading version control system, combined with project hosting services for file storage.

Project hosting services

Through the last decade, several popular project hosting sites appeared: Sourceforge, Google code, Launchpad, Bitbucket, and GitHub. Today, the latter two stand out as the most successful, and we therefore concentrate on them in this quick intro. Both services have excellent introductions available at their web sites, but the recipes below are much shorter and aim at getting you started as quickly as possible by concentrating on the most important need-to-know steps.

After the recipes on how to be up and going with project hosting services, there is an introduction to Git. One can find a lot of Git tutorials on the web so you might ask why yet another one. Traditionally, Git introductions have been quite technical, aimed at professional programmers. The material on Git in this document grew out my need to provide students in course with minimal information on how they can archive and hand in all their computer-based course work in Git. The coming text therefore gets you up and going very fast. More advanced features of Git (branching, stashing, forking, merging), which enhances the reliability of your work, are introduced subsequently, but are not required for beginners. Thereafter, your Git background should be firm enough to easily access the many references I provide to material on the web.

Repositories and local copies

The mentioned services host all your files in a specific project in what is known as a repository, or repo for short. When a copy of the files are wanted on a certain computer, one clones the repository on that computer. This creates a local copy of the files. Now files can be edited, new ones can be added, and files can be deleted. These changes are then brought back to the repository. If users at different computers synchronize their files frequently with the repository, most modern version control systems will be able to merge changes in files that have been edited simultaneously on different computers. This is perhaps one of the most useful features of project hosting services. However, the merge functionality clearly works best for pure text files and less well for binary files, such as PDF files, MS Word or Excel documents, and OpenOffice documents.

What to choose?

It might seem challenging to pick the project hosting service and the version control system that are right for you. Below is some very personal advice.

  • I would recommend Git as the default choice of version control system. It is simply superior in many aspects, but for a beginner the superior merge functionality is most useful.
  • If you need private repos and you are not a student, choose Bitbucket unless you are willing to pay for private repos at GitHub.
  • If you want to make links to your programs in documentation, stay away from Bitbucket as this site embeds the version of the file into its URL (but you can get around this problem with a trick described in the section Linking to Bitbucket files).
  • If a few collaborate on a project and everyone wants to be notified in email about changes in project files, the procedure on Bitbucket is easier to use than the one on GitHub.

For the examples below, assume that you have some directory tree my-project with files that you want to host at Bitbucket or GitHub and bring under version control. The official name of the project is “My Project”.

Installing Git

The installation of Git on various systems is described on the Git website under the Download section. Git involves compiled code so it is most convenient to download a precompiled binary version of the software on Windows, Mac and other Linux computers. On Ubuntu or any Debian-based system the relevant installation command is

Terminal> sudo apt-get install git gitk git-doc

This tutorial explains Git interaction through command-line applications in a terminal window. There are numerous graphical user interfaces to Git. Three examples are git-cola, TortoiseGit, and SourceTree.

Configuring Git

Make a file .gitconfig in your home directory with information on your full name, email address, your favorite text editor, and the name of an “excludes file” which defines the file types that Git should omit when bringing new directories under version control. Here is a simplified version of the author’s .gitconfig file:

[user]
name = Hans Petter Langtangen
email = hpl@simula.no
editor = emacs

[core]
excludesfile = ~/.gitignore

The excludesfile variable is important: it points to a file called .gitignore, which must list, using the Unix Shell Wildcard notation, the type of files that you do not need to have under version control, because they represent garbage or temporary information, or they can easily be regenerated from some other source files. A suggested .gitignore file looks like

# compiled files:
*.o
*.so
*.a
# temporary files:
*.bak
*.swp
*~
.*~
*.old
tmp*
.tmp*
temp*
.#*
\#*
# tex files:
*.log
*.dvi
*.aux
*.blg
*.idx
*.nav
*.out
*.toc
*.snm
*.vrb
*.bbl
*.ilg
*.ind
*.loe
# eclipse files:
*.cproject
*.project
# misc:
.DS_Store

Carefully judge what files to bring under version control

You should be critical to what kind of files you really need a full history of. For example, you do not want to populate the repository with big graphics files of the type that can easily be regenerated by some program. The suggested .gitignore file above lists typical files that are not needed (usually because they are automatically generated by some program).

In addition to a default .gitignore file in your home directory, it may be wise to have a .gitignore file tailored your repo in the root directory of the repo.

Large data files, even when you want to version them, fill up your repo and should be taken care of through the special service Git Large File Storage.