Motivation

Greg Wilson’s excellent Script for Introduction to Version Control provides a detailed motivation why you will benefit greatly from using version control systems. Here follows a shorter motivation and a quick overview of the basic concepts.

Why not Dropbox or Google Drive?

The simplest services for hosting project files are Dropbox and Google Drive. It is very easy to get started with these systems, and they allow you to share files among laptops and mobile units with as many users as you want. The systems offer a kind of version control in that the files are stored frequently (several times per minute), and you can go back to previous versions for the last 30 days. However, it is challenging to find the right version from the past when there are so many of them and when the different versions are not annotated with sensible comments. Another deficiency of Dropbox and Google Drive is that they sync all your files in a folder, a feature you clearly do not want if there are many large files (simulation data, visualizations, movies, binaries from compilations, temporary scratch files, automatically generated copies) that can easily be regenerated.

However, the most serious problem with Dropbox and Google Drive arises when several people edit files simultaneously: it can be difficult detect who did what when, roll back to previous versions, and to manually merge the edits when these are incompatible. Then one needs more sophisticated tools, which means a true version control system. The following text aims at providing you with the minimum information to started with Git, the leading version control system, combined with project hosting services for file storage.

Repositories and local copies

The mentioned services host all your files in a specific project in what is known as a repository, or repo for short. When a copy of the files are wanted on a certain computer, one clones the repository on that computer. This creates a local copy of the files. Now files can be edited, new ones can be added, and files can be deleted. These changes are then brought back to the repository. If users at different computers synchronize their files frequently with the repository, most modern version control systems will be able to merge changes in files that have been edited simultaneously on different computers. This is perhaps one of the most useful features of project hosting services. However, the merge functionality clearly works best for pure text files and less well for binary files, such as PDF files, MS Word or Excel documents, and OpenOffice documents.

Installing Git

The installation of Git on various systems is described on the Git website under the Download section. Git involves compiled code so it is most convenient to download a precompiled binary version of the software on Windows, Mac and other Linux computers. On Ubuntu or any Debian-based system the relevant installation command is

Terminal> sudo apt-get install git gitk git-doc

This tutorial explains Git interaction through command-line applications in a terminal window. There are numerous graphical user interfaces to Git. Three examples are git-cola, TortoiseGit, and SourceTree.

Configuring Git

Make a file .gitconfig in your home directory with information on your full name, email address, your favorite text editor, and the name of an “excludes file” which defines the file types that Git should omit when bringing new directories under version control. Here is a simplified version of the author’s .gitconfig file:

[user]
name = Hans Petter Langtangen
email = hpl@simula.no
editor = emacs

[core]
excludesfile = ~/.gitignore

The excludesfile variable is important: it points to a file called .gitignore, which must list, using the Unix Shell Wildcard notation, the type of files that you do not need to have under version control, because they represent garbage or temporary information, or they can easily be regenerated from some other source files. A suggested .gitignore file looks like

# compiled files:
*.o
*.so
*.a
# temporary files:
*.bak
*.swp
*~
.*~
*.old
tmp*
.tmp*
temp*
.#*
\#*
# tex files:
*.log
*.dvi
*.aux
*.blg
*.idx
*.nav
*.out
*.toc
*.snm
*.vrb
*.bbl
*.ilg
*.ind
*.loe
# eclipse files:
*.cproject
*.project
# misc:
.DS_Store

Carefully judge what files to bring under version control

You should be critical to what kind of files you really need a full history of. For example, you do not want to populate the repository with big graphics files of the type that can easily be regenerated by some program. The suggested .gitignore file above lists typical files that are not needed (usually because they are automatically generated by some program).

In addition to a default .gitignore file in your home directory, it may be wise to have a .gitignore file tailored your repo in the root directory of the repo.

Large data files, even when you want to version them, fill up your repo and should be taken care of through the special service Git Large File Storage.