Using Git

Most Mac and Linux users prefer to work with Git via commands in a terminal window. Windows users may prefer a graphical user interface (GUI), and there are many options in this respect. There are also GUIs for Mac users. Here we concentrate on the efficient command-line interface to Git.

Basic Git commands

Cloning

You get started with your project on a new machine, or another user can get started with the project, by running

Terminal> git clone git@github.com:user/My-Project.git
Terminal> cd My-Project
ls

Recall to replace user by your real username and My-Project by the actual project name.

The pull-change-push cycle

The typical work flow with the "My Project" project starts with updating the local repository by going to the My-Project directory and writing

Terminal> git pull origin master

You may want to do git fetch and git merge instead of git pull as explained in the section Replacing pull by fetch and merge, especially if you work with branches.

You can now edit files, make new files, and make new directories. New files and directories must be added with git add. There are also Git commands for deleting, renaming, and moving files. Typical examples on these Git commands are

Terminal> git add file2.* dir1 dir2  # add files and directories
Terminal> git rm file3
Terminal> git rm -r dir2
Terminal> git mv oldname newname
Terminal> git mv oldname ../newdir

When your chunk of work is ready, it is time to commit your changes (note the -am option):

Terminal> git commit -am 'Description of changes.'

If typos or errors enter the message, the git commit --amend command can be used to reformulate the message. Running git diff prior to git commit makes it easier to formulate descriptive commit messages since this command gives a listing of all the changes you have made to the files since the last commit or pull command.

You may perform many commits (to keep track of small changes), before you push your changes to the global repository:

Terminal> git push origin master

Do pull and push often! It is recommended to pull, commit, and push frequently if the work takes place in several clones of the repo (i.e., there are many users or you work with the repo on different computers). Infrequent push and pull easily leads to merge problems (see the section Merging files with Git). Also remember that others (human and machines) cannot get your changes before they are pushed!

Do not forget to add important files. You should run git status -s frequently to see the status of files: A for added, M for modified, R for renamed, and ?? for not being registered in the repo. Pay particular attention to the ?? files and examine if all of them are redundant or easily regenerated from other files - of not, run git add.

Make sure you have a .gitignore file. The simplest way of adding files to the repo is to do

Terminal> git add .

The dot adds every file, and this is seldom what you want, since your directories frequently contain large redundant files or files that can easily be regenerated. You therefore need a .gitignore file, see the section Configuring Git, either in your home directory or in the root directory of the repo. The .gitignore file will ignore undesired files when you do git add ..

Viewing the history of files

A nice graphical tool allows you to view all changes, or just the latest ones:

Terminal> gitk --all
Terminal> gitk --since="2 weeks ago"

You can also view changes to all files, some selected ones, or a subdirectory:

Terminal> git log -p                 # all changes to all files
Terminal> git log -p filename        # changes to a specific file
Terminal> git log --stat --summary   # compact summary
Terminal> git log --stat --summary subdir

Adding --follow will print the history of file versions before the file got its present name.

To show the author who is responsible for the last modification of each line in the file, use git blame:

Terminal> git blame filename
Terminal> git blame --since="1 week" filename

A useful command to see the history of who did what, where individual edits of words are highlighted (--word-diff), is

git log -p --stat --word-diff filename

Removed words appear in brackets and added words in curly braces.

Looking for when a particular piece of text entered or left the file, say the text def myfunc, one can run

Terminal> git log -p --word-diff --stat -S'def myfunc' filename

This is useful to track down particular changes in the files to see when they occurred and who introduced them. One can also search for regular expressions instead of exact text: just replace -S by -G.

Retrieving old files

Occasionally you need to go back to an earlier version of a file, e.g., a file called f.py. Start with viewing the history:

Terminal> git log f.py

Find a commit candidate from the list that you will compare the present version to, copy the commit hash (string like c7673487...), and run

Terminal> git diff c7673487763ec2bb374758fb8e7efefa12f16dea f.py

where the long string is the relevant commit hash. You can now view the differences between the most recent version and the one in the commit you picked (see the section Replacing pull by fetch and merge for how to configure the tools used by the git diff command). If you want to restore the old file, write

Terminal> git checkout c7673487763ec2bb374758fb8e7efefa12f16dea f.py

To go back to another version (the most recent one, for instance), find the commit hash with git log f.py, and do get checkout <commit hash> f.py.

If f.py changed name from e.py at some point and you want e.py back, run git log --follow f.py to find the commit when e.py existed, and do a git checkout <commit hash> e.py.

In case f.py no longer exists, run git log -- f.py to see its history before deletion. The last commit shown does not contain the file, so you need to check out the next last to retrieve the latest version of a deleted file.

Often you just need to view the old file, not replace the current one by the old one, and then git show is handy. Unfortunately, it requires the full path from the root git directory:

Terminal> git show \
          c7673487763ec2bb374758fb8e7efefa12f16dea:dir1/dir2/f.py

Reset the entire repo to an old version

Run git log on some file and find the commit hash of the date or message when want to go back to. Run git checkout <commit hash> to change all files to this state. The problem of going back to the most recent state is that git log has no newer commits than the one you checked out. The trick is to say git checkout master to set all files to most recent version again.

If you want to reset all files to an old version and commit this state as the valid present state, you do

Terminal> git checkout c7673487763ec2bb374758fb8e7efefa12f16dea .
Terminal> git commit -am 'Resetting to ...'

Note the period at the end of the first command (without it, you only get the possibility to look at old files, but the next commit is not affected).

Going back to a previous commit

Sometimes accidents with many files happen and you want to go back to the last commit. Find the hash of the last commit and do

Terminal> git reset --hard c867c487763ec2

This command destroys everything you have done since the last commit. To push it as the new state of the repo, do

Terminal> git push origin HEAD --force

Merging files with Git

The git pull command fetches new files from the repository and tries to perform an automatic merge if there are conflicts between the local files and the files in the repository. Alternatively, you may run git fetch and git merge to do the same thing as described in the section Replacing pull by fetch and merge. We shall now address what do to if the merge goes wrong, which occasionally happens.

Git will write a message in the terminal window if the merge is unsuccessful for one or more files. These files will have to be edited manually. Merge markers of the type >>>>>, ======, and <<<<< have been inserted by Git to mark sections of a file where the version in the repository differ from the local version. You must decide which lines that are to appear in the final, merged version. When done, perform git commit and the conflicts are resolved.

Graphical merge tools may ease the process of merging text files. You can run git mergetool --tool=meld to open the merge tool meld for every file that needs to be merged (or specify the name of a particular file). Other popular merge tools supported by Git are araxis, bc3, diffuse, ecmerge, emerge, gvimdiff, kdiff3, opendiff, p4merge, tkdiff, tortoisemerge, vimdiff, and xxdiff.

Below is a Unix shell script illustrating how to make a global repository in Git, and how two users clone this repository and perform edits in parallel. There is one file myfile in the repository.

#!/bin/sh
# Demo script for exemplifying git and merge

rm -rf tmp1 tmp2 tmp_repo   # Clean up previous runs

mkdir tmp_repo   # Global repository for testing
cd tmp_repo
git --bare init --shared
cd ..

# Make a repo that can be pushed to tmp_repo
mkdir _tmp
cd _tmp
cat > myfile <<EOF
This is a little
test file for
exemplifying merge
of files in different
git directories.
EOF
git init
git add .   # Add all files not mentioned in ~/.gitignore
git commit -am 'first commit'
git push ../tmp_repo master
cd ..
rm -rf _tmp

# Make a new hg repositories tmp1 and tmp2 (two users)
git clone tmp_repo tmp1
git clone tmp_repo tmp2
# Change myfile in the directory tmp1
cd tmp1
# Edit myfile: insert a new second line
perl -pi -e 's/a little\n/a little\ntmp1-add1\n/g' myfile
# Register change in local repository
git commit -am 'Inserted a new second line in myfile.'
# Look at changes in this clone
git log -p
# or a more compact summary
git log --stat --summary
# or graphically
#gitk
# Register change in global repository tmp_repo
git push origin master
cd ..

# Change myfile in the directory tmp2 "in parallel"
cd tmp2
# Edit myfile: add a line at the end
cat >> myfile <<EOF
tmp2-add1
EOF
# Register change locally
git commit -am 'Added a new line at the end'
# Register change globally
git push origin master
# Error message: global repository has changed,
# we need to pull those changes to local repository first
# and see if all files are compatible before we can update
# our own changes to the global repository.
# git writes
#To /home/hpl/vc/scripting/manu/py/bitgit/src-bitgit/tmp_repo
# ! [rejected]        master -> master (non-fast-forward)
#error: failed to push some refs to ...

git pull origin master
# git writes:
#Auto-merging myfile
#Merge made by recursive.
# myfile |    1 +
# 1 files changed, 1 insertions(+), 0 deletions(-)
cat myfile  # successful merge!
git commit -am merge
git push origin master
cd ..

# Perform new changes in parallel in tmp1 and tmp2,
# this time causing hg merge to fail

# Change myfile in the directory tmp1
cd tmp1
# Do it all right by pulling and updating first
git pull origin master
# Edit myfile: insert "just" in first line.
perl -pi -e 's/a little/tmp1-add2 a little/g' myfile
# Register change in local repository
git commit -am 'Inserted "just" in first line.'
# Register change in global repository tmp_repo
git push origin master
cd ..

# Change myfile in the directory tmp2 "in parallel"
cd tmp2
# Edit myfile: replace little by modest
perl -pi -e 's/a little/a tmp2-replace1\ntmp2-add2\n/g' myfile
# Register change locally
git commit -am 'Replaced "little" by "modest"'
# Register change globally
git push origin master
# Not possible: need to pull changes in the global repository
git pull origin master
# git writes
#CONFLICT (content): Merge conflict in myfile
#Automatic merge failed; fix conflicts and then commit the result.
# we have to do a manual merge
cat myfile
echo 'Now you must edit myfile manually'

You may run this file git_merge.sh named by sh -x git_merge.sh. At the end, the versions of myfile in the repository and the tmp2 directory are in conflict. Git tried to merge the two versions, but failed. Merge markers are left in tmp2/myfile:

<<<<<<< HEAD
This is a tmp2-replace1
tmp2-add2

=======
This is tmp1-add2 a little
>>>>>>> ad9b9f631c4cc586ea951390d9415ac83bcc9c01
tmp1-add1
test file for
exemplifying merge
of files in different
git directories.
tmp2-add1

Launch a text editor and edit the file, or use git mergetool, so that the file becomes correct. Then run git commit -am merge to finalize the merge.

Git working style with branching and stashing

Branching and stashing are nice features of Git that allow you to try out new things without affecting the stable version of your files. Usually, you extend and modify files quite often and perform a git commit every time you want to record the changes in your local repository. Imagine that you want to correct a set of errors in some files and push these corrections immediately. The problem is that such a push will also include the latest, yet unfinished files that you have committed.

Branching

A better organization of your work would be to keep the latest, ongoing developments separate from the more official and stable version of the files. This is easily achieved by creating a separate branch where new developments takes place:

Terminal> git branch newstuff      # create new branch
Terminal> git checkout newstuff
Terminal> # extend and modify files...
Terminal> git commit -am 'Modified ... Added a file on ...'
Terminal> git checkout master      # swith back to master
Terminal> # correct errors
Terminal> git push origin master
Terminal> git checkout newstuff    # switch to other branch
Terminal> git merge master         # keep branch up-to-date w/master
Terminal> # continue development work...
Terminal> git commit -am 'More modifications of ...'

At some point, your developments in newstuff are mature enough to be incorporated in the master branch:

Terminal> git checkout newstuff
Terminal> git merge master        # synchronize newstuff w/master
Terminal> git checkout master
Terminal> git merge newstuff      # synchronize master w/newstuff

You no longer need the newstuff branch and can delete it:

Terminal> git branch -d newstuff

This command deletes the branch locally. To also delete the branch in the remote repo, run

Terminal> git push origin --delete newstuff

You can learn more in an excellent introduction and demonstration of Git branching.

Stashing

It is not possible to switch branches unless you have committed the files in the current branch. If your work on some files is in a mess and you want to change to another branch or fix other files in the current branch, a "global" commit affecting all files might be immature. Then the git stash command is handy. It records the state of your files and sets you back to the state of the last commit in the current branch. With git stash apply you will update the files in this branch to the state when you did the last git stash.

Let us explain a typical case. Suppose you have performed some extensive edits in some files and then you are suddenly interrupted. You need to fix some typos in some other files, commit the changes, and push. The problem is that many files are in an unfinished state - in hindsight you realize that those files should have been modified in a separate branch. It is not too late to create that branch! First run git stash to get the files back to the state they were at the last commit. Then run git stash branch newstuff to create a new branch newstuff containing the state of the files when you did the (last) git stash command. Stashing used this way is a convenient technique to move some immature edits after the last commit out in a new branch for further experimental work.

Warning. You can get the stashed files back by git stash apply. It is possible to multiple git stash and git stash apply commands. However, it is easy to run into trouble with multiple stashes, especially if they occur in multiple branches, as it becomes difficult to recognize which stashes that belong to which branch. A good advice is therefore to do git stash only once to get back to a clean state and then move the unfinished messy files to a separate branch with git stash branch newstuff.

Replacing pull by fetch and merge

The git pull command actually performs two steps that are sometimes advantageous to run separately. First, a get fetch is run to fetch new files from the repository, and thereafter a git merge command is run to merge the new files with your local version of the files. While git pull tries to do a lot and be smart in the merge, very often with success, the merge step may occasionally lead to trouble. That is why it is recommended to run a git merge separately, especially if you work with branches.

To fetch files from your repository at GitHub, which usually has the nickname origin, you write

Terminal> git fetch origin

You now have the possibility to check out in detail what the differences are between the new files and local ones:

Terminal> git diff origin/master

This command produces comparisons of the files in the current local branch and the master branch at origin (the GitHub repo). In this way you can exactly see the differences between branches. It also gives you an overview of what others have done with the files. When you are ready to merge in the new files from the master branch of origin with the files in the current local branch, you say

Terminal> git merge origin/master

Especially when you work with multiple branches, as outlined in the section Git working style with branching and stashing, it is wise to first do a get fetch origin and then update each branch separately. The git fetch origin command will list the branches, e.g.,

* master
  gh-pages
  next

After updating master as described, you can continue with another branch:

Terminal> git checkout next
Terminal> git diff origin/next
Terminal> git merge origin/next
Terminal> git checkout master

Configuring the `git diff` command

The git diff command launches by default the Unix diff tool in the terminal window. Many users prefer to use other diff tools, and the desired one can be specified in your ~/.gitconfig file. However, a much recommended approach is to wrap a shell script around the call to the diff program, because git diff actually calls the diff program with a series of command-line arguments that will confuse diff programs that take the names of the two files to be compared as arguments. In ~/.gitconfig you specify a script to do the diff:

[diff]
external = ~/bin/git-diff-wrapper.sh

It remains to write the git-diff-wrapper.sh script. The 2nd and 5th command-line arguments passed to this script are the name of the files to be compared in the diff. A typical script may therefore look like

#!/bin/sh

diff "$2" "$5" | less

Here we use the standard (and quite primitive) Unix diff program, but we can replace diff by, e.g., diffuse, kdiff3, xxdiff, meld, pdiff, or others. With a Python script you can easily check for the extensions of the files and use different diff tools for different types of files, e.g., latexdiff for LaTeX files and pdiff for pure text files.

Replacing all your files with those in the repo. Occasionally it becomes desirable to replace all files in the local repo with those in the repo at the file hosting service. One possibility is removing your repo and cloning again, or use the Git commands

Terminal> git fetch --all
Terminal> git reset --hard origin/master

Merging with just one file from another branch. Say you have two branches, A and B, and want to merge a file f.txt in A with the latest version in B. To merge this single file, go to the directory where f.txt resides and do

Terminal> git checkout A
Terminal> git checkout --patch B f.txt

If f.txt is not present in branch A, and if you want to include more files, drop the --patch option and specify files with full path relative to the root in the repo:

Terminal> git checkout A
Terminal> git checkout B doc/f.txt src/files/g.py

Now, f.txt and g.py from branch B will be included in branch A as well.

Team work with forking and pull requests

In small collaboration teams it is natural that everyone has push access to the repo. On GitHub this is known as the Shared Repository Model. As teams grow larger, there will usually be a few people in charge who should approve changes to the files. Ordinary team members will in this case not clone a repo and push changes, but instead fork the repo and send pull requests, which constitutes the Fork and Pull Model.

Say you want to fork the repo https://github.com/somebody/proj1.git. The first step is to press the Fork button on the project page for the somebody/proj1 project on GitHub. This action creates a new repo proj1, known as the forked repo, on your GitHub account. Clone the fork as you clone any repo:

Terminal> git clone https://github.com/user/proj1.git

When you do git push origin master, you update your fork. However, the original repo is usually under development too, and you need to pull from that one to stay up to date. A git pull origin master pulls from origin which is your fork. To pull from the original repo, you create a name upstream, either by

Terminal> git remote add upstream \
              https://github.com/somebody/proj1.git

if you cloned with such an https address, or by

Terminal> git remote add upstream \
              git@github.com:somebody/proj1.git

if you cloned with a git@github.com (SSH) address. Doing a git pull upstream master would seem to be the command for pulling the most recent files in the original repo. However, it is not recommended to update the forked repo's files this way because heavy development of the sombody/proj1 project may lead to serious merge problems. It is much better to replace the pull by a separate fetch and merge. The typical workflow is

Terminal> git fetch upstream           # get new version of files
Terminal> git merge upstream/master    # merge with yours
Terminal> # Your files are up to date - ready for editing
Terminal> git commit -am 'Description...'
Terminal> git push origin master       # store changes in your fork

At some point you would like to push your changes back to the original repo somebody/proj1. This is done by a pull request. Make sure you have selected the right branch on the project page of your forked project. Press the Pull Request button and fill out the form that pops up. Trusted people in the somebody/proj1 project will now review your changes and if they are approved, your files are merged into the original repo. If not, there are tools for keeping a dialog about how to proceed.

Also in small teams where everyone has push access, the fork and pull request model is beneficial for reviewing files before the repo is actually updated with new contributions.

Cloning a repo with multiple branches

An annoying feature of Git for beginners is the fact that if you clone a repo, you only get the master branch. There are seemingly no other branches:

Terminal> git branch
* master

To see which branches that exist in the repo, type

Terminal> git branch -a
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/gh-pages
  remotes/origin/master
  remotes/origin/next

If there is only one remote repo that you pull/push from/to, you can simply switch branch with git checkout the usual way:

Terminal> git checkout gh-pages
Terminal> git branch
* gh-pages
  master
Terminal> git checkout next
Terminal> git branch
  gh-pages
  master
* next

You might need to do git fetch origin to see new branches made on other machines.

When you have more than one remote, which is usually the case if you have forked a repo, see the section Team work with forking and pull requests, you must use do a checkout with specifying the remote branch you want:

Terminal> git checkout -b gh-pages --track remote/gh-pages
Terminal> git checkout -b next --track upstream/next

Files can be edited, added, or removed as soon as you have done the local checkout.

It is possible to write a little script that takes the output of git branch -a after a git clone command and automatically check out all branches via git checkout.

Git workflows

Although the purpose of these notes is just to get the reader started with Git, it must be mentioned that there are advanced features of Git that have led to very powerful workflows with files and people, especially for software development. There is an official Git workflow model that outlines the basic principles, but it can be quite advanced for those with modest Git knowledge. A more detailed explanation of a recommended workflow for beginners is given in the developer instructions for the software package PETSc. This is highly suggested reading. The associated "quick summary" of Git commands for their workflow is also useful.

Git tips

How can I see which files are tracked by Git?

git ls-files is the command:

Terminal> git ls-files            # list all tracked files
Terminal> git ls-files -o         # list non-tracked files
Terminal> git ls-files myfile     # prints myfile if it's tracked
Terminal> git ls-files myfile --error-unmatch

The latter command prints an error message if myfile is not tracked. See man git-ls-files for the many options this utility has.

How can I reduce the size of a repo?

The command git gc can compress a git repository and should be run regularly on large repositories. Greater effect is achieved by git gc --aggressive --prune=all. You can measure the size of a repo before and after compression by git gc using du -s repodir, where repodir is the name of the root directory of the repository.

Occasionally big or sensitive files are removed from the repo and you want to permanently remove these files from the revision history. This is achieved using git filter-branch. To remove a file or directory with path doc/src/mydoc relative to the root directory of the repo, go to this root directory, make sure all branches are checked out on your computer, and run

Terminal> git filter-branch --index-filter \
         'git rm -r --cached --ignore-unmatch doc/src/mydoc' \
         --prune-empty -- --all
Terminal> rm -rf .git/refs/original/
Terminal> git reflog expire --expire=now --all
Terminal> git gc --aggressive --prune=now
Terminal> git push origin master --force  # do this for each branch
Terminal> git checkout somebranch
Terminal> git push origin somebranch --force

You must repeat the push command for each branch as indicated. If other users have created their own branches in this repo, they need to rebase, not merge, when updating the branches!

How can I restore missing files?

Sometimes you accidentally remove files from a repo, either by git rm or a plain rm. You can get the files back as long as they are in the remote repo. In case of a plain rm command, run

Terminal> git checkout `git ls-files`

to restore all missing files in the current directory.

In case of a git rm command, use git log --diff-filter=D --summary to find the commit hash corresponding to the last commit the files were in the repo. Restoring a file is then done by

Terminal> git checkout <commit hash> filename