Introduction to Git#

What is Git?#

Git is a database that allows you to store all the successive versions of a set of files, called a project, and to navigate in the version history. Its main usage (but not the only one) is for software development: it allows to keep track of the successive versions, go back to a previous version, revert changes or manage the development of several versions in parallel. Basically, it is an alternative to having several directories containing different versions of the same project or to maintaining several copies of the same file, suffixing them with some magic number to be able to go back to a previous version if something goes wrong… Without a versioning system (like Git, Mercurial or SVN), you soon realize that this is a nightmare to manage a project history. The database holding the history of the project is called a repository.

There are different families of versioning systems. Git belongs to the family referred as Distributed Version Control System (DVCS). One of the distinctive feature of the tools in this family is that they enable the collaboration of several different developers into the same project, still allowing each individual developer to have their own versioned workspace, called a clone. A clone contains the full history of the project and is “private” to a developer. Developers collaborate by exchanging their contributions through operations called merge. Despite its focus on flexible collaboration between developers, Git is a tool well suited also for personal developments: it is a standalone tool that does not require the configuration of complex services. Experience shows that a project that starts as private may become a shared project: with Git (and any other DVCS), this is easily done.

This documentation is a step-by-step introduction to main Git features. It doesn’t pretend to be an exhaustive documentation of Git. The definitive documentation is the Git Book, available in several European and Asiatic languages.

The First Commit#

To begin the Git tour, we’ll start by creating a new, empty repository for a project called “git-test”, add a README file to the project and to the repository. This is done with the following commands (assuming you want to create the repository in your current directory):

cd git-test
git init
echo "My first Git project" >> README
git add README
git commit -m 'Adding README file'

Let’s explain what we have done:

git init: create a new Git repository in the current directory. This results in the creation of a .git directory: this directory must not be altered or removed for any reason, it is the Git repository (database). It cannot be recovered, except from a backup. All interactions with the repository are done by the means of Git commands. Everything outside the .git directory is not part of the repository and can be recovered from it at any time, should something important be deleted. The directory containing the .git directory and all its contents (except .git itself) is called the working directory.
echo ...: not Git related, an easy way to create a file call README in the current directory.
git add README: despite the command name, doesn’t mean that the file is added to the Git repository. It just add this file to the list of files that Git manages (by default Git ignore all files until it is said to manage it).
git commit: this command saves to the repository everything that has been added with git add. All the modifications (often referred as a change set), saved with the same command, are called a commit. As we’ll see later, it is a good practice to execute git status before git commit: this displays all the modifications that will saved to the repository when the next git commit command is executed.

Note: this is a good practice to add a README file in every new repository, as part of the first commit.

When using Git for the first time on a given machine, it is also a best-practice to define name and email to be used in commits. This is typically done with:

git config --global user.name "GivenName Surname"
git config --global user.email "user@example.com"

Demystifying the Working Directory#

As a developer, you work in a directory called the working directory. You create, edit, delete, rename files and directories using the standard commands: Git has no commands of its own matching the standard commands. As said before, this directory also contains the repository itself (.git subdirectory) which must not be altered or deleted: its contents must be managed by Git commands only.

There is only one working directory associated with a given repository.

Checking the state of the working directory#

Before attempting to commit any change to the repository, it is recommended to check the state of the working directory with respect to the repository. This is done by the command:

git status

This command lists all the files in the working directory that have a different contents than their last version in the repository. This lists all the modified files in the working copy, whatever the current directory: modified files can be in the current directory, in a subdirectory of it or in its parent. Modifications can be:

A file whose contents has been modified: it can appear as either Changed but not updated or Changes to be committed, see below.
A new file (not yet existing in the repository): it will be listed in section Untracked, see below.
A removed file: treated the same way as a file with modified contents.

Adding/removing files to/from Git control#

Modify the README file with the following commands:

echo "" >> README
echo "This repository demonstrates how great Git is!" >> README

If you do a git status after this modification, you will see the README file listed in the section Changed but not updated. This illustrates a key feature of the Git workflow: first you make your changes according to what you want to achieve, then you decide to commit the changes in one or more commits that you will build from the files you changed, added or deleted. Changes are never implicitly added to commits: when you commit your changes, only the changes listed by git status in the section Changes to be committed will be committed, others will be ignored (except if you used the option -a when committing, asking that everything in section Changed but not updated must be added to the commit! Using this option is discouraged for new Git users). And in fact, as you may have noticed, git status is suggesting you the commands that makes sense to add your changes to the commit or to revert them.

The main commands to build a commit are:

git add file_or_directory: this is the main command to prepare a commit. It allows to add to the commit (stage) any file listed in sections Changes not staged for commit or Untracked files (untracked files are new files created and never added to any commit before, they are not added by the -a option mentioned above). If the parameter is a directory, every file or subdirectory is added.
git rm file or git rm -r directory: this command has to be used to confirm that a file or directory removed with usual commands have to be removed to the commit (add the removal to the commit). You can only do a git rm on files/directories previously part of a commit. Note that if the file/directory still exists, it is actually removed.
git reset file_or_directory: this command is used to remove from the commit being built a file modification that was previously added with git add or git rm and put it back in either Changes not staged for commit or Untracked files section. As for other commands, if the parameter is a directory, all files from this directory previously added to the commit are removed from the commit.
git checkout file_or_directory: see Discard changes section.

Note on empty directories: Git, as most of the DVCS, never stores an empty directory in the repository. There is no way (no option) to change this behaviour. If you want a directory to be in the repository, it has to contain at least one file. As a result ``git status`` never shows an empty directory in any of the sections it reports about. Doing a ``git add`` of an empty directory does nothing (but yields no error). Doing a ``git rm`` produces an error that the ``pathspec didn’t match any file`` (because it was never added to a previous commit).

In Git, every action related to files and directories (creation, edition, rename, deletion) can be done with the standard commands and tools of your environment: there is no need to use a specific Git command (as it was the case with SVN for example). When renaming a file or directory, the modification history is preserved. git mv is just a convenient shortcut for renaming a file and doing the relevant git add and git rm.

Discard changes#

Another important feature of a version control system is the ability to revert changes to a previously working version. In this section related to the working directory management, we describe how to discard a not yet committed change or set of changes. There are two sets of commands, depending whether you want to discard the change permanently or would like to be able to restore it later.

git checkout -- file_or_directory: discards all the changes made to the file specified or to all the files in the directory specified (and its subdirectories) and not yet added to next commit (with git add command). The file version restored is the the last version committed. There is no way to restore the change later, it is definitely forgotten. git status suggests you this command to discard your changes.
git stash: discards all the non-committed changes and put them in a special area called the stash where they can be recovered later with the command git stash pop. See the help (git help stash) for all the possible options, in particular the ability to associate a message to your stash entry (a stash entry is very similar to a commit) to facilitate its further identification if you have several entries in your stash).

Note that if the changes you want to discard have already been added to be part of the next commit (git add), you must first unstage your change with git reset and then discard the change using one of the commands above. Again, git status shows the exact command to execute.

Committing changes#

When you are finished with preparing your commit, you need to commit the changes you selected. This is done by the command git commit.

By default, this command opens your default editor so that you can edit the commit message. Commit messages are required in Git: if you exit the editor with an empty commit message (or a message containing only comments), the commit is aborted. This is a useful feature if you want to cancel your commit. A good commit message is:

short: a commit message is not a documentation! Typically 1 or a few lines.
explanatory: the commit message must make clear what has been done and the reason for doing it.

git commit accepts a lot of options: see git help commit for the details. We mention here only the most commonly used:

-m "message": an alternative to using an editor to edit the commit message. You will have no chance to cancel the commit…
-a: already mentioned, do an implicit git add of everything that git status lists in the Changed but not updated section. Both handy and dangerous… Usage strongly discouraged to new users as there is a risk to commit inappropriate things (despite it can be fixed by using the following option!).
--amend: this adds the selected changes to the previous commit rather than creating a new one. This must be used with caution when you are collaborating with others (see the relevant sections). But this is very handy when you build incrementally the solution to your problem: when you reach a first stage, you can do a first commit that will prevent your changes from being lost in case of a mistake and then you can do more modifications and add them to the initial commit. Another approach is described in the History clean up section.

Working with the History#

git log#

One of the most immediate benefit from using a versioning system is that you can navigate through the history of your changes. The history is organized as an ordered suite of commits. Each commit has an identifier, a date, an author… It describes all the changes applied to the previous state of the repository: this may imply modifications to several existing files, addition of new files and removal of existing files. Any commit (except the first one) has a parent or ancestor, the commit before it in the history.

Note: as most DVCS, Git doesn’t track a directory directly. A directory is known to Git only if it contains files (an empty directory cannot be added to the repository). Despite of this, Git properly tracks files moved to another directory.

The main command for looking at the history is git log. By default, it shows the history of the modifications in reverse order (from the latest change to the older). Among all the possible options, 2 are particularly useful:

--name-status: in addition to the basic information about each commit, shows the list of files affected by the commit (added, deleted, modified).
file_or_directory_name: if a file or directory name is specified, only commits affecting this file/directory will be displayed.

Commit identifiers#

Each commit has an identifier. This identifier is used by all Git commands that needs to refer to a particular commit. It is an hexadecimal number made of 32 digits. Unlike some other versioning systems, there is no commit number giving the relative position of the commit in the history (from 1 to n).

As typing a 32-digit number would not be very convenient, it is possible to type only the minimum number of leading digits to unambiguously identify the commit. Usually 7 digits are enough to unambigyously identify any commit.

Git also allows to use symbolic names to designate some commits. For example HEAD can be used instead of a commit identifier and always refer to the last commit in the history. It is also possible to use a relative position in the history from HEAD (i.e. before HEAD) with the following syntax: HEAD~n where is n is a number referring to the nth predecessor of HEAD (for example, HEAD~1 is the commit just before the last one, HEAD~3 is the third commit before the last one).

Looking at changes#

In addition to looking at the commit history, it is often useful to know the changes made by a commit or a serie of commits. The command allowing this is:

git diff first_commit_id..last_commit_id [-- file_or_directory]

where first_commit_id and last_commit_id are commit identifiers (see Commit identifiers). It is possible to restrict the changes displayed to those affecting some specific files by giving the file or directory names after -- (followed by a space).

git diff behaviour varies slightly depending of the options given and the state of the working directory (whether changes have been staged or not to be included in the next commit):

When a range of commit identifiers are specified, it shows the differences since the first commit (not included) up to the last commit (included).
When only one commit identier is specified, it show the differences since the commit specified (not included), including staged and unstaged changes in the working directory.
When ommiting the commit identifiers, the command displays the differences between the last commit and the changes to tracked files present in the working directory but not yet included in the next commit (changes showed under Changes not staged for commit by git status, see Checking the state of the working directory).
To see changes in the working directory (compared to last commit) not yet committed but already marked for addition in the next commit (changes showed under changed to be commited by git status), it is necessary to add the option --cached.

Reverting committed changes#

The usual approach to revert committed changes is to add a new commit that reverts the inappropriate changes. Apart from manually producing the (reverse) changes required, you can use Git commands. The command to use is different whether you want to revert an entire commit or only revert changes to some files in the commit:

Reverting an entire commit: git revert commit_id. Note that this command, depending on what happened since the commit that will be reverted, may produce conflicts. See below the sections about conflicts. Before running this command, you need to discard any change in your working directory (See section Discard changes).
Reverting a specific file to a previous version: git checkout commit_id -- file. This will restore in the working directory the file as it was in commit commit_id. Then you need to commit this version using the usual procedure.

Note: another approach is called history rewriting. This is a very advanced feature whose misuse can be catastrophic. For this reason it is not described in this documentation. Refer to `Git Book <http://git-scm.com/book>`_ if you think you have a good reason for using this feature.

Discarding last commits#

git reset is a command that allows to undo the last commits. By default, changes are not lost: the indicated commits are removed from the history but the corresponding changes are kept in the working copy (appearing as Changes not staged for commit or Untracked files if the files were added), so that it is possible to commit again the same changes. This command also accepts some options that allow to completely discard the changes made by the last commits, in particular option --hard: these options must be considered as advanced features and must not be used if their consequences are not well understood!

The command syntax for its default behaviour is:

git reset commit_id

where commit_id is a commit identifier (see Commit identifiers). commit_id is the last commit to be preserved: all the further commits will be removed from the history and after the command HEAD will refer to commit_id.

History clean up#

One advanced feature of Git is the ability to modify the history, that means to delete any commit from the history, change the order of the commits or merge several commits into one (squash commits). In itself this is a very dangerous feature if used inappropriately: in particular, the history must never been rewritten if the changes have already been shared with others (see Collaboration with Other Developers section).

There is one situation where modifying the history is useful, keeping in mind that it must be done before sharing the changes with others: to clean up the modification history after completing a “development step”. Quite often, fixing a problem or adding a new feature is done incrementally, resulting in an history with a lot of related commits that reflects different attempts or wrong directions taken during the development (it is a best practice to commit often in his personnal reppository, even things that are not working yet or not completely tested). In this situation, it is generally desirable to cleanup the history and keep only a set of commits that really reflects the changes required to address the objective of the development.

To achieve this, it is possible to use the command git rebase -i: note that -i option is mandatory in this context (git rebase is a much more general command, with many advanced features, not covered here). The command syntax is:

git rebase -i commit_id

where commit_id is a commit identifier (see Commit identifiers). This command opens an editor window with all the commits from commit_id to HEAD listed each one a separate line. Each line starts with an action that will be applied to the commit, the default one, pick, meaning that the commit will be kept as such in the rebase (history rewritting) operation. This default operation can be changed to one of the others allowed (see the text in the editor window) and the order of the commits can be changed by changing the order of the lines in the editor window. When changing the order of the commits, Git checks that this doesn’t cause conflicts (with a subsequent change in the new history modifying the same part of a file as the commit moved). One of the actions, squash, allows to merge 2 or more consecutive commits into one. When merging them, Git asks for a new commit message for the merged commit (proposing the original commit messages as the default).

If the rebase operation results in a conflict (see Conflicts section), the working directory is put in a special state (displayed by git status) that allows to fix the problem and continue the rebase or to abort the rebase:

To abort the rebase (the easiest!), the command is git rebase --abort.
If the problems were fixed and you want to continue the rebase operation, the command is git rebase --continue.

When exiting the editor window without any change, the rebase operation is aborted and no change happens.

Branches#

What is a branch?#

Up to now, we learned about the history, how it is organised, how we can navigate through it or even modify it. But having only one unique history of all the modifications in a repository will not allow to deal with the real life where, for a given project, you generally need to work on several different things in parallel, to maintain a stable version and one or several development versions, where you want to try new ideas without impacting other things. To address this need, Git, as all versioning systems, allow to create several branches in the same repository. Each branch allows to record a distinct history of changes. Each branch has its own HEAD (also called the tip of the branch) that always refers to the last commit in the branch.

Generally, a branch starts from some commit in another branch, leading to a tree-like structure of the history of the repository:

branch1    A - B - C - D    branch1
               |
branch2        F - G - H    branch2
                   |
branch3            J - K    branch3

This history representation illustrates a repository with 3 branches named branch1, branch2, branch3 which contain the following commits:

branch1: A, B, C, D
branch2: A, B, F, G, H
branch3: A, B, F, G, J, K

A commit like B which is the start of a new branch is referred to as the ‘’common ancestor`` of commits C and F. That means that at commit B, branch1 and branch2 were identical (with the same history).

Note that a branch in Git has always a name. And this name can be used as a commit identifier in any command requiring one: it is interpreted as the head (i.e. the last commit) of the specified branch (whereas HEAD always refers to the head of the current branch).

Note: when a new repository is created with ``git init command, a default branch called master is created. Despite the fact that this branch is often used as the main branch of the repository, this branch is treated as any other branch. It is not required to exist and can be deleted or renamed if it exists.’’

Creating a New Branch#

Creating a new branch, i.e. forking the history line, can be done with one of the following two commands:

git branch new_branch_name commit_id: this command creates a new branch named new_branch_name from the commit identifier commit_id (see Commit identifiers). If commit_id is a branch name, it is interpreted as the head of the specified branch. If it is ommited, HEAD is used (the head of the current branch, also called the tip of the branch).
git checkout -b new_branch_name commit_id: does the same as git branch but after creating the branch, makes it the current branch and checks it out in the working directory, replacing the previous contents of the working directory.

To get the list of the existing branches, use git branch without a branch name. The branch prefixed with a * is the current branch (the one checked out in the working directory).

Switching between Branches#

There is one working directory shared by all the branches of one Git repository. At any time, there is only one active branch, denoted by a * in git branch output and displayed by git status.

To switch the working directory from one branch to another one, the command is:

git checkout branch_name

The branch branch_name must already exist (see Creating a New Branch for creating a new branch).

This command replaces the working directory contents by the content of the specified branch. Before doing it, Git checks that this will not result in loosing some uncommitted changes: if this is the case, it raises an error and cancels the checkout operation without doing any modification to the working directory. One typical source of conflict for a checkout operation is a file with uncommitted changes whose contents will be modified when doing the checkout (in the new branch, this file has a different contents than in the current branch).

Merging changes between branches#

In addition to forking the history by creating new branches, it is often desirable to get the changes done in one branch merged into another branch. This is typically the case if you create a new branch to add (and test) a new feature to an application without impacting the production version: when it is ready and tested, you want to merge this new feature into the production version without redoing the change in the production branch. This operation is called merge and is done with the command git merge. This command is very powerful and has many options. Here we’ll cover only the most common uses of this command.

A merge is an operation that can be done only between two branches that share a common ancestor. That means that one of the branch must be a fork of the other branch or a branch the other branch is derived from. Below are some examples:

A - B - C - D - E - F     branch1
    |
    O - P - Q             branch2
            |
            V - W - X     branch3

In the above example, branch3 can be merged with branch2 (common ancestor: commit Q) or with branch1 (common ancestor: commit B).

The merge operation consists of adding the changes done in one branch (the source branch) since the common ancestor on top of another branch (in Git, the current branch). There are in fact two different situations illustrated by the example above:

Merging branch3 into branch2: as nothing happened in branch2 since branch3 was created from it, the merge operation only updates the HEAD (also called the tip) in branch2 to refer to the same commit as the HEAD in branch3 (commit X). This type of merge is referred to as a fast-forward merge.
Merging branch2 (or branch3) into branch1: as both branch2 and branch1 have been modified since commit B, it is necessary to create a new commit in branch1 (after F) that will have two parents (the HEAD of each branch). Thanks to this special commit, called a merge commit, branch1 will contain all the previous commits from branch1 and all the commits from branch2 (O, P, Q). This is denoted by the history figure below:

A - B - C - D - E - F - G - H   branch1
    |                   |
    O - P - Q - - - - - - - R   branch2
            |
            V - W - X           branch3

Note that, as shown by the figure above, the merge operation doesn’t close or delete branch2 and only affects branch1 (the merge commit is created in branch1 and can be reached only from this branch, branch2 doesn’t know anything about the merge commit). After the merge, commits can be added either to branch1 or to branch2. And further, branch2 can be merged again into branch1 as shown below:

A - B - C - D - E - F - G - H - I - J   branch1
    |                   |       |
    O - P - Q - - - - - - - R - - -     branch2
            |
            V - W - X                   branch3

Git command to merge branches is the same in both situations (fast-forward and non fast-forward, often referred to as recursive, merge). The current branch when the command is executed must be the target branch (the branch where the merge commit will be created if the merge is not fast-forward). The command is:

git merge [--ff-only] [-m msg] source_branch

Option --ff-only says that the merge must be done only if it can be fast-forwarded: it is mentioned here as there are situations where a fast-forward merge is expected/wanted and where the merge should fail if this is not possible (in these situations, this is generally the sign of a messed up history).

Option -m msg allows to define the commit message that will be used for the merge commit. Git generate a default message if not specified. Generally, the default message is used.

In a merge operation, there are circumstances where the two branches involved modifies the same section of the same file. This results in a conflict that prevents the automatic merge to complete. See section about Conflicts to know how to handle and fix them.

Note: it is strongly recommended to do a ``git status`` before starting a merge operation and either commit or discard changes in the working directory not yet committed. See `Discard changes`_ section to learn how to temporarily discard uncommitted changes.

Deleting a branch#

When a branch is no longer needed, it is possible to delete it. This is typically the case when a branch used for a specific development has been merged into another branch, for example the production branch. This is done with the command:

git branch -d branch_to_delete

branch_to_delete must not be the current branch.

If branch_to_delete has not been merged into another branch, the attempt to delete it fails. This is done on purpose, as deleting the branch would result in all commits belonging only to this branch to become unreachable (not referenced by any branch). If you really want to delete this branch because its contents is no longer useful, option -d must be replaced by option -D.

Renaming a branch#

It is sometimes desirable to rename a branch to reflect its new purpose. This is easily done with:

git branch -m current_name new_name

This command can be done on the current branch.

Rebase#

Rebasing a branch is the operation that allows to modify the existing history of a branch: this is a destructive operation in the sense that there is no way to roll back a rebase operation. For this reason, it should be considered as an advanced operation. And in practice, it is forbidden to rebase a branch history that has already been shared with others (see Collaboration with Other Developers section). But there are a few circumstances where this is a very useful operation, see in particular History clean up section.

This section describes another use case for rebasing, when working with branches. The merge operation, described above, allows to incorporate changes done in another branch into the current branch, either by adding the commits from the other branch (branch3 in the example above) to the current one (branch2) (fast-forward merge) or by creating a new commit in the current branch (branch1) that will add the changes done in the other branch (branch3) in parallel with those done in the current branch. But in this later case, there are circumstances where you want to do a fast-forward merge rather than a full merge.

To make this possible, it is necessary to rebuild the history of branch3 since it diverged from branch1 as an history based on current branch1 HEAD. Rather than redoing the work manually, Git allows to do this with an history rewrite: every commit since branch3 diverged from branch1 is recreated, one by one, as a commit based on current branch1 HEAD. That means that every commit, describing the modifications to apply to the previous commit, will be recomputed based on the files contents in branch1 HEAD. At the end of the rebase operation, branch3 will contain the same number of specific commits (not in branch1), with the original commit message (and author) being preserved.

In a rebase operation, as in a merge operation, there are circumstances where the two branches involved modify the same section of the same file. This results in a conflict that prevents the rebase operation to complete. See section about Conflicts to know how to handle and fix them and how to continue or abort the rebase operation. When aborting a rebase operation, the repository is left in the state where it was before starting the merge, without any partial modification. Until the rebase is either completed or aborted, the repository is in a special state without a current branch defined.

Note: it is strongly recommended to do a ``git status`` before starting a rebase operation and either commit or discard changes in the working directory not yet committed (in fact Git doesn’t allow to start a rebase operation if there are uncommitted changes in the working area). See `Discard changes`_ section to learn how to temporarily discard uncommitted changes.

The base syntax of the rebase command is:

git rebase new_upstream

where new_upstream` is the name of the branch you want to rebase the local branch on.

To illustrate the result of rebase operation, assume that we have two branches branch1 and branch2 with the following history:

A - B - C - D             branch1
    |
    O - P - Q             branch2

branch2 is the current branch and we had like to rewrite its history (O, P, Q) so that branch2 appears to be derived from D in branch1 rather than B. This is done with:

git rebase branch1

As a branch name refers to the last commit in the branch, branch1 in this context means commit D. If the command completes successfully (i.e. there is no changes in C and D conflicting with those made in branch2, the result will be:

A - B - C - D             branch1
            |
            O' - P' - Q'  branch2

Conflicts#

A conflict happens during a merge or rebase operation if two commits from two different branches attempt to modify the same part of a file. For example, a file test has the following contents:

Hello World!

In branch A, this file is modified and contains:

Hello World!
This is the typical test...

In branch B, the initial version of the same file file is also modified and contains:

Hello World!
Anybody listening?

If you attempt to merge these two branches, Git cannot decide what should be the final result: all lines in the final file, ignore one modification… To avoid producing an unexpected result, Git generates a conflict with the required information allowing to manually select what should be done. This manual operation is called conflict resolution. For exemple if the current branch is A and you try to merge B in it, you’ll get the following error message:

Auto-merging test
CONFLICT (content): Merge conflict in test
Automatic merge failed; fix conflicts and then commit the result.

git status command also makes very clear that there is a conflict and what are the files affected by the conflict. Based on our example, you’ll get:

# On branch A
# You have unmerged paths.
#   (fix conflicts and run "git commit")
#
# Unmerged paths:
#   (use "git add <file>..." to mark resolution)
#
#       both modified:      test
#

There are several ways to fix conflicts, including the use of specialized tools not described here. In situations not too complex (in collaborative projects with many developers, a conflict may not be trivial to fix if several developers edited the same section of code), a good starting point is to look at the conflicting file and to search for <<<<<<< marks: they are the start of a conflicting section (there may be several in the same file) that ends with a matching >>>>>>>. To see only the conflicting sections (not the whole file), use the git diff command that will display only the conflicting sections in the same format by default (and possibly in richer formats, see git help merge). Based on our example, test contents will be:

Hello World
<<<<<<< A
This is the typical test
=======
Anybody listening?
>>>>>>> B

The first part, before the ======= line, is what was in branch A before the merge attempt, and the second part is what is in branch B. In this case, suppose you would like to keep the three lines, edit the file, remove the conflict markers, use git add to add your modifications to the merge commit and git status to check for remaining conflicts. Then complete the merge operation with git commit without any arguments: the git merge (or git rebase) command saved the commit message in a temporary location and it will be used by the git commit command that completes the merge (or the commit rebase in the case of git rebase) after the conflict(s).

Note: if there are multiple changes to merge in a file, all those that don’t cause a conflict are pre-merged (integrated to the file in the working directory) and only those with a conflict are marked as such. But all the changes will be part of the same commit after the conflict resolution has been done.

git rebase has to process all the commits one by one. After resolving a conflict for one commit, it is necessary to continue the rebase operation for next commits (note that there is a risk that the conflict resolution for one commit is causing a conflict when rebasing next commit and it is sometimes preferable to abort the rebase rather than fixing conflicts). This is done with:

git rebase --continue

If for some reason the conflict is too hard to solve, it is possible to abort the merge operation with:

git merge --abort

Similarly if the conflict happened as part of a rebase operation rather than a merge, it is possible to cancel it, in which case no modification at all will be performed, with

git rebase --abort

In the case of a merge operation (conversely to a rebase operation), if the merge is completed but you want to change your mind, it is possible to revert or rollback the merge with the commands described in sections Reverting committed changes and Discarding last commits.

Collaboration with Other Developers#

One of the distinguishing features of DVCS, and Git in particular, is that it allows very flexible collaboration workflows between developpers in a project. Every developer has its own repository and controls the peering of its repository with other developer repositories, on a branch by branch basis. He has locally the full history for each of his own branches and peered branches. Peering is controlled by a Git object called a remote and two specific operations:

fetch: this is the action of updating the history of a remote in the local repository. This requires to have read access to the remote repository
push: this is the action of pushing local changes to a peered remote branch. This requires to have write access to the remote repository

Remotes#

A remote defines how to connect to a remote repository. It has an arbitrary name and is added to an existing repository with the following command:

git remote add remote_name remote_url

with:

remote_name: the name used to designate the remote
remote_url: the URL used to connect the remote. The protocol is typically ssh: or https:. A URL example is: ssh://git@gitlab.in2p3.fr/jouvin/npac-documentation.git

It is also possible to add the remote at repository creation time, that means to create the association between the local repository and another one when the local repository is created. This operation is referred to as cloning. The command to do it is:

git clone remote_url [directory]

with remote_url having the same value as in git remote add command. This results in the creation of a repository with one remote configured pointing to the remote_url and called origin. The created repository name is directory if specified else it defaults to the same name as the source one (with .git extension removed if present in the URL).

Note: most commands working with remotes use remote ``origin`` by default, if the remote name is not explicitly specified.

Remote Branches#

Branches of remote repositories, called remote branches have a name starting by remote_name/ followed by the remote branch name. These branches have some specificities, in particular they are not meant to be checked out directly and thus it is not possible to add commits to these branches with the git commit command. Instead, they are used to create local branch (see Creating a New Branch) that will be synchronized with them using specific commands (see Synchronizing repositories). A local branch associated with a remote branch is often called a tracking branch. When such a branch is created, Git explicitly says that the local branch will be a tracking branch, as in the following example:

% git checkout -b test origin/json
Branch test set up to track remote branch json from origin.

Remote branches have other unique features. They cannot be deleted with the git branch command: to delete them, the branch in the remote repository or the remote repository definition must be deleted. Management operations on remote branches are done with git remote command: refer to Git documentation for details.

Apart from these management operations, their contents and history can be examined with the usual Git commands.

Note: even though a remote branch is not intended to be checked out, it is not invalid to do it. In this case, Git warns that the working directory is in a ``detached HEAD`` state: this means that the branch can be examined but that new commits will not be part of a branch (and almost lost). To exit this state, just do a ``git checkout`` of a normal branch or create a new branch based on the remote branch.

Synchronizing repositories#

Synchronizing repositories involves 3 different operations in Git, each one implemented by its own command:

Updating remote branches in the local repository to reflect the contents of the associated branch in the remote repository. This is done by the git fetch command. The remote branch contents is simply overwritten (rebased).
Updating the local tracking branch with the remote branch contents. This is done by a merge operation by default (see Merging changes between branches) and thus may result in conflicts (see Conflicts).
Pushing local changes to the remote repository: by default, this is done by merging local changes into the remote branch (see Merging changes between branches) and then updating the remote repository with the remote branch contents (overwritting it). This is done by git push command and requires the ability to connect to the remote repository and to have the write permissions in the remote repository.

When updating a local branch with a remote repository contents, it is possible to combine the git fetch command and the associated merge operation in one Git command: git pull.

Good Practices#

Topical branches#

A typical, recommended, workflow when working with Git (in fact, applicable to most DVCS) is to use a branch for any development and not to do any direct development in the master branch or other release/production branches. In this scenario the master branch (that is not required to exist but is typically the first branch created by default) is a branch that contains tested modifications/features supposed to work at any revision (commit). The development is done in other branches, that after being tested and ideally reviewed by others, are merged into the master branch (or other release/production branches like v1, v2 if relevant).

Here is an example of a typical workflow in a project, possibly with a remote reference repository, often called upstream (ignore the git fetch and git push commands if working only with one local repository):

# Update information about all remote branches
git fetch --all

# Update the local master to reflect the upstream master
# Any non fast-forward merge of upstream changes is considered an error
git checkout master
git merge --ff-only upstream/master

# Create a new topical branch for feature1
# Specifying upstream/master is optional if the previous commands succeeded
git checkout -b feature1 upstream/master

# Make modifications and commit them
edit ....
git add ...
git commit -m 'feature1 first mod'
edit ....
git add ...
git commit -m 'feature1 second mod'
Run some test
Fix errors
git add ...
git commit -m 'feature1: fix mistakes in previous mods'

# Merge the tested changes from feature1 into local copy of upstream/master
# See section about conflicts if some occurs during merge of feature1
git checkout master
git fetch --all
git merge --ff-only upstream/master
git merge feature1

# Push changes to upstream master
# First check that the result will be what is expected: in particular
# check that the right remote is used in case you have several configured.
# copy of upstream/master.
git push -n upstream
git push

# If the push cannot be done without a forced update, be sure that this is what
# is expected (should normally never happens with an upstream repository):
# generally means that the previous merge was not done on an up-to-date. Try
# to rebase your local master on the last version of the remote one.
git fetch --all
git rebase upstream/master

# Start to work on another feature: create a new branch
# Run git fetch before, if necessary.
# Edit, commit, test, fix...
git checkout -b feature2 upstream/master
edit ....
git add ...
git commit -m 'feature2: some mod'

# Merge the tested changes from feature2 into local copy of upstream/master
# See section about conflicts if some occurs during merge of feature2
git checkout master
git fetch --all
git merge --ff-only upstream/master
git merge feature2

# Improvement to original feature1 branch: either create a new branch following
# previous examples or reuse the original feature1 as suggested below and
# update it with all changes done in upstream/master since the last time
# we merged it. If not able to merge fast-forward, create a new branch.
# If necessary, run git fetch before.
git checkout feature1
git merge --ff-only upstream/master

# Do modifications as demonstrated before...

Repository backup#

It is important to backup a Git repository, if the file system where it resides is not centrally backed up. The easiest way to do it is to setup a remote (see Collaboration with Other Developers section) on another machine (it is also possible to use services like GitHub, BitBucket, GitLab and others to host this remote repository but this is not covered in this documentation). Then doing a backup of the Git repository is nothing more than doing a git push (with the appropriate remote name if this is not origin). If you want to be sure that the remote is an exact mirror of your local repository, disabling checks for potential conflicts and ensuring that any local branch is mirrored, use:

git push --mirror [remote_name]

Advanced Features#

bisect#

The git bisect command is used to quickly locate a commit that introduced a problem. The principle is that you give the first (good) and last (bad) commit of a range of commits and Git walks through the history selecting one commit (the commit in the middle of the good-bad range) and asking you to tag it as good or bad. Each selected commit is checked out and you can perform any test you want to decide whether it is good or bad, from code inspection to code execution. This command as many options: see git help bisect for details. A typical bisect workflow is given below:

git bisect start                # Start the bisect operation
git bisect bad                  # Define the current revision as bad
git bisect good some_commit     # Define a commit as good
A checkout of one revision in the range of commit is done by Git.

After testing the checked out commit, use git bisect good or git bisect bad (without a commit identifier) to qualify the current commit: this causes Git to select another commit until the end of the commit range. When the bisect operation is completed or to abort it and start a new one, use:

git bisect reset

Tags#

A tag is just a name attached to a commit to facilitate further reference to it. This is typically used in conjunction with the workflow suggested in Topical branches section, where you have one branch reflecting the application history. Tags allow for example to define versions: in this case the tag name is the version number (in whatever format is chosen for the application). A tag can be used anywhere a commit identifier is expected and even if new commits are added to the branch after the tag has been defined, the tag will continue to refer to the commit it is associated with. To create a tag, the command is:

git tag tag_name [commid_id]

where tag_name is the tag name (an arbitrary string, for example v2.0.1) and optionally the commit to tag. If the commit is not specified, the HEAD of the current branch is tagged.

To display the list of tags already defined, use command git tag without any arguments.

Note: when checking out a tag, the working directory is placed in a state called ``detached HEAD``. If modifications based on this tag are done, before committing them, first create a branch with:

git checkout -b my_new_feature

Git Vocabulary#

branch: an history of modifications (commits).

clone: a repository clone is a repository initialized with the contents of another existing repositories and with a remote defined pointing the reference repository.

commit: all the file modifications that make a changeset.

fork: synonym for a clone. fork is particularly to refer to personal clones based on a project repository in platforms like GitHub.

HEAD: the symbolic name referring to the most recent commit in the current branch (the tip of the current branch). HEAD can be used wherever a commit identifier is expected.

index: an area, distinct from the working directory, where changes are staged before being committed to the repository.

remote: a name referring to a remote Git repository that can be used to fetch or pull modifications.

repository: a database containing the whole history of a project. This is made of a set of files, stored in a .git repository.

tag: a symbolic name associated with a specific commit.

tip: the name used to refer to the most recent commit in a branch.

working directory: the directory that contains the files checked out from the repository. It is used to edit the files before committing them to the repository. Generally, the repository is stored in .git subdirectory of the (top-level) working directory.

More Information#

git help: Git has a very complete online documentation, accessible with git help for the general help and git help command for a specific command.
Git Book: this is the definitive documentation about Git, available in several European and Asiatic languages.

NPAC Computing Course

Introduction to Git

Contents