Version control allows you to keep track of what you did when, undo any changes you have decided you don't want, and collaborate at scale with other people.
Stregnths:
Nothing that is saved to Git is ever lost, so you can always go back to see which results were generated by which versions of your programs.
Git automatically notifies you when your work conflicts with someone else's, so it's harder (but not impossible) to accidentally overwrite work.
Git can synchronize work done by different people on different machines, so it scales as your team does.
Basic Workflow
.git
located in the root directory of the repository. Git expects this information to be laid out in a very precise way, so you should never edit or delete anything in .git
.git status
)git status
, which displays a list of the files that have been modified since the last time changes were saved.ls
to list the files in your current working directory.git diff filename
)git status
shows you which files are in this staging area, and which files have changes that haven't yet been put there. In order to compare the file as it currently is to what you last saved, you can use git diff filename
.git diff
without any filenames will show you all the changes in your repository, while git diff directory
will show you the changes to the files in some directory.diff --git
). In it, a
and b
are placeholders meaning "the first version" and "the second version".--- a/data/nothern.csv
and +++ b/data/nothern.csv
, which indicate that lines being removed are prefixed with -
, while lines being added are prefixed with +
.@@
that tells where the changes are being made. The pairs of numbers are start line,number of lines changed
. Here, the diff output shows that 3 lines from line 22 are being removed and replaced with 4 lines.-
showing deletions and +
showing additions. (We have also configured Git to show deletions in red and additions in green.) Lines that haven't changed are sometimes shown before and after the ones that have in order to give context; when they appear, they don't have either +
or -
in front of them. In this example we see that only 1 line was added and the 3 first lines are to give context.git add filename
)git add filename
. git diff -r HEAD
)git diff -r HEAD
. The -r
flag means "compare to a particular revision", and HEAD
is a shortcut meaning "the most recent commit".You can restrict the results to a single file or directory using git diff -r HEAD path/to/file
How do I commit changes? ( git commit
)
To save the changes in the staging area, you use the command git commit
. It always saves everything that is in the staging area as one unit: as you will see later, when you want to undo changes to a project, you undo all of a commit or none of it.
When you commit changes, Git requires you to enter a log message. This serves the same purpose as a comment in a program: it tells the next person to examine the repository why you made a change.
Writing a better log message. git commit -m "message"
is good enough for very small changes, but your collaborators will appreciate more information. If you run git commit
without -m "message"
, Git launches a text editor with a template like this (The lines starting with #
are comments, and won't be saved):
How can I view a repository's history or a specific file's history? ( git log
)
Repository's history. The command git log
is used to view the log of the project's history. Log entries are shown most recent first, and look like this:
Specific file's history. You can do this using git log path
, where path
is the path to a specific file or directory.
Remark.
Passing -
then a number restricts the output to that many commits. For example, git log -3 report.txt
shows you the last three commits involving report.txt
.
The log for a file shows changes made to that file; the log for a directory shows when files were added or deleted in that directory, rather than when the contents of the directory's files were changed
Interlude: how can I edit a file?
Repositories
git show
)git show
with the first few characters of the commit's hash. For example, the command git show 0da2f7
produces this: git log
. The second part shows the changes; as with git diff
, lines that the change removed are prefixed with -
, while lines that it added are prefixed with +
HEAD~1
)HEAD
always refers to the most recent commit. The label HEAD~1
then refers to the commit before it, while HEAD~2
refers to the commit before that, and so on. git annotate file
)git log
displays the overall history of a project or file, but Git can give even more information: the command git annotate file
shows who made the last change to each line of a file and when. For example, the first three lines of output from git annotate report.txt
look something like this:git diff ID1..ID2
)git show
with a commit ID shows the changes made in a particular commit. To see the changes between two commits, you can use git diff ID1..ID2
, where ID1
and ID2
identify the two commits you're interested in, and the connector ..
is a pair of dots. For example, git diff abc123..def456
shows the differences between the commits abc123
and def456
, while git diff HEAD~1..HEAD~3
shows the differences between the state of the repository one commit in the past and its state three commits in the past.git add
at least once before it starts paying attention to a file..gitignore
).gitignore
and storing a list of wildcard patterns that specify the files you don't want Git to pay attention to. For example, if .gitignore
contains:build
*.mpl
then Git will ignore any file or directory called build
(and, if it's a directory, anything in it), as well as any file whose name ends in .mpl
.
How can I remove unwanted files? ( git clean
)
Git can help you clean up files that you have told it you don't want. The command git clean -n
will show you a list of files that are in the repository, but whose history Git is not currently tracking. A similar command git clean -f
will then delete those files.
Use this command carefully: git clean
only works on untracked files, so by definition, their history has not been saved. If you delete them with git clean -f
, they're gone for good.
Git configuration
To see what the settings are, you can use the command git config --list
with one of three additional options:
--system
: settings for every user on this computer.--global
: settings for every one of your projects.--local
: settings for one specific project.git config --global setting.name setting.value
with the setting's name and value in the appropriate places. The keys that identify your name and email address are user.name
and user.email
respectively.user.email
) configured for the current user for all projects to rep.loop@datacamp.com
. Undo
git reset HEAD
)analysis.R
and spot a bug in cleanup.R
. After you have fixed it, you want to save your work. Since the changes to cleanup.R
aren't directly related to the work you're doing in analysis.R
, you should save your work in two separate commits.git add path/to/file
.git reset HEAD
and try again.git add
periodically to save the most recent changes to a file to the staging area.git checkout
)git checkout -- filename
will discard the changes that have not yet been staged. (The double dash --
must be there to separate the git checkout
command from the names of the file or files you want to recover.)
Use this command carefully: once you discard changes in this way, they are gone forever.
Ex.
How do I undo changes to staged files?
At the start of this chapter you saw that git reset
will unstage files that you previously staged using git add
. By combining git reset
with git checkout
, you can undo changes to a file that you staged changes to. The syntax is as follows.
git reset HEAD path/to/file
git checkout -- path/to/file
git checkout
can also be used to go back even further into a file's history and restore versions of that file from a commit.For example, if git log
shows this:
commit ab8883e8a6bfa873d44616a0f356125dbaccd9ea Author: Author: Rep Loop Date: Thu Oct 19 09:37:48 2017 -0400 Adding graph to show latest quarterly results.
commit 2242bd761bbeafb9fc82e33aa5dad966adfe5409 Author: Author: Rep Loop Date: Thu Oct 16 09:17:37 2017 -0400 Modifying the bibliography format.
then git checkout 2242bd report.txt
would replace the current version of report.txt
with the version that was committed on October 16. Notice that this is the same syntax that you used to undo the unstaged changes, except --
has been replaced by a hash.
Remark. Restoring a file doesn't erase any of the repository's history. Instead, the act of restoring the file is saved as another commit, because you might later want to undo your undoing.
One way to do this is to give git reset
a directory. For example, git reset HEAD data
will unstage any files from the data
directory. Even better, if you don't provide any files or directories, it will unstage everything. Even even better, HEAD
is the default commit to unstage, so you can simply write git reset
to unstage everything.
Similarly git checkout -- data
will then restore the files in the data
directory to their previous state. You can't leave the file argument completely blank, but recall that you can refer to the current directory as .
. So git checkout -- .
will revert all files in the current directory.
Ex. We want to remove all files form stagin area and putting them back in their previous state. After all this they "exist" but are unstaged.
Working with branches
Branching is one of Git's most powerful features, since it allows you to work on several things at once without tripping over yourself.
git branch
)master
(which is why you have been seeing that word in Git's output in previous lessons). To list all of the branches in a repository, you can run the command git branch
. The branch you are currently in will be shown with a *
beside its name. git diff branch-1..branch-2
)git diff revision-1..revision-2
shows the difference between two versions of a repository, git diff branch-1..branch-2
shows the difference between two branches.git checkout
with the name of a branch to switch to that branch. git checkout -b branch-name
)git branch
to create a branch, and indeed this is possible. However, the most common thing you want to do is to create a branch then switch to that branch.
In the previous exercise, you used git checkout branch-name
to switch to a branch. To create a branch then switch to it in one step, you add a -b
flag, calling git checkout -b branch-name
,
The contents of the new branch are initially identical to the contents of the original. Once you start making changes, they only affect the new branch.
How can I merge two branches? ( git merge
)
Branching lets you create parallel universes; merging is how you bring them back together. When you merge one branch (call it the source) into another (call it the destination), Git incorporates the changes made to the source branch into the destination branch. If those changes don't overlap, the result is a new commit in the destination branch that includes everything from the source branch. (The next exercises describes what happens if there are conflicts.)
To merge two branches, you run git merge source destination
. Git automatically opens an editor so that you can write a log message for the merge; you can either keep its default message or fill in something more informative.
What are conflicts?
Sometimes the changes in two branches will conflict with each other: for example, bug fixes might touch the same lines of code, or analyses in two different branches may both append new (and different) records to a summary data file. In this case, Git relies on you to reconcile the conflicting changes.
How can I merge two branches with conflicts?
When there is a conflict during a merge, Git tells you that there's a problem, and running git status
after the merge reminds you which files have conflicts that you need to resolve by printing both modified:
beside the files' names.
Inside the file, Git leaves markers that look like this to tell you where the conflicts occurred:
<<<<<<< destination-branch-name
...changes from the destination branch...
=======
...changes from the source branch...
>>>>>>> source-branch-name
(In many cases, the destination branch name will be HEAD
, because you will be merging into the current branch.) To resolve the conflict, edit the file to remove the markers and make whatever other changes are needed to reconcile the changes, then commit those changes.
Ex.
Collaborating
This chapter shows Git's other greatest feature: how you can share changes between repositories to collaborate at scale.
git init project-name
)git init project-name
, where "project-name" is the name you want the new repository's root directory to have..git
directories the update is to be stored in.git init
in the project's root directory, or
git init /path/to/project
from anywhere else on your computer.
Ex.
Remark. After initializing the folder into a repository, Git immediately notices that there are a bunch of changes that can be staged (and afterwards, commited)
How can I create a copy of an existing repository? ( git clone URL
)
Sometimes you will join a project that is already running, inherit a project from someone else, or continue working on one of your own projects on a new machine. In each case, you will clone an existing repository instead of creating a new one. Cloning a repository does exactly what the name suggests: it creates a copy of an existing repository (including all of its history) in a new directory.
To clone a repository, use the command git clone URL
, where URL
identifies the repository you want to clone. This will normally be something like
https://github.com/datacamp/project.git
When you clone a repository, Git uses the name of the existing repository as the name of the clone's root directory: for example,
git clone /existing/project
will create a new directory called project
. If you want to call the clone something else, add the directory name you want to the command:
git clone /existing/project newprojectname
How can I find out where a cloned repository originated? ( git remote
)
When you clone a repository, Git remembers where the original repository was. It does this by storing a remote in the new repository's configuration.
If you use an online git repository hosting service like GitHub or Bitbucket, a common task would be that you clone a repository from that site to work locally on your computer. Then the copy on the website is the remote.
If you are in a repository, you can list the names of its remotes using git remote
.
If you want more information, you can use git remote -v
(for "verbose"), which shows the remote's URLs.
Remark. When you clone a repository, Git automatically creates a remote called origin that points to the original repository.
How can I define remotes?
You can add more remotes using:
git remote add remote-name URL
and remove existing ones using:
git remote rm remote-name
git pull remote branch
)Recall that the remote repository is often a repository in an online hosting service like GitHub. A typical workflow is that you pull in your collaborators' work from the remote repository so you have the latest version of everything, do some work yourself, then push your work back to the remote so that your collaborators have access to it.
Pulling changes is straightforward: the command git pull remote branch
gets everything in branch
in the remote repository identified by remote
and merges it into the current branch of your local repository. For example, if you are in the quarterly-report
branch of your local repository, the command:
git pull thunk latest-analysis
would get changes from latest-analysis
branch in the repository associated with the remote called thunk
and merge them into your quarterly-report
branch.
git push
)git pull
is git push
, which pushes the changes you have made locally into a remote repository. The most common way to use it is:git push remote-name branch-name
which pushes the contents of your branch branch-name
into a branch with the same name in the remote repository associated with remote-name
. It's possible to use different branch names at your end and the remote's end, but doing this quickly becomes confusing: it's almost always better to use the same names for branches across repositories.
What happens if my push conflicts with someone else's work?
Overwriting your own work by accident is bad; overwriting someone else's is worse.
To prevent this happening, Git does not allow you to push changes to a remote repository unless you have merged the contents of the remote repository into your own work.