epubjs
Version:
Render ePub documents in the browser, across many devices
269 lines (209 loc) • 17 kB
HTML
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Pro Git - professional version control</title>
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/>
<link href="stylesheet.css" type="text/css" rel="stylesheet"/>
<style type="text/css">
@page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style>
</head>
<body class="calibre">
<h2 class="calibre4" id="calibre_pb_54">Submodules</h2>
<p class="calibre3">It often happens that while working on one project, you need to use another project from within it. Perhaps it's a library that a third party developed or that you're developing separately and using in multiple parent projects. A common issue arises in these scenarios: you want to be able to treat the two projects as separate yet still be able to use one from within the other.</p>
<p class="calibre3">Here's an example. Suppose you're developing a web site and creating Atom feeds. Instead of writing your own Atom-generating code, you decide to use a library. You're likely to have to either include this code from a shared library like a CPAN install or Ruby gem, or copy the source code into your own project tree. The issue with including the library is that it's difficult to customize the library in any way and often more difficult to deploy it, because you need to make sure every client has that library available. The issue with vendoring the code into your own project is that any custom changes you make are difficult to merge when upstream changes become available.</p>
<p class="calibre3">Git addresses this issue using submodules. Submodules allow you to keep a Git repository as a subdirectory of another Git repository. This lets you clone another repository into your project and keep your commits separate.</p>
<h3 class="calibre5">Starting with Submodules</h3>
<p class="calibre3">Suppose you want to add the Rack library (a Ruby web server gateway interface) to your project, possibly maintain your own changes to it, but continue to merge in upstream changes. The first thing you should do is clone the external repository into your subdirectory. You add external projects as submodules with the <code class="calibre10">git submodule add</code> command:</p>
<pre class="calibre9"><code class="calibre10">$ git submodule add git://github.com/chneukirchen/rack.git rack
Initialized empty Git repository in /opt/subtest/rack/.git/
remote: Counting objects: 3181, done.
remote: Compressing objects: 100% (1534/1534), done.
remote: Total 3181 (delta 1951), reused 2623 (delta 1603)
Receiving objects: 100% (3181/3181), 675.42 KiB | 422 KiB/s, done.
Resolving deltas: 100% (1951/1951), done.
</code></pre>
<p class="calibre3">Now you have the Rack project under a subdirectory named <code class="calibre10">rack</code> within your project. You can go into that subdirectory, make changes, add your own writable remote repository to push your changes into, fetch and merge from the original repository, and more. If you run <code class="calibre10">git status</code> right after you add the submodule, you see two things:</p>
<pre class="calibre9"><code class="calibre10">$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: .gitmodules
# new file: rack
#
</code></pre>
<p class="calibre3">First you notice the <code class="calibre10">.gitmodules</code> file. This is a configuration file that stores the mapping between the project's URL and the local subdirectory you've pulled it into:</p>
<pre class="calibre9"><code class="calibre10">$ cat .gitmodules
[submodule "rack"]
path = rack
url = git://github.com/chneukirchen/rack.git
</code></pre>
<p class="calibre3">If you have multiple submodules, you'll have multiple entries in this file. It's important to note that this file is version-controlled with your other files, like your <code class="calibre10">.gitignore</code> file. It's pushed and pulled with the rest of your project. This is how other people who clone this project know where to get the submodule projects from.</p>
<p class="calibre3">The other listing in the <code class="calibre10">git status</code> output is the rack entry. If you run <code class="calibre10">git diff</code> on that, you see something interesting:</p>
<pre class="calibre9"><code class="calibre10">$ git diff --cached rack
diff --git a/rack b/rack
new file mode 160000
index 0000000..08d709f
--- /dev/null
+++ b/rack
@@ -0,0 +1 @@
+Subproject commit 08d709f78b8c5b0fbeb7821e37fa53e69afcf433
</code></pre>
<p class="calibre3">Although <code class="calibre10">rack</code> is a subdirectory in your working directory, Git sees it as a submodule and doesn't track its contents when you're not in that directory. Instead, Git records it as a particular commit from that repository. When you make changes and commit in that subdirectory, the superproject notices that the HEAD there has changed and records the exact commit you're currently working off of; that way, when others clone this project, they can re-create the environment exactly.</p>
<p class="calibre3">This is an important point with submodules: you record them as the exact commit they're at. You can't record a submodule at <code class="calibre10">master</code> or some other symbolic reference.</p>
<p class="calibre3">When you commit, you see something like this:</p>
<pre class="calibre9"><code class="calibre10">$ git commit -m 'first commit with submodule rack'
[master 0550271] first commit with submodule rack
2 files changed, 4 insertions(+), 0 deletions(-)
create mode 100644 .gitmodules
create mode 160000 rack
</code></pre>
<p class="calibre3">Notice the 160000 mode for the rack entry. That is a special mode in Git that basically means you're recording a commit as a directory entry rather than a subdirectory or a file.</p>
<p class="calibre3">You can treat the <code class="calibre10">rack</code> directory as a separate project and then update your superproject from time to time with a pointer to the latest commit in that subproject. All the Git commands work independently in the two directories:</p>
<pre class="calibre9"><code class="calibre10">$ git log -1
commit 0550271328a0038865aad6331e620cd7238601bb
Author: Scott Chacon <schacon@gmail.com>
Date: Thu Apr 9 09:03:56 2009 -0700
first commit with submodule rack
$ cd rack/
$ git log -1
commit 08d709f78b8c5b0fbeb7821e37fa53e69afcf433
Author: Christian Neukirchen <chneukirchen@gmail.com>
Date: Wed Mar 25 14:49:04 2009 +0100
Document version change
</code></pre>
<h3 class="calibre5">Cloning a Project with Submodules</h3>
<p class="calibre3">Here you'll clone a project with a submodule in it. When you receive such a project, you get the directories that contain submodules, but none of the files yet:</p>
<pre class="calibre9"><code class="calibre10">$ git clone git://github.com/schacon/myproject.git
Initialized empty Git repository in /opt/myproject/.git/
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (6/6), done.
$ cd myproject
$ ls -l
total 8
-rw-r--r-- 1 schacon admin 3 Apr 9 09:11 README
drwxr-xr-x 2 schacon admin 68 Apr 9 09:11 rack
$ ls rack/
$
</code></pre>
<p class="calibre3">The <code class="calibre10">rack</code> directory is there, but empty. You must run two commands: <code class="calibre10">git submodule init</code> to initialize your local configuration file, and <code class="calibre10">git submodule update</code> to fetch all the data from that project and check out the appropriate commit listed in your superproject:</p>
<pre class="calibre9"><code class="calibre10">$ git submodule init
Submodule 'rack' (git://github.com/chneukirchen/rack.git) registered for path 'rack'
$ git submodule update
Initialized empty Git repository in /opt/myproject/rack/.git/
remote: Counting objects: 3181, done.
remote: Compressing objects: 100% (1534/1534), done.
remote: Total 3181 (delta 1951), reused 2623 (delta 1603)
Receiving objects: 100% (3181/3181), 675.42 KiB | 173 KiB/s, done.
Resolving deltas: 100% (1951/1951), done.
Submodule path 'rack': checked out '08d709f78b8c5b0fbeb7821e37fa53e69afcf433'
</code></pre>
<p class="calibre3">Now your <code class="calibre10">rack</code> subdirectory is at the exact state it was in when you committed earlier. If another developer makes changes to the rack code and commits, and you pull that reference down and merge it in, you get something a bit odd:</p>
<pre class="calibre9"><code class="calibre10">$ git merge origin/master
Updating 0550271..85a3eee
Fast forward
rack | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
[master*]$ git status
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: rack
#
</code></pre>
<p class="calibre3">You merged in what is basically a change to the pointer for your submodule; but it doesn't update the code in the submodule directory, so it looks like you have a dirty state in your working directory:</p>
<pre class="calibre9"><code class="calibre10">$ git diff
diff --git a/rack b/rack
index 6c5e70b..08d709f 160000
--- a/rack
+++ b/rack
@@ -1 +1 @@
-Subproject commit 6c5e70b984a60b3cecd395edd5b48a7575bf58e0
+Subproject commit 08d709f78b8c5b0fbeb7821e37fa53e69afcf433
</code></pre>
<p class="calibre3">This is the case because the pointer you have for the submodule isn't what is actually in the submodule directory. To fix this, you must run <code class="calibre10">git submodule update</code> again:</p>
<pre class="calibre9"><code class="calibre10">$ git submodule update
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 1), reused 2 (delta 0)
Unpacking objects: 100% (3/3), done.
From git@github.com:schacon/rack
08d709f..6c5e70b master -> origin/master
Submodule path 'rack': checked out '6c5e70b984a60b3cecd395edd5b48a7575bf58e0'
</code></pre>
<p class="calibre3">You have to do this every time you pull down a submodule change in the main project. It's strange, but it works.</p>
<p class="calibre3">One common problem happens when a developer makes a change locally in a submodule but doesn't push it to a public server. Then, they commit a pointer to that non-public state and push up the superproject. When other developers try to run <code class="calibre10">git submodule update</code>, the submodule system can't find the commit that is referenced, because it exists only on the first developer's system. If that happens, you see an error like this:</p>
<pre class="calibre9"><code class="calibre10">$ git submodule update
fatal: reference isn't a tree: 6c5e70b984a60b3cecd395edd5b48a7575bf58e0
Unable to checkout '6c5e70b984a60b3cecd395edd5ba7575bf58e0' in submodule path 'rack'
</code></pre>
<p class="calibre3">You have to see who last changed the submodule:</p>
<pre class="calibre9"><code class="calibre10">$ git log -1 rack
commit 85a3eee996800fcfa91e2119372dd4172bf76678
Author: Scott Chacon <schacon@gmail.com>
Date: Thu Apr 9 09:19:14 2009 -0700
added a submodule reference I will never make public. hahahahaha!
</code></pre>
<p class="calibre3">Then, you e-mail that guy and yell at him.</p>
<h3 class="calibre5">Superprojects</h3>
<p class="calibre3">Sometimes, developers want to get a combination of a large project's subdirectories, depending on what team they're on. This is common if you're coming from CVS or Subversion, where you've defined a module or collection of subdirectories, and you want to keep this type of workflow.</p>
<p class="calibre3">A good way to do this in Git is to make each of the subfolders a separate Git repository and then create superproject Git repositories that contain multiple submodules. A benefit of this approach is that you can more specifically define the relationships between the projects with tags and branches in the superprojects.</p>
<h3 class="calibre5">Issues with Submodules</h3>
<p class="calibre3">Using submodules isn't without hiccups, however. First, you must be relatively careful when working in the submodule directory. When you run <code class="calibre10">git submodule update</code>, it checks out the specific version of the project, but not within a branch. This is called having a detached head - it means the HEAD file points directly to a commit, not to a symbolic reference. The issue is that you generally don't want to work in a detached head environment, because it's easy to lose changes. If you do an initial <code class="calibre10">submodule update</code>, commit in that submodule directory without creating a branch to work in, and then run <code class="calibre10">git submodule update</code> again from the superproject without committing in the meantime, Git will overwrite your changes without telling you. Technically you won't lose the work, but you won't have a branch pointing to it, so it will be somewhat difficult to retrieve.</p>
<p class="calibre3">To avoid this issue, create a branch when you work in a submodule directory with <code class="calibre10">git checkout -b work</code> or something equivalent. When you do the submodule update a second time, it will still revert your work, but at least you have a pointer to get back to.</p>
<p class="calibre3">Switching branches with submodules in them can also be tricky. If you create a new branch, add a submodule there, and then switch back to a branch without that submodule, you still have the submodule directory as an untracked directory:</p>
<pre class="calibre9"><code class="calibre10">$ git checkout -b rack
Switched to a new branch "rack"
$ git submodule add git@github.com:schacon/rack.git rack
Initialized empty Git repository in /opt/myproj/rack/.git/
...
Receiving objects: 100% (3184/3184), 677.42 KiB | 34 KiB/s, done.
Resolving deltas: 100% (1952/1952), done.
$ git commit -am 'added rack submodule'
[rack cc49a69] added rack submodule
2 files changed, 4 insertions(+), 0 deletions(-)
create mode 100644 .gitmodules
create mode 160000 rack
$ git checkout master
Switched to branch "master"
$ git status
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# rack/
</code></pre>
<p class="calibre3">You have to either move it out of the way or remove it, in which case you have to clone it again when you switch back-and you may lose local changes or branches that you didn't push up.</p>
<p class="calibre3">The last main caveat that many people run into involves switching from subdirectories to submodules. If you've been tracking files in your project and you want to move them out into a submodule, you must be careful or Git will get angry at you. Assume that you have the rack files in a subdirectory of your project, and you want to switch it to a submodule. If you delete the subdirectory and then run <code class="calibre10">submodule add</code>, Git yells at you:</p>
<pre class="calibre9"><code class="calibre10">$ rm -Rf rack/
$ git submodule add git@github.com:schacon/rack.git rack
'rack' already exists in the index
</code></pre>
<p class="calibre3">You have to unstage the <code class="calibre10">rack</code> directory first. Then you can add the submodule:</p>
<pre class="calibre9"><code class="calibre10">$ git rm -r rack
$ git submodule add git@github.com:schacon/rack.git rack
Initialized empty Git repository in /opt/testsub/rack/.git/
remote: Counting objects: 3184, done.
remote: Compressing objects: 100% (1465/1465), done.
remote: Total 3184 (delta 1952), reused 2770 (delta 1675)
Receiving objects: 100% (3184/3184), 677.42 KiB | 88 KiB/s, done.
Resolving deltas: 100% (1952/1952), done.
</code></pre>
<p class="calibre3">Now suppose you did that in a branch. If you try to switch back to a branch where those files are still in the actual tree rather than a submodule - you get this error:</p>
<pre class="calibre9"><code class="calibre10">$ git checkout master
error: Untracked working tree file 'rack/AUTHORS' would be overwritten by merge.
</code></pre>
<p class="calibre3">You have to move the <code class="calibre10">rack</code> submodule directory out of the way before you can switch to a branch that doesn't have it:</p>
<pre class="calibre9"><code class="calibre10">$ mv rack /tmp/
$ git checkout master
Switched to branch "master"
$ ls
README rack
</code></pre>
<p class="calibre3">Then, when you switch back, you get an empty <code class="calibre10">rack</code> directory. You can either run <code class="calibre10">git submodule update</code> to reclone, or you can move your <code class="calibre10">/tmp/rack</code> directory back into the empty directory.</p>
</body>
</html>