epubjs
Version:
Render ePub documents in the browser, across many devices
179 lines (123 loc) • 14.4 kB
HTML
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Pro Git - professional version control</title>
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/>
<link href="stylesheet.css" type="text/css" rel="stylesheet"/>
<style type="text/css">
@page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style>
</head>
<body class="calibre">
<h2 class="calibre4" id="calibre_pb_76">Transfer Protocols</h2>
<p class="calibre3">Git can transfer data between two repositories in two major ways: over HTTP and via the so-called smart protocols used in the <code class="calibre10">file://</code>, <code class="calibre10">ssh://</code>, and <code class="calibre10">git://</code> transports. This section will quickly cover how these two main protocols operate.</p>
<h3 class="calibre5">The Dumb Protocol</h3>
<p class="calibre3">Git transport over HTTP is often referred to as the dumb protocol because it requires no Git-specific code on the server side during the transport process. The fetch process is a series of GET requests, where the client can assume the layout of the Git repository on the server. Let's follow the <code class="calibre10">http-fetch</code> process for the simplegit library:</p>
<pre class="calibre9"><code class="calibre10">$ git clone http://github.com/schacon/simplegit-progit.git
</code></pre>
<p class="calibre3">The first thing this command does is pull down the <code class="calibre10">info/refs</code> file. This file is written by the <code class="calibre10">update-server-info</code> command, which is why you need to enable that as a <code class="calibre10">post-receive</code> hook in order for the HTTP transport to work properly:</p>
<pre class="calibre9"><code class="calibre10">=> GET info/refs
ca82a6dff817ec66f44342007202690a93763949 refs/heads/master
</code></pre>
<p class="calibre3">Now you have a list of the remote references and SHAs. Next, you look for what the HEAD reference is so you know what to check out when you're finished:</p>
<pre class="calibre9"><code class="calibre10">=> GET HEAD
ref: refs/heads/master
</code></pre>
<p class="calibre3">You need to check out the <code class="calibre10">master</code> branch when you've completed the process.
At this point, you're ready to start the walking process. Because your starting point is the <code class="calibre10">ca82a6</code> commit object you saw in the <code class="calibre10">info/refs</code> file, you start by fetching that:</p>
<pre class="calibre9"><code class="calibre10">=> GET objects/ca/82a6dff817ec66f44342007202690a93763949
(179 bytes of binary data)
</code></pre>
<p class="calibre3">You get an object back - that object is in loose format on the server, and you fetched it over a static HTTP GET request. You can zlib-uncompress it, strip off the header, and look at the commit content:</p>
<pre class="calibre9"><code class="calibre10">$ git cat-file -p ca82a6dff817ec66f44342007202690a93763949
tree cfda3bf379e4f8dba8717dee55aab78aef7f4daf
parent 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
author Scott Chacon <schacon@gmail.com> 1205815931 -0700
committer Scott Chacon <schacon@gmail.com> 1240030591 -0700
changed the version number
</code></pre>
<p class="calibre3">Next, you have two more objects to retrieve - <code class="calibre10">cfda3b</code>, which is the tree of content that the commit we just retrieved points to; and <code class="calibre10">085bb3</code>, which is the parent commit:</p>
<pre class="calibre9"><code class="calibre10">=> GET objects/08/5bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
(179 bytes of data)
</code></pre>
<p class="calibre3">That gives you your next commit object. Grab the tree object:</p>
<pre class="calibre9"><code class="calibre10">=> GET objects/cf/da3bf379e4f8dba8717dee55aab78aef7f4daf
(404 - Not Found)
</code></pre>
<p class="calibre3">Oops - it looks like that tree object isn't in loose format on the server, so you get a 404 response back. There are a couple of reasons for this - the object could be in an alternate repository, or it could be in a packfile in this repository. Git checks for any listed alternates first:</p>
<pre class="calibre9"><code class="calibre10">=> GET objects/info/http-alternates
(empty file)
</code></pre>
<p class="calibre3">If this comes back with a list of alternate URLs, Git checks for loose files and packfiles there - this is a nice mechanism for projects that are forks of one another to share objects on disk. However, because no alternates are listed in this case, your object must be in a packfile. To see what packfiles are available on this server, you need to get the <code class="calibre10">objects/info/packs</code> file, which contains a listing of them (also generated by <code class="calibre10">update-server-info</code>):</p>
<pre class="calibre9"><code class="calibre10">=> GET objects/info/packs
P pack-816a9b2334da9953e530f27bcac22082a9f5b835.pack
</code></pre>
<p class="calibre3">There is only one packfile on the server, so your object is obviously in there, but you'll check the index file to make sure. This is also useful if you have multiple packfiles on the server, so you can see which packfile contains the object you need:</p>
<pre class="calibre9"><code class="calibre10">=> GET objects/pack/pack-816a9b2334da9953e530f27bcac22082a9f5b835.idx
(4k of binary data)
</code></pre>
<p class="calibre3">Now that you have the packfile index, you can see if your object is in it - because the index lists the SHAs of the objects contained in the packfile and the offsets to those objects. Your object is there, so go ahead and get the whole packfile:</p>
<pre class="calibre9"><code class="calibre10">=> GET objects/pack/pack-816a9b2334da9953e530f27bcac22082a9f5b835.pack
(13k of binary data)
</code></pre>
<p class="calibre3">You have your tree object, so you continue walking your commits. They're all also within the packfile you just downloaded, so you don't have to do any more requests to your server. Git checks out a working copy of the <code class="calibre10">master</code> branch that was pointed to by the HEAD reference you downloaded at the beginning.</p>
<p class="calibre3">The entire output of this process looks like this:</p>
<pre class="calibre9"><code class="calibre10">$ git clone http://github.com/schacon/simplegit-progit.git
Initialized empty Git repository in /private/tmp/simplegit-progit/.git/
got ca82a6dff817ec66f44342007202690a93763949
walk ca82a6dff817ec66f44342007202690a93763949
got 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Getting alternates list for http://github.com/schacon/simplegit-progit.git
Getting pack list for http://github.com/schacon/simplegit-progit.git
Getting index for pack 816a9b2334da9953e530f27bcac22082a9f5b835
Getting pack 816a9b2334da9953e530f27bcac22082a9f5b835
which contains cfda3bf379e4f8dba8717dee55aab78aef7f4daf
walk 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
walk a11bef06a3f659402fe7563abf99ad00de2209e6
</code></pre>
<h3 class="calibre5">The Smart Protocol</h3>
<p class="calibre3">The HTTP method is simple but a bit inefficient. Using smart protocols is a more common method of transferring data. These protocols have a process on the remote end that is intelligent about Git - it can read local data and figure out what the client has or needs and generate custom data for it. There are two sets of processes for transferring data: a pair for uploading data and a pair for downloading data.</p>
<h4 class="calibre14">Uploading Data</h4>
<p class="calibre3">To upload data to a remote process, Git uses the <code class="calibre10">send-pack</code> and <code class="calibre10">receive-pack</code> processes. The <code class="calibre10">send-pack</code> process runs on the client and connects to a <code class="calibre10">receive-pack</code> process on the remote side.</p>
<p class="calibre3">For example, say you run <code class="calibre10">git push origin master</code> in your project, and <code class="calibre10">origin</code> is defined as a URL that uses the SSH protocol. Git fires up the <code class="calibre10">send-pack</code> process, which initiates a connection over SSH to your server. It tries to run a command on the remote server via an SSH call that looks something like this:</p>
<pre class="calibre9"><code class="calibre10">$ ssh -x git@github.com "git-receive-pack 'schacon/simplegit-progit.git'"
005bca82a6dff817ec66f4437202690a93763949 refs/heads/master report-status delete-refs
003e085bb3bcb608e1e84b2432f8ecbe6306e7e7 refs/heads/topic
0000
</code></pre>
<p class="calibre3">The <code class="calibre10">git-receive-pack</code> command immediately responds with one line for each reference it currently has - in this case, just the <code class="calibre10">master</code> branch and its SHA. The first line also has a list of the server's capabilities (here, <code class="calibre10">report-status</code> and <code class="calibre10">delete-refs</code>).</p>
<p class="calibre3">Each line starts with a 4-byte hex value specifying how long the rest of the line is. Your first line starts with 005b, which is 91 in hex, meaning that 91 bytes remain on that line. The next line starts with 003e, which is 62, so you read the remaining 62 bytes. The next line is 0000, meaning the server is done with its references listing.</p>
<p class="calibre3">Now that it knows the server's state, your <code class="calibre10">send-pack</code> process determines what commits it has that the server doesn't. For each reference that this push will update, the <code class="calibre10">send-pack</code> process tells the <code class="calibre10">receive-pack</code> process that information. For instance, if you're updating the <code class="calibre10">master</code> branch and adding an <code class="calibre10">experiment</code> branch, the <code class="calibre10">send-pack</code> response may look something like this:</p>
<pre class="calibre9"><code class="calibre10">0085ca82a6dff817ec66f44342007202690a93763949 15027957951b64cf874c3557a0f3547bd83b3ff6 refs/heads/master report-status
00670000000000000000000000000000000000000000 cdfdb42577e2506715f8cfeacdbabc092bf63e8d refs/heads/experiment
0000
</code></pre>
<p class="calibre3">The SHA-1 value of all '0's means that nothing was there before - because you're adding the experiment reference. If you were deleting a reference, you would see the opposite: all '0's on the right side.</p>
<p class="calibre3">Git sends a line for each reference you're updating with the old SHA, the new SHA, and the reference that is being updated. The first line also has the client's capabilities. Next, the client uploads a packfile of all the objects the server doesn't have yet. Finally, the server responds with a success (or failure) indication:</p>
<pre class="calibre9"><code class="calibre10">000Aunpack ok
</code></pre>
<h4 class="calibre14">Downloading Data</h4>
<p class="calibre3">When you download data, the <code class="calibre10">fetch-pack</code> and <code class="calibre10">upload-pack</code> processes are involved. The client initiates a <code class="calibre10">fetch-pack</code> process that connects to an <code class="calibre10">upload-pack</code> process on the remote side to negotiate what data will be transferred down.</p>
<p class="calibre3">There are different ways to initiate the <code class="calibre10">upload-pack</code> process on the remote repository. You can run via SSH in the same manner as the <code class="calibre10">receive-pack</code> process. You can also initiate the process via the Git daemon, which listens on a server on port 9418 by default. The <code class="calibre10">fetch-pack</code> process sends data that looks like this to the daemon after connecting:</p>
<pre class="calibre9"><code class="calibre10">003fgit-upload-pack schacon/simplegit-progit.git\0host=myserver.com\0
</code></pre>
<p class="calibre3">It starts with the 4 bytes specifying how much data is following, then the command to run followed by a null byte, and then the server's hostname followed by a final null byte. The Git daemon checks that the command can be run and that the repository exists and has public permissions. If everything is cool, it fires up the <code class="calibre10">upload-pack</code> process and hands off the request to it.</p>
<p class="calibre3">If you're doing the fetch over SSH, <code class="calibre10">fetch-pack</code> instead runs something like this:</p>
<pre class="calibre9"><code class="calibre10">$ ssh -x git@github.com "git-upload-pack 'schacon/simplegit-progit.git'"
</code></pre>
<p class="calibre3">In either case, after <code class="calibre10">fetch-pack</code> connects, <code class="calibre10">upload-pack</code> sends back something like this:</p>
<pre class="calibre9"><code class="calibre10">0088ca82a6dff817ec66f44342007202690a93763949 HEAD\0multi_ack thin-pack \
side-band side-band-64k ofs-delta shallow no-progress include-tag
003fca82a6dff817ec66f44342007202690a93763949 refs/heads/master
003e085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7 refs/heads/topic
0000
</code></pre>
<p class="calibre3">This is very similar to what <code class="calibre10">receive-pack</code> responds with, but the capabilities are different. In addition, it sends back the HEAD reference so the client knows what to check out if this is a clone.</p>
<p class="calibre3">At this point, the <code class="calibre10">fetch-pack</code> process looks at what objects it has and responds with the objects that it needs by sending "want" and then the SHA it wants. It sends all the objects it already has with "have" and then the SHA. At the end of this list, it writes "done" to initiate the <code class="calibre10">upload-pack</code> process to begin sending the packfile of the data it needs:</p>
<pre class="calibre9"><code class="calibre10">0054want ca82a6dff817ec66f44342007202690a93763949 ofs-delta
0032have 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
0000
0009done
</code></pre>
<p class="calibre3">That is a very basic case of the transfer protocols. In more complex cases, the client supports <code class="calibre10">multi_ack</code> or <code class="calibre10">side-band</code> capabilities; but this example shows you the basic back and forth used by the smart protocol processes.</p>
</body>
</html>