UNPKG

epubjs

Version:

Render ePub documents in the browser, across many devices

229 lines (157 loc) 18.4 kB
<?xml version='1.0' encoding='utf-8'?> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Pro Git - professional version control</title> <meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/> <link href="stylesheet.css" type="text/css" rel="stylesheet"/> <style type="text/css"> @page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style> </head> <body class="calibre"> <h2 class="calibre4" id="calibre_pb_60">Git Attributes</h2> <p class="calibre3">Some of these settings can also be specified for a path, so that Git applies those settings only for a subdirectory or subset of files. These path-specific settings are called Git attributes and are set either in a <code class="calibre10">.gitattributes</code> file in one of your directories (normally the root of your project) or in the <code class="calibre10">.git/info/attributes</code> file if you don't want the attributes file committed with your project.</p> <p class="calibre3">Using attributes, you can do things like specify separate merge strategies for individual files or directories in your project, tell Git how to diff non-text files, or have Git filter content before you check it into or out of Git. In this section, you'll learn about some of the attributes you can set on your paths in your Git project and see a few examples of using this feature in practice.</p> <h3 class="calibre5">Binary Files</h3> <p class="calibre3">One cool trick for which you can use Git attributes is telling Git which files are binary (in cases it otherwise may not be able to figure out) and giving Git special instructions about how to handle those files. For instance, some text files may be machine generated and not diffable, whereas some binary files can be diffed - you'll see how to tell Git which is which.</p> <h4 class="calibre14">Identifying Binary Files</h4> <p class="calibre3">Some files look like text files but for all intents and purposes are to be treated as binary data. For instance, Xcode projects on the Mac contain a file that ends in <code class="calibre10">.pbxproj</code>, which is basically a JSON (plain text javascript data format) dataset written out to disk by the IDE that records your build settings and so on. Although it's technically a text file, because it's all ASCII, you don't want to treat it as such because it's really a lightweight database - you can't merge the contents if two people changed it, and diffs generally aren't helpful. The file is meant to be consumed by a machine. In essence, you want to treat it like a binary file.</p> <p class="calibre3">To tell Git to treat all <code class="calibre10">pbxproj</code> files as binary data, add the following line to your <code class="calibre10">.gitattributes</code> file:</p> <pre class="calibre9"><code class="calibre10">*.pbxproj -crlf -diff </code></pre> <p class="calibre3">Now, Git won't try to convert or fix CRLF issues; nor will it try to compute or print a diff for changes in this file when you run git show or git diff on your project. In the 1.6 series of Git, you can also use a macro that is provided that means <code class="calibre10">-crlf -diff</code>:</p> <pre class="calibre9"><code class="calibre10">*.pbxproj binary </code></pre> <h4 class="calibre14">Diffing Binary Files</h4> <p class="calibre3">In the 1.6 series of Git, you can use the Git attributes functionality to effectively diff binary files. You do this by telling Git how to convert your binary data to a text format that can be compared via the normal diff.</p> <p class="calibre3">Because this is a pretty cool and not widely known feature, I'll go over a few examples. First, you'll use this technique to solve one of the most annoying problems known to humanity: version-controlling Word documents. Everyone knows that Word is the most horrific editor around; but, oddly, everyone uses it. If you want to version-control Word documents, you can stick them in a Git repository and commit every once in a while; but what good does that do? If you run <code class="calibre10">git diff</code> normally, you only see something like this:</p> <pre class="calibre9"><code class="calibre10">$ git diff diff --git a/chapter1.doc b/chapter1.doc index 88839c4..4afcb7c 100644 Binary files a/chapter1.doc and b/chapter1.doc differ </code></pre> <p class="calibre3">You can't directly compare two versions unless you check them out and scan them manually, right? It turns out you can do this fairly well using Git attributes. Put the following line in your <code class="calibre10">.gitattributes</code> file:</p> <pre class="calibre9"><code class="calibre10">*.doc diff=word </code></pre> <p class="calibre3">This tells Git that any file that matches this pattern (.doc) should use the "word" filter when you try to view a diff that contains changes. What is the "word" filter? You have to set it up. Here you'll configure Git to use the <code class="calibre10">strings</code> program to convert Word documents into readable text files, which it will then diff properly:</p> <pre class="calibre9"><code class="calibre10">$ git config diff.word.textconv strings </code></pre> <p class="calibre3">Now Git knows that if it tries to do a diff between two snapshots, and any of the files end in <code class="calibre10">.doc</code>, it should run those files through the "word" filter, which is defined as the <code class="calibre10">strings</code> program. This effectively makes nice text-based versions of your Word files before attempting to diff them.</p> <p class="calibre3">Here's an example. I put Chapter 1 of this book into Git, added some text to a paragraph, and saved the document. Then, I ran <code class="calibre10">git diff</code> to see what changed:</p> <pre class="calibre9"><code class="calibre10">$ git diff diff --git a/chapter1.doc b/chapter1.doc index c1c8a0a..b93c9e4 100644 --- a/chapter1.doc +++ b/chapter1.doc @@ -8,7 +8,8 @@ re going to cover Version Control Systems (VCS) and Git basics re going to cover how to get it and set it up for the first time if you don t already have it on your system. In Chapter Two we will go over basic Git usage - how to use Git for the 80% -s going on, modify stuff and contribute changes. If the book spontaneously +s going on, modify stuff and contribute changes. If the book spontaneously +Let's see if this works. </code></pre> <p class="calibre3">Git successfully and succinctly tells me that I added the string "Let's see if this works", which is correct. It's not perfect - it adds a bunch of random stuff at the end - but it certainly works. If you can find or write a Word-to-plain-text converter that works well enough, that solution will likely be incredibly effective. However, <code class="calibre10">strings</code> is available on most Mac and Linux systems, so it may be a good first try to do this with many binary formats.</p> <p class="calibre3">Another interesting problem you can solve this way involves diffing image files. One way to do this is to run JPEG files through a filter that extracts their EXIF information - metadata that is recorded with most image formats. If you download and install the <code class="calibre10">exiftool</code> program, you can use it to convert your images into text about the metadata, so at least the diff will show you a textual representation of any changes that happened:</p> <pre class="calibre9"><code class="calibre10">$ echo '*.png diff=exif' &gt;&gt; .gitattributes $ git config diff.exif.textconv exiftool </code></pre> <p class="calibre3">If you replace an image in your project and run <code class="calibre10">git diff</code>, you see something like this:</p> <pre class="calibre9"><code class="calibre10">diff --git a/image.png b/image.png index 88839c4..4afcb7c 100644 --- a/image.png +++ b/image.png @@ -1,12 +1,12 @@ ExifTool Version Number : 7.74 -File Size : 70 kB -File Modification Date/Time : 2009:04:21 07:02:45-07:00 +File Size : 94 kB +File Modification Date/Time : 2009:04:21 07:02:43-07:00 File Type : PNG MIME Type : image/png -Image Width : 1058 -Image Height : 889 +Image Width : 1056 +Image Height : 827 Bit Depth : 8 Color Type : RGB with Alpha </code></pre> <p class="calibre3">You can easily see that the file size and image dimensions have both changed.</p> <h3 class="calibre5">Keyword Expansion</h3> <p class="calibre3">SVN- or CVS-style keyword expansion is often requested by developers used to those systems. The main problem with this in Git is that you can't modify a file with information about the commit after you've committed, because Git checksums the file first. However, you can inject text into a file when it's checked out and remove it again before it's added to a commit. Git attributes offers you two ways to do this.</p> <p class="calibre3">First, you can inject the SHA-1 checksum of a blob into an <code class="calibre10">$Id$</code> field in the file automatically. If you set this attribute on a file or set of files, then the next time you check out that branch, Git will replace that field with the SHA-1 of the blob. It's important to notice that it isn't the SHA of the commit, but of the blob itself:</p> <pre class="calibre9"><code class="calibre10">$ echo '*.txt ident' &gt;&gt; .gitattributes $ echo '$Id$' &gt; test.txt </code></pre> <p class="calibre3">The next time you check out this file, Git injects the SHA of the blob:</p> <pre class="calibre9"><code class="calibre10">$ rm text.txt $ git checkout -- text.txt $ cat test.txt $Id: 42812b7653c7b88933f8a9d6cad0ca16714b9bb3 $ </code></pre> <p class="calibre3">However, that result is of limited use. If you've used keyword substitution in CVS or Subversion, you can include a datestamp - the SHA isn't all that helpful, because it's fairly random and you can't tell if one SHA is older or newer than another.</p> <p class="calibre3">It turns out that you can write your own filters for doing substitutions in files on commit/checkout. These are the "clean" and "smudge" filters. In the <code class="calibre10">.gitattributes</code> file, you can set a filter for particular paths and then set up scripts that will process files just before they're checked out ("smudge", see Figure 7-2) and just before they're committed ("clean", see Figure 7-3). These filters can be set to do all sorts of fun things.</p> <p class="calibre3"><img src="18333fig0702-tn.png" alt="Figure 7-2. The " smudge="" filter="" is="" run="" on="" checkout.="" title="Figure 7-2. The " class="calibre6"/></p> <p class="calibre3"><img src="18333fig0703-tn.png" alt="Figure 7-3. The " clean="" filter="" is="" run="" when="" files="" are="" staged.="" title="Figure 7-3. The " class="calibre6"/></p> <p class="calibre3">The original commit message for this functionality gives a simple example of running all your C source code through the <code class="calibre10">indent</code> program before committing. You can set it up by setting the filter attribute in your <code class="calibre10">.gitattributes</code> file to filter <code class="calibre10">*.c</code> files with the "indent" filter:</p> <pre class="calibre9"><code class="calibre10">*.c filter=indent </code></pre> <p class="calibre3">Then, tell Git what the "indent"" filter does on smudge and clean:</p> <pre class="calibre9"><code class="calibre10">$ git config --global filter.indent.clean indent $ git config --global filter.indent.smudge cat </code></pre> <p class="calibre3">In this case, when you commit files that match <code class="calibre10">*.c</code>, Git will run them through the indent program before it commits them and then run them through the <code class="calibre10">cat</code> program before it checks them back out onto disk. The <code class="calibre10">cat</code> program is basically a no-op: it spits out the same data that it gets in. This combination effectively filters all C source code files through <code class="calibre10">indent</code> before committing.</p> <p class="calibre3">Another interesting example gets <code class="calibre10">$Date$</code> keyword expansion, RCS style. To do this properly, you need a small script that takes a filename, figures out the last commit date for this project, and inserts the date into the file. Here is a small Ruby script that does that:</p> <pre class="calibre9"><code class="calibre10">#! /usr/bin/env ruby data = STDIN.read last_date = `git log --pretty=format:"%ad" -1` puts data.gsub('$Date$', '$Date: ' + last_date.to_s + '$') </code></pre> <p class="calibre3">All the script does is get the latest commit date from the <code class="calibre10">git log</code> command, stick that into any <code class="calibre10">$Date$</code> strings it sees in stdin, and print the results - it should be simple to do in whatever language you're most comfortable in. You can name this file <code class="calibre10">expand_date</code> and put it in your path. Now, you need to set up a filter in Git (call it <code class="calibre10">dater</code>) and tell it to use your <code class="calibre10">expand_date</code> filter to smudge the files on checkout. You'll use a Perl expression to clean that up on commit:</p> <pre class="calibre9"><code class="calibre10">$ git config filter.dater.smudge expand_date $ git config filter.dater.clean 'perl -pe "s/\\\$Date[^\\\$]*\\\$/\\\$Date\\\$/"' </code></pre> <p class="calibre3">This Perl snippet strips out anything it sees in a <code class="calibre10">$Date$</code> string, to get back to where you started. Now that your filter is ready, you can test it by setting up a file with your <code class="calibre10">$Date$</code> keyword and then setting up a Git attribute for that file that engages the new filter:</p> <pre class="calibre9"><code class="calibre10">$ echo '# $Date$' &gt; date_test.txt $ echo 'date*.txt filter=dater' &gt;&gt; .gitattributes </code></pre> <p class="calibre3">If you commit those changes and check out the file again, you see the keyword properly substituted:</p> <pre class="calibre9"><code class="calibre10">$ git add date_test.txt .gitattributes $ git commit -m "Testing date expansion in Git" $ rm date_test.txt $ git checkout date_test.txt $ cat date_test.txt # $Date: Tue Apr 21 07:26:52 2009 -0700$ </code></pre> <p class="calibre3">You can see how powerful this technique can be for customized applications. You have to be careful, though, because the <code class="calibre10">.gitattributes</code> file is committed and passed around with the project but the driver (in this case, <code class="calibre10">dater</code>) isn't; so, it won't work everywhere. When you design these filters, they should be able to fail gracefully and have the project still work properly.</p> <h3 class="calibre5">Exporting Your Repository</h3> <p class="calibre3">Git attribute data also allows you to do some interesting things when exporting an archive of your project.</p> <h4 class="calibre14">export-ignore</h4> <p class="calibre3">You can tell Git not to export certain files or directories when generating an archive. If there is a subdirectory or file that you don't want to include in your archive file but that you do want checked into your project, you can determine those files via the <code class="calibre10">export-ignore</code> attribute.</p> <p class="calibre3">For example, say you have some test files in a <code class="calibre10">test/</code> subdirectory, and it doesn't make sense to include them in the tarball export of your project. You can add the following line to your Git attributes file:</p> <pre class="calibre9"><code class="calibre10">test/ export-ignore </code></pre> <p class="calibre3">Now, when you run git archive to create a tarball of your project, that directory won't be included in the archive.</p> <h4 class="calibre14">export-subst</h4> <p class="calibre3">Another thing you can do for your archives is some simple keyword substitution. Git lets you put the string <code class="calibre10">$Format:$</code> in any file with any of the <code class="calibre10">--pretty=format</code> formatting shortcodes, many of which you saw in Chapter 2. For instance, if you want to include a file named <code class="calibre10">LAST_COMMIT</code> in your project, and the last commit date was automatically injected into it when <code class="calibre10">git archive</code> ran, you can set up the file like this:</p> <pre class="calibre9"><code class="calibre10">$ echo 'Last commit date: $Format:%cd$' &gt; LAST_COMMIT $ echo "LAST_COMMIT export-subst" &gt;&gt; .gitattributes $ git add LAST_COMMIT .gitattributes $ git commit -am 'adding LAST_COMMIT file for archives' </code></pre> <p class="calibre3">When you run <code class="calibre10">git archive</code>, the contents of that file when people open the archive file will look like this:</p> <pre class="calibre9"><code class="calibre10">$ cat LAST_COMMIT Last commit date: $Format:Tue Apr 21 08:38:48 2009 -0700$ </code></pre> <h3 class="calibre5">Merge Strategies</h3> <p class="calibre3">You can also use Git attributes to tell Git to use different merge strategies for specific files in your project. One very useful option is to tell Git to not try to merge specific files when they have conflicts, but rather to use your side of the merge over someone else's.</p> <p class="calibre3">This is helpful if a branch in your project has diverged or is specialized, but you want to be able to merge changes back in from it, and you want to ignore certain files. Say you have a database settings file called database.xml that is different in two branches, and you want to merge in your other branch without messing up the database file. You can set up an attribute like this:</p> <pre class="calibre9"><code class="calibre10">database.xml merge=ours </code></pre> <p class="calibre3">If you merge in the other branch, instead of having merge conflicts with the database.xml file, you see something like this:</p> <pre class="calibre9"><code class="calibre10">$ git merge topic Auto-merging database.xml Merge made by recursive. </code></pre> <p class="calibre3">In this case, database.xml stays at whatever version you originally had.</p> </body> </html>