gdal-async

<html> <head>  <TITLE>Introduction to FAN Language and Utilities </TITLE> <link rel="shortcut icon" href="https://www.unidata.ucar.edu/favicon.ico" type="image/x-icon" /> </head> <body>  <h1 align="center">Introduction to FAN Language and Utilities FAN Version 2.0 </h1> <h3 align="center">Harvey Davies CSIRO Division of Atmospheric Research, Private Bag No. 1, Mordialloc 3195, Australia email: hld@dar.csiro.au Scientific Visitor from January to August 1996 at UCAR/Unidata Program Center, P.O. Box 3000, Boulder, Colorado 80307-3000, USA email: hld@ucar.edu </h3> <hr /> <h1>Introduction</h1> FAN (File Array Notation) is an array-oriented language for identifying data items in files for the purpose of extraction or modification. FAN specifications consist of <ul> <li>one or more filenames</li> <li>one or more variable (array) names or ID numbers</li> <li>attribute names or ID numbers (optional)</li> <li>dimension names or ID numbers (optional)</li> <li>subscripts in various possible forms (optional)</li> </ul> NetCDF is the only format currently supported. However FAN is intended to be generic and it is hoped that there will eventually also be FAN interfaces to various other formats. This document describes the FAN language and four utilities based on FAN. The use of these utilities can greatly decrease the need for programming in Fortran or C. They can be called from the Unix command line and shell scripts. The first is <tt>nc2text</tt> which prints selected data from netCDF variables. The standard utility <tt>ncdump</tt> can also print data from netCDF variables, but only entire variables and only together with metadata in CDL form. The second is <tt>ncmeta</tt> which prints selected metadata from one or more netCDF files. This metadata can include rank, shape, file names, variable names, dimension names and attribute names. The third is <tt>ncrob</tt> which reads data from one or more netCDF variables, performs some process on it and then either prints the result or writes it to a netCDF array. The letters `<tt>rob</tt>' in `<tt>ncrob</tt>' stand for Reduce Or Broadcast. Reduce means to produce an array (e.g. sum, mean, maximum) with less dimensions than the original. Broadcast means to copy one array to another, recycling values if necessary. An example is copying the same vector to each row of a matrix. It is possible to process large volumes of data (e.g. 100 MB) using <tt>ncrob</tt>. The fourth is <tt>text2nc</tt> which can be used to read small volumes (say up to a few thousand lines) of ASCII data and copy it into netCDF variables. It is also possible to use <tt>text2nc</tt> to create, modify and delete attributes. This document does not cover other ways of using FAN. These include some local (CSIRO DAR) utilities (e.g. contouring program <tt>con_cif</tt>), the array-oriented languages IDL and J (for which there are FAN interfaces) and direct use of the C API (application programmer interface). <h2>Simple Examples</h2> Let us start with a simple netCDF file <tt>vec.nc</tt> which is printed (in CDL) as follows: <pre> $ ncdump vec.nc netcdf vec { dimensions: n = UNLIMITED ; // (5 currently) variables: float v(n) ; data: v = 10 , 20.3 , 30.2 , 40.9 , 50 ; } </pre> Here `<tt>$</tt>' is the UNIX command-line prompt. The following uses <tt>nc2text</tt> to print the whole array <tt>v</tt>: <pre> $ nc2text vec.nc v 10 20.3 30.2 40.9 50 </pre> Individual elements can be selected using subscripts. For example: <pre> $ nc2text vec.nc 'v[0]' 10 $ nc2text vec.nc 'v[3]' 40.9 </pre> Several can be selected using a subscript consisting of a list of indices such as: <pre> $ nc2text vec.nc 'v[0 3 1 3]' 10 40.9 20.3 40.9 </pre> We can write to a netCDF file using <tt>text2nc</tt>. The following changes the third element from 30.2 to 30.7 and then prints <tt>v</tt>: <pre> $ echo 30.7 | text2nc vec.nc 'v[2]' $ nc2text vec.nc v 10 20.3 30.7 40.9 50 </pre> Here <tt>text2nc</tt> reads ASCII text data from standard input, which in this case is a pipe connected to the standard output of <tt>echo</tt>. Since the dimension has <tt>UNLIMITED</tt> size, we can append values as follows: <pre> $ echo 60.5 70.2 | text2nc vec.nc 'v[5 6]' $ nc2text vec.nc v 10 20.3 30.7 40.9 50 60.5 70.2 </pre> Next we use <tt>ncrob</tt> to calculate and print the arithmetic mean of <tt>v</tt>. <pre> $ ncrob -r am vec.nc v / 40.3714 </pre> The option <tt>-r am</tt> specifies that an arithmetic mean is to be calculated. The following example stores the mean in the same file, naming the variable <tt>v_mean</tt>: <pre> $ ncrob -r am vec.nc v / v_mean $ nc2text vec.nc v_mean 40.3714 </pre> The `<tt>/</tt>' separates the input from the output. If no output is specified then results are printed. In fact <tt>ncrob</tt> can be used in place of <tt>nc2text</tt> to print data from a netCDF file. E.g. <pre> $ ncrob vec.nc v / 10 20.3 30.7 40.9 50 60.5 70.2 $ ncrob vec.nc v_mean / 40.3714 </pre> Finally we use <tt>ncmeta</tt> to print metadata. The shape is printed by: <pre> $ ncmeta v vec.nc 5 </pre> and the following prints the variable name, dimension name and shape: <pre> $ ncmeta -w vds v vec.nc v n 5 </pre> <h2>What is New in Version 2?</h2> The utility <tt>ncmeta</tt> is new. There are significant enhancements to the utility <tt>ncrob</tt>. It can now print results as well as write them to netCDF files. (This means that <tt>nc2text</tt> is no longer really needed.) In version 1 the output FAN specification could only be a single (final) argument. There can now be zero (implying printed output) or more output arguments following a `<tt>/</tt>' which separates input arguments from output arguments. (The old convention is deprecated but still supported.) It is now possible to create new variables without specifying the <tt>-c</tt> option or an output filename. There is a facility for merging dimensions. There are several new options related to printing and similar to those of <tt>nc2text</tt>. A number of bugs in <tt>ncrob</tt> have been fixed, including one with a serious effect on speed. <h1>FAN Language</h1> <a id="High_level_Syntax" name="High_level_Syntax"></a> <h2>High-level Syntax</h2> A FAN specification can be either a single command-line argument or span several arguments. Use of multiple arguments decreases the need for quoting and allows use of UNIX wildcarding (a.k.a. globbing) facilities. A FAN specification can have any of the following forms: <center> <table border="1" summary="syntax meaning"> <tr> <td>Syntax </td> <td>Meaning</td> </tr> <tr> <td>fanio <tt>/</tt> fanio </td> <td>netCDF input and netCDF output</td> </tr> <tr> <td>fanio <tt>/</tt> </td> <td>netCDF input and output to <tt>stdout</tt> (i.e. printed)</td> </tr> <tr> <td>fanio </td> <td>Either netCDF input or netCDF output (but not both)</td> </tr> </table> </center> where fanio is a FAN input/output specification, which has the form: pair <tt>;</tt> pair <tt>;</tt> pair <tt>;</tt> ... A semicolon (`<tt>;</tt>') has the same effect as commencing a new argument. Any sequence of one or more whitespace characters (space, tab, newline) is equivalent to a single space. A pair can take any of the following forms: filename vas vas filename filename vas A filename must contain at least one period (`<tt>.</tt>') to distinguish it from a variable name. This will be the case if netCDF filenames have a conventional suffix such as the recommended <tt>.nc</tt>. (In any case it is always possible to prefix a redundant `<tt>./</tt>' directory as in `<tt>./unconventional</tt>' or `<tt>/usr/./IdidItMyWay</tt>'!) A vas is a variable or attribute specification which can have any of the following forms: var var<tt>[</tt>subscript<tt>,</tt> subscript<tt>,</tt> subscript<tt>,</tt> ...<tt>]</tt> var<tt>[</tt>subscript<tt>,</tt> subscript<tt>,</tt> subscript<tt>,</tt> ...<tt>)</tt> var<tt>(</tt>subscript<tt>,</tt> subscript<tt>,</tt> subscript<tt>,</tt> ...<tt>]</tt> var<tt>(</tt>subscript<tt>,</tt> subscript<tt>,</tt> subscript<tt>,</tt> ...<tt>)</tt> var<tt>:</tt>att <tt>:</tt>att where var is a variable name or ID number and att is an attribute name or ID number. It is usually more convenient to identify variables, attributes and dimensions by name rather than ID number. The use of ID numbers is discussed in Section <a href="#Using_ID_Numbers">Using ID Numbers</a>. Attributes are discussed in Section <a href="#Attributes">Attributes</a>. A pair without a filename or vas uses that of the previous pair. The first pair has no effect by itself unless it contains both a filename and a vas. Thus the following all access the same values: <pre> $ nc2text 'vec.nc v[0 4]' 10 50 $ nc2text 'v[0 4] vec.nc' 10 50 $ nc2text vec.nc 'v[0 4]' 10 50 $ nc2text 'v[0 4]' vec.nc 10 50 $ nc2text ' v [ 0 4 ] vec.nc ' 10 50 </pre> The following are equivalent ways of concatenating variables <tt>v</tt> and <tt>v_mean</tt>: <pre> $ nc2text 'vec.nc v' 'vec.nc v_mean' 10 20.3 30.7 40.9 50 60.5 70.2 40.3714 $ nc2text 'vec.nc v' 'v_mean' 10 20.3 30.7 40.9 50 60.5 70.2 40.3714 $ nc2text 'vec.nc v; v_mean' 10 20.3 30.7 40.9 50 60.5 70.2 40.3714 $ nc2text vec.nc v v_mean 10 20.3 30.7 40.9 50 60.5 70.2 40.3714 </pre> Now let us copy file <tt>vec.nc</tt> to <tt>vec_new.nc</tt> and then demonstrate concatenation of data from different files: <pre> $ cp vec.nc vec_new.nc $ nc2text v vec.nc vec_new.nc 10 20.3 30.7 40.9 50 60.5 70.2 10 20.3 30.7 40.9 50 60.5 70.2 $ nc2text v vec*.nc 10 20.3 30.7 40.9 50 60.5 70.2 10 20.3 30.7 40.9 50 60.5 70.2 </pre> Note the use of UNIX wildcarding facilities in the latter example using the metacharacter `<tt>*</tt>' in <tt>vec*.nc</tt> which matches both <tt>vec.nc</tt> and <tt>vec_new.nc</tt>. <h2>Subscripts</h2> As mentioned in Section <a href="#High_level_Syntax">High level Syntax</a>, subscripts are enclosed by either `<tt>[</tt>' or `<tt>(</tt>' on the left and either `<tt>]</tt>' or `<tt>)</tt>' on the right. A left bracket `<tt>[</tt>' implies the C convention of starting subscripts at 0; while a left parenthesis `<tt>(</tt>' implies the Fortran convention of starting at 1. This starting value of 0 or 1 is called the index origin. A mnemonic to associate left with index origin is an x-axis with the origin on the left. The right hand delimiter controls the relative significance of multiple dimensions. A `<tt>]</tt>' implies conventional row-major (or lexicographic) order in which the rightmost subscript varies fastest; while a `<tt>)</tt>' implies the Fortran convention of column-major order in which the leftmost subscript varies fastest. So far our examples have involved only a single dimension. Now consider a netCDF file <tt>mat.nc</tt> containing a 2-dimensional array (i.e. a matrix). We print it as follows: <pre> $ ncdump mat.nc netcdf mat { dimensions: row = 2 ; col = 3 ; variables: short M(row, col) ; data: M = 11, 12, 13, 21, 22, 23 ; } </pre> The following are equivalent ways of printing the final element: <pre> $ nc2text 'mat.nc M[1,2]' 23 $ nc2text 'mat.nc M(2,3]' 23 $ nc2text 'mat.nc M(3,2)' 23 $ nc2text 'mat.nc M[2,1)' 23 </pre> Subscript values can be less than the index origin and are then relative to the end. So the final element could also be accessed by: <pre> $ nc2text 'mat.nc M[-1,-1]' 23 $ nc2text 'mat.nc M(0,0)' 23 </pre> As we have seen before, a subscript can contain a list of indices. Thus one could use any of the following to select all rows, but exclude the middle column: <pre> $ nc2text mat.nc 'M[0 1,0 2]' 11 13 21 23 $ nc2text mat.nc 'M(1 2,1 3]' 11 13 21 23 $ nc2text mat.nc 'M(1 3,1 2)' 11 13 21 23 </pre> <h3>Triplet Notation</h3> A sequence of indices forming an arithmetic progression as in <pre> $ nc2text vec.nc 'v[0 2 4 6]' 10 30.7 50 70.2 </pre> can be specified using a generalization of Fortran 90 triplet notation, in this case: <pre> $ nc2text vec.nc 'v[0:6:2]' 10 30.7 50 70.2 </pre> The triplet <tt>0:6:2</tt> means 0 to 6 in steps of 2. A triplet can take two forms: start<tt>:</tt>finish<tt>:</tt>stride start<tt>:</tt>finish The second form implies a stride of 1. It is possible to omit start and/or finish. Let  <var>I</var>  be the index-origin (0 or 1). If the stride is positive then start defaults to  <var>I</var>  (i.e. first element) and finish to  <var>I</var>-1  (i.e. final element). These are reversed for a negative stride; start defaults to  <var>I</var>-1  and finish to  <var>I</var>. E.g. <pre> $ nc2text vec.nc v 10 20.3 30.7 40.9 50 60.5 70.2 $ nc2text vec.nc 'v[:6:2]' 10 30.7 50 70.2 $ nc2text vec.nc 'v[0::2]' 10 30.7 50 70.2 $ nc2text vec.nc 'v[::2]' 10 30.7 50 70.2 $ nc2text vec.nc 'v[0:2]' 10 20.3 30.7 $ nc2text vec.nc 'v[:2]' 10 20.3 30.7 $ nc2text vec.nc 'v[2:]' 30.7 40.9 50 60.5 70.2 $ nc2text vec.nc 'v[::-1]' 70.2 60.5 50 40.9 30.7 20.3 10 </pre> Note how the latter example reverses the order. A triplet can wrap-around the start or end. This is useful with cyclic dimensions such as longitude. Wrap-around is shown by: <pre> $ nc2text vec.nc 'v[3:1]' 40.9 50 60.5 70.2 10 20.3 $ nc2text vec.nc 'v[1:3:-1]' 20.3 10 70.2 60.5 50 40.9 </pre> But the following does not imply wrap-around: <pre> $ nc2text vec.nc 'v[0:-1:1]' 10 20.3 30.7 40.9 50 60.5 70.2 </pre> since <tt>-1</tt> means final (i.e. same as <tt>6</tt>). Each subscript can contain any number of triplets and individual values. The colon (<tt>:</tt>) operator has higher precedence than concatenation. This is shown by the following: <pre> $ nc2text vec.nc 'v[2 :4]' 30.2 40.9 50 </pre> which is equivalent to: <pre> $ nc2text vec.nc 'v[2:4]' 30.2 40.9 50 </pre> However parentheses can be used to override this precedence rule. E.g. <pre> $ nc2text vec.nc 'v[2 (:4)]' 30.2 10 20.3 30.2 40.9 50 </pre> <h3>Omitting Subscripts</h3> An omitted subscript implies the whole dimension. Thus we can print the first row of <tt>mat</tt> as follows: <pre> $ nc2text mat.nc 'M[0]' 11 12 13 </pre> and exclude the middle column by: <pre> $ nc2text mat.nc 'M[,0 -1]' 11 13 21 23 </pre> <h3>Dimension Names</h3> Dimension names play an important role in FAN. Instead of: <pre> $ nc2text mat.nc 'M(2 1,1 3]' 21 23 11 13 </pre> one can use: <pre> $ nc2text mat.nc 'M(row=2 1,col=1 3]' 21 23 11 13 </pre> This is clearer for human readers. But specifying dimension names also provides the important facility of transposing dimensions. For example this allows <tt>ncrob</tt> to produce statistics (e.g. means) for rows as well as the normal columns. To transpose the above matrix, one could specify: <pre> $ nc2text mat.nc 'M(col=1 3,row=2 1]' 21 11 23 13 </pre> since the order in which dimensions are specified controls their order in the output. To transpose a whole matrix one need only specify the dimension names as in the following: <pre> $ nc2text mat.nc 'M[col,row]' 11 21 12 22 13 23 </pre> or using column-major order: <pre> $ nc2text mat.nc 'M(row,col)' 11 21 12 22 13 23 </pre> In fact only one dimension name is needed, since any not mentioned are appended in their input order. E.g. <pre> $ nc2text mat.nc 'M[col]' 11 21 12 22 13 23 </pre> <h3>Indirect Indexing</h3> So far we have located elements using direct index values. FAN also allows an indirect method using coordinate variables (i.e. variables with the same names as dimensions). Consider the following geographic netCDF file <tt>geog.nc</tt>: <pre> $ ncdump geog.nc netcdf geog { dimensions: lat = 3 ; lon = 4 ; variables: float lat(lat) ; lat:units = "degrees_north" ; float lon(lon) ; lon:units = "degrees_east" ; double tsur(lat, lon) ; data: lat = -45 , 0 , 45 ; lon = -180 , -90 , 0 , 90 ; tsur = 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34 ; } </pre> FAN provides several indirect indexing operators. Perhaps the most useful of these is `<tt>~</tt>', which gives the index of the coordinate value closest to its argument. Thus: <pre> $ nc2text geog.nc 'lat[~-40]' -45 </pre> prints the latitude closest to 40�S and <pre> $ nc2text geog.nc 'tsur[~-40,~10]' 13 </pre> prints the element of <tt>tsur</tt> closest to the point 40�S, 10�E. Note that FAN knows nothing about circular wrap-around and does not consider 360� to be equal to 0�. The following shows how indirect indexing can be used within triplets: <pre> $ nc2text geog.nc 'tsur[ lat = ~90:~-90:-2 , lon = ~10: ]' 33 34 13 14 </pre> This gives every second latitude from that closest the north pole to that closest the south pole, and all longitudes from that closest to 10�E to the final one. The other indirect indexing operators are as follows: <table border="1" summary="other @ max and min operators"> <tr> <td><tt>@ max <</tt> </td> <td>index value corresponding to maximum coordinate value less than argument</td> </tr> <tr> <td><tt>@ max <=</tt> </td> <td>index value corresponding to maximum coordinate value less than or equal to argument</td> </tr> <tr> <td><tt>@ min ></tt> </td> <td>index value corresponding to minimum coordinate value greater than argument</td> </tr> <tr> <td><tt>@ min >=</tt> </td> <td>index value corresponding to minimum coordinate value greater than or equal to argument</td> </tr> </table> Thus the following prints the minimum longitude greater than 10�E: <pre> $ nc2text geog.nc 'lon[@ min > 10]' 90 </pre> and the following retrieves the rows from the maximum latitude less than or equal to 30�N to the closest latitude to 90�N, and the columns from the second (i.e 1 with respect to index origin of 0) to minimum longitude greater than 0. <pre> $ nc2text geog.nc 'tsur[lat= @max<=30 : ~90, lon= 1 : @min > 0]' 22 23 24 32 33 34 </pre> <h3>Offsets</h3> It is possible to specify offsets using an expression of the form index <tt>+</tt> offset where offset is an integer constant (which can be negative). The offset must be the right hand argument of `<tt>+</tt>'. Note that this `<tt>+</tt>' operator has even higher precedence than `<tt>:</tt>'. Here are some examples of the use of offsets: <pre> $ nc2text geog.nc 'lon[ ~-100 + -1 : ~-360 + 2 ]' -180 -90 0 </pre> prints the longitudes from that one before the closest to 100�W to that two beyond the closest to 360�W. Note how the negative offset is specified as `<tt>+ -1</tt>', which is not equivalent to `<tt>-1</tt>' as in: <pre> $ nc2text geog.nc 'lon[ ~-100-1 : ~-360 + 2 ]' -90 90 -180 -90 0 </pre> which is equivalent to both the following (Note the wrap-around.): <pre> $ nc2text geog.nc 'lon[ (~-100) (-1:~-360 + 2) ]' -90 90 -180 -90 0 $ nc2text geog.nc 'lon[ 1 3:2 ]' -90 90 -180 -90 0 </pre> One use for offsets is to append along the <tt>UNLIMITED</tt> dimension without needing to know its current size. The expression `<tt>-1+1</tt>' represents the index value for appending immediately after the current final record. Thus we could append a value to variable <tt>v</tt> in file <tt>vec_new.nc</tt> (whose <tt>UNLIMITED</tt> dimension <tt>n</tt> has the current size 7) by: <pre> $ echo 80 | text2nc 'vec_new.nc v[-1 + 1]' $ nc2text 'vec_new.nc v' 10 20.3 30.7 40.9 50 60.5 70.2 80 </pre> Then we could append two more values by: <pre> $ echo 90 100.1 | text2nc 'vec_new.nc v[ -1 + 1 : -1 + 2 ]' $ nc2text 'vec_new.nc v' 10 20.3 30.7 40.9 50 60.5 70.2 80 90 100.1 </pre> giving a new size of 10 for the <tt>UNLIMITED</tt> dimension. <h3>Coordinate Variable Unit Conversion</h3> In file <tt>geog.nc</tt> the <tt>units</tt> attribute is <tt>degrees_north</tt> for <tt>lat</tt> and <tt>degrees_east</tt> for <tt>lon</tt>. One may want to specify coordinate values in some other units. The following shows how this can be done by appending the unit (enclosed in braces i.e. `<tt>{}</tt>') to the value: <pre> $ nc2text geog.nc 'tsur[ lat=~0.8{radian}, lon = ~ -1.5 { radian } ]' 32 </pre> giving the value at the point closest to latitude 0.8 radians north and longitude 1.5 radians west. This unit conversion (like that during FAN input and output) is done using the Unidata units library discussed in Appendix C of <a href="/software/netcdf/guide_toc.html"> NetCDF User's Guide</a>. <a id="Attributes" name="Attributes"></a> <h2>Attributes</h2> As noted in Section <a href="#High_level_Syntax">High level Syntax</a> an attribute vas can take two forms: var<tt>:</tt>att <tt>:</tt>att As in CDL, the latter denotes a global attribute. The following writes the global attribute <tt>title</tt> and then reads and prints it: <pre> $ echo 'Sample geographic file' | text2nc -h 'geog.nc :title' $ nc2text 'geog.nc :title' Sample geographic file </pre> (The <tt>-h</tt> flag means `Do not append a line to the global attribute <tt>history</tt>'.) Attributes cannot have subscripts, so there is no way of accessing only part of an attribute. Attributes are automatically created if they do not exist and their type and size can be changed. The following gives variable <tt>lat</tt> the new attribute <tt>valid_range</tt> (with type <tt>float</tt>) and then prints it: <pre> $ echo -90 90 | text2nc -h -t float 'geog.nc lat:valid_range' $ nc2text 'geog.nc lat:valid_range' -90 90 </pre> The following gives variable <tt>lat</tt> another new attribute <tt>foo</tt> (by copying variable <tt>v</tt> from file <tt>vec.nc</tt>), then modifies it, then deletes it. <pre> $ nc2text 'vec.nc v[:4]' | text2nc -h -t double 'geog.nc lat:foo' $ nc2text 'geog.nc lat:foo' 10 20.3 30.2 40.9 50 $ echo 'Hello' | text2nc -h 'geog.nc lat:foo' # Modify attribute 'lat:foo' $ nc2text 'geog.nc lat:foo' Hello $ text2nc -h 'geog.nc lat:foo' < /dev/null # Delete attribute 'lat:foo' </pre> Note how one can delete attributes by changing their size to 0. The file <tt>/dev/null</tt> is a standard UNIX pseudo-file that is empty for input. <a id="Using_ID_Numbers" name="Using_ID_Numbers"></a> <h2>Using ID Numbers for Variables, Dimensions and Attributes</h2> It is possible to use ID numbers in place of names for variables, dimensions and attributes. However dimension ID numbers must be followed by <tt>=</tt> so they can be distinguished from index values. ID numbers begin at 0 regardless of the index origin. Negative values are relative to the end, which is represented by <tt>-1</tt>. There are some situations where ID numbers are more convenient than names. For example, one might adopt the convention that coordinate variables should be defined first, after which there should be only a single other (main) variable in each file. A shell-script to process such files can refer to the main variable as <tt>-1</tt>. The following shows the use of such a variable ID number: <pre> $ nc2text geog.nc -1 11 12 13 14 21 22 23 24 31 32 33 34 </pre> The following prints the first attribute of the second variable: <pre> $ nc2text geog.nc '1:0' degrees_east </pre> The following Korn shell script <tt>pratts</tt> prints all the non-global attributes in the files specified by its arguments. <pre> $ cat pratts #!/bin/ksh for FILE do integer VARID=0 # Following true if variable VARID exists while VARNAME="$(ncmeta -s -w v $FILE $VARID)"; test -n "$VARNAME" do integer ATTID=0 # Following true if attribute ATTID exists while ATTNAME="$(ncmeta -s -w a $FILE $VARID:$ATTID)"; test -n "$ATTNAME" do print -n "$FILE $VARNAME:$ATTNAME " nc2text $FILE "$VARNAME:$ATTNAME" (( ATTID += 1 )) done (( VARID += 1 )) done done </pre> We can use <tt>pratts</tt> on file <tt>geog.nc</tt> as follows: <pre> $ pratts geog.nc geog.nc lat:units degrees_north geog.nc lat:valid_range -90 90 geog.nc lon:units degrees_east </pre> <h1>FAN Utilities</h1> <h2>Introduction to Utilities</h2> This section provides a more detailed description of the four FAN utilities, <tt>nc2text</tt>, <tt>text2nc</tt>, <tt>ncmeta</tt> and <tt>ncrob</tt>, commencing with some features common to several utilities. The usage summaries in Sections <a href="#nc2text_Usage">nc2text Usage</a>, <a href="#text2nc_Usage">text2nc Usage</a>, <a href="#ncmeta_Usage">ncmeta Usage</a> and <a href="#ncrob_Usage">ncrob Usage</a> can be printed by entering the command name without any arguments. All netCDF types (<tt>char, byte, short, long, float</tt> and <tt>double</tt>) can be read and written. During input/output there is automatic conversion to or from type <tt>double</tt>, which is used for internal storage and processing. <h3>Options Common to several Utilities</h3> The two flags <tt>-h</tt> and <tt>-H</tt> specify what is to be written to the global attribute <tt>history</tt>. The <tt>-h</tt> flag means `Do not write any history'. The <tt>-H</tt> flag means `Exclude time-stamp and user-name (<tt>LOGNAME</tt>) from history'. This flag is useful in program testing, since it causes the same values to be written to <tt>history</tt> each time, thus facilitating comparison of actual output with that expected. Section 4.5 of <a href="/software/netcdf/guide_toc.html"> NetCDF User's Guide</a> explains the two aspects of error-handling: suppression of error messages and fatality of errors. The default mode is verbose and fatal. Non-verbose (silent) mode is set by flag <tt>-s</tt>. Non-fatal (persevere) mode is set by flag <tt>-p</tt>. The <tt>-e</tt> flag means `Write error messages to <tt>stdout</tt> not <tt>stderr</tt>'. The option `<tt>-t</tt> type' sets the data-type for new variables or attributes. Valid values are <tt>char, byte, short, long, float</tt> and <tt>double</tt>. These can be abbreviated to their first letter. The option `<tt>-u</tt> unit' sets the unit of measure for ASCII text data, providing conversion to or from those defined by netCDF <tt>units</tt> attributes. <a id="Scaling" name="Scaling"></a> <h3>Scaling and Unit Conversion</h3> All netCDF input and output values are transformed by a linear equation defined by the attributes <tt>add_offset</tt>, <tt>scale_factor</tt> and <tt>units</tt>; together with any unit defined by the <tt>-u</tt> option mentioned above. The output <tt>units</tt> attribute is defined or modified in some situations such as when it is undefined but the corresponding input attribute is defined. All unit conversion is done using the Units Library documented in Appendix C of <a href="/software/netcdf/guide_toc.html"> NetCDF User's Guide</a>. The environment variable <tt>UDUNITS_PATH</tt> can be used to specify a non-standard units file. (See <tt>man</tt> document <tt>udunits(3)</tt>.) <h3>Missing Values</h3> Values read from a netCDF file are considered missing if outside the valid range defined by the attribute <tt>valid_range</tt> or the attributes <tt>valid_min</tt>, and <tt>valid_max</tt>. If these do not define either the minimum or the maximum then an attempt is made to define it based on the principle that the missing value must be outside the valid range. The missing value is defined by the attribute <tt>missing_value</tt>, or if this is undefined then the fill value (defined by attribute <tt>_FillValue</tt> if defined, otherwise the default fill value for the data type). <h3>Environment Variables</h3> The environment variable <tt>UDUNITS_PATH</tt> was mentioned in Section <a href="#Scaling">Scaling</a>. The environment variable <tt>COLUMNS</tt> (default: 80) defines the page width and is used to print data of type <tt>character</tt>. <h2>nc2text</h2> This utility prints variable and attribute values from netCDF files. <a id="nc2text_Usage" name="nc2text_Usage"></a> <h3>Usage</h3> <pre> Usage: nc2text [-eps] [-f %s] [-m %s] [-n %d] [-u %s] <FANI> <FANI> netCDF FAN specification for input -e Write error messages to stdout not stderr -p Persevere after errors -s Silent mode: Suppress warning messages -f <string>: Format for output (default: C_format attribute ("%G" if none)) -m <string>: Missing value for output (default: _ ) -n <integer>: Number of fields per line of output (default: 10 if numeric) (Environment variable COLUMNS defines default for characters) -u <string>: Unit of measure for output (default: unit in file) </pre> <h3>Examples</h3> The following prints the first three elements of variable <tt>v</tt> of file <tt>vec.nc</tt>: <pre> $ nc2text 'vec.nc v[0 1 2]' 10 20.3 30.2 </pre> The following uses <tt>text2nc</tt> to <ul> <li>set attribute <tt>v:units</tt> to `<tt>degF</tt>'</li> <li>set attribute <tt>v:valid_min</tt> to -460�F (just below 0�K)</li> <li>modify <tt>v[2]</tt> so it is less than this valid minimum i.e. missing.</li> </ul> <pre> $ echo degF | text2nc vec.nc 'v:units' $ echo -460 | text2nc -t float vec.nc 'v:valid_min' $ echo -999 | text2nc vec.nc 'v[2]' </pre> Then we print four Celsius temperatures per line. The text `<tt>MISSING</tt>' is printed for missing values. Normal values are printed using the C format <tt>%8.4f</tt> (equivalent to the Fortran format <tt>F8.4</tt> i.e 4 decimal places with a total field width of 8 characters). <pre> $ nc2text -f '%8.4f' -m ' MISSING' -n 4 -u degC vec.nc 'v[:4]' -12.2222 -6.5000 MISSING 4.9444 10.0000 </pre> <h2>ncmeta</h2> This utility prints metadata from netCDF files. This metadata can include rank, shape, file names, variable names, dimension names and attribute names. <a id="ncmeta_Usage" name="ncmeta_Usage"></a> <h3>Usage</h3> <pre> Usage: ncmeta [-eps] [-w <LETTERS>] <FANI> <FANI> netCDF FAN specification for input -e Write error messages to stdout not stderr -p Persevere after errors -s Silent mode: Suppress warning messages -w <LETTERS>: What to print using following (default: s) a: attribute names d: dimension names f: file names r: rank (number of dimensions) s: shape (dimension sizes) v: variable names Example: ncmeta -w fvs abc.nc var1 var2 </pre> <a id="ncmeta_Examples" name="ncmeta_Examples"></a> <h3>Examples</h3> The following examples print the shape of the specified variables: <pre> $ ncmeta vec.nc v 5 $ ncmeta geog.nc tsur 3 4 </pre> The following example prints the filename, variable name, rank, dimensions and shape of the specified variables: <pre> $ ncmeta -w fvrds vec.nc v 'geog.nc tsur' lat lon vec.nc v 1 n 5 geog.nc tsur 2 lat lon 3 4 geog.nc lat 1 lat 3 geog.nc lon 1 lon 4 </pre> The following example prints the variable name and attribute name of the first (0) attribute of the first (0) variable: <pre> $ ncmeta -w va geog.nc '0:0' lat units </pre> <h2>ncrob</h2> This utility reads data from one or more netCDF variables, performs some process on it and then either prints the result or writes it to one or more netCDF variables. The type of process is defined by option `<tt>-r</tt> string', where string is one of the following: <center> <table border="1" summary="option meanings"> <tr> <td><tt>am</tt> </td> <td>arithmetic mean</td> </tr> <tr> <td><tt>broadcast</tt> </td> <td>cyclic copy</td> </tr> <tr> <td><tt>count</tt> </td> <td>number of non-missing values</td> </tr> <tr> <td><tt>fill</tt> </td> <td>fill with missing values</td> </tr> <tr> <td><tt>gm</tt> </td> <td>geometric mean</td> </tr> <tr> <td><tt>max</tt> </td> <td>maximum</td> </tr> <tr> <td><tt>min</tt> </td> <td>minimum</td> </tr> <tr> <td><tt>prod</tt> </td> <td>product</td> </tr> <tr> <td><tt>sd</tt> </td> <td>unadjusted standard deviation (divisor is  <var>n</var> )</td> </tr> <tr> <td><tt>sd1</tt> </td> <td>adjusted standard deviation (divisor is  <var>n</var>-1 )</td> </tr> <tr> <td><tt>sum</tt> </td> <td>sum</td> </tr> <tr> <td><tt>sum2</tt> </td> <td>sum of squares of values</td> </tr> </table> </center> A <tt>broadcast</tt> copies successive elements from input to output. Whenever the end of input is reached, reading begins again at the start of input. The whole process continues until reaching the end of output. A <tt>fill</tt> simply fills the output variable with missing values. There must be input, although it is used only to define the shape of new variables. The other processes are all reductions, in the sense that they reduce the rank (number of dimensions). The number of input elements ( <var>I</var> ) must be a multiple of both the number of output elements ( <var>N</var> ) and the number of weights ( <var>M</var> ) (if any, as specified by option <tt>-w</tt>). If the process is <tt>count</tt> and there are no weights then the result is the number of non-missing values. If there are weights then the result is the sum of the weights of the non-missing values. Let vector  <var>X</var>0, <var>X</var>1, ..., <var> X</var><var>i</var>, ..., <var>X</var><var> I</var>-1  represent the selected input data elements in the specified order. Similarly, let vector  <var>Y</var>0, <var>Y</var>1, ... <var> Y</var><var>j</var>, ..., <var>Y</var><var> N</var>-1  represent the resultant output data. Let  <var>n</var> = <var>I</var> � <var>N</var>. If the process is <tt>sum</tt> and there are no weights then <center>   <var>Y</var><var>j</var> =  Sum<var>i</var>=0,<var>n</var>-1 <var>X</var> <var>Ni</var>+<var>j</var>   </center> If weights  <var>W</var>0, <var>W</var>1, ..., <var> W</var><var>k</var>, ..., <var>W</var><var> M</var>-1  are defined and  <var>m</var> = <var>I</var> � <var>M</var>  then <center>   <var>Y</var><var>j</var> =  Sum<var>i</var>=0,<var>n</var>-1 <var>W</var> floor((<var>Ni</var>+<var>j</var>)/m)<var>X</var> <var>Ni</var>+<var>j</var>   </center> where  floor( <var>x</var> )  represents the floor of  <var>x</var>  i.e. the greatest integer  <= <var>x</var>. This is calculated using the following algorithm:  <var>n</var>  <tt>:=</tt>  <var>I</var>�<var>N</var>   <var>m</var>  <tt>:=</tt>  <var>I</var>�<var>M</var>  for  <var>j</var>  from 0 to  <var>N</var>-1  <tt>   </tt>  <var>Y</var><var>j</var>  <tt>:=</tt> 0 for  <var>i</var>  from 0 to  <var>I</var>-1  <tt>   </tt>  <var>j</var>  <tt>:=</tt>  <var>i</var> mod <var>n</var>  <tt>   </tt>  <var>k</var>  <tt>:=</tt>  floor( <var>i</var>/m )  <tt>   </tt> if  <var>Y</var><var>j</var> �=  <tt>missing_value</tt> <tt>   </tt> <tt>   </tt> if <tt>valid_min</tt>  <= <var>X</var><var>i</var> <=  <tt>valid_max</tt> <tt>   </tt> <tt>   </tt> <tt>   </tt>  <var>Y</var><var>j</var>  <tt>:=</tt>  <var>Y</var><var>j</var> + <var>W</var> <var>k</var> <var>X</var><var>i</var>  <tt>   </tt> <tt>   </tt> else if <tt>suddenDeath</tt> <tt>   </tt> <tt>   </tt> <tt>   </tt>  <var>Y</var><var>j</var>  <tt>:=</tt> <tt>missing_value</tt> Note that this definition of  <var>k</var>  means that the first  <var>m</var>  elements have the first weight  <var>W</var>0 , the next  <var>m</var>  have the second weight  <var>W</var>1 , and so on. As an example consider an input array which is a matrix  <var>A</var>  with  <var>R</var>  rows and  <var>C</var>  columns. Thus  <var>I</var>=<var>RC</var>. If we want column sums then the output vector would be of length  <var>C</var>  i.e.  <var>N</var>=<var>C</var>. Now  <var>n</var>= <var>I</var> � <var>N</var> = <var> R</var>. So the unweighted sum is <center>   <var>Y</var><var>j</var> =  Sum<var>i</var>=0,<var>R</var>-1 <var>X</var> <var>Ci</var>+<var>j</var> = Sum <var>i</var>=0,<var>R</var>-1 <var>A</var><var>ij</var>   </center> and the weighted sum is <center>   <var>Y</var><var>j</var> =  Sum<var>i</var>=0,<var>R</var>-1 <var>W</var> floor((<var>Ci</var>+<var>j</var>)/C) <var> X</var><var>Ci</var>+<var>j</var> = Sum <var>i</var>=0,<var>R</var>-1 <var>W</var><var>i</var> <var> A</var><var>ij</var>   </center> If the process is <tt>prod</tt> and there are no weights then <center>   <var>Y</var><var>j</var> =  Product<var>i</var>=0,<var>n</var>-1 <var>X</var> <var>Ni</var>+<var>j</var>   </center> If weights are defined then <center>   <var>Y</var><var>j</var> =  Product<var>i</var>=0,<var>n</var>-1 <var>X</var> <var>Ni</var>+<var>j</var> <var>W</var> floor((<var>Ni</var>+<var>j</var>)/m)   </center> In general the shape (dimension vector) of the destination should match the trailing dimensions of the source. Then the reduction process operates over those leading dimensions absent from the destination. Note that FAN allows you to transpose dimensions by specifying them in an order different from that in the file. Thus the leading source dimensions are those specified first. The order of the remaining dimensions must match those of the destination. The other reduction processes are treated similarly. However <tt>min</tt> and <tt>max</tt> do not allow weights. If the <tt>-m</tt> flag is specified then the result is missing if any of the values it depends on is missing (sudden death mode). Otherwise missing values are omitted (filter mode) i.e. essentially treated as having a weight of 0. The <tt>-b</tt> option sets the size of the input buffer. This can improve efficiency when reading very large variables. The <tt>-c</tt> option creates a new destination variable with the specified rank (number of dimensions). If the variable already exists then this option is ignored. If the destination file does not exist then it is created. The variable is created with the same attributes as the (first if several) source variable, and the specified number of its trailing dimensions, together with any associated coordinate variables. However a broadcast is slightly different in that a new leading dimension is created from the leading source dimensions by taking the product of their sizes (so the total number of elements is unchanged) and concatenating their names. The data-type of the new variable is specified using option <tt>-t</tt> and defaults to the type of the source variable. <a id="ncrob_Usage" name="ncrob_Usage"></a> <h3>Usage</h3> <pre> Usage: ncrob [options] <FANI> / <FANO> <FANI>: FAN specification for input <FANO>: FAN specification for output (default: stdout) -e Write error messages to stdout not stderr -H Exclude time-stamp & LOGNAME from history -h Do not write history -m If any value missing then result missing -p Persevere after errors -s Silent mode: Suppress warning messages -b <int>: Max. buffer size (Kbytes) (default: 512) -c <int>: Rank (decrement if < 0) of any Created variable including stdout (default: input rank for broadcast, else -1) -f <string>: Format for stdout (default: C_format attribute ("%G" if none)) -M <string>: Missing value for stdout (default: _ ) -n <integer>: Number of fields per line for stdout (default: 10 if numeric) (Environment variable COLUMNS defines default for characters) -r <string>: Reduction type (am broadcast count fill gm max min prod sd sd1 sum sum2) (default: broadcast) -t char|byte|short|long|float|double: new variable type (default: input type) -u <string>: Unit of measure for stdout (default: unit in file) -w <reals>: Weight vector(e.g. -w '3 1.5 .8') </pre> If the `<tt>/</tt>' is omitted then the final argument is taken as <tt><FANO></tt>. (This version 1 convention is deprecated.) If <tt><FANO></tt> does not specify a filename or variable name then the first one in <tt><FANI></tt> is used. <h3>Examples</h3> The following prints the variable <tt>M</tt> in file <tt>mat.nc</tt>: <pre> $ ncrob mat.nc M / 11 12 13 21 22 23 </pre> The following prints the column sums, row means and overall product: <pre> $ ncrob -r sum mat.nc M / # sum of each column 32 34 36 $ ncrob -r am mat.nc 'M[col]' / # arithmetic mean of each row 12 22 $ ncrob -r pro