@gmod/gff
Version:
read and write GFF3 data as streams
767 lines (517 loc) • 21.3 kB
Markdown
# @gmod/gff
[](https://github.com/GMOD/gff-js/actions?query=branch%3Amaster+workflow%3APush+)
Read and write GFF3 data performantly. This module aims to be a complete
implementation of the [GFF3
specification](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md).
**NOTE: this module uses the NPM stream package, which requires node.js polyfills for use on the web. We also created the https://github.com/cmdcolin/gff-nostream package to allow a non-streaming version that does not require polyfills**
- streaming parsing and streaming formatting
- proper escaping and unescaping of attribute and column values
- supports features with multiple locations and features with multiple parents
- reconstructs feature hierarchies of both `Parent` and `Derives_from`
relationships
- parses FASTA sections
- does no validation except for referential integrity of `Parent` and
`Derives_from` relationships (can disable Derives_from reference checking
with `disableDerivesFromReferences`)
- only compatible with GFF3
## Compatability
Works in the browser and with Node.js v18 and up.
## Install
$ npm install --save @gmod/gff
## Usage
### Node.js example
```js
import {
createReadStream,
createWriteStream,
readFileSync,
writeFileSync,
} from 'fs'
import { Readable, Writable } from 'stream'
import { TransformStream } from 'stream/web'
import {
formatSync,
parseStringSync,
GFFTransformer,
GFFFormattingTransformer,
} from '@gmod/gff'
// parse a file from a file name. parses only features and sequences by default,
// set options to parse directives and/or comments
;(async () => {
const readStream = createReadStream('/path/to/my/file.gff3')
const streamOfGFF3 = Readable.toWeb(readStream).pipeThrough(
new TransformStream(
new GFFTransformer({ parseComments: true, parseDirectives: true }),
),
)
for await (const data of streamOfGFF3) {
if ('directive' in data) {
console.log('got a directive', data)
} else if ('comment' in data) {
console.log('got a comment', data)
} else if ('sequence' in data) {
console.log('got a sequence from a FASTA section')
} else {
console.log('got a feature', data)
}
}
// parse a string of gff3 synchronously
const stringOfGFF3 = readFileSync('/path/to/my/file.gff3', 'utf8')
const arrayOfGFF3ITems = parseStringSync(stringOfGFF3)
// format an array of items to a string
const newStringOfGFF3 = formatSync(arrayOfGFF3ITems)
writeFileSync('/path/to/new/file.gff3', newStringOfGFF3)
// read a file, format it, and write it to a new file. inserts sync marks and
// a '##gff-version 3' header if one is not already present
await Readable.toWeb(createReadStream('/path/to/my/file.gff3'))
.pipeThrough(
new TransformStream(
new GFFTransformer({ parseComments: true, parseDirectives: true }),
),
)
.pipeThrough(new TransformStream(new GFFFormattingTransformer()))
.pipeTo(Writable.toWeb(createWriteStream('/path/to/my/file.gff3')))
})()
```
### Browser example
```js
import { GFFTransformer } from '@gmod/gff'
// parse a file from a URL. parses only features and sequences by default, set
// options to parse directives and/or comments
;(async () => {
const response = await fetch('http://example.com/file.gff3')
if (!response.ok) {
throw new Error('Bad response')
}
if (!response.body) {
throw new Error('No response body')
}
const streamOfGFF3 = response.body.pipeThrough(
new TransformStream(
new GFFTransformer({ parseComments: true, parseDirectives: true }),
),
)
for await (const chunk of streamOfGFF3) {
if ('directive' in data) {
console.log('got a directive', data)
} else if ('comment' in data) {
console.log('got a comment', data)
} else if ('sequence' in data) {
console.log('got a sequence from a FASTA section')
} else {
console.log('got a feature', data)
}
}
})()
```
## Object format
### features
In GFF3, features can have more than one location. We parse features
as arrayrefs of all the lines that share that feature's ID.
Values that are `.` in the GFF3 are `null` in the output.
A simple feature that's located in just one place:
```json
[
{
"seq_id": "ctg123",
"source": null,
"type": "gene",
"start": 1000,
"end": 9000,
"score": null,
"strand": "+",
"phase": null,
"attributes": {
"ID": ["gene00001"],
"Name": ["EDEN"]
},
"child_features": [],
"derived_features": []
}
]
```
A CDS called `cds00001` located in two places:
```json
[
{
"seq_id": "ctg123",
"source": null,
"type": "CDS",
"start": 1201,
"end": 1500,
"score": null,
"strand": "+",
"phase": "0",
"attributes": {
"ID": ["cds00001"],
"Parent": ["mRNA00001"]
},
"child_features": [],
"derived_features": []
},
{
"seq_id": "ctg123",
"source": null,
"type": "CDS",
"start": 3000,
"end": 3902,
"score": null,
"strand": "+",
"phase": "0",
"attributes": {
"ID": ["cds00001"],
"Parent": ["mRNA00001"]
},
"child_features": [],
"derived_features": []
}
]
```
### directives
```js
parseDirective("##gff-version 3\n")
// returns
{
"directive": "gff-version",
"value": "3"
}
```
```js
parseDirective('##sequence-region ctg123 1 1497228\n')
// returns
{
"directive": "sequence-region",
"value": "ctg123 1 1497228",
"seq_id": "ctg123",
"start": "1",
"end": "1497228"
}
```
### comments
```js
parseComment('# hi this is a comment\n')
// returns
{
"comment": "hi this is a comment"
}
```
### sequences
These come from any embedded `##FASTA` section in the GFF3 file.
```js
parseSequences(`##FASTA
>ctgA test contig
ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA`)[
// returns
{
id: 'ctgA',
description: 'test contig',
sequence: 'ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA',
}
]
```
## API
<!-- Generated by documentation.js. Update this documentation by updating the source code. -->
#### Table of Contents
- [ParseOptions](#parseoptions)
- [disableDerivesFromReferences](#disablederivesfromreferences)
- [parseFeatures](#parsefeatures)
- [parseDirectives](#parsedirectives)
- [parseComments](#parsecomments)
- [parseSequences](#parsesequences)
- [bufferSize](#buffersize)
- [GFFTransformer](#gfftransformer)
- [Parameters](#parameters)
- [parseStringSync](#parsestringsync)
- [Parameters](#parameters-1)
- [formatSync](#formatsync)
- [Parameters](#parameters-2)
- [FormatOptions](#formatoptions)
- [minSyncLines](#minsynclines)
- [insertVersionDirective](#insertversiondirective)
- [GFFFormattingTransformer](#gffformattingtransformer)
- [Parameters](#parameters-3)
### ParseOptions
Parser options
#### disableDerivesFromReferences
Whether to resolve references to derives from features
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
#### parseFeatures
Whether to parse features, default true
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
#### parseDirectives
Whether to parse directives, default false
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
#### parseComments
Whether to parse comments, default false
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
#### parseSequences
Whether to parse sequences, default true
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
#### bufferSize
Maximum number of GFF3 lines to buffer, default Infinity
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
### GFFTransformer
Parse a stream of text data into a stream of feature, directive, comment,
an sequence objects.
#### Parameters
- `options` **O?** Parser options
### parseStringSync
Synchronously parse a string containing GFF3 and return an array of the
parsed items.
#### Parameters
- `str` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** GFF3 string
- `inputOptions` **O?** Parsing options
Returns **any** array of parsed features, directives, comments and/or sequences
### formatSync
Format an array of GFF3 items (features,directives,comments) into string of
GFF3. Does not insert synchronization (###) marks.
#### Parameters
- `items` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)\<GFF3Item>** Array of features, directives, comments and/or sequences
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** the formatted GFF3
### FormatOptions
Formatter options
#### minSyncLines
The minimum number of lines to emit between sync (###) directives, default
100
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
#### insertVersionDirective
Whether to insert a version directive at the beginning of a formatted
stream if one does not exist already, default true
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
### GFFFormattingTransformer
Transform a stream of features, directives, comments and/or sequences into a
stream of GFF3 text.
Inserts synchronization (###) marks automatically.
#### Parameters
- `options` **[FormatOptions](#formatoptions)** Formatter options (optional, default `{}`)
## About `util`
There is also a `util` module that contains super-low-level functions for dealing with lines and parts of lines.
```js
import { util } from '@gmod/gff'
const gff3Lines = util.formatItem({
seq_id: 'ctgA',
...
})
```
## util
<!-- Generated by documentation.js. Update this documentation by updating the source code. -->
#### Table of Contents
- [unescape](#unescape)
- [Parameters](#parameters)
- [escape](#escape)
- [Parameters](#parameters-1)
- [escapeColumn](#escapecolumn)
- [Parameters](#parameters-2)
- [parseAttributes](#parseattributes)
- [Parameters](#parameters-3)
- [parseFeature](#parsefeature)
- [Parameters](#parameters-4)
- [parseDirective](#parsedirective)
- [Parameters](#parameters-5)
- [formatAttributes](#formatattributes)
- [Parameters](#parameters-6)
- [formatFeature](#formatfeature)
- [Parameters](#parameters-7)
- [formatDirective](#formatdirective)
- [Parameters](#parameters-8)
- [formatComment](#formatcomment)
- [Parameters](#parameters-9)
- [formatSequence](#formatsequence)
- [Parameters](#parameters-10)
- [formatItem](#formatitem)
- [Parameters](#parameters-11)
- [GFF3Attributes](#gff3attributes)
- [GFF3FeatureLine](#gff3featureline)
- [seq_id](#seq_id)
- [source](#source)
- [type](#type)
- [start](#start)
- [end](#end)
- [score](#score)
- [strand](#strand)
- [phase](#phase)
- [attributes](#attributes)
- [GFF3FeatureLineWithRefs](#gff3featurelinewithrefs)
- [child_features](#child_features)
- [derived_features](#derived_features)
- [GFF3Feature](#gff3feature)
- [BaseGFF3Directive](#basegff3directive)
- [directive](#directive)
- [value](#value)
- [GFF3SequenceRegionDirective](#gff3sequenceregiondirective)
- [value](#value-1)
- [seq_id](#seq_id-1)
- [start](#start-1)
- [end](#end-1)
- [GFF3GenomeBuildDirective](#gff3genomebuilddirective)
- [value](#value-2)
- [source](#source-1)
- [buildName](#buildname)
- [GFF3Comment](#gff3comment)
- [comment](#comment)
- [GFF3Sequence](#gff3sequence)
- [id](#id)
- [description](#description)
- [sequence](#sequence)
### unescape
Unescape a string value used in a GFF3 attribute.
#### Parameters
- `stringVal` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** Escaped GFF3 string value
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** An unescaped string value
### escape
Escape a value for use in a GFF3 attribute value.
#### Parameters
- `rawVal` **([string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) | [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number))** Raw GFF3 attribute value
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** An escaped string value
### escapeColumn
Escape a value for use in a GFF3 column value.
#### Parameters
- `rawVal` **([string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) | [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number))** Raw GFF3 column value
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** An escaped column value
### parseAttributes
Parse the 9th column (attributes) of a GFF3 feature line.
#### Parameters
- `attrString` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** String of GFF3 9th column
Returns **[GFF3Attributes](#gff3attributes)** Parsed attributes
### parseFeature
Parse a GFF3 feature line
#### Parameters
- `line` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** GFF3 feature line
Returns **[GFF3FeatureLine](#gff3featureline)** The parsed feature
### parseDirective
Parse a GFF3 directive line.
#### Parameters
- `line` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** GFF3 directive line
Returns **(GFF3Directive | [GFF3SequenceRegionDirective](#gff3sequenceregiondirective) | [GFF3GenomeBuildDirective](#gff3genomebuilddirective) | null)** The parsed directive
### formatAttributes
Format an attributes object into a string suitable for the 9th column of GFF3.
#### Parameters
- `attrs` **[GFF3Attributes](#gff3attributes)** Attributes
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** GFF3 9th column string
### formatFeature
Format a feature object or array of feature objects into one or more lines of
GFF3.
#### Parameters
- `featureOrFeatures` **([GFF3FeatureLine](#gff3featureline) | [GFF3FeatureLineWithRefs](#gff3featurelinewithrefs) | [Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<([GFF3FeatureLine](#gff3featureline) | [GFF3FeatureLineWithRefs](#gff3featurelinewithrefs))>)** A feature object or array of feature objects
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** A string of one or more GFF3 lines
### formatDirective
Format a directive into a line of GFF3.
#### Parameters
- `directive` **GFF3Directive** A directive object
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** A directive line string
### formatComment
Format a comment into a GFF3 comment.
Yes I know this is just adding a # and a newline.
#### Parameters
- `comment` **[GFF3Comment](#gff3comment)** A comment object
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** A comment line string
### formatSequence
Format a sequence object as FASTA
#### Parameters
- `seq` **[GFF3Sequence](#gff3sequence)** A sequence object
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** Formatted single FASTA sequence string
### formatItem
Format a directive, comment, sequence, or feature, or array of such items,
into one or more lines of GFF3.
#### Parameters
- `item` **([GFF3FeatureLineWithRefs](#gff3featurelinewithrefs) | GFF3Directive | [GFF3Comment](#gff3comment) | [GFF3Sequence](#gff3sequence))** 
- `itemOrItems` A comment, sequence, or feature, or array of such items
Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** A formatted string or array of strings
### GFF3Attributes
A record of GFF3 attribute identifiers and the values of those identifiers
Type: Record<[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String), [Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)>>
### GFF3FeatureLine
A representation of a single line of a GFF3 file
#### seq_id
The ID of the landmark used to establish the coordinate system for the current feature
Type: ([string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) | null)
#### source
A free text qualifier intended to describe the algorithm or operating procedure that generated this feature
Type: ([string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) | null)
#### type
The type of the feature
Type: ([string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) | null)
#### start
The start coordinates of the feature
Type: ([number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) | null)
#### end
The end coordinates of the feature
Type: ([number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) | null)
#### score
The score of the feature
Type: ([number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) | null)
#### strand
The strand of the feature
Type: ([string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) | null)
#### phase
For features of type "CDS", the phase indicates where the next codon begins relative to the 5' end of the current CDS feature
Type: ([string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String) | null)
#### attributes
Feature attributes
Type: ([GFF3Attributes](#gff3attributes) | null)
### GFF3FeatureLineWithRefs
**Extends GFF3FeatureLine**
A GFF3 Feature line that includes references to other features defined in
their "Parent" or "Derives_from" attributes
#### child_features
An array of child features
Type: [Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[GFF3Feature](#gff3feature)>
#### derived_features
An array of features derived from this feature
Type: [Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[GFF3Feature](#gff3feature)>
### GFF3Feature
A GFF3 feature, which may include multiple individual feature lines
Type: [Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[GFF3FeatureLineWithRefs](#gff3featurelinewithrefs)>
### BaseGFF3Directive
A GFF3 directive
#### directive
The name of the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### value
The string value of the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
### GFF3SequenceRegionDirective
**Extends BaseGFF3Directive**
A GFF3 sequence-region directive
#### value
The string value of the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### seq_id
The sequence ID parsed from the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### start
The sequence start parsed from the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### end
The sequence end parsed from the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
### GFF3GenomeBuildDirective
**Extends BaseGFF3Directive**
A GFF3 genome-build directive
#### value
The string value of the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### source
The genome build source parsed from the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### buildName
The genome build name parsed from the directive
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
### GFF3Comment
A GFF3 comment
#### comment
The text of the comment
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
### GFF3Sequence
A GFF3 FASTA single sequence
#### id
The ID of the sequence
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### description
The description of the sequence
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### sequence
The sequence
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
## License
MIT © [Robert Buels](https://github.com/rbuels)