@gmod/bam
Version:
Parser for BAM and BAM index (bai) files
164 lines (118 loc) • 5.26 kB
Markdown
[](https://npmjs.org/package/@gmod/bam)
[](https://codecov.io/gh/GMOD/bam-js/branch/master)
[](https://github.com/GMOD/bam-js/actions?query=branch%3Amaster+workflow%3APush+)
## Install
```bash
$ npm install --save /bam
```
## Usage
```typescript
const { BamFile } = require('/bam')
// or import {BamFile} from '@gmod/bam'
const t = new BamFile({
bamPath: 'test.bam',
})
// note: it's required to first run getHeader before any getRecordsForRange
var header = await t.getHeader()
// this would get same records as samtools view ctgA:1-50000
var records = await t.getRecordsForRange('ctgA', 0, 50000)
```
The `bamPath` argument only works on nodejs. In the browser, you should pass
`bamFilehandle` with a generic-filehandle2 e.g. `RemoteFile`
```typescript
const { RemoteFile } = require('generic-filehandle2')
const bam = new BamFile({
bamFilehandle: new RemoteFile('yourfile.bam'), // or a full http url
baiFilehandle: new RemoteFile('yourfile.bam.bai'), // or a full http url
})
```
Input are 0-based half-open coordinates (note: not the same as samtools view
coordinate inputs!)
## Usage with htsget
Since 1.0.41 we support usage of the htsget protocol
Here is a small code snippet for this
```typescript
const { HtsgetFile } = require('/bam')
const ti = new HtsgetFile({
baseUrl: 'http://htsnexus.rnd.dnanex.us/v1/reads',
trackId: 'BroadHiSeqX_b37/NA12878',
})
await ti.getHeader()
const records = await ti.getRecordsForRange(1, 2000000, 2000001)
```
Our implementation makes some assumptions about how the protocol is implemented,
so let us know if it doesn't work for your use case
## Documentation
### BAM constructor
The BAM class constructor accepts arguments
- `bamPath`/`bamUrl`/`bamFilehandle` - a string file path to a local file or a
class object with a read method
- `csiPath`/`csiUrl`/`csiFilehandle` - a CSI index for the BAM file, required
for long chromosomes greater than 2^29 in length
- `baiPath`/`baiUrl`/`baiFilehandle` - a BAI index for the BAM file
- `cacheSize` - limit on number of chunks to cache. default: 50
- `yieldThreadTime` - the interval at which the code yields to the main thread
when it is parsing a lot of data. default: 100ms. Set to 0 to performed no
yielding
Note: filehandles implement the Filehandle interface from
https://www.npmjs.com/package/generic-filehandle2.
This module offers the path and url arguments as convenience methods for
supplying the LocalFile and RemoteFile
### async getRecordsForRange(refName, start, end, opts)
Note: you must run getHeader before running getRecordsForRange
- `refName` - a string for the chrom to fetch from
- `start` - a 0-based half open start coordinate
- `end` - a 0-based half open end coordinate
- `opts.signal` - an AbortSignal to indicate stop processing
- `opts.viewAsPairs` - re-dispatches requests to find mate pairs. default: false
- `opts.pairAcrossChr` - control the viewAsPairs option behavior to pair across
chromosomes. default: false
- `opts.maxInsertSize` - control the viewAsPairs option behavior to limit
distance within a chromosome to fetch. default: 200kb
### async \*streamRecordsForRange(refName, start, end, opts)
This is a async generator function that takes the same signature as
`getRecordsForRange` but results can be processed using
```typescript
for await (const chunk of file.streamRecordsForRange(
refName,
start,
end,
opts,
)) {
}
```
The `getRecordsForRange` simply wraps this process by concatenating chunks into
an array
### async getHeader(opts: {....anything to pass to generic-filehandle2 opts})
This obtains the header from `HtsgetFile` or `BamFile`. Retrieves BAM file and
BAI/CSI header if applicable, or API request for refnames from htsget
### async indexCov(refName, start, end)
- `refName` - a string for the chrom to fetch from
- `start` - a 0-based half open start coordinate (optional)
- `end` - a 0-based half open end coordinate (optional)
Returns features of the form {start, end, score} containing estimated feature
density across 16kb windows in the genome
### async lineCount(refName: string)
- `refName` - a string for the chrom to fetch from
Returns number of features on refName, uses special pseudo-bin from the BAI/CSI
index (e.g. bin 37450 from bai, returning n_mapped from SAM spec pdf) or -1 if
refName not exist in sample
### async hasRefSeq(refName: string)
- `refName` - a string for the chrom to check
Returns whether we have this refName in the sample
### Returned features
Example
```typescript
feature.ref_id // numerical sequence id corresponding to position in the sam header
feature.start // 0-based half open start coordinate
feature.end // 0-based half open end coordinate
feature.name // QNAME
feature.seq // feature sequence
feature.qual // qualities
feature.CIGAR // CIGAR string
feature.tags // tags
feature.flags // flags
feature.template_length // TLEN
```
## License
MIT © [Colin Diesh](https://github.com/cmdcolin)