@gmod/bam
Version:
Parser for BAM and BAM index (bai) files
215 lines (159 loc) • 6.5 kB
Markdown
[](https://npmjs.org/package/@gmod/bam)
[](https://github.com/GMOD/bam-js/actions?query=branch%3Amain+workflow%3APush+)
## Install
```bash
$ npm install --save /bam
```
## Usage
```typescript
import { BamFile } from '/bam'
const t = new BamFile({
bamPath: 'test.bam',
})
// note: it's required to first run getHeader before any getRecordsForRange
const header = await t.getHeader()
// this would get same records as samtools view ctgA:1-50000
const records = await t.getRecordsForRange('ctgA', 0, 50000)
```
The `bamPath` argument only works on nodejs. In the browser, you should pass
`bamFilehandle` with a generic-filehandle2 e.g. `RemoteFile`
```typescript
import { RemoteFile } from 'generic-filehandle2'
import { BamFile } from '/bam'
const bam = new BamFile({
bamFilehandle: new RemoteFile('yourfile.bam'), // or a full http url
baiFilehandle: new RemoteFile('yourfile.bam.bai'), // or a full http url
})
```
Input are 0-based half-open coordinates (note: not the same as samtools view
coordinate inputs!)
## Usage with htsget
Since 1.0.41 we support usage of the htsget protocol
Here is a small code snippet for this
```typescript
import { HtsgetFile } from '/bam'
const ti = new HtsgetFile({
baseUrl: 'http://htsnexus.rnd.dnanex.us/v1/reads',
trackId: 'BroadHiSeqX_b37/NA12878',
})
await ti.getHeader()
const records = await ti.getRecordsForRange('1', 2000000, 2000001)
```
Let us know if it doesn't work for your use case.
Caveat: htsget `getRecordsForRange` does not honor `viewAsPairs`,
`pairAcrossChr`, `maxInsertSize`, or `filterBy`. The range is fetched from the
server as-is.
## Documentation
### BAM constructor
The BAM class constructor accepts arguments
- `bamPath`/`bamUrl`/`bamFilehandle` - a local file path, remote URL string, or
a class object with a read method
- `csiPath`/`csiUrl`/`csiFilehandle` - a CSI index for the BAM file, required
for long chromosomes greater than 2^29 in length
- `baiPath`/`baiUrl`/`baiFilehandle` - a BAI index for the BAM file
- `recordClass` - a custom class extending BamRecord to use for records (see
Custom BamRecord class section below)
Note: filehandles implement the Filehandle interface from generic-filehandle2.
The `path` and `url` arguments are convenience wrappers for `LocalFile` and
`RemoteFile`.
### async getRecordsForRange(refName, start, end, opts)
Note: requires calling `getHeader` first.
- `refName` - a string for the chrom to fetch from
- `start` - a 0-based half open start coordinate
- `end` - a 0-based half open end coordinate
- `opts.signal` - an AbortSignal to indicate stop processing
- `opts.viewAsPairs` - re-dispatches requests to find mate pairs. default: false
- `opts.pairAcrossChr` - control the viewAsPairs option behavior to pair across
chromosomes. default: false
- `opts.maxInsertSize` - control the viewAsPairs option behavior to limit
distance within a chromosome to fetch. default: 200kb
### async getHeader(opts?)
Fetches the header from `BamFile` or `HtsgetFile`. Must be called before
`getRecordsForRange`.
### async indexCov(refName, start, end)
- `refName` - a string for the chrom to fetch from
- `start` - a 0-based half open start coordinate (optional)
- `end` - a 0-based half open end coordinate (optional)
Returns features of the form {start, end, score} containing estimated feature
density across 16kb windows in the genome. BAI-only: derived from the linear
index, which CSI omits — calling on a CSI-indexed file returns `[]`.
### async lineCount(refName: string)
- `refName` - a string for the chrom to fetch from
Returns number of features on refName, uses special pseudo-bin from the BAI/CSI
index (e.g. bin 37450 from bai, returning n_mapped from SAM spec pdf) or 0 if
refName does not exist in the sample
### async hasRefSeq(refName: string)
- `refName` - a string for the chrom to check
Returns whether we have this refName in the sample
### BamRecord properties
```typescript
// Core alignment fields
record.fileOffset // "file offset" based id -- not a true file offset
record.ref_id // numerical sequence id from SAM header
record.start // 0-based start coordinate
record.end // 0-based end coordinate
record.name // QNAME
record.seq // sequence string
record.qual // Uint8Array of quality scores (null if unmapped)
record.CIGAR // CIGAR string e.g. "50M2I48M"
record.flags // SAM flags integer
record.mq // mapping quality (undefined if 255)
record.strand // 1 or -1
record.template_length // TLEN
// Auxiliary data
record.tags // object with all aux tags e.g. {MD: "100", NM: 0}
record.getTag('MD') // get a single tag (more efficient than record.tags when you only need one)
record.getTagRaw('MD') // get tag as Uint8Array for string tags (avoids string conversion)
record.NUMERIC_MD // MD tag as Uint8Array (for fast mismatch rendering)
record.NUMERIC_CIGAR // Uint32Array of packed CIGAR operations
record.NUMERIC_SEQ // Uint8Array of packed sequence (4-bit encoded)
// Mate info
record.next_refid // mate reference id
record.next_pos // mate position
// Flag methods
record.isPaired()
record.isProperlyPaired()
record.isSegmentUnmapped()
record.isMateUnmapped()
record.isReverseComplemented()
record.isMateReverseComplemented()
record.isRead1()
record.isRead2()
record.isSecondary()
record.isFailedQc()
record.isDuplicate()
record.isSupplementary()
// Utility
record.seqAt(idx) // get single base at position
record.toJSON() // serialize record
```
### Custom BamRecord class
You can provide your own BamRecord class to add custom properties or methods:
```typescript
import { BamFile, BamRecord } from '/bam'
class CustomBamRecord extends BamRecord {
get customProperty() {
return `custom-${this.name}`
}
getDoubleStart() {
return this.start * 2
}
}
const bam = new BamFile<CustomBamRecord>({
bamPath: 'test.bam',
recordClass: CustomBamRecord,
})
await bam.getHeader()
const records = await bam.getRecordsForRange('ctgA', 0, 50000)
// records are typed as CustomBamRecord[]
console.log(records[0].customProperty)
console.log(records[0].getDoubleStart())
```
## License
MIT © [Colin Diesh](https://github.com/cmdcolin)
## Publishing
[Trusted publishing](https://docs.npmjs.com/about-trusted-publishing) via GitHub
Actions.
```bash
pnpm version patch # or minor/major
```