UNPKG

@gmod/bam

Version:

Parser for BAM and BAM index (bai) files

215 lines (159 loc) 6.5 kB
[![NPM version](https://img.shields.io/npm/v/@gmod/bam.svg?style=flat-square)](https://npmjs.org/package/@gmod/bam) [![Build Status](https://img.shields.io/github/actions/workflow/status/GMOD/bam-js/push.yml?branch=main)](https://github.com/GMOD/bam-js/actions?query=branch%3Amain+workflow%3APush+) ## Install ```bash $ npm install --save @gmod/bam ``` ## Usage ```typescript import { BamFile } from '@gmod/bam' const t = new BamFile({ bamPath: 'test.bam', }) // note: it's required to first run getHeader before any getRecordsForRange const header = await t.getHeader() // this would get same records as samtools view ctgA:1-50000 const records = await t.getRecordsForRange('ctgA', 0, 50000) ``` The `bamPath` argument only works on nodejs. In the browser, you should pass `bamFilehandle` with a generic-filehandle2 e.g. `RemoteFile` ```typescript import { RemoteFile } from 'generic-filehandle2' import { BamFile } from '@gmod/bam' const bam = new BamFile({ bamFilehandle: new RemoteFile('yourfile.bam'), // or a full http url baiFilehandle: new RemoteFile('yourfile.bam.bai'), // or a full http url }) ``` Input are 0-based half-open coordinates (note: not the same as samtools view coordinate inputs!) ## Usage with htsget Since 1.0.41 we support usage of the htsget protocol Here is a small code snippet for this ```typescript import { HtsgetFile } from '@gmod/bam' const ti = new HtsgetFile({ baseUrl: 'http://htsnexus.rnd.dnanex.us/v1/reads', trackId: 'BroadHiSeqX_b37/NA12878', }) await ti.getHeader() const records = await ti.getRecordsForRange('1', 2000000, 2000001) ``` Let us know if it doesn't work for your use case. Caveat: htsget `getRecordsForRange` does not honor `viewAsPairs`, `pairAcrossChr`, `maxInsertSize`, or `filterBy`. The range is fetched from the server as-is. ## Documentation ### BAM constructor The BAM class constructor accepts arguments - `bamPath`/`bamUrl`/`bamFilehandle` - a local file path, remote URL string, or a class object with a read method - `csiPath`/`csiUrl`/`csiFilehandle` - a CSI index for the BAM file, required for long chromosomes greater than 2^29 in length - `baiPath`/`baiUrl`/`baiFilehandle` - a BAI index for the BAM file - `recordClass` - a custom class extending BamRecord to use for records (see Custom BamRecord class section below) Note: filehandles implement the Filehandle interface from generic-filehandle2. The `path` and `url` arguments are convenience wrappers for `LocalFile` and `RemoteFile`. ### async getRecordsForRange(refName, start, end, opts) Note: requires calling `getHeader` first. - `refName` - a string for the chrom to fetch from - `start` - a 0-based half open start coordinate - `end` - a 0-based half open end coordinate - `opts.signal` - an AbortSignal to indicate stop processing - `opts.viewAsPairs` - re-dispatches requests to find mate pairs. default: false - `opts.pairAcrossChr` - control the viewAsPairs option behavior to pair across chromosomes. default: false - `opts.maxInsertSize` - control the viewAsPairs option behavior to limit distance within a chromosome to fetch. default: 200kb ### async getHeader(opts?) Fetches the header from `BamFile` or `HtsgetFile`. Must be called before `getRecordsForRange`. ### async indexCov(refName, start, end) - `refName` - a string for the chrom to fetch from - `start` - a 0-based half open start coordinate (optional) - `end` - a 0-based half open end coordinate (optional) Returns features of the form {start, end, score} containing estimated feature density across 16kb windows in the genome. BAI-only: derived from the linear index, which CSI omits — calling on a CSI-indexed file returns `[]`. ### async lineCount(refName: string) - `refName` - a string for the chrom to fetch from Returns number of features on refName, uses special pseudo-bin from the BAI/CSI index (e.g. bin 37450 from bai, returning n_mapped from SAM spec pdf) or 0 if refName does not exist in the sample ### async hasRefSeq(refName: string) - `refName` - a string for the chrom to check Returns whether we have this refName in the sample ### BamRecord properties ```typescript // Core alignment fields record.fileOffset // "file offset" based id -- not a true file offset record.ref_id // numerical sequence id from SAM header record.start // 0-based start coordinate record.end // 0-based end coordinate record.name // QNAME record.seq // sequence string record.qual // Uint8Array of quality scores (null if unmapped) record.CIGAR // CIGAR string e.g. "50M2I48M" record.flags // SAM flags integer record.mq // mapping quality (undefined if 255) record.strand // 1 or -1 record.template_length // TLEN // Auxiliary data record.tags // object with all aux tags e.g. {MD: "100", NM: 0} record.getTag('MD') // get a single tag (more efficient than record.tags when you only need one) record.getTagRaw('MD') // get tag as Uint8Array for string tags (avoids string conversion) record.NUMERIC_MD // MD tag as Uint8Array (for fast mismatch rendering) record.NUMERIC_CIGAR // Uint32Array of packed CIGAR operations record.NUMERIC_SEQ // Uint8Array of packed sequence (4-bit encoded) // Mate info record.next_refid // mate reference id record.next_pos // mate position // Flag methods record.isPaired() record.isProperlyPaired() record.isSegmentUnmapped() record.isMateUnmapped() record.isReverseComplemented() record.isMateReverseComplemented() record.isRead1() record.isRead2() record.isSecondary() record.isFailedQc() record.isDuplicate() record.isSupplementary() // Utility record.seqAt(idx) // get single base at position record.toJSON() // serialize record ``` ### Custom BamRecord class You can provide your own BamRecord class to add custom properties or methods: ```typescript import { BamFile, BamRecord } from '@gmod/bam' class CustomBamRecord extends BamRecord { get customProperty() { return `custom-${this.name}` } getDoubleStart() { return this.start * 2 } } const bam = new BamFile<CustomBamRecord>({ bamPath: 'test.bam', recordClass: CustomBamRecord, }) await bam.getHeader() const records = await bam.getRecordsForRange('ctgA', 0, 50000) // records are typed as CustomBamRecord[] console.log(records[0].customProperty) console.log(records[0].getDoubleStart()) ``` ## License MIT © [Colin Diesh](https://github.com/cmdcolin) ## Publishing [Trusted publishing](https://docs.npmjs.com/about-trusted-publishing) via GitHub Actions. ```bash pnpm version patch # or minor/major ```