@qgustavor/stream-audio-fingerprint
Version:
Audio landmark fingerprinting as a JavaScript module
165 lines (105 loc) • 6.23 kB
Markdown
# Audio Landmark Fingerprinting as a JavaScript module
This module converts a PCM audio signal into a series of audio fingerprints. It works with limited audio tracks (e.g. recorded audio) as well as with unlimited audio streams (e.g. broadcast radio).
It's based on [lpolito/stream-audio-fingerprint](https://github.com/lpolito/stream-audio-fingerprint) which is based [adblockradio/stream-audio-fingerprint](https://github.com/adblockradio/stream-audio-fingerprint) which is one of the foundations of the [Adblock Radio project](https://github.com/adblockradio/adblockradio).
## Credits and description
Check [the original project](https://github.com/adblockradio/stream-audio-fingerprint#credits) for credits and detailed info on the algorithm used. To be fair, even as the maintainer of this fork, I still don’t fully understand it.
## Usage
A usage demo is shown below. It requires the executable [ffmpeg](https://ffmpeg.org/download.html) and Deno to run.
```javascript
import Fingerprinter from 'npm:@qgustavor/stream-audio-fingerprint'
const decoder = (new Deno.Command('ffmpeg', {
args: [
'-i', 'pipe:0',
'-acodec', 'pcm_s16le',
'-ar', '22050',
'-ac', '1',
'-f', 's16le',
'-v', 'fatal',
'pipe:1'
],
stdout: 'piped',
stdin: 'inherit'
})).spawn()
const fingerprinter = new Fingerprinter()
const { dt } = fingerprinter.options
for await (const audioData of decoder.stdout.readable) {
const data = fingerprinter.process(audioData)
for (let i = 0; i < data.tcodes.length; i++) {
console.log(`time=${data.tcodes[i] * dt} fingerprint=${data.hcodes[i]}`)
}
}
```
and then we pipe audio data, either a stream or a file
```sh
curl http://radiofg.impek.com/fg | deno run --allow-run=ffmpeg codegen_demo.mjs
deno run --allow-run=ffmpeg codegen_demo.mjs < awesome_music.mp3
```
**Warning:** the path of the files changed on the version 2.1.0 which may affect Deno users that were importing `src/codegen_landmark.ts`, as this file is now `codegen_landmark.mts`.
## Fingerprinter options
One improvement from the [Lucas Polito](https://github.com/lpolito) fork is the ability to customize the fingerprinter options. Those are all the options available:
* `verbose`: whether to print debug information to `console.log`.
* `samplingRate`: the sampling rate of the audio input in Hz. Defaults to `22050`.
* If you change this, you must also adapt `windowDt` and `pruningDt` to match your needs.
* For more info, read the comments in the code.
* `bps`: bytes per sample. Defaults to `2` (16-bit PCM).
* Do not change this without checking the code first.
* `mnlm`: maximum number of local maxima detected in each FFT spectrum. Defaults to `5`.
* Higher values increase the number of fingerprints produced.
* `mppp`: maximum number of hashes (fingerprints) each peak can generate. Defaults to `3`.
* Useful for tuning the density of fingerprints.
* `nfft`: size of the FFT window. Defaults to `512`.
* Larger values improve *spectral precision* (frequency resolution) but reduce *temporal precision*.
* The FFT spectrum will have `nfft / 2` points.
* `step`: number of samples to advance between successive FFT windows. Defaults to `nfft / 2` (50% overlap).
* With a sampling rate of 22050 Hz, this yields ~86 windows per second (`dt ≈ 11.61 ms`).
* Typically you don’t need to change this.
* `dt`: duration of each time step in seconds.
* Defaults to `1 / (samplingRate / step)`.
* It's useful to convert `tcodes` into seconds, just multiply `tcodes` by `fingerprinter.options.dt` and you get the time in seconds, as shown in the demos.
* `hwin`: the Hann window applied to each FFT frame. Defaults to a precomputed array of size `nfft`.
* Adjusting this is rare unless experimenting with different window functions.
* `maskDecayLog`: logarithmic decay factor for the detection threshold between frames. Defaults to `Math.log(0.995)`.
* Affects how quickly old peaks become irrelevant.
* `ifMin`: minimum frequency bin (in units of `DF = samplingRate / nfft`) to consider when generating fingerprints. Defaults to `0`.
* Increase this to ignore very low frequencies.
* `ifMax`: maximum frequency bin to consider when generating fingerprints. Defaults to `nfft / 2`.
* Usually best left unchanged. To reduce processing time, lower `samplingRate` instead.
* `windowDf`: maximum allowed frequency difference between paired peaks. Defaults to `60`.
* Limits fingerprint generation to peaks within a certain frequency range of each other.
* Reducing this reduces fingerprint density.
* `windowDt`: maximum time window (in units of `dt`) for generating landmark pairs. Defaults to `96` (~1 second).
* Controls how far apart peaks can be and still generate fingerprints.
* `pruningDt`: time window (in units of `dt`) used to prune older peaks that are overshadowed by newer ones. Defaults to `24` (~250 ms).
* Also affects system latency: higher values increase latency.
* `maskDf`: decay scale of the exponential mask on the frequency axis. Defaults to `3`.
* Wider masks reduce sensitivity to small frequency variations.
* `eww`: precomputed 2D exponential mask (log-domain) matrix of size `(nfft / 2) × (nfft / 2)`. Defaults to a generated Gaussian-like mask based on `maskDf`.
* Advanced option for customizing the spectral masking behavior.
## Node.js Usage
This code also works in Node.js and is available in NPM via `npm install @qgustavor/stream-audio-fingerprint`.
The previous demo can be rewritten as this:
```javascript
import Fingerprinter from '@qgustavor/stream-audio-fingerprint'
import { spawn } from 'child_process'
const decoder = spawn('ffmpeg', [
'-i', 'pipe:0',
'-acodec', 'pcm_s16le',
'-ar', '22050',
'-ac', '1',
'-f', 's16le',
'-v', 'fatal',
'pipe:1'
])
const fingerprinter = new Fingerprinter()
const { dt } = fingerprinter.options
for await (const audioData of decoder.stdout) {
const data = fingerprinter.process(audioData)
for (let i = 0; i < data.tcodes.length; i++) {
console.log(`time=${data.tcodes[i] * dt} fingerprint=${data.hcodes[i]}`)
}
}
```
## TypeScript
This module already includes TypeScript types.
## License
See LICENSE file.