yolo-helpers
Version:
Helper functions to use models converted from YOLO in browser and Node.js
518 lines (434 loc) • 13.4 kB
Markdown
# yolo-helpers
Helper functions to use models converted from YOLO in browser and Node.js.
[](https://www.npmjs.com/package/yolo-helpers)
[](https://bundlephobia.com/package/yolo-helpers)
[](https://bundlephobia.com/package/yolo-helpers)
## Features
- Support for YOLO models:
- [Image Classification](https://docs.ultralytics.com/tasks/classify/) (classify images into one of the given classes)
- [Object Detection](https://docs.ultralytics.com/tasks/detect/) (detect objects and location of bounding boxes)
- [Keypoint Detection](https://docs.ultralytics.com/tasks/pose/) (detect objects and location of keypoints)
- [Instance Segmentation](https://docs.ultralytics.com/tasks/segment/) (detect objects and generate segmentation masks)
- Typescript support
- Isomorphic package: works in Node.js and browsers
## Installation
```bash
npm install yolo-helpers
```
You can also install `yolo-helpers` with [pnpm](https://pnpm.io/), [yarn](https://yarnpkg.com/), or [slnpm](https://github.com/beenotung/slnpm)
## Usage Examples
For complete examples, see [examples/browser/app.ts](./examples/browser/app.ts) and [examples/nodejs/test.ts](./examples/nodejs/test.ts)
### Browser
```typescript
import * as tf from '/tfjs'
import { detectPose, loadYoloModel } from 'yolo-helpers/dist/browser'
async function main() {
// Load the YOLO model
const model = await loadYoloModel('url/to/yolo11n-pose_web_model')
// Get image element
const image = document.querySelector('img')!
// Detect poses in the image element
const predictions = await detectPose({
tf,
model,
pixels: image,
maxOutputSize: 1,
num_classes: 1,
num_keypoints: 17,
})
// predictions[0] contains array of detected poses with bounding boxes and keypoints
console.log(predictions[0])
}
```
### Node.js
```typescript
import * as tf from '/tfjs-node'
import { detectPose, loadYoloModel } from 'yolo-helpers'
async function main() {
// Load the YOLO model
const model = await loadYoloModel('path/to/yolo11n-pose_web_model')
// Detect poses in an image file
const predictions = await detectPose({
tf,
model,
file: 'path/to/image.jpg',
maxOutputSize: 1,
num_classes: 1,
num_keypoints: 17,
})
// predictions[0] contains array of detected poses with bounding boxes and keypoints
console.log(predictions[0])
}
```
### Remark on Inference Speed
Sync version of `detectPose` is available as `detectPoseSync` but it is not recommended.
The sync version is slower than the async version even when running in the browser/nodejs without other concurrent tasks.
If you want to speed up the inference, export the model with smaller `imgsz` (e.g. `imgsz=[256,320]` for 256px height, 320px width).
It's similar case for `detectSegment`, `detectBox` and `classifyImage`.
## Typescript Signature
### Model Functions
```typescript
function loadYoloModel(modelUrl: string): Promise<
tf.InferenceModel & {
class_names?: string[]
}
>
```
The `modelUrl` can be with or without `/model.json`.
<details>
<summary> Example `modelUrl` for browser: </summary>
- `./saved_model/yolo11n-pose_web_model/model.json`
- `./saved_model/yolo11n-pose_web_model`
- `http://localhost:8100/saved_models/yolo11n-pose_web_model/model.json`
- `https://domain.net/saved_models/yolo11n-pose_web_model`
- `indexeddb://yolo11n-pose_web_model`
</details>
<details>
<summary> Example `modelUrl` for node.js: </summary>
- `./saved_model/yolo11n-pose_web_model/model.json`
- `./saved_model/yolo11n-pose_web_model`
- `file://path/to/model.json`
- `http://localhost:8100/saved_models/yolo11n-pose_web_model`
- `https://domain.net/saved_models/yolo11n-pose_web_model/model.json`
</details>
### Detection Functions and Types
<details>
<summary> classifyImage() </summary>
```typescript
/**
* image features:
* - confidence of all classes
* - highest confidence, class_index
*
* The confidence are already normalized between 0 to 1, and sum up to 1.
*/
function classifyImage(args: ClassifyArgs): Promise<ClassifyResult>
/**
* output shape: [batch]
*
* Array of batches, each containing array of confidence for each classes
* */
type ClassifyResult = ImageResult[]
type ImageResult = {
/** class index with highest confidence */
class_index: number
/** confidence of the class with highest confidence */
confidence: number
/** confidence of all classes */
all_confidences: number[]
}
type ClassifyArgs = {
model: tf.InferenceModel
/** used for image resize when necessary, auto inferred from model shape */
input_shape?: {
width: number
height: number
}
/** e.g. `1` for single class */
num_classes: number
} & ImageInput
```
</details>
<details>
<summary> detectBox() </summary>
```typescript
/**
* box features:
* - x, y, width, height
* - highest confidence, class_index
*
* The x, y, width, height are in pixel unit, NOT normalized in the range of [0, 1].
* The the pixel units are scaled to the input_shape.
*
* The confidence are already normalized between 0 to 1.
*/
function detectBox(args: DetectBoxArgs): Promise<BoxResult>
/**
* output shape: [batch, box]
*
* Array of batches, each containing array of detected bounding boxes
* */
type BoxResult = BoundingBox[][]
type BoundingBox = {
/** center x of bounding box in px */
x: number
/** center y of bounding box in px */
y: number
/** width of bounding box in px */
width: number
/** height of bounding box in px */
height: number
/** class index with highest confidence */
class_index: number
/** confidence of the class with highest confidence */
confidence: number
/** confidence of all classes */
all_confidences: number[]
}
type DetectBoxArgs = {
model: tf.InferenceModel
/** used for image resize when necessary, auto inferred from model shape */
input_shape?: {
width: number
height: number
}
/**
* tensorflow runtime:
* - browser: `import * as tf from '@tensorflow/tfjs'`
* - nodejs: `import * as tf from '@tensorflow/tfjs-node'`
*/
tf: typeof tf_type
/** e.g. `1` for single class */
num_classes: number
/**
* Number of boxes to return using non-max suppression.
* If not provided, all boxes will be returned
*
* e.g. `1` for only selecting the bounding box with highest confidence.
*/
maxOutputSize?: number
/**
* the threshold for deciding whether boxes overlap too much with respect to IOU.
*
* default: `0.5`
*/
iouThreshold?: number
/**
* the threshold for deciding whether a box is a valid detection.
*
* default: `-Infinity`
*/
scoreThreshold?: number
} & ImageInput
```
</details>
<details>
<summary> detectPose() </summary>
```typescript
/**
* box features:
* - x, y, width, height
* - highest confidence, class_index
* - keypoints
*
* keypoint features:
* - x, y, visibility
*
* The x, y, width, height are in pixel unit, NOT normalized in the range of [0, 1].
* The the pixel units are scaled to the input_shape.
*
* The confidence are already normalized between 0 to 1.
*/
function detectPose(args: DetectPoseArgs & ImageInput): Promise<PoseResult>
/**
* output shape: [batch, box]
*
* Array of batches, each containing array of detected bounding boxes
* */
type PoseResult = BoundingBoxWithKeypoints[][]
type BoundingBoxWithKeypoints = BoundingBox & {
keypoints: Keypoint[]
}
type Keypoint = {
/** x of keypoint in px */
x: number
/** y of keypoint in px */
y: number
/** confidence of keypoint */
visibility: number
}
type DetectPoseArgs = {
model: tf.InferenceModel
/** used for image resize when necessary, auto inferred from model shape */
input_shape?: {
width: number
height: number
}
/**
* tensorflow runtime:
* - browser: `import * as tf from '@tensorflow/tfjs'`
* - nodejs: `import * as tf from '@tensorflow/tfjs-node'`
*/
tf: typeof tf_type
/** e.g. `1` for single class */
num_classes: number
/** e.g. `17` for 17 keypoints */
num_keypoints: number
/**
* Number of boxes to return using non-max suppression.
* If not provided, all boxes will be returned
*
* e.g. `1` for only selecting the bounding box with highest confidence.
*/
maxOutputSize: number
/**
* the threshold for deciding whether boxes overlap too much with respect to IOU.
*
* default: `0.5`
*/
iouThreshold?: number
/**
* the threshold for deciding whether a box is a valid detection.
*
* default: `-Infinity`
*/
scoreThreshold?: number
} & ImageInput
```
</details>
<details>
<summary> detectSegment() </summary>
```typescript
/**
* boxes features:
* - x, y, width, height
* - highest confidence, class_index
* - mask coefficients for each channel
*
* mask features:
* - [height, width, channel]: 0 for background, 1 for object
*
* The x, y, width, height are in pixel unit, NOT normalized in the range of [0, 1].
* The the pixel units are scaled to the input_shape.
*
* The confidence are already normalized between 0 to 1.
*/
function detectSegment(args: DetectSegmentArgs): Promise<SegmentResult>
/**
* output shape: [batch, box]
*
* Array of batches, each containing array of detected bounding boxes with masks coefficients and masks
* */
type SegmentResult = {
bounding_boxes: BoundingBoxWithMaskCoefficients[]
/** e.g. [mask_height, mask_width, 32] for 32 channels of masks */
masks: Mask[]
}[]
type BoundingBoxWithMaskCoefficients = BoundingBox & {
/** 32 coefficients of mask */
mask_coefficients: number[]
}
/** [height, width, num_channels] -> 0 for background, 1 for object */
type Mask = number[][]
type DetectSegmentArgs = {
model: tf.InferenceModel
/** used for image resize when necessary, auto inferred from model shape */
input_shape?: {
width: number
height: number
}
/**
* tensorflow runtime:
* - browser: `import * as tf from '@tensorflow/tfjs'`
* - nodejs: `import * as tf from '@tensorflow/tfjs-node'`
*/
tf: typeof tf_type
/** e.g. `1` for single class */
num_classes: number
/**
* Number of channels in segmentation mask
* default: `32`
*/
num_channels?: number
/**
* Number of boxes to return using non-max suppression.
* If not provided, all boxes will be returned
*
* e.g. `1` for only selecting the bounding box with highest confidence.
*/
maxOutputSize?: number
/**
* the threshold for deciding whether boxes overlap too much with respect to IOU.
*
* default: `0.5`
*/
iouThreshold?: number
/**
* the threshold for deciding whether a box is a valid detection.
*
* default: `-Infinity`
*/
scoreThreshold?: number
} & ImageInput
```
</details>
<details>
<summary> ImageInput type for browser </summary>
```typescript
type ImageInput =
| {
pixels:
| PixelData
| ImageData
| HTMLImageElement
| HTMLCanvasElement
| HTMLVideoElement
| ImageBitmap
}
| {
/**
* input shape: [height, width, channels] or [batch, height, width, channels]
*
* the pixel values should be in the range of [0, 255]
*/
tensor: tf.Tensor
}
```
</details>
<details>
<summary> ImageInput type for node.js </summary>
```typescript
type ImageInput =
| {
/** path to image file */
file: string
}
| {
/**
* input shape: [height, width, channels] or [batch, height, width, channels]
*
* the pixel values should be in the range of [0, 255]
*/
tensor: tf.Tensor
}
```
</details>
### Helper Functions for drawing
<details>
<summary> drawBox() </summary>
```typescript
function drawBox(args: {
/** canvas context to draw on */
context: CanvasRenderingContext2D
/** x-axis of the center of the box, in pixel unit */
x: number
/** y-axis of the center of the box, in pixel unit */
y: number
/** width of the box, in pixel unit */
width: number
/** height of the box, in pixel unit */
height: number
/** color of the border of the box, default is `red` */
borderColor?: string
/** line width of the box, in pixel unit, default is 5px */
lineWidth?: number
/** label of the box, e.g. class name, confidence score, etc. */
label?: {
text: string
/** color of the text label, default is `'white'` */
fontColor?: string
/** background color of the text label, default is `'transparent'` */
backgroundColor?: string
/** font style of the text label, default is `'normal 900 14px Arial, sans-serif'` */
font?: string
}
}): void
```
</details>
## License
This project is licensed with [BSD-2-Clause](./LICENSE)
This is free, libre, and open-source software. It comes down to four essential freedoms [[ref]](https://seirdy.one/2021/01/27/whatsapp-and-the-domestication-of-users.html#fnref:2):
- The freedom to run the program as you wish, for any purpose
- The freedom to study how the program works, and change it so it does your computing as you wish
- The freedom to redistribute copies so you can help others
- The freedom to distribute copies of your modified versions to others