fp16
Version:
Half-precision 16-bit floating point numbers
101 lines (67 loc) • 4.98 kB
Markdown
# fp16
[](https://github.com/RichardLitt/standard-readme) [](https://opensource.org/licenses/MIT) [](https://www.npmjs.com/package/fp16) 
Half-precision 16-bit floating point numbers.
[`DataView`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DataView) has APIs for getting and setting [float64s](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) and [float32s](https://en.wikipedia.org/wiki/Single-precision_floating-point_format). This library provides the analogous methods for [float16s](https://en.wikipedia.org/wiki/Half-precision_floating-point_format), and utilities for testing how a given float64 value will convert to float32 and float16 values. Conversion implements the IEEE 754 default rounding behavior ("Round-to-Nearest RoundTiesToEven").
`NaN` is always encoded as `0x7e00`, which extends the pattern of how browsers serialize `NaN` in 32 bits and is the recommendation [in the CBOR spec](https://www.rfc-editor.org/rfc/rfc8949.html#name-deterministically-encoded-c).
This library is TypeScript-native, ESM-only, and has zero dependencies. It works in Node, the browser, and Deno.
## Table of Contents
- [Install](#install)
- [Usage](#usage)
- [Testing](#testing)
- [Credits](#credits)
- [Contributing](#contributing)
- [License](#license)
## Install
```
npm i fp16
```
## Usage
### Set a 16-bit float
```typescript
declare function setFloat16(
view: DataView,
offset: number,
value: number,
littleEndian?: boolean,
): void
```
### Get a 16-bit float
```typescript
declare function getFloat16(
view: DataView,
offset: number,
littleEndian?: boolean,
): number
```
### Precision
In addition to methods for getting and setting float16s, `fp16` exports two methods for testing how a given `number` value will convert to 32-bit and 16-bit values.
```typescript
export const Precision = {
Exact: 0,
Inexact: 1,
Underflow: 2,
Overflow: 3,
} as const
export type Precision = typeof Precision[keyof typeof Precision]
declare function getFloat32Precision(value: number): Precision
declare function getFloat16Precision(value: number): Precision
```
- `Precision.Exact`: Conversion will not loose precision. The value is guaranteed to round-trip back to the same `number` value. Positive and negative zero, positive and negative infinity, and `NaN` all return `exact`. **Values that can be represented losslessly as a subnormal value in the target format will return `exact`.**
- `Precision.Overflow`: the exponent of the given value is greater than the maximum exponent of the target size (`127` for float32 or `15` for float16). Conversion is guaranteed to overflow to +/- Infinity.
- `Precision.Underflow`: the exponent of the given value is less than the minimum exponent _minus the number of fractional bits_ of the target size (`-126 - 23` for float32 or `-14 - 10` for float16). Conversion is guaranteed to underflow to +/- 0 **or** to the smallest signed subnormal value (`+/- 2^-24` for float16 or `+/- 2^-149` for float32).
- `Precision.Inexact`: the exponent is within the target range, but precision bits will be lost during rounding. The value may round to +/- 0 but will never round to +/- Infinity.
Note that the boundaries for overflow and underflow are not what you might necessarily expect; this is because values with exponents just under the minimum exponent for a format map to subnormal values.
Also note that `fp16` treats all NaN values as identical, ignoring sign and signalling bits when decoding, and encoding every `NaN` value as `0x7e00`. This means that not all 16-bit values will round-trip through `setFloat16` and `getFloat16`.
## Testing
Tests use [AVA](https://github.com/avajs/ava) and live in the [test](./test/) directory.
```
npm run test
```
Tests cover decoding all 65536 possible 16-bit values, rounding behaviour, subnormal values, underflows, and overflows. More tests are always welcome.
## Credits
[This PDF](http://fox-toolkit.org/ftp/fasthalffloatconversion.pdf) was extremely helpful as a reference for understanding the float16 format, even though `fp16` doesn't use the table-based aproach it outlines.
The Golang [github.com/x448/float16](https://github.com/x448/float16) package was used as a reference for implementing rounding. The test suite in [tests/32to16.js](./test/32to16.js) was adapted from its test file [float16_test.go](https://github.com/x448/float16/blob/master/float16_test.go).
## Contributing
I don't expect to add any additional features to this library, or change any of the exported interfaces. If you encounter a bug or would like to add more tests, please open an issue to discuss it!
## License
MIT © 2021 Joel Gustafson