node-red-contrib-shelly
Version:
354 lines (256 loc) • 17 kB
Markdown
# Testing strategy
A concrete, executable plan for going from **0% coverage** today to **a useful test suite running on every push** — without adding heavy dependencies or breaking the existing release pipeline.
This document is meant to be **acted on** in phases. Each phase is a single PR. The aim is to make the second-most-important thing in this repo (releasing correctly) cheaper and safer.
---
## Goals
- `npm test` runs all tests locally with one command, no setup beyond `npm install`.
- Tests run automatically in CI on every push and pull request to `master`.
- Coverage report uploads as a CI artefact so it can be inspected from any failing run.
- Coverage **thresholds** enforced by CI — a regression below the threshold fails the build.
- A clear path from the first PR (small, targeted) to comprehensive coverage of the bug-prone parts of the codebase.
## Non-goals
- 100% coverage — not a useful target. Aim at the bug-prone surface.
- Integration tests against real Shelly hardware — too operationally heavy for CI. Mock the transport.
- A Node-RED end-to-end runner — Node-RED's own test helper is heavy and pulls in the full runtime. We can get most of the value with simpler unit tests against mocked `RED`.
- Bringing in jest / vitest / mocha — the built-in `node:test` runner is enough and adds zero dependencies.
---
## Tooling choice
| Concern | Tool | Why |
|---|---|---|
| Test runner | `node:test` (built-in since Node 20) | Zero dependencies. We just bumped engines to `>=20`, so it's available. Real `describe` / `it` style. Watch mode, parallelism, filters, all built in. |
| Assertions | `node:assert/strict` (built-in) | Same — no deps. Strict-mode by default is the right default. |
| Coverage | `c8` (one dev dep) | Wraps V8's native coverage. The only mainstream coverage tool that pairs cleanly with `node:test`. Tiny dependency footprint. |
| HTTP mocking | `nock` (one dev dep) | The standard for axios mocking. Lets us simulate Shelly device responses (including the digest 401-retry dance) without hitting hardware. |
| CI runner | GitHub Actions | Already in use. |
**Total new dependencies:** 2 dev deps (`c8`, `nock`). Both are mature, widely used, and small.
---
## Directory layout
```
test/
├── unit/
│ ├── utils.test.js Pure helpers from shelly/lib/utils.js
│ ├── configuration.test.js Catalog lookups
│ ├── parsers/
│ │ ├── gen1-relay.test.js
│ │ ├── gen1-dimmer.test.js
│ │ ├── gen1-rgbw.test.js
│ │ ├── gen1-roller.test.js
│ │ ├── gen1-thermostat.test.js (would have caught the scheduleProfile bug)
│ │ ├── gen1-sensor.test.js
│ │ ├── gen1-button.test.js
│ │ ├── gen1-measure.test.js
│ │ └── gen2-generic.test.js
│ ├── status-converters.test.js convertStatus1, convertStatus2
│ └── transport.test.js shellyRequestAsync + digest retry (nock)
└── helpers/
├── fake-red.js Minimal Node-RED RED-object stub
└── mock-shelly.js nock interceptors for typical device responses
```
The split into `unit/` leaves room for `integration/` later (when / if a Shelly-device simulator emerges).
---
## What to test in what order
Each phase is one PR. Each phase produces a working `npm test` that's more capable than the last.
### Phase 1 — Tooling + low-hanging fruit (reachable: ~10% coverage)
**Wire up the runner, prove it works, cover the pure helpers.**
- Add `c8` to devDependencies.
- Add `npm test` and `npm run coverage` scripts.
- Write tests for [`shelly/lib/utils.js`](../../shelly/lib/utils.js): `isMsgPayloadValid`, `isMsgPayloadValidOrArray`, `isEmpty`, `trim`, `replace`. All five are pure and trivially testable.
- Wire CI to run `npm test` on push / PR.
- Upload coverage report as a CI artefact.
**Why first:** lowest risk, highest learning value. Establishes the patterns and CI integration before any harder testing.
### Phase 2 — Catalog and status converters (reachable: ~25%)
**Test the data layer.**
- Tests for [`shelly/lib/configuration.js`](../../shelly/lib/configuration.js): `getDevice`, `getDeviceType`, `getDeviceTypes1/2`, `isExactTypeGen1/2`, `getDeviceTypeInfos`.
- Tests for the status converters (`convertStatus1`, `convertStatus2`). They're pure functions but live inside the node files — see "extraction" note below.
**Why second:** still pure functions, no I/O. Validates `config.json` shape implicitly — if a future edit breaks the catalog, tests will catch it.
### Phase 3 — Input parsers (reachable: ~50%)
**The bug-prone surface.** This is where the value-per-test ratio is highest.
- One test file per device family, asserting on the produced URL or RPC envelope.
- Cover at least:
- Default path (no command parameters → no route returned).
- Each documented input field exercised.
- **Boundary checks** — the scheduleProfile bug was an out-of-range value silently passing through. Every range guard should have a test below, at, and above the bounds.
- The `on:true/false` ↔ `turn:'on'/'off'` translation.
**Why third:** these functions are pure but live inside the device-node IIFE. They need to be reachable for testing, which couples this phase to a small extraction (see "extraction" below).
### Phase 4 — Transport (reachable: ~65%)
**Test the auth and retry logic with nock.**
- Mock a gen 1 device: GET `/relay/0?turn=on` → 200 with status JSON. Assert returned payload.
- Mock a gen 1 device with Basic auth: 401 → after retry with auth header → 200. Assert auth header value.
- Mock a gen 2 device with Digest auth: 401 with `www-authenticate` → assert the second request carries a correct `Digest` header.
- Mock a gen 2 device returning a 400 with `{"error":{"code":-103,...}}` → assert the thrown error message includes the body (the 11.10.1 improvement).
**Why fourth:** needs `nock`. Slightly more setup than pure unit tests. Catches the entire transport layer including the auth dance — which has had no tests despite being the most security-sensitive part of the codebase.
### Phase 5 — Lifecycle (reachable: ~70-75%)
**Mock the Node-RED `RED` object and exercise the constructor.**
- Use `test/helpers/fake-red.js` to provide a minimal `RED.nodes.createNode`, `RED.nodes.registerType`, `RED.nodes.getNode`, `RED.httpAdmin`, `RED.log`.
- Assert that `new ShellyGen1Node({ hostname: '...', devicetype: 'Relay', mode: 'polling' })` sets up the expected `axiosInstance`, `pollingTimer`, and event handlers.
- Test the `close` handler — assert `node.closing === true`, timers cleared, listeners removed.
**Why fifth:** needs more scaffolding than the others. By this point we know how the codebase shapes up under tests, so the harness is informed.
The remaining ~25% (callback-mode network code paths) is genuinely hard to unit-test without a device simulator. Leave it.
---
## A note on extraction
The per-family input parsers and status converters in [`gen1-node.js`](../../shelly/nodes/gen1-node.js) and [`gen2-node.js`](../../shelly/nodes/gen2-node.js) are **nested inside** the `module.exports = function(RED) { ... }` factory. They can't be imported directly:
```js
// This doesn't work today:
const { inputParserRelay1Async } = require('../../shelly/nodes/gen1-node');
// ↑ exports a factory, not the parsers
```
Two options:
**Option A — extract to standalone files** (preferred, aligns with [R1 in refactoring recommendations](../architecture/06-recommendations-for-refactoring.md#r1-extract-the-per-device-type-input-parsers-into-one-file-each-m)):
```
shelly/nodes/gen1/parsers/
├── relay.js module.exports = async function inputParserRelay1Async(msg) { ... }
├── dimmer.js
├── ...
```
These don't need the `RED` factory because they're pure. They get imported by `gen1-node.js` _and_ by tests.
**Option B — export-for-testing hook** (faster, uglier):
```js
// At the bottom of gen1-node.js, before the factory returns:
module.exports.__test = { inputParserRelay1Async, inputParserDimmer1Async, ... };
```
Tests reach in via `require('../../shelly/nodes/gen1-node').__test.inputParserRelay1Async`. Convention-marked as a testing escape hatch.
**Recommendation:** Option A. The refactor is small, lasting, and makes the rest of the codebase easier to read. Option B is technical debt deliberately incurred to avoid a refactor that pays for itself.
This proposal **assumes Option A** for Phases 3-5. Phase 1 and 2 (utils, config, status converters) don't need any extraction — they're already in importable files.
---
## Concrete first-PR shape (Phase 1)
### `package.json` changes
```jsonc
{
"scripts": {
"prepare": "husky install",
"lint": "eslint shelly --ext .js --format unix --ignore-pattern scripts",
"postlint": "echo ✅ lint valid",
"test": "node --test test/unit",
"test:watch": "node --test --watch test/unit",
"coverage": "c8 --reporter=text --reporter=html --reporter=lcov node --test test/unit",
"coverage:check": "c8 --check-coverage --lines 10 --functions 10 --branches 8 node --test test/unit"
},
"devDependencies": {
"c8": "^10.1.3",
"eslint": "^8.20.0",
"eslint-config-prettier": "^8.5.0",
"eslint-plugin-prettier": "^4.2.1",
"husky": "^8.0.1",
"prettier": "^2.7.1"
}
}
```
Thresholds start **deliberately low** in Phase 1 (10% lines / 10% functions / 8% branches). Each phase ratchets them upward.
### `test/unit/utils.test.js` (example, fully working)
```js
const { describe, it } = require('node:test');
const assert = require('node:assert/strict');
const utils = require('../../shelly/lib/utils.js');
describe('utils.isMsgPayloadValid', () => {
it('rejects undefined msg', () => {
assert.equal(utils.isMsgPayloadValid(undefined), false);
});
it('rejects msg without payload', () => {
assert.equal(utils.isMsgPayloadValid({}), false);
});
it('rejects array as msg', () => {
assert.equal(utils.isMsgPayloadValid([{ payload: { a: 1 } }]), false);
});
it('rejects empty payload object', () => {
assert.equal(utils.isMsgPayloadValid({ payload: {} }), false);
});
it('rejects array payload', () => {
assert.equal(utils.isMsgPayloadValid({ payload: [1, 2] }), false);
});
it('accepts non-empty object payload', () => {
assert.equal(utils.isMsgPayloadValid({ payload: { relay: 0, on: true } }), true);
});
});
describe('utils.trim', () => {
it('returns undefined for undefined input', () => {
assert.equal(utils.trim(undefined), undefined);
});
it('returns undefined for empty string', () => {
assert.equal(utils.trim(''), undefined);
});
it('strips leading and trailing whitespace', () => {
assert.equal(utils.trim(' hello '), 'hello');
});
});
describe('utils.replace', () => {
it('returns undefined for falsy input', () => {
assert.equal(utils.replace(undefined, /x/g, 'y'), undefined);
assert.equal(utils.replace('', /x/g, 'y'), undefined);
});
it('does global regex replace', () => {
assert.equal(utils.replace('a"b"c', /"/g, ''), 'abc');
});
it('does string-literal replace', () => {
assert.equal(utils.replace('hello %URL% world', '%URL%', 'http://x'), 'hello http://x world');
});
});
```
That's roughly 50 lines of tests covering ~80% of `utils.js`. Phase 1 is mostly this kind of work.
### CI workflow changes
Add a `test` step to [`.github/workflows/node.js.yml`](../../.github/workflows/node.js.yml):
```yaml
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [20.x, 22.x]
steps:
- uses: actions/checkout@v4
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
- run: npm ci
- run: npm run lint
- run: npm test # ← new
- run: npm run coverage:check # ← new (gates on the threshold)
- name: Upload coverage report # ← new
if: matrix.node-version == '22.x'
uses: actions/upload-artifact@v4
with:
name: coverage-report
path: coverage/
retention-days: 14
```
`coverage:check` failing breaks the build. The HTML coverage report is uploaded as a CI artefact — downloadable from any run, useful for inspecting what's covered without running locally.
### Threshold ratchet plan
Each phase raises the thresholds in `coverage:check`:
Actual progression once executed (matches the floors set in `package.json`'s `c8` block at each phase):
| Phase | Lines | Functions | Branches | What was added | Tests |
|---:|---:|---:|---:|---|---:|
| 1 | 5 | 10 | 5 | `lib/utils.js` + `lib/configuration.js`. | 48 |
| 2 | 10 | 50 | 80 | + status converters extracted to `shelly/nodes/gen{1,2}/`. | 73 |
| 3 | 28 | 73 | 87 | + 7 per-family gen1 parsers extracted to `shelly/nodes/gen1/parsers/`, gen2 generic parser to `shelly/nodes/gen2/parsers/`. The 1342-line `gen1-node.js` shrinks to 696 LOC. | 153 |
| 4 | 35 | 70 | 88 | + transport tests (`shellyRequestAsync` with nock, digest 401-retry, error-body enrichment) and `getCredentials` / `getShellyInfo`. Functions floor dipped because `shelly.js` (12 functions) entered the tested-files denominator with only 7 covered. | 175 |
| 5 | 40 | 75 | 88 | + lifecycle: `shellyPing`, `tryCheckDeviceType`, `start` against nock + a fake-node harness. `shelly.js` went from 49% → 82% lines. | 193 |
The remaining uncovered ~58% lives mostly in the four files that need a fake Node-RED `RED` object to exercise (`99-shelly.js`, `cloud-node.js`, `gen1-node.js`, `gen2-node.js` — all at 0%). Their core logic — the input parsers, status converters, transport, polling lifecycle — is already tested in isolation. What remains uncovered in those files is mostly wiring (constructor field assignment, `RED.nodes.createNode`, event-handler registration, the EM-data download side path in `inputParserMeasure1Async`). Adding fake-RED tests for the constructors would lift line coverage roughly to 55-60% but with much lower value-per-test than Phases 1-5; left as future work.
Note: line / function / branch percentages don't move in lockstep. Branch and function percentages spike early because the small tested files are well-branched; line percentage moves slowly until the large `gen1-node.js` and `gen2-node.js` get split (Phase 3); and the function metric can dip when a new larger file enters the tested set (Phase 4) before its functions get covered (Phase 5).
Each phase's PR bumps the previous floor. CI then prevents regressions back below it.
---
## Optional: Codecov integration
If you want a coverage badge on the README and per-PR coverage delta comments:
```yaml
- name: Upload coverage to Codecov
if: matrix.node-version == '22.x'
uses: codecov/codecov-action@v4
with:
files: ./coverage/lcov.info
fail_ci_if_error: false
```
Requires signing up at [codecov.io](https://about.codecov.io/) and adding a `CODECOV_TOKEN` repo secret. Optional and additive — the local thresholds in `coverage:check` are the real gate.
---
## Why this beats the status quo
- **The 11.9.x series shipped four real logic bugs** to a published release: the missing-await in `start()`, the `urls[i]→urls[j]` index swap, the missing-await in RGBW init, and the always-true `scheduleProfile` range guard. Three of the four are pure-function bugs that would have been caught by a 5-line test in Phases 1-3.
- **Husky is now wired** (since 11.9.4) but only runs `npm run lint`. Once `npm test` exists, adding it to the pre-commit hook is a one-line change — every local commit then runs the suite.
- The maintainer's current QA strategy ("I run it against my home fleet") doesn't scale and isn't replicable by contributors. Tests give every PR the same baseline check.
---
## What I'd suggest doing first
Phase 1 only — it's small, low-risk, and you can see exactly what the developer ergonomics feel like before committing to the broader plan. Once you've used `npm test` locally a few times you'll know whether the runner choice fits your style. Subsequent phases follow the same pattern, so the first one is the highest-information one.
If you want, I can implement Phase 1 as a single PR-shaped commit:
- Add `c8` to devDeps and the four new npm scripts.
- Write the `test/unit/utils.test.js` shown above plus one more file for `configuration.js` to prove the catalog-lookup pattern.
- Update the CI workflow with the test + coverage-check + artefact-upload steps.
- Wire `npm test` into the husky pre-commit hook.
- Land at ~12-15% coverage with thresholds set at 10%.
Say the word and I'll send it.