ngraph.leiden
Version:
Leiden/Louvain community detection for ngraph.graph (JS)
302 lines (222 loc) • 11.4 kB
Markdown
Leiden/Louvain community detection for ngraph.graph. Fast, deterministic (seeded), and flexible: undirected/directed modularity, CPM with resolution, multilayer aggregation, fixed nodes, and custom weights/sizes.
Works with ESM and CommonJS. Built as a Vite library; tested with Vitest.
Install the published package in your project:
```sh
npm i ngraph.leiden
```
This repository uses Node 18+ for development. To work on the repo locally:
```sh
npm install
npm test
npm run build
```
```js
import createGraph from 'ngraph.graph'
import { detectClusters } from 'ngraph.leiden'
const g = createGraph()
// Undirected graphs: just add a single link; adapter will symmetrize for you
g.addNode('a'); g.addNode('b'); g.addNode('c'); g.addNode('d')
g.addLink('a','b')
g.addLink('c','d')
const result = detectClusters(g, { randomSeed: 42 })
// Get a community id for a node
console.log('a ->', result.getClass('a'))
// Group membership as Map<communityId, nodeId[]>
console.log(result.getCommunities())
// Partition quality for the chosen objective
console.log('Q =', result.quality())
```
CommonJS:
```js
const createGraph = require('ngraph.graph')
const { detectClusters } = require('ngraph.leiden')
```
Signature
- `detectClusters(input, options?) => Clusters`
Input
- `input`: either
- an `ngraph.graph` instance; or
- a multilayer array: `[{ graph, weight?, linkWeight?, nodeSize? }]` — all layers must share the same node ids (see Multilayer graphs).
Return value: `Clusters`
- `getClass(nodeId): number` — community id for `nodeId`.
- `getCommunities(): Map<number, string[]>` — nodes grouped by community id.
- `quality(): number` — objective value for the final partition. With `quality: 'cpm'`, see CPM on how it’s computed and `options.cpmMode`.
- `toJSON(): { membership: Record<id, number>, meta: { levels: number, quality: number, options: object } }`.
Options
Each option below is optional. Defaults are shown in backticks.
- What: Objective to optimize.
- Values: `'modularity'` | `'cpm'`. Default: `'modularity'`.
- Why: Use `'modularity'` for classic community detection; use `'cpm'` (Constant Potts Model) when you need explicit control over community granularity via `resolution` or when modularity’s resolution limit is an issue.
- When: Prefer `'cpm'` for very large graphs or when you want consistent small communities; otherwise `'modularity'` is a strong default.
- What: The `gamma` parameter for CPM.
- Value: `number`. Default: `1.0`.
- Why: Larger `resolution` yields more/smaller communities; smaller yields fewer/larger.
- When: Tune to match desired granularity; try `0.1`, `1.0`, `5.0` as starting points.
- What: Enable Leicht–Newman directed modularity.
- Value: `boolean`. Default: `false`.
- Why: For directed networks where in/out strengths matter.
- When: Set `true` for directed graphs. For undirected, keep `false` and add reciprocal links in your graph.
- What: Seed for the internal RNG used for node order shuffles.
- Value: `number`. Default: `42`.
- Why: Ensures deterministic, reproducible partitions.
- When: Set explicitly for reproducible experiments or tests.
- What: Controls which target communities are considered when moving a node.
- Values: `'neighbors'` (default), `'all'`, `'random'`, `'random-neighbor'`.
- Why: Trade-off between speed and search breadth.
- When:
- `'neighbors'`: fastest and typical; consider only adjacent communities.
- `'all'`: exhaustive but slow on large graphs.
- `'random'`: sample from all communities (broader search at bounded cost).
- `'random-neighbor'`: sample from neighbor communities (cheap, slightly more exploratory).
### allowNewCommunity
- What: Allow moves that create a fresh singleton community.
- Value: `boolean`. Default: `false`.
- Why: Can help escape local minima in specific structures.
- When: Rarely needed; try enabling if you see over-merged communities.
- What: Upper bound on a community’s total size.
- Value: `number`. Default: `Infinity`.
- How measured: Uses `nodeSize` (default `1` per node). A move that would exceed this bound is skipped.
- Why: Enforce capacity or balance constraints.
- When: Useful for constrained clustering or to avoid giant communities.
- What: Enable Leiden-style refinement between coarsening levels.
- Value: `boolean`. Default: `true`.
- Why: Improves partitions by re-optimizing within coarse communities.
- When: Keep `true` for best quality; turn off for speed-sensitive runs.
- What: Nodes that must remain in their initial communities at the finest level.
- Value: `Set` or `Array` of node ids.
- Why: Respect domain constraints or anchor communities.
- When: Use to pin landmarks, seeds, or known group members.
- What: Control how community ids are compacted after renumbering.
- Values: `false` (default), `true` (preserve old order), or `Map<oldId, order>`.
- Why: Stable ids across runs or alignment to a predefined ordering.
- When: Advanced; mostly for downstream integration/visualization.
- What: Function to read an edge’s weight.
- Signature: `(link) => number`. Default: `link.data?.weight ?? 1`.
- Why: Use custom attributes (e.g., frequency, strength) as weights.
- Example:
```js
const res = detectClusters(g, { linkWeight: l => Math.max(0, l.data?.w ?? 0) })
```
- What: Function to read a node’s size (used by CPM and `maxCommunitySize`).
- Signature: `(node) => number`. Default: `node.data?.size ?? 1`.
- Why: Make CPM size-aware or weight capacity by node importance.
- Example:
```js
const res = detectClusters(g, { nodeSize: n => n.data?.pop ?? 1, quality: 'cpm' })
```
- What: Internal performance knobs for the multi-level loop and local passes.
- Values: `number`. Defaults are conservative and usually fine.
- Why/When: Only tune if you profile and identify a need.
- Modularity: Measures how much more densely connected nodes are within communities than expected at random. Directed modularity (Leicht–Newman) uses in/out strengths. Higher is better; can suffer from a resolution limit on very large graphs.
- CPM (Constant Potts Model): Maximizes internal edge weight minus gamma × a penalty for community size. Gamma (resolution) controls granularity: larger gamma → more/smaller communities; smaller gamma → fewer/larger. Supports custom node sizes via nodeSize.
CPM is a resolution-tunable objective. During optimization this package uses node sizes (via nodeSize), so gains reflect community size; with default nodeSize=1 this matches “unit-count” CPM. If you supply custom node sizes, the optimization becomes size-aware. The quality() reporter supports:
- options.cpmMode: 'unit' | 'size-aware' (affects only how quality() is computed, not the move heuristic). If you use custom node sizes and want "unit-count" reporting, keep cpmMode='unit'.
Pass an array of layers: [{ graph, weight?, linkWeight?, nodeSize? }]. Edges are aggregated by summing weight * linkWeight(link) per layer. All layers must have the same set of node ids. Node sizes default to the first layer’s nodeSize.
Example:
```js
const result = detectClusters([
{ graph: layer1, weight: 1.0 },
{ graph: layer2, weight: 0.2 },
], { quality: 'modularity', randomSeed: 7 })
```
When to use it
- Multiplex networks: combine different relation types (e.g., friendship + collaboration).
- Temporal smoothing: blend snapshots with weights to reduce short-term noise.
- Heterogeneous signals: fuse a strong–sparse layer with a weak–dense layer.
- Denoising: add a lightly weighted prior layer to stabilize communities.
## Directed graphs
Set { directed: true } to use Leicht–Newman directed modularity. Edges are taken as-is; no reciprocal links are needed.
Undirected handling
- If directed: false (default), the adapter symmetrizes edges so you don’t need to add reciprocal links to your ngraph.graph.
- If both directions exist between two nodes with possibly different weights, the adapter averages them: w_undirected = (w_ij + w_ji) / 2, then emits symmetric edges with that weight. This preserves total weight and keeps results consistent whether you supplied one or both directions.
- Fixed nodes: keep given nodes in their initial communities at the finest level (also respected during refinement).
- Community size limit: maxCommunitySize prevents moves that would exceed the given total nodeSize.
- Determinism: randomSeed controls shuffle order; repeated runs with the same seed and inputs are deterministic.
- Negative weights and self-loops are supported; use negative weights with care as modularity assumptions may not hold theoretically.
Modularity with custom weights
```js
const g = createGraph()
g.addNode('x'); g.addNode('y'); g.addNode('z')
g.addLink('x','y', { weight: 2 }); g.addLink('y','x', { weight: 2 })
g.addLink('y','z', { weight: 0.3 }); g.addLink('z','y', { weight: 0.3 })
const res = detectClusters(g, { quality: 'modularity', randomSeed: 1, linkWeight: l => l.data?.weight ?? 1 })
```
CPM with resolution and node sizes
```js
const res = detectClusters(g, {
quality: 'cpm',
resolution: 0.5,
nodeSize: n => n.data?.size ?? 1,
randomSeed: 3,
})
```
Fix a subset of nodes
```js
const res = detectClusters(g, { fixedNodes: new Set(['x','z']), randomSeed: 11, refine: true })
```
- Build library bundles: `npm run build`
- Run tests: `npm test`
MIT © Andrei Kashcha
This package ships a small CLI for quick community detection (via npx or installed locally).
Input format is auto-detected by file extension (.dot/.gv/.json) or by content (tries JSON first, then DOT). Use --format to override.
Usage examples
- From a DOT file
```sh
npx ngraph.leiden --in graph.dot --out membership.json
```
- From stdin
```sh
cat graph.dot | npx ngraph.leiden --membership-only
```
- From a JSON edgelist (either an array of {source,target,weight?} or {nodes,links})
```sh
cat edges.json | npx ngraph.leiden > out.json
```
Options
- `--directed` — treat input as directed
- `--quality` modularity|cpm (default modularity)
- `--resolution <gamma>` — CPM resolution parameter
- `--candidate-strategy` neighbors|all|random|random-neighbor
- `--max-levels`, `--max-local-passes`, `--random-seed`
- `--max-community-size <num>`
- `--allow-new-community`, `--no-refine`
- `--fixed <file>` — newline-separated node ids to keep fixed at level 0
- `--membership-only` — print only the id->community map
Output
- Defaults to JSON on stdout; use `--out <file>` to write to a file.
- Formats:
- JSON (default): full object `{membership, meta}` or mapping only with `--membership-only`.
- CSV: `--out-format csv` prints header `nodeId,communityId`.
- DOT: `--out-format dot` overlays `community="..."` on nodes using ngraph.todot.
Quality-only evaluation
Evaluate modularity/CPM for an existing membership mapping without running detection:
```sh
npx ngraph.leiden \
--in graph.dot --format dot \
--evaluate --membership membership.json \
--quality modularity
```