zod-enum-forge
Version:
Tiny helpers to extend Zod enums for open-set/iterative classification workflows.
655 lines (497 loc) • 20 kB
Markdown
# zod-enum-forge
<div align="center">
<img src="https://kybernetes.ngo/wp-content/uploads/2022/10/Logo-Kybernetes.png" alt="Kybernetes Logo" width="200"/>
<br/>
<em>Institute of Socio-Political Technologies "Kybernetes"</em>
<br/>
<a href="https://kybernetes.ngo">🌐 kybernetes.ngo</a>
</div>
<p align="center">
<a href="https://www.npmjs.com/package/zod-enum-forge">
<img src="https://img.shields.io/npm/v/zod-enum-forge.svg" alt="npm version">
</a>
<a href="https://github.com/itsp-kybernetes/zod-enum-forge/blob/master/LICENSE">
<img src="https://img.shields.io/badge/license-BSD--2--Clause-blue.svg" alt="License">
</a>
</p>
Tiny helpers to extend Zod enums for open-set/iterative classification workflows.
## Overview
`zod-enum-forge` provides utilities to dynamically extend Zod enums, making them "flexible" for scenarios where you need to handle unknown values in iterative data processing workflows, such as LLM-based classification tasks.
## Features
- 🔧 **Flexible Enums**: Create enums that can accept unknown values while preserving type safety
- 🔄 **Dynamic Schema Updates**: Automatically extend schemas based on incoming data
- 🌐 **Multi-level Support**: Handle deeply nested objects with flexible enums
- 🔗 **Universal Zod Compatibility**: Works seamlessly with both Zod v3 and v4
- 📦 **Zero Configuration**: Automatic version detection and compatibility layer
- ⚡ **Lightweight**: Minimal dependencies, focused functionality
- 🛡️ **Type Safe**: Full TypeScript support with proper type inference
## Installation
```bash
npm install zod-enum-forge
```
**Requirements:**
- Node.js 16+
- Zod v3.25.0+ or v4.0.0+
The library automatically detects and works with both Zod v3 and v4 - no configuration needed.
## Quick Start
```typescript
import { z } from 'zod';
import { flexEnum, forgeEnum } from 'zod-enum-forge';
// Create a flexible enum that can accept unknown values
const statusEnum = flexEnum(['pending', 'done']);
// Or from existing Zod enum
const baseEnum = z.enum(['a', 'b']);
const flexibleEnum = flexEnum(baseEnum);
// Extend an enum with new values
const extendedEnum = forgeEnum(['pending', 'done'], 'archived');
// Dynamic schema updates based on data
const schema = z.object({
status: flexEnum(['pending', 'done']),
category: z.enum(['urgent', 'normal'])
});
const data = {
status: 'in_progress', // New value!
category: 'urgent'
};
// Schema automatically extends to include new values
const updatedSchema = flexEnum(schema, data);
```
## API Reference
### `flexEnum`
Creates flexible enums that can accept unknown values and be dynamically extended based on data.
#### Signatures
```typescript
// Create from array of values
flexEnum(values: string[], description?: string): ZodUnion
// Create from existing ZodEnum
flexEnum(enumDef: ZodEnum, description?: string): ZodUnion
// Update schema based on data (auto-extends enums)
flexEnum(schema: ZodObject, dataJson: unknown): ZodObject
// Use specific Zod instance (for version control)
flexEnum(zodInstance: ZodType, values: string[], description?: string): ZodUnion
flexEnum(zodInstance: ZodType, enumDef: ZodEnum, description?: string): ZodUnion
```
#### Examples
```typescript
import { z } from 'zod';
import { flexEnum } from 'zod-enum-forge';
// Basic flexible enum - accepts both predefined and unknown values
const statusEnum = flexEnum(['pending', 'done']);
console.log(statusEnum.parse('pending')); // ✅ 'pending'
console.log(statusEnum.parse('in_progress')); // ✅ 'in_progress' (unknown value accepted)
// With custom description for LLM guidance
const categoryEnum = flexEnum(['spam', 'ham'], 'Custom category type for email classification');
// Dynamic schema updates - automatically extends enums when new values are encountered
const schema = z.object({
status: flexEnum(['pending', 'done']),
category: flexEnum(['urgent', 'normal'])
});
const data = {
status: 'in_progress', // New value!
category: 'low_priority' // Another new value!
};
const updatedSchema = flexEnum(schema, data);
// Schema now accepts the new values for future validations
console.log(updatedSchema.parse(data)); // ✅ Works!
// Using specific Zod instance for version control
import { z as zod4 } from 'zod/v4';
const v4FlexEnum = flexEnum(zod4, ['a', 'b'], 'Custom description');
```
### `forgeEnum`
Extends existing enums with new values, creating a new enum with combined values.
#### Signatures
```typescript
// Extend array of values
forgeEnum(values: string[], add: string | string[]): ZodEnum
// Extend existing ZodEnum
forgeEnum(enumDef: ZodEnum, add: string | string[]): ZodEnum
// Extend enum within schema object
forgeEnum(schema: ZodObject, key: string, add: string | string[]): ZodObject
```
#### Examples
```typescript
import { z } from 'zod';
import { forgeEnum } from 'zod-enum-forge';
// Extend array of values
const newEnum = forgeEnum(['a', 'b'], 'c');
// Result: enum with values ['a', 'b', 'c']
// Extend existing Zod enum
const baseEnum = z.enum(['pending', 'done']);
const extendedEnum = forgeEnum(baseEnum, ['archived', 'cancelled']);
// Result: enum with values ['pending', 'done', 'archived', 'cancelled']
// Extend enum within schema
const schema = z.object({
status: z.enum(['pending', 'done'])
});
const newSchema = forgeEnum(schema, 'status', 'archived');
// Schema now has status enum with 'archived' value
```
### `addToEnum` (alias of `forgeEnum`)
Alias that forwards to `forgeEnum`.
Signatures: same as `forgeEnum`.
Example:
```typescript
import { addToEnum } from 'zod-enum-forge';
const base = z.enum(['a','b']);
const extended = addToEnum(base, 'c'); // enum with a,b,c
```
### `limitEnum`
Remove values from enums or flex enums.
Supports:
- Raw string[] (creates new enum after removal)
- ZodEnum
- flexEnum union or plain-flex enum
- Schema path (object key) including optional()/nullable() wrappers
Signatures:
```typescript
limitEnum(values: string[], remove: string | string[]): ZodEnum
limitEnum(enumOrFlex: ZodEnum | ZodUnion /* flex */ , remove: string | string[]): ZodEnum | ZodUnion
limitEnum(schema: ZodObject, key: string, remove: string | string[]): ZodObject
```
Examples:
```typescript
// Array
const reduced = limitEnum(['a','b','c'], 'b'); // enum a,c
// Enum
const Base = z.enum(['x','y','z']);
const trimmed = limitEnum(Base, ['z']); // enum x,y
// flexEnum
const fx = flexEnum(['draft','pub','arch']);
const fxTrimmed = limitEnum(fx, 'arch'); // flex without 'arch'
// In schema path
const schema = z.object({ status: z.enum(['open','closed','archived']) });
const updated = limitEnum(schema, 'status', 'archived');
```
### `deleteFromEnum` (alias of `limitEnum`)
Same usage as `limitEnum`.
### `strictEnum`
Convert a flex enum (or entire structure) back to a strict `z.enum(...)` removing the string-union flexibility and metadata. Preserves optional / nullable wrappers.
Signatures:
```typescript
strictEnum(flexEnumOrEnum: ZodUnion | ZodEnum): ZodEnum
strictEnum(schema: ZodObject): ZodObject // cleans all nested flex enums
```
Example:
```typescript
const fx = flexEnum(['a','b']);
const strict = strictEnum(fx); // plain z.enum(['a','b'])
const schema = z.object({ role: flexEnum(['admin','user']) });
const cleaned = strictEnum(schema); // role is now pure enum
```
### `deflexStructure`
Alias behaving like `strictEnum` when passed a structure. (You can still pass a single flex enum.)
```typescript
deflexStructure(schema: ZodObject): ZodObject
```
### `isFlexEnum`
Predicate that returns true if the value is a flex enum (either union or plain enum with metadata).
```typescript
isFlexEnum(x: unknown): boolean
```
Example:
```typescript
const fx = flexEnum(['a','b']);
console.log(isFlexEnum(fx)); // true
```
### `separateFlexibility`
Removes flex enums from a structure (turns them into strict enums) and returns a layer describing original flexibility for later reintegration.
Signature:
```typescript
separateFlexibility(schema: ZodObject): { schema: ZodObject; flexityLayer: FlexityLayer }
```
`FlexityLayer` shape:
```typescript
type FlexityLayer = {
[path: string]: { values: string[]; description?: string }
};
```
Example:
```typescript
const schema = z.object({ status: flexEnum(['pending','done']), nested: z.object({ kind: flexEnum(['a','b']).optional().nullable() }) });
const { schema: strictSchema, flexityLayer } = separateFlexibility(schema);
// strictSchema: all flex removed
// flexityLayer: { 'status': { values:['pending','done'], description: ... }, 'nested.kind': {...} }
```
### `integrateFlexibility`
Given a strict schema and a `FlexityLayer`, recreate the flex enums (original flexible form).
Signature:
```typescript
integrateFlexibility(schema: ZodObject, flexityLayer: FlexityLayer): ZodObject
```
Example:
```typescript
const { schema: strictSchema, flexityLayer } = separateFlexibility(schemaWithFlex);
const restored = integrateFlexibility(strictSchema, flexityLayer);
```
## New Utilities (v0.3.0)
Added helpers for managing enum evolution lifecycle:
- `addToEnum`: alias of `forgeEnum`
- `limitEnum`: remove values from enums / flexEnums (arrays, direct enums, wrapped optional/nullable, schema paths, or flexEnum unions)
- `deleteFromEnum`: alias of `limitEnum`
- `strictEnum`: converts flexEnum (union) or structures back to plain z.enum(...) (preserving optional/nullable wrappers) and removes metadata
- `deflexStructure`: alias behaving like strictEnum on whole structures
- `isFlexEnum`: exported predicate detecting flex enums
- `separateFlexibility`: returns `{ schema, flexityLayer }` where schema has all flexEnums converted to strict enums and flexityLayer maps paths to original values & descriptions
- `integrateFlexibility`: given a strict schema and a flexityLayer recreates the original schema with flexEnums reintegrated
### FlexityLayer Format
FlexityLayer is a simple object: `{ 'path.to.field': { values: string[], description?: string } }`
### Example
```typescript
import { z } from 'zod';
import { flexEnum, separateFlexibility, integrateFlexibility, limitEnum, addToEnum, strictEnum } from 'zod-enum-forge';
const schema = z.object({
status: flexEnum(['pending','done']),
nested: z.object({ kind: flexEnum(['a','b']).optional().nullable() })
});
// Extract flexibility layer
const { schema: strictSchema, flexityLayer } = separateFlexibility(schema);
// strictSchema now contains pure enums
// flexityLayer records where flex enums were
// Reintegrate later
const restored = integrateFlexibility(strictSchema, flexityLayer);
// Remove a value from an enum
const trimmed = limitEnum(restored, 'status', 'done');
// Add a new value
const extended = addToEnum(trimmed, 'status', 'archived');
// Force convert a specific field back to strict
const strictAgain = strictEnum(extended.shape.status);
```
## Advanced Usage
### Nested Objects
The library handles complex nested structures:
```typescript
const schema = z.object({
textClassification: z.object({
category: flexEnum(['spam', 'ham']),
subCategory: flexEnum(['urgent', 'non-urgent']).optional().nullable(),
features: z.object({
sentiment: flexEnum(['positive', 'negative']),
intent: flexEnum(['inform', 'request', 'command'])
})
}),
metadata: z.object({
source: flexEnum(['email', 'chat'])
})
});
const newData = {
textClassification: {
category: 'offers', // New category
subCategory: 'urgent',
features: {
sentiment: 'neutral', // New sentiment
intent: 'inform'
}
},
metadata: {
source: 'sms' // New source
}
};
const updatedSchema = flexEnum(schema, newData);
// All new enum values are now supported
```
### Optional Fields
The library properly handles optional and nullable fields:
```typescript
const schema = z.object({
required: flexEnum(['a', 'b']),
optional: flexEnum(['x', 'y']).optional().nullable()
});
const data = {
required: 'c', // Extends required field
optional: 'z' // Extends optional field (remains optional and nullable)
};
const updated = flexEnum(schema, data);
```
## Use Cases
### LLM-based Classification
Perfect for iterative classification workflows where you discover new categories as you process data:
This example demonstrates a real-world scenario where you're processing Wikipedia articles with an LLM (GPT-4o) and discovering new classification categories on the fly. The schema starts with basic categories but grows automatically as the LLM encounters new types of content that don't fit existing categories.
**How it works:**
1. **Initial Schema**: Start with a classification schema with predefined categories
2. **Iterative Processing**: For each article, use the current schema with OpenAI's structured output
3. **Dynamic Extension**: When the LLM outputs new enum values not in the current schema, `flexEnum` automatically extends the schema
4. **Schema Evolution**: The updated schema is used for subsequent articles, creating a self-improving classification system
```typescript
import fs from "fs";
import OpenAI from "openai";
import { zodTextFormat } from "openai/helpers/zod";
import { z } from "zod";
import csv from 'async-csv';
import { flexEnum } from 'zod-enum-forge';
import 'dotenv/config';
// Classification schema for Wikipedia articles
const articleSchema = z.object({
textClassification: z.object({
category: flexEnum(['politics', 'mathematics', 'ecology']),
subCategory: flexEnum(['international politics', 'geometry', 'climate change']).optional().nullable(),
}),
keyfindings: z.object({
summary: z.string().max(500),
importantFigures: z.array(z.string()).min(1).max(5),
relatedArticles: z.array(z.string()).min(1).max(5)
})
});
async function main() {
let currArticleSchema = articleSchema;
// Load articles from CSV file
const articlesContent_raw = await fs.promises.readFile('./articles.csv', 'utf8');
const articlesContent = await csv.parse(articlesContent_raw) as string[][];
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const articles = [];
// Process first 4 articles (skipping header row)
for (let n = 1; n < 5; n++) {
if (!articlesContent[n]?.[0]) {
continue; // Skip empty rows
}
const response = await openai.responses.parse({
model: "gpt-4o",
input: [
{ role: "system", content: "Write the information about article." },
{
role: "user",
content: "Article content:\n" + (articlesContent[n]?.[0] ?? ''),
},
],
text: {
format: zodTextFormat(currArticleSchema, "article"),
},
});
const article = response.output_parsed;
// Dynamically extend schema based on LLM output
currArticleSchema = flexEnum(currArticleSchema, article);
articles.push(article);
}
// Save processed articles and final schema
fs.writeFileSync('./processed_articles.json', JSON.stringify(articles, null, 2));
fs.writeFileSync('./last_schema.json', JSON.stringify(zodTextFormat(currArticleSchema, "article"), null, 2));
}
main().catch(err => {
console.error(err);
process.exit(1);
});
```
**Key Benefits:**
- **Automatic Discovery**: New categories are discovered organically through LLM processing
- **No Manual Intervention**: The schema evolves without requiring manual updates
- **Consistent Structure**: All processed articles maintain the same structured format
- **Iterative Improvement**: Each processed article potentially improves the classification schema for future articles
- **Schema Persistence**: Final evolved schema can be saved and reused
This approach is particularly useful for:
- Content categorization where categories aren't known upfront
- Academic paper classification across disciplines
- News article categorization with emerging topics
- Sentiment analysis with evolving emotional categories
- Intent classification in chatbots
- Document taxonomy development
- Any scenario where classification categories emerge from data rather than being predefined
### Taxonomy Evolution
Build evolving taxonomies that grow with your data:
```typescript
let taxonomy = z.object({
domain: flexEnum(['technology', 'business']),
subdomain: flexEnum(['ai', 'blockchain']).optional().nullable()
});
// As you process more documents
const documents = [
{ domain: 'healthcare', subdomain: 'telemedicine' },
{ domain: 'technology', subdomain: 'quantum' },
{ domain: 'business', subdomain: null } // nullable value
];
documents.forEach(doc => {
taxonomy = flexEnum(taxonomy, doc);
});
// Taxonomy now includes all discovered categories
```
## Zod Version Compatibility
This library (v0.2.0) automatically detects and works with both Zod v3 and v4:
- **Zod v3**: Uses `_def` property structure
- **Zod v4**: Uses `_zod.def` property structure with traits
The compatibility layer automatically:
- Detects which Zod version you're using
- Adapts internal API calls accordingly
- Maintains consistent behavior across versions
- Supports schemas created with different Zod instances
**Version Detection:**
```typescript
// Library automatically detects version from your schemas
const v3Schema = z3.enum(['a', 'b']);
const v4Schema = z4.enum(['a', 'b']);
// Both work seamlessly
const flexV3 = flexEnum(v3Schema);
const flexV4 = flexEnum(v4Schema);
// You can also specify the Zod instance explicitly
const explicitV4 = flexEnum(z4, ['a', 'b']);
```
No configuration needed - the library handles all differences internally.
## How It Works
### Flexible Enum Implementation
`flexEnum` creates a Zod union type that combines:
1. **Predefined enum values** - for known/expected values
2. **String schema** - for accepting unknown values
```typescript
// flexEnum(['a', 'b']) internally creates:
z.enum(['a', 'b']).or(z.string().describe("If none of the existing enum values match, provide a new appropriate value for this field."))
```
This approach provides:
- **Type safety** for known values
- **Flexibility** for unknown values
- **LLM guidance** through descriptions
- **Automatic extension** when new values are encountered
### Schema Evolution
When using `flexEnum(schema, data)`:
1. Library traverses the schema structure
2. Identifies flexible enums (marked with special metadata)
3. Checks if data contains new enum values
4. Extends enums with new values while preserving structure
5. Maintains optional/nullable wrappers
```typescript
const schema = z.object({
status: flexEnum(['pending', 'done']).optional(),
nested: z.object({
category: flexEnum(['a', 'b'])
})
});
const data = {
status: 'in_progress', // New value
nested: { category: 'c' } // New nested value
};
// Result: schema with extended enums, status remains optional
const newSchema = flexEnum(schema, data);
```
## TypeScript Support
Full TypeScript support with proper type inference:
```typescript
const schema = z.object({
status: flexEnum(['pending', 'done'])
});
type SchemaType = z.infer<typeof schema>;
// Result: { status: "pending" | "done" | string }
// The union type allows both predefined and custom values
const validData1: SchemaType = { status: 'pending' }; // ✅ Known value
const validData2: SchemaType = { status: 'custom' }; // ✅ Unknown value
```
## Error Handling
The library provides clear error messages:
```typescript
const schema = z.object({
name: z.string() // Not an enum
});
// This will throw: 'Field "name" is not a ZodEnum.'
forgeEnum(schema, 'name', 'test');
```
## Project source and contributing
Source code is available on GitHub: [itsp-kybernetes/zod-enum-forge](https://github.com/itsp-kybernetes/zod-enum-forge)
Contributions are welcome! Please feel free to submit a Pull Request.
## License
FreeBSD-2-Clause © Mariusz Żabiński (kybernetes.ngo)
## Keywords
- zod
- enum
- taxonomy
- open-set
- llm
- structured-output
- classification
- dynamic-schemas
- typescript
- schema-evolution