codemodctl
Version:
CLI tool and utilities for workflow engine operations, file sharding, and codeowner analysis
181 lines (131 loc) • 6.67 kB
Markdown
and utilities for workflow engine operations, file sharding, and codeowner analysis.
```bash
npm install codemodctl
```
```bash
codemodctl codeowner --shard-size 20 --state-prop shards --rule ./rule.yaml
```
```typescript
import { getShardForFilename, fitsInShard, distributeFilesAcrossShards } from 'codemodctl/sharding';
// Get the shard index for a specific file - always deterministic!
const shardIndex = getShardForFilename('src/components/Button.tsx', { shardCount: 5 });
// Same file + same shard count = same result, every time
const shard1 = getShardForFilename('src/components/Button.tsx', { shardCount: 5 });
const shard2 = getShardForFilename('src/components/Button.tsx', { shardCount: 5 });
console.log(shard1 === shard2); // always true
// Check if a file belongs to a specific shard
const belongsToShard = fitsInShard('src/components/Button.tsx', {
shardCount: 5,
shardIndex: 2
});
// Distribute all files across shards with consistent hashing
const files = ['file1.ts', 'file2.ts', 'file3.ts'];
const distribution = distributeFilesAcrossShards(files, 5);
// Check scaling behavior - minimal reassignment when growing
const scalingAnalysis = analyzeShardScaling(files, 5, 6);
console.log(`${scalingAnalysis.stableFiles} files stay in same shard`);
console.log(`${scalingAnalysis.reassignmentPercentage}% reassignment`); // Much less than 100%
```
```typescript
import { analyzeCodeowners, findCodeownersFile } from 'codemodctl/codeowners';
// Analyze codeowners and generate shard configuration
const result = await analyzeCodeowners({
shardSize: 20,
rulePath: './rule.yaml',
projectRoot: process.cwd()
});
console.log(`Generated ${result.shards.length} shards for ${result.totalFiles} files`);
result.teams.forEach(team => {
console.log(`Team "${team.team}" owns ${team.fileCount} files`);
});
```
```typescript
import codemodctl from 'codemodctl';
// Access all utilities through the default export
const shardIndex = await codemodctl.sharding.getShardForFilename('file.ts', { shardCount: 5 });
const analysis = await codemodctl.codeowners.analyzeCodeowners(options);
```
The sharding algorithm uses **consistent hashing** to ensure:
- **Perfect consistency**: Same file + same shard count = same result, always
- **No external dependencies**: Result depends only on filename and shard count
- **Minimal reassignment**: When scaling up, only ~20-40% of files move (not 100%)
- **Stable scaling**: Adding new shards doesn't reorganize existing file assignments
- **Simple API**: No complex parameters or configuration needed
- **Team-aware sharding**: Works with codeowner boundaries
### Codeowner Analysis
- **Automatic CODEOWNERS detection**: Searches common locations (root, .github/, docs/)
- **AST-grep integration**: Analyze files using custom rules
- **Team-based grouping**: Groups files by their assigned teams
- **Shard generation**: Creates optimal shard configuration based on team ownership
## API Reference
### Sharding Functions
- `getShardForFilename(filename, { shardCount })` - Get shard index for a file
- `fitsInShard(filename, { shardCount, shardIndex })` - Check shard membership
- `distributeFilesAcrossShards(files, shardCount)` - Distribute files across shards
- `calculateOptimalShardCount(totalFiles, targetShardSize)` - Calculate optimal shard count
- `getFileHashPosition(filename)` - Get consistent hash position for a file
- `analyzeShardScaling(files, oldCount, newCount)` - Analyze reassignment when scaling
All functions are deterministic: same input always produces the same output.
**Scaling behavior**: When going from N to N+1 shards, typically only 20-40% of files get reassigned to new locations, making it ideal for incremental scaling scenarios.
### Codeowner Functions
- `analyzeCodeowners(options)` - Complete analysis with shard generation
- `findCodeownersFile(projectRoot?, explicitPath?)` - Locate CODEOWNERS file
- `loadAstGrepRule(rulePath)` - Parse AST-grep rule from YAML
- `analyzeFilesByOwner(codeownersPath, rule, projectRoot?)` - Group files by owner
- `generateShards(filesByOwner, shardSize)` - Generate shard configuration
- `normalizeOwnerName(owner)` - Normalize owner names
## Usage Examples
### Simple Deterministic Sharding
```typescript
import { getShardForFilename, distributeFilesAcrossShards } from 'codemodctl/sharding';
// Get shard for a file - always deterministic
const shard = getShardForFilename('src/components/Button.tsx', { shardCount: 5 });
// Same input always gives same output
const shard1 = getShardForFilename('src/components/Button.tsx', { shardCount: 5 });
const shard2 = getShardForFilename('src/components/Button.tsx', { shardCount: 5 });
console.log(shard1 === shard2); // always true
// Different shard counts give different results (expected behavior)
const shard5 = getShardForFilename('src/components/Button.tsx', { shardCount: 5 });
const shard10 = getShardForFilename('src/components/Button.tsx', { shardCount: 10 });
// shard5 and shard10 will likely be different, but each is consistent
// Distribute files with consistent hashing for stable scaling
const files = ['file1.ts', 'file2.ts', 'file3.ts'];
const distribution = distributeFilesAcrossShards(files, 5);
// When you need more capacity, most files stay in place
const moreFiles = [...files, 'newFile.ts'];
const analysis = analyzeShardScaling(files, 5, 6);
// Only ~20-40% of files get reassigned, not all of them!
```
- **No complex parameters**: Just filename and shard count
- **Perfectly deterministic**: Same input = same output, always
- **Stable scaling**: When adding shards, most files stay in their original shards
- **Minimal reassignment**: Only ~20-40% of files move when scaling up
- **Fast and simple**: Hash-based assignment with consistent ring placement
- **Works across runs**: File gets same shard whether filesystem changes or not
Analyze CODEOWNERS file and generate sharding configuration.
```bash
codemodctl codeowner [options]
Options:
-s, --shard-size <size> Number of files per shard (required)
-p, --state-prop <prop> Property name for state output (required)
-c, --codeowners <path> Path to CODEOWNERS file (optional)
-r, --rule <path> Path to AST-grep rule file (required)
```
Environment variables:
- `STATE_OUTPUTS`: Path to write state output file
## License
MIT
CLI tool