@astermind/astermind-pro
Version:
Astermind Pro - Premium ML Toolkit with Advanced RAG, Reranking, Summarization, and Information Flow Analysis
363 lines (295 loc) • 7.65 kB
Markdown
# Astermind Pro Quick Reference
Quick reference for common operations and patterns.
## Common Imports
```typescript
// Math
import { cosine, l2, normalizeL2, softmax, sigmoid } from '@astermind/astermind-pro';
import { ridgeSolvePro, OnlineRidge } from '@astermind/astermind-pro';
import { buildRFF, mapRFF } from '@astermind/astermind-pro';
// Retrieval (NEW - reusable outside workers!)
import {
tokenize, expandQuery, toTfidf, hybridRetrieve, buildIndex,
parseMarkdownToSections, flattenSections
} from '@astermind/astermind-pro';
// RAG
import { omegaComposeAnswer } from '@astermind/astermind-pro';
// Reranking
import { rerank, rerankAndFilter, filterMMR } from '@astermind/astermind-pro';
// Summarization
import { summarizeDeterministic } from '@astermind/astermind-pro';
// Information Flow
import { TransferEntropy, InfoFlowGraph, TEController } from '@astermind/astermind-pro';
// Auto-tuning (NEW - reusable!)
import { autoTune, sampleQueriesFromCorpus } from '@astermind/astermind-pro';
// Model serialization (NEW - reusable!)
import { exportModel, importModel } from '@astermind/astermind-pro';
```
## Common Patterns
### Basic Reranking
```typescript
const results = rerankAndFilter(query, chunks, {
lambdaRidge: 1e-2,
probThresh: 0.45,
useMMR: true,
budgetChars: 1200
});
```
### Basic Summarization
```typescript
const summary = summarizeDeterministic(query, chunks, {
maxAnswerChars: 1000,
includeCitations: true
});
```
### Cosine Similarity
```typescript
const similarity = cosine(vec1, vec2);
```
### Online Learning
```typescript
const ridge = new OnlineRidge(64, 1, 1e-3);
ridge.update(features, target);
const prediction = ridge.predict(newFeatures);
```
### Transfer Entropy Monitoring
```typescript
const graph = new InfoFlowGraph({ window: 256 });
graph.get('ChannelName').push(x, y);
const snapshot = graph.snapshot();
```
### Building an Index
```typescript
const index = buildIndex({
chunks: yourDocuments,
vocab: 10000,
landmarks: 256,
headingW: 2.0,
useStem: true,
kernel: 'rbf',
sigma: 1.0
});
```
### Hybrid Retrieval
```typescript
const retrieved = hybridRetrieve({
query: 'your query',
chunks: yourDocuments,
vocabMap: index.vocabMap,
idf: index.idf,
tfidfDocs: index.tfidfDocs,
denseDocs: index.denseDocs,
landmarksIdx: index.landmarksIdx,
landmarkMat: index.landmarkMat,
vocabSize: index.vocabMap.size,
kernel: 'rbf',
sigma: 1.0,
alpha: 0.7,
beta: 0.1,
ridge: 0.08,
headingW: 2.0,
useStem: true,
expandQuery: false,
topK: 10
});
```
### Tokenization
```typescript
const tokens = tokenize('Hello world', true); // with stemming
const expanded = expandQuery('map'); // expands query terms
```
### Markdown Parsing
```typescript
const root = parseMarkdownToSections(markdownText);
const chunks = flattenSections(root);
```
### Auto-Tuning
```typescript
const result = await autoTune({
chunks: yourDocuments,
vocabMap: index.vocabMap,
idf: index.idf,
tfidfDocs: index.tfidfDocs,
vocabSize: index.vocabMap.size,
budget: 40,
sampleQueries: 24,
currentSettings: currentSettings
}, (trial, best, note) => {
console.log(`Trial ${trial}: ${best} (${note})`);
});
```
### Model Serialization
```typescript
// Export
const model = exportModel({
settings: yourSettings,
vocabMap: index.vocabMap,
idf: index.idf,
chunks: yourDocuments,
tfidfDocs: index.tfidfDocs,
landmarksIdx: index.landmarksIdx,
landmarkMat: index.landmarkMat,
denseDocs: index.denseDocs
});
// Import
const imported = importModel(model, {
buildDense: (tfidfDocs, vocabSize, landmarkMat, kernel, sigma) =>
buildDenseDocs(tfidfDocs, vocabSize, landmarkMat, kernel, sigma)
});
```
## Parameter Ranges
### Reranking
- `lambdaRidge`: 1e-3 to 1e-1 (lower = less regularization)
- `probThresh`: 0.3 to 0.7 (higher = more selective)
- `mmrLambda`: 0.4 to 0.9 (higher = more diversity)
- `budgetChars`: 600 to 5000 (content budget)
### Summarization
- `maxAnswerChars`: 500 to 3000
- `queryWeight`: 0.3 to 0.6 (query alignment)
- `teWeight`: 0.1 to 0.3 (transfer entropy)
- `codeBonus`: 0.0 to 0.15 (code preference)
### Transfer Entropy
- `window`: 64 to 512 (sample window)
- `condLags`: 1 to 3 (conditioning lags)
- `ridge`: 1e-6 to 1e-3 (regularization)
### Retrieval
- `alpha`: 0.4 to 0.98 (dense/sparse mix, higher = more dense)
- `beta`: 0.0 to 0.4 (keyword bonus weight)
- `ridge`: 0.02 to 0.18 (regularization)
- `sigma`: 0.12 to 1.0 (kernel bandwidth)
- `landmarks`: 128 to 384 (Nyström landmarks)
- `vocab`: 8000 to 15000 (vocabulary size)
- `headingW`: 1.5 to 4.5 (heading weight multiplier)
## Type Definitions
```typescript
type Chunk = {
heading: string;
content: string;
rich?: string;
level?: number;
secId?: number;
score_base?: number;
};
type ScoredChunk = Chunk & {
score_rr: number;
p_relevant: number;
_features?: number[];
_feature_names?: string[];
};
type RerankOptions = {
lambdaRidge?: number;
useMMR?: boolean;
mmrLambda?: number;
probThresh?: number;
epsilonTop?: number;
budgetChars?: number;
randomProjDim?: number;
exposeFeatures?: boolean;
attachFeatureNames?: boolean;
};
type SumOptions = {
maxAnswerChars?: number;
maxBullets?: number;
preferCode?: boolean;
includeCitations?: boolean;
teWeight?: number;
queryWeight?: number;
evidenceWeight?: number;
rrWeight?: number;
codeBonus?: number;
headingBonus?: number;
jaccardDedupThreshold?: number;
allowOffTopic?: boolean;
minQuerySimForCode?: number;
maxSectionsInAnswer?: number;
};
```
## Common Workflows
### 1. Complete Retrieval Pipeline (Outside Workers)
```typescript
// Build index
const index = buildIndex({
chunks: documents,
vocab: 10000,
landmarks: 256,
headingW: 2.0,
useStem: true,
kernel: 'rbf',
sigma: 1.0
});
// Retrieve
const retrieved = hybridRetrieve({
query: query,
chunks: documents,
vocabMap: index.vocabMap,
idf: index.idf,
tfidfDocs: index.tfidfDocs,
denseDocs: index.denseDocs,
landmarksIdx: index.landmarksIdx,
landmarkMat: index.landmarkMat,
vocabSize: index.vocabMap.size,
kernel: 'rbf',
sigma: 1.0,
alpha: 0.7,
beta: 0.1,
ridge: 0.08,
headingW: 2.0,
useStem: true,
expandQuery: false,
topK: 10
});
// Rerank and summarize
const reranked = rerankAndFilter(query, retrieved.items);
const answer = summarizeDeterministic(query, reranked);
```
### 2. Simple Q&A (Using Pre-built Index)
```typescript
const reranked = rerankAndFilter(query, docs);
const answer = summarizeDeterministic(query, reranked);
```
### 2. Code Search
```typescript
const reranked = rerankAndFilter(query, codeChunks, {
probThresh: 0.4,
budgetChars: 3000
});
const summary = summarizeDeterministic(query, reranked, {
preferCode: true,
codeBonus: 0.15
});
```
### 3. High Precision
```typescript
const reranked = rerankAndFilter(query, docs, {
probThresh: 0.6,
lambdaRidge: 1e-3
});
const summary = summarizeDeterministic(query, reranked, {
allowOffTopic: false,
minQuerySimForCode: 0.5
});
```
### 4. High Diversity
```typescript
const reranked = rerankAndFilter(query, docs, {
mmrLambda: 0.8,
budgetChars: 2000
});
```
## Performance Tips
- Use `prod-worker` for inference-only
- Cache reranking results
- Batch process queries
- Reduce `randomProjDim` for speed
- Use `OnlineRidge` for incremental updates
## Error Handling
```typescript
try {
const results = rerankAndFilter(query, chunks);
} catch (error) {
if (error.message.includes('empty')) {
// Handle empty input
} else if (error.message.includes('NaN')) {
// Handle invalid data
}
}
```