UNPKG

@astermind/astermind-pro

Version:

Astermind Pro - Premium ML Toolkit with Advanced RAG, Reranking, Summarization, and Information Flow Analysis

363 lines (295 loc) 7.65 kB
# Astermind Pro Quick Reference Quick reference for common operations and patterns. ## Common Imports ```typescript // Math import { cosine, l2, normalizeL2, softmax, sigmoid } from '@astermind/astermind-pro'; import { ridgeSolvePro, OnlineRidge } from '@astermind/astermind-pro'; import { buildRFF, mapRFF } from '@astermind/astermind-pro'; // Retrieval (NEW - reusable outside workers!) import { tokenize, expandQuery, toTfidf, hybridRetrieve, buildIndex, parseMarkdownToSections, flattenSections } from '@astermind/astermind-pro'; // RAG import { omegaComposeAnswer } from '@astermind/astermind-pro'; // Reranking import { rerank, rerankAndFilter, filterMMR } from '@astermind/astermind-pro'; // Summarization import { summarizeDeterministic } from '@astermind/astermind-pro'; // Information Flow import { TransferEntropy, InfoFlowGraph, TEController } from '@astermind/astermind-pro'; // Auto-tuning (NEW - reusable!) import { autoTune, sampleQueriesFromCorpus } from '@astermind/astermind-pro'; // Model serialization (NEW - reusable!) import { exportModel, importModel } from '@astermind/astermind-pro'; ``` ## Common Patterns ### Basic Reranking ```typescript const results = rerankAndFilter(query, chunks, { lambdaRidge: 1e-2, probThresh: 0.45, useMMR: true, budgetChars: 1200 }); ``` ### Basic Summarization ```typescript const summary = summarizeDeterministic(query, chunks, { maxAnswerChars: 1000, includeCitations: true }); ``` ### Cosine Similarity ```typescript const similarity = cosine(vec1, vec2); ``` ### Online Learning ```typescript const ridge = new OnlineRidge(64, 1, 1e-3); ridge.update(features, target); const prediction = ridge.predict(newFeatures); ``` ### Transfer Entropy Monitoring ```typescript const graph = new InfoFlowGraph({ window: 256 }); graph.get('ChannelName').push(x, y); const snapshot = graph.snapshot(); ``` ### Building an Index ```typescript const index = buildIndex({ chunks: yourDocuments, vocab: 10000, landmarks: 256, headingW: 2.0, useStem: true, kernel: 'rbf', sigma: 1.0 }); ``` ### Hybrid Retrieval ```typescript const retrieved = hybridRetrieve({ query: 'your query', chunks: yourDocuments, vocabMap: index.vocabMap, idf: index.idf, tfidfDocs: index.tfidfDocs, denseDocs: index.denseDocs, landmarksIdx: index.landmarksIdx, landmarkMat: index.landmarkMat, vocabSize: index.vocabMap.size, kernel: 'rbf', sigma: 1.0, alpha: 0.7, beta: 0.1, ridge: 0.08, headingW: 2.0, useStem: true, expandQuery: false, topK: 10 }); ``` ### Tokenization ```typescript const tokens = tokenize('Hello world', true); // with stemming const expanded = expandQuery('map'); // expands query terms ``` ### Markdown Parsing ```typescript const root = parseMarkdownToSections(markdownText); const chunks = flattenSections(root); ``` ### Auto-Tuning ```typescript const result = await autoTune({ chunks: yourDocuments, vocabMap: index.vocabMap, idf: index.idf, tfidfDocs: index.tfidfDocs, vocabSize: index.vocabMap.size, budget: 40, sampleQueries: 24, currentSettings: currentSettings }, (trial, best, note) => { console.log(`Trial ${trial}: ${best} (${note})`); }); ``` ### Model Serialization ```typescript // Export const model = exportModel({ settings: yourSettings, vocabMap: index.vocabMap, idf: index.idf, chunks: yourDocuments, tfidfDocs: index.tfidfDocs, landmarksIdx: index.landmarksIdx, landmarkMat: index.landmarkMat, denseDocs: index.denseDocs }); // Import const imported = importModel(model, { buildDense: (tfidfDocs, vocabSize, landmarkMat, kernel, sigma) => buildDenseDocs(tfidfDocs, vocabSize, landmarkMat, kernel, sigma) }); ``` ## Parameter Ranges ### Reranking - `lambdaRidge`: 1e-3 to 1e-1 (lower = less regularization) - `probThresh`: 0.3 to 0.7 (higher = more selective) - `mmrLambda`: 0.4 to 0.9 (higher = more diversity) - `budgetChars`: 600 to 5000 (content budget) ### Summarization - `maxAnswerChars`: 500 to 3000 - `queryWeight`: 0.3 to 0.6 (query alignment) - `teWeight`: 0.1 to 0.3 (transfer entropy) - `codeBonus`: 0.0 to 0.15 (code preference) ### Transfer Entropy - `window`: 64 to 512 (sample window) - `condLags`: 1 to 3 (conditioning lags) - `ridge`: 1e-6 to 1e-3 (regularization) ### Retrieval - `alpha`: 0.4 to 0.98 (dense/sparse mix, higher = more dense) - `beta`: 0.0 to 0.4 (keyword bonus weight) - `ridge`: 0.02 to 0.18 (regularization) - `sigma`: 0.12 to 1.0 (kernel bandwidth) - `landmarks`: 128 to 384 (Nyström landmarks) - `vocab`: 8000 to 15000 (vocabulary size) - `headingW`: 1.5 to 4.5 (heading weight multiplier) ## Type Definitions ```typescript type Chunk = { heading: string; content: string; rich?: string; level?: number; secId?: number; score_base?: number; }; type ScoredChunk = Chunk & { score_rr: number; p_relevant: number; _features?: number[]; _feature_names?: string[]; }; type RerankOptions = { lambdaRidge?: number; useMMR?: boolean; mmrLambda?: number; probThresh?: number; epsilonTop?: number; budgetChars?: number; randomProjDim?: number; exposeFeatures?: boolean; attachFeatureNames?: boolean; }; type SumOptions = { maxAnswerChars?: number; maxBullets?: number; preferCode?: boolean; includeCitations?: boolean; teWeight?: number; queryWeight?: number; evidenceWeight?: number; rrWeight?: number; codeBonus?: number; headingBonus?: number; jaccardDedupThreshold?: number; allowOffTopic?: boolean; minQuerySimForCode?: number; maxSectionsInAnswer?: number; }; ``` ## Common Workflows ### 1. Complete Retrieval Pipeline (Outside Workers) ```typescript // Build index const index = buildIndex({ chunks: documents, vocab: 10000, landmarks: 256, headingW: 2.0, useStem: true, kernel: 'rbf', sigma: 1.0 }); // Retrieve const retrieved = hybridRetrieve({ query: query, chunks: documents, vocabMap: index.vocabMap, idf: index.idf, tfidfDocs: index.tfidfDocs, denseDocs: index.denseDocs, landmarksIdx: index.landmarksIdx, landmarkMat: index.landmarkMat, vocabSize: index.vocabMap.size, kernel: 'rbf', sigma: 1.0, alpha: 0.7, beta: 0.1, ridge: 0.08, headingW: 2.0, useStem: true, expandQuery: false, topK: 10 }); // Rerank and summarize const reranked = rerankAndFilter(query, retrieved.items); const answer = summarizeDeterministic(query, reranked); ``` ### 2. Simple Q&A (Using Pre-built Index) ```typescript const reranked = rerankAndFilter(query, docs); const answer = summarizeDeterministic(query, reranked); ``` ### 2. Code Search ```typescript const reranked = rerankAndFilter(query, codeChunks, { probThresh: 0.4, budgetChars: 3000 }); const summary = summarizeDeterministic(query, reranked, { preferCode: true, codeBonus: 0.15 }); ``` ### 3. High Precision ```typescript const reranked = rerankAndFilter(query, docs, { probThresh: 0.6, lambdaRidge: 1e-3 }); const summary = summarizeDeterministic(query, reranked, { allowOffTopic: false, minQuerySimForCode: 0.5 }); ``` ### 4. High Diversity ```typescript const reranked = rerankAndFilter(query, docs, { mmrLambda: 0.8, budgetChars: 2000 }); ``` ## Performance Tips - Use `prod-worker` for inference-only - Cache reranking results - Batch process queries - Reduce `randomProjDim` for speed - Use `OnlineRidge` for incremental updates ## Error Handling ```typescript try { const results = rerankAndFilter(query, chunks); } catch (error) { if (error.message.includes('empty')) { // Handle empty input } else if (error.message.includes('NaN')) { // Handle invalid data } } ```