@n2flowjs/nbase
Version:
Neural Vector Database for efficient similarity search
264 lines (214 loc) • 6.43 kB
Markdown
# Database Reference
## Overview
The `Database` class is the main interface for NBase, providing high-level access to vector storage, search, and management capabilities. It wraps the underlying `PartitionedVectorDB` and provides additional features like caching, monitoring, and automatic resource management.
## Constructor
```typescript
const db = new Database(options: DatabaseOptions);
```
### Options
```typescript
interface DatabaseOptions {
// Vector configuration
vectorSize?: number; // Default: 1536
// Caching and performance
cacheSize?: number; // Default: 1000
maxConcurrentSearches?: number; // Default: CPU cores - 1
// Clustering configuration
clustering?: {
clusterSize: number; // Default: 100
newClusterThresholdFactor: number; // Default: 1.5
useCompression: boolean; // Default: true
};
// Partitioning configuration
partitioning?: {
partitionsDir: string; // Default: './database/partitions'
partitionCapacity: number; // Default: 100000
autoLoadPartitions: boolean; // Default: true
maxActivePartitions: number; // Default: 3
};
// Indexing configuration
indexing?: {
buildOnStart: boolean; // Default: true
autoRebuildThreshold: number; // Default: 500
hnsw: {
M: number; // Default: 16
efConstruction: number; // Default: 200
efSearch: number; // Default: 100
}
};
// Persistence options
persistence?: {
dbPath?: string; // Default: './database'
saveIntervalMs?: number; // Default: 300000 (5 minutes)
};
// Monitoring options
monitoring?: {
enable: boolean; // Default: false
intervalMs: number; // Default: 60000
logToConsole: boolean; // Default: false
};
}
```
## Core Methods
### Vector Operations
#### Adding Vectors
```typescript
// Add single vector
const result = await db.addVector(
id: string | number | undefined, // Optional ID
vector: number[] | Float32Array, // Vector data
metadata?: Record<string, any> // Optional metadata
): Promise<{
partitionId: string;
vectorId: string | number;
}>;
// Bulk add vectors
const result = await db.bulkAdd(
vectors: Array<{
id?: string | number;
vector: number[] | Float32Array;
metadata?: Record<string, any>;
}>
): Promise<{
count: number;
partitionIds: string[];
}>;
```
#### Searching
```typescript
const results = await db.search(
query: number[] | Float32Array,
options?: {
k?: number; // Number of results (default: 10)
filter?: (id: string | number, metadata?: Record<string, any>) => boolean;
includeMetadata?: boolean; // Include metadata in results
useHNSW?: boolean; // Use HNSW index
efSearch?: number; // HNSW search parameter
distanceMetric?: 'cosine' | 'euclidean';
partitionIds?: string[]; // Specific partitions to search
skipCache?: boolean; // Bypass result caching
}
): Promise<Array<{
id: string | number;
score: number;
distance: number;
metadata?: Record<string, any>;
}>>;
```
### Metadata Management
```typescript
// Add/update metadata
await db.addMetadata(id, metadata);
await db.updateMetadata(id, metadata);
// Get metadata
const metadata = await db.getMetadata(id);
// Search by metadata
const results = await db.getMetadataWithField(
criteria: string | string[] | Record<string, any>,
values?: any | any[],
options?: { limit: number }
);
```
## Advanced Features
```typescript
// Extract relationships between vectors
const relationships = await db.extractRelationships(
threshold: number,
options?: {
metric?: 'cosine' | 'euclidean';
partitionIds?: string[];
includeMetadata?: boolean;
}
);
// Find vector communities
const communities = await db.extractCommunities(
threshold: number,
options?: {
metric?: 'cosine' | 'euclidean';
partitionIds?: string[];
includeMetadata?: boolean;
}
);
```
## Database Management
### Initialization and Status
```typescript
// Check if database is ready
const isReady = db.IsReady();
// Wait for initialization
await db.initializationPromise;
```
### Index Management
```typescript
// Build indices
await db.buildIndexes(
partitionId?: string,
options?: {
force?: boolean;
dimensionAware?: boolean;
progressCallback?: (progress: number) => void;
}
);
```
### State Management
```typescript
// Save current state
await db.save();
// Close database
await db.close();
// Get statistics
const stats = await db.getStats();
```
## Events
The Database class emits various events that can be listened to:
```typescript
// Lifecycle events
db.on('initializing', () => {});
db.on('ready', () => {});
db.on('close', () => {});
// Operation events
db.on('vector:add', (data) => {});
db.on('vectors:bulkAdd', (data) => {});
db.on('search:complete', (data) => {});
db.on('search:error', (data) => {});
// Background task events
db.on('save:complete', (data) => {});
db.on('index:progress', (data) => {});
// Error events
db.on('error', (data) => {});
db.on('warn', (data) => {});
```
## Performance Monitoring
```typescript
const stats = await db.getStats();
console.log(stats);
/*
{
state: { isReady, isClosed, status },
database: { vectors, partitions, indices },
search: { calls, avgTime, methodCounts },
searchCache: { size, hits, misses, hitRate },
performance: { queries, avgSearchTimeMs },
system: { cpuUsage, memoryUsage },
options: { current configuration }
}
*/
```
## Best Practices
1. **Initialization**
- Always wait for database initialization before performing operations
- Use `await db.initializationPromise` or listen for 'ready' event
2. **Vector Management**
- Use bulkAdd for adding multiple vectors
- Keep vector dimensions consistent within partitions
- Include relevant metadata for better filtering
3. **Search Optimization**
- Enable HNSW indexing for large datasets
- Use specific partitionIds when possible
- Implement efficient filter functions
- Utilize metadata search for non-vector queries
4. **Resource Management**
- Monitor memory usage via getStats()
- Close database properly when done
- Configure appropriate partition sizes
- Adjust maxActivePartitions based on memory