UNPKG

@n2flowjs/nbase

Version:

Neural Vector Database for efficient similarity search

596 lines (458 loc) 13.4 kB
# NBase API Documentation This document provides detailed specifications for all API endpoints in the NBase vector database server. ## Table of Contents - [Health and Status](#health-and-status) - [Health Check](#health-check) - [Database Statistics](#database-statistics) - [Vector Management](#vector-management) - [Add Vector](#add-vector) - [Bulk Add Vectors](#bulk-add-vectors) - [Get Vector](#get-vector) - [Check Vector Exists](#check-vector-exists) - [Update Vector Metadata](#update-vector-metadata) - [Delete Vector](#delete-vector) - [Find Similar Vectors](#find-similar-vectors) - [Vector Search](#vector-search) - [Search](#search) - [Extract Relationships](#extract-relationships) - [Extract Communities](#extract-communities) ## Health and Status ### Health Check Get the current health status of the API server. **URL:** `/health` **Method:** `GET` **Response:** ```json { "status": "ok", "version": "1.0.0", "timestamp": "2023-04-01T12:34:56.789Z" } ``` ### Database Statistics Get statistics about the current database state. **URL:** `/stats` **Method:** `GET` **Response:** ```json { "vectorCount": 1000, "dimensions": { "768": 800, "1536": 200 }, "partitionCount": 4, "partitionStats": [ { "id": "partition1", "vectorCount": 300, "memoryUsage": 15000000 }, { "id": "partition2", "vectorCount": 700, "memoryUsage": 35000000 } ], "indexStats": { "isIndexed": true, "indexType": "hnsw", "indexingProgress": 100 }, "memoryUsage": { "total": 50000000, "vectors": 48000000, "metadata": 2000000 }, "timestamps": { "created": "2023-04-01T10:00:00.000Z", "lastModified": "2023-04-01T12:30:00.000Z" } } ``` ## Vector Management ### Add Vector Add a single vector to the database. **URL:** `/api/vectors` **Method:** `POST` **Request Body:** ```json { "id": "doc123", // Optional: Custom ID for the vector "vector": [0.1, 0.2, ...], // Required: Vector embedding "metadata": { // Optional: Metadata associated with the vector "title": "Example Document", "source": "website", "tags": ["example", "documentation"] } } ``` **Response:** ```json { "success": true, "count": 1, "ids": ["doc123"], "dimensions": { "1536": 1 }, "partitionsAffected": { "partition1": 1 }, "duration": 15 } ``` **Error Responses:** - `400 Bad Request`: Missing or invalid vector - `503 Service Unavailable`: Database is not ready ### Bulk Add Vectors Add multiple vectors to the database in a single request. **URL:** `/api/vectors` **Method:** `POST` **Request Body:** ```json { "vectors": [ { "id": "doc123", "vector": [0.1, 0.2, ...], "metadata": { "title": "Example Document 1", "source": "website" } }, { "id": "doc456", "vector": [0.3, 0.4, ...], "metadata": { "title": "Example Document 2", "source": "pdf" } } ] } ``` **Response:** ```json { "success": true, "count": 2, "dimensions": { "1536": 2 }, "partitionsAffected": { "partition1": 1, "partition2": 1 }, "duration": 25 } ``` **Error Responses:** - `400 Bad Request`: Missing or invalid vector array - `503 Service Unavailable`: Database is not ready ### Get Vector Retrieve a vector by its ID. **URL:** `/api/vectors/:id` **Method:** `GET` **URL Parameters:** - `id`: Vector ID **Query Parameters:** - `includeVector`: Include the vector in the response (`true` or `false`, default: `false`) - `includeMetadata`: Include metadata in the response (`true` or `false`, default: `true`) **Response:** ```json { "success": true, "id": "doc123", "partitionId": "partition1", "dimension": 1536, "vector": [0.1, 0.2, ...], "metadata": { "title": "Example Document", "source": "website", "dimension": 1536, "createdAt": 1680346800000 }, "duration": 5 } ``` **Error Responses:** - `404 Not Found`: Vector not found - `503 Service Unavailable`: Database is not ready ### Check Vector Exists Check if a vector with the specified ID exists in the database. **URL:** `/api/vectors/:id/exists` **Method:** `GET` **URL Parameters:** - `id`: Vector ID **Response:** ```json { "success": true, "exists": true, "id": "doc123", "foundId": "doc123", "dimension": 1536, "duration": 3 } ``` **Error Responses:** - `503 Service Unavailable`: Database is not ready ### Update Vector Metadata Update the metadata for a vector. **URL:** `/api/vectors/:id/metadata` **Method:** `PATCH` **URL Parameters:** - `id`: Vector ID **Request Body:** ```json { "metadata": { "title": "Updated Title", "tags": ["updated", "metadata"] }, "operation": "merge" // Optional: "merge" (default) or "replace" } ``` **Response:** ```json { "success": true, "id": "doc123", "operation": "merge", "dimension": 1536, "duration": 8 } ``` **Error Responses:** - `400 Bad Request`: Missing or invalid metadata - `404 Not Found`: Vector not found - `503 Service Unavailable`: Database is not ready ### Delete Vector Delete a vector from the database. **URL:** `/api/vectors/:id` **Method:** `DELETE` **URL Parameters:** - `id`: Vector ID **Response:** ```json { "success": true, "id": "doc123", "duration": 6 } ``` **Error Responses:** - `404 Not Found`: Vector not found - `503 Service Unavailable`: Database is not ready ### Find Similar Vectors Find vectors similar to a specific vector. **URL:** `/api/vectors/:id/similar` **Method:** `GET` **URL Parameters:** - `id`: Vector ID of the reference vector **Query Parameters:** - `k`: Number of similar vectors to return (default: `10`) - `includeMetadata`: Include metadata in results (`true` or `false`, default: `true`) - `includeVectors`: Include vectors in results (`true` or `false`, default: `false`) **Response:** ```json { "success": true, "queryId": "doc123", "queryDimension": 1536, "results": [ { "id": "doc456", "score": 0.92, "partitionId": "partition2", "metadata": { "title": "Similar Document 1", "source": "pdf" } }, { "id": "doc789", "score": 0.85, "partitionId": "partition1", "metadata": { "title": "Similar Document 2", "source": "website" } } ], "count": 2, "duration": 12 } ``` **Error Responses:** - `404 Not Found`: Source vector not found - `503 Service Unavailable`: Database is not ready ## Vector Search ### Search Search for similar vectors based on a query vector. **URL:** `/api/search` **Method:** `POST` **Request Body:** ```json { "query": [0.1, 0.2, ...], // Required: Query vector "k": 10, // Optional: Number of results to return (default: 10) "method": "hnsw", // Optional: Search method ("hnsw" or default clustered) "partitionIds": ["partition1", "partition2"], // Optional: Specific partitions to search "efSearch": 100, // Optional: Exploration factor for HNSW search "distanceMetric": "cosine", // Optional: Distance metric (default: cosine) "rerank": true, // Optional: Whether to rerank results "rerankingMethod": "cross_encoder", // Optional: Method for reranking "rerankLambda": 0.5, // Optional: Lambda parameter for reranking "filters": { // Optional: Metadata filters "source": "website", "tags": "example" }, "includeMetadata": true, // Optional: Include metadata in results (default: true) "includeVectors": false, // Optional: Include vectors in results (default: false) "skipCache": false, // Optional: Bypass result caching (default: false) "searchTimeoutMs": 5000 // Optional: Search timeout in milliseconds } ``` **Alternative Filter Format:** ```json "filters": [ { "field": "source", "operator": "$eq", "value": "website" }, { "field": "rating", "operator": "$gt", "value": 4 } ] ``` **Response:** ```json { "results": [ { "id": "doc456", "score": 0.92, "partitionId": "partition2", "metadata": { "title": "Similar Document 1", "source": "website" } }, { "id": "doc789", "score": 0.85, "partitionId": "partition1", "metadata": { "title": "Similar Document 2", "source": "website" } } ], "count": 2, "duration": 18, "searchOptions": { "k": 10, "method": "hnsw", "partitionsSearched": 2, "rerankApplied": true, "cacheUsed": true, "filtersApplied": true } } ``` **Error Responses:** - `400 Bad Request`: Missing or invalid query vector - `500 Internal Server Error`: Search error (with details) ### Extract Relationships Find relationships between vectors based on a distance threshold. **URL:** `/api/search/relationships` **Method:** `POST` **Request Body:** ```json { "threshold": 0.3, // Required: Maximum distance between vectors to consider them related "metric": "cosine", // Optional: Distance metric (e.g., 'cosine', 'euclidean'). Default depends on database implementation. "partitionIds": ["p1", "p2"] // Optional: Array of partition IDs to restrict the search. Searches all loaded partitions if omitted. } ``` **Response:** ```json { "relationships": [ { "vector1": { "id": 123, "partitionId": "partition1" }, "vector2": { "id": 456, "partitionId": "partition1" }, "distance": 0.25 }, { "vector1": { "id": 789, "partitionId": "partition2" }, "vector2": { "id": 101, "partitionId": "partition2" }, "distance": 0.15 } // ... more relationships ], "count": 2, // Total number of relationships found "duration": 345 // Time taken in milliseconds } ``` **Error Responses:** - `400 Bad Request`: Missing or invalid `threshold` - `500 Internal Server Error`: Error during relationship extraction ### Extract Communities Find communities (clusters) of related vectors based on a distance threshold. **URL:** `/api/search/communities` **Method:** `POST` **Request Body:** ```json { "threshold": 0.3, // Required: Maximum distance between vectors to consider them related "metric": "cosine", // Optional: Distance metric (e.g., 'cosine', 'euclidean'). Default depends on database implementation. "partitionIds": ["p1", "p2"], // Optional: Array of partition IDs to restrict the search. Searches all loaded partitions if omitted. "includeMetadata": true // Optional: Whether to include metadata for each vector in the results (default: true) } ``` **Response:** ```json { "communities": [ [ { "id": 123, "partitionId": "partition1", "metadata": { "label": "doc1" } }, { "id": 456, "partitionId": "partition1", "metadata": { "label": "doc2" } } ], [ { "id": 789, "partitionId": "partition2", "metadata": { "label": "doc3" } }, { "id": 101, "partitionId": "partition2", "metadata": { "label": "doc4" } }, { "id": 102, "partitionId": "partition2", "metadata": { "label": "doc5" } } ] // ... more communities ], "count": 2, // Number of communities "totalVectors": 5, // Total number of vectors across all communities "duration": 345 // Time taken in milliseconds } ``` **Error Responses:** - `400 Bad Request`: Missing or invalid `threshold` - `500 Internal Server Error`: Error during community extraction ## Filter Operators The following operators are supported in filter configurations: | Operator | Description | Example | |----------|-------------|---------| | `$eq` | Equal to | `{ "field": "type", "operator": "$eq", "value": "article" }` | | `$ne` | Not equal to | `{ "field": "type", "operator": "$ne", "value": "draft" }` | | `$gt` | Greater than | `{ "field": "rating", "operator": "$gt", "value": 4 }` | | `$gte` | Greater than or equal to | `{ "field": "rating", "operator": "$gte", "value": 4 }` | | `$lt` | Less than | `{ "field": "rating", "operator": "$lt", "value": 3 }` | | `$lte` | Less than or equal to | `{ "field": "rating", "operator": "$lte", "value": 3 }` | | `$in` | In array | `{ "field": "type", "operator": "$in", "value": ["article", "post"] }` | | `$nin` | Not in array | `{ "field": "type", "operator": "$nin", "value": ["draft", "deleted"] }` | | `$exists` | Field exists | `{ "field": "rating", "operator": "$exists", "value": true }` | | `$regex` | Matches regex pattern | `{ "field": "title", "operator": "$regex", "value": "^Getting Started" }` | ## Best Practices 1. **Vector Dimensions**: Ensure vectors have consistent dimensions within a partition. 2. **Metadata Fields**: Use descriptive field names and consistent types for metadata. 3. **ID Management**: Consistently use either string or number IDs to avoid type conversion costs. 4. **Bulk Operations**: Use bulk add for better performance when adding multiple vectors. 5. **Partitioning**: Consider using separate partitions for vectors with different dimensions or from different sources. 6. **Search Performance**: - Use HNSW method for faster search at a slight cost in recall - Include proper filters to reduce the search space - Limit the number of partitions searched when possible 7. **Rate Limiting**: Be aware of server-side rate limiting settings to avoid request failures.