@mastra/core

# Storing embeddings in a vector database After generating embeddings, you need to store them in a database that supports vector similarity search. Mastra provides a consistent interface for storing and querying embeddings across various vector databases. ## Supported databases **MongoDB**: ```ts import { MongoDBVector } from '@mastra/mongodb' const store = new MongoDBVector({ id: 'mongodb-vector', uri: process.env.MONGODB_URI, dbName: process.env.MONGODB_DATABASE, }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` ### Using MongoDB Atlas Vector search For detailed setup instructions and best practices, see the [official MongoDB Atlas Vector Search documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/?utm_campaign=devrel\&utm_source=third-party-content\&utm_medium=cta\&utm_content=mastra-docs). **PgVector**: ```ts import { PgVector } from '@mastra/pg' const store = new PgVector({ id: 'pg-vector', connectionString: process.env.POSTGRES_CONNECTION_STRING, }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` ### Using PostgreSQL with pgvector PostgreSQL with the pgvector extension is a good solution for teams already using PostgreSQL who want to minimize infrastructure complexity. For detailed setup instructions and best practices, see the [official pgvector repository](https://github.com/pgvector/pgvector). **Pinecone**: ```ts import { PineconeVector } from '@mastra/pinecone' const store = new PineconeVector({ id: 'pinecone-vector', apiKey: process.env.PINECONE_API_KEY, }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **Qdrant**: ```ts import { QdrantVector } from '@mastra/qdrant' const store = new QdrantVector({ id: 'qdrant-vector', url: process.env.QDRANT_URL, apiKey: process.env.QDRANT_API_KEY, }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **Chroma**: ```ts import { ChromaVector } from '@mastra/chroma' // Running Chroma locally // const store = new ChromaVector() // Running on Chroma Cloud const store = new ChromaVector({ id: 'chroma-vector', apiKey: process.env.CHROMA_API_KEY, tenant: process.env.CHROMA_TENANT, database: process.env.CHROMA_DATABASE, }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **Astra**: ```ts import { AstraVector } from '@mastra/astra' const store = new AstraVector({ id: 'astra-vector', token: process.env.ASTRA_DB_TOKEN, endpoint: process.env.ASTRA_DB_ENDPOINT, keyspace: process.env.ASTRA_DB_KEYSPACE, }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **libSQL**: ```ts import { LibSQLVector } from '@mastra/core/vector/libsql' const store = new LibSQLVector({ id: 'libsql-vector', url: process.env.DATABASE_URL, authToken: process.env.DATABASE_AUTH_TOKEN, // Optional: for Turso cloud databases }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **Upstash**: ```ts import { UpstashVector } from '@mastra/upstash' // In upstash they refer to the store as an index const store = new UpstashVector({ id: 'upstash-vector', url: process.env.UPSTASH_URL, token: process.env.UPSTASH_TOKEN, }) // There is no store.createIndex call here, Upstash creates indexes (known as namespaces in Upstash) automatically // when you upsert if that namespace does not exist yet. await store.upsert({ indexName: 'myCollection', // the namespace name in Upstash vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **Cloudflare**: ```ts import { CloudflareVector } from '@mastra/vectorize' const store = new CloudflareVector({ id: 'cloudflare-vector', accountId: process.env.CF_ACCOUNT_ID, apiToken: process.env.CF_API_TOKEN, }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **OpenSearch**: ```ts import { OpenSearchVector } from '@mastra/opensearch' const store = new OpenSearchVector({ id: 'opensearch', node: process.env.OPENSEARCH_URL }) await store.createIndex({ indexName: 'my-collection', dimension: 1536, }) await store.upsert({ indexName: 'my-collection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **Elasticsearch**: ```ts import { ElasticSearchVector } from '@mastra/elasticsearch' const store = new ElasticSearchVector({ id: 'elasticsearch-vector', url: process.env.ELASTICSEARCH_URL, auth: { apiKey: process.env.ELASTICSEARCH_API_KEY, }, }) await store.createIndex({ indexName: 'my-collection', dimension: 1536, }) await store.upsert({ indexName: 'my-collection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` ### Using Elasticsearch For detailed setup instructions and best practices, see the [official Elasticsearch documentation](https://www.elastic.co/docs/solutions/search/get-started). **Couchbase**: ```ts import { CouchbaseVector } from '@mastra/couchbase' const store = new CouchbaseVector({ id: 'couchbase-vector', connectionString: process.env.COUCHBASE_CONNECTION_STRING, username: process.env.COUCHBASE_USERNAME, password: process.env.COUCHBASE_PASSWORD, bucketName: process.env.COUCHBASE_BUCKET, scopeName: process.env.COUCHBASE_SCOPE, collectionName: process.env.COUCHBASE_COLLECTION, }) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` **Lance**: ```ts import { LanceVectorStore } from '@mastra/lance' const store = await LanceVectorStore.create('/path/to/db') await store.createIndex({ tableName: 'myVectors', indexName: 'myCollection', dimension: 1536, }) await store.upsert({ tableName: 'myVectors', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` ### Using LanceDB LanceDB is an embedded vector database built on the Lance columnar format, suitable for local development or cloud deployment. For detailed setup instructions and best practices, see the [official LanceDB documentation](https://lancedb.github.io/lancedb/). **S3 Vectors**: ```ts import { S3Vectors } from '@mastra/s3vectors' const store = new S3Vectors({ id: 's3-vectors', vectorBucketName: 'my-vector-bucket', clientConfig: { region: 'us-east-1', }, nonFilterableMetadataKeys: ['content'], }) await store.createIndex({ indexName: 'my-index', dimension: 1536, }) await store.upsert({ indexName: 'my-index', vectors: embeddings, metadata: chunks.map(chunk => ({ text: chunk.text })), }) ``` ## Using vector storage Once initialized, all vector stores share the same interface for creating indexes, upserting embeddings, and querying. ### Creating Indexes Before storing embeddings, you need to create an index with the appropriate dimension size for your embedding model: ```ts // Create an index with dimension 1536 (for text-embedding-3-small) await store.createIndex({ indexName: 'myCollection', dimension: 1536, }) ``` The dimension size must match the output dimension of your chosen embedding model. Common dimension sizes are: - `OpenAI text-embedding-3-small`: 1536 dimensions (or custom, e.g., 256) - `Cohere embed-multilingual-v3`: 1024 dimensions - `Google gemini-embedding-001`: 768 dimensions (or custom) > **Warning:** Index dimensions can't be changed after creation. To use a different model, delete and recreate the index with the new dimension size. ### Naming Rules for Databases Each vector database enforces specific naming conventions for indexes and collections to ensure compatibility and prevent conflicts. **MongoDB**: Collection (index) names must: - Start with a letter or underscore - Be up to 120 bytes long - Contain only letters, numbers, underscores, or dots - Cannot contain `$` or the null character - Example: `my_collection.123` is valid - Example: `my-index` is not valid (contains hyphen) - Example: `My$Collection` is not valid (contains `$`) **PgVector**: Index names must: - Start with a letter or underscore - Contain only letters, numbers, and underscores - Example: `my_index_123` is valid - Example: `my-index` is not valid (contains hyphen) **Pinecone**: Index names must: - Use only lowercase letters, numbers, and dashes - Not contain dots (used for DNS routing) - Not use non-Latin characters or emojis - Have a combined length (with project ID) under 52 characters - Example: `my-index-123` is valid - Example: `my.index` is not valid (contains dot) **Qdrant**: Collection names must: - Be 1-255 characters long - Not contain any of these special characters: - `< > : " / \ | ? *` - Null character (`\0`) - Unit separator (`\u{1F}`) - Example: `my_collection_123` is valid - Example: `my/collection` is not valid (contains slash) **Chroma**: Collection names must: - Be 3-63 characters long - Start and end with a letter or number - Contain only letters, numbers, underscores, or hyphens - Not contain consecutive periods (..) - Not be a valid IPv4 address - Example: `my-collection-123` is valid - Example: `my..collection` is not valid (consecutive periods) **Astra**: Collection names must: - Not be empty - Be 48 characters or less - Contain only letters, numbers, and underscores - Example: `my_collection_123` is valid - Example: `my-collection` is not valid (contains hyphen) **libSQL**: Index names must: - Start with a letter or underscore - Contain only letters, numbers, and underscores - Example: `my_index_123` is valid - Example: `my-index` is not valid (contains hyphen) **Upstash**: Namespace names must: - Be 2-100 characters long - Contain only: - Alphanumeric characters (a-z, A-Z, 0-9) - Underscores, hyphens, dots - Not start or end with special characters (\_, -, .) - Can be case-sensitive - Example: `MyNamespace123` is valid - Example: `_namespace` is not valid (starts with underscore) **Cloudflare**: Index names must: - Start with a letter - Be shorter than 32 characters - Contain only lowercase ASCII letters, numbers, and dashes - Use dashes instead of spaces - Example: `my-index-123` is valid - Example: `My_Index` is not valid (uppercase and underscore) **OpenSearch**: Index names must: - Use only lowercase letters - Not begin with underscores or hyphens - Not contain spaces, commas - Not contain special characters (e.g. `:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, `<`) - Example: `my-index-123` is valid - Example: `My_Index` is not valid (contains uppercase letters) - Example: `_myindex` is not valid (begins with underscore) **Elasticsearch**: Index names must: - Use only lowercase letters - Not exceed 255 bytes (counting multi-byte characters) - Not begin with underscores, hyphens, or plus signs - Not contain spaces, commas - Not contain special characters (e.g. `:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, `<`) - Not be "." or ".." - Not start with "." (deprecated except for system/hidden indices) - Example: `my-index-123` is valid - Example: `My_Index` is not valid (contains uppercase letters) - Example: `_myindex` is not valid (begins with underscore) - Example: `.myindex` is not valid (begins with dot, deprecated) **S3 Vectors**: Index names must: - Be unique within the same vector bucket - Be 3–63 characters long - Use only lowercase letters (`a–z`), numbers (`0–9`), hyphens (`-`), and dots (`.`) - Begin and end with a letter or number - Example: `my-index.123` is valid - Example: `my_index` is not valid (contains underscore) - Example: `-myindex` is not valid (begins with hyphen) - Example: `myindex-` is not valid (ends with hyphen) - Example: `MyIndex` is not valid (contains uppercase letters) ### Upserting Embeddings After creating an index, you can store embeddings along with their basic metadata: ```ts // Store embeddings with their corresponding metadata await store.upsert({ indexName: 'myCollection', // index name vectors: embeddings, // array of embedding vectors metadata: chunks.map(chunk => ({ text: chunk.text, // The original text content id: chunk.id, // Optional unique identifier })), }) ``` The upsert operation: - Takes an array of embedding vectors and their corresponding metadata - Updates existing vectors if they share the same ID - Creates new vectors if they don't exist - Automatically handles batching for large datasets ## Adding metadata Vector stores support rich metadata (any JSON-serializable fields) for filtering and organization. Since metadata is stored with no fixed schema, use consistent field naming to avoid unexpected query results. > **Warning:** Metadata is crucial for vector storage - without it, you'd only have numerical embeddings with no way to return the original text or filter results. Always store at least the source text as metadata. ```ts // Store embeddings with rich metadata for better organization and filtering await store.upsert({ indexName: 'myCollection', vectors: embeddings, metadata: chunks.map(chunk => ({ // Basic content text: chunk.text, id: chunk.id, // Document organization source: chunk.source, category: chunk.category, // Temporal metadata createdAt: new Date().toISOString(), version: '1.0', // Custom fields language: chunk.language, author: chunk.author, confidenceScore: chunk.score, })), }) ``` Key metadata considerations: - Be strict with field naming - inconsistencies like 'category' vs 'Category' will affect queries - Only include fields you plan to filter or sort by - extra fields add overhead - Add timestamps (e.g., 'createdAt', 'lastUpdated') to track content freshness ## Deleting vectors When building RAG applications, you often need to clean up stale vectors when documents are deleted or updated. Mastra provides the `deleteVectors` method that supports deleting vectors by metadata filters, making it straightforward to remove all embeddings associated with a specific document. ### Delete by Metadata Filter The most common use case is deleting all vectors for a specific document when a user deletes it: ```ts // Delete all vectors for a specific document await store.deleteVectors({ indexName: 'myCollection', filter: { docId: 'document-123' }, }) ``` This is particularly useful when: - A user deletes a document and you need to remove all its chunks - You're re-indexing a document and want to remove old vectors first - You need to clean up vectors for a specific user or tenant ### Delete Multiple Documents You can also use complex filters to delete vectors matching multiple conditions: ```ts // Delete all vectors for multiple documents await store.deleteVectors({ indexName: 'myCollection', filter: { docId: { $in: ['doc-1', 'doc-2', 'doc-3'] }, }, }) // Delete vectors for a specific user's documents await store.deleteVectors({ indexName: 'myCollection', filter: { $and: [{ userId: 'user-123' }, { status: 'archived' }], }, }) ``` ### Delete by Vector IDs If you have specific vector IDs to delete, you can pass them directly: ```ts // Delete specific vectors by their IDs await store.deleteVectors({ indexName: 'myCollection', ids: ['vec-1', 'vec-2', 'vec-3'], }) ``` ## Best practices - Create indexes before bulk insertions - Use batch operations for large insertions (the upsert method handles batching automatically) - Only store metadata you'll query against - Match embedding dimensions to your model (e.g., 1536 for `text-embedding-3-small`)