n8n-nodes-qwen-embedding
Version:
n8n nodes for Ollama embeddings - Dynamic model loading + Support for Qwen, EmbeddingGemma, Nomic and more. Generate text embeddings for vector stores and AI applications.
440 lines (323 loc) ⢠13.6 kB
Markdown
# n8n-nodes-ollama-embeddings
[](https://www.npmjs.com/package/n8n-nodes-qwen-embedding)
[](https://github.com/theseedship/deposium_n8n_embeddings_integration/blob/master/LICENSE)
[](https://n8n.io/integrations)
n8n community nodes for generating text embeddings via Ollama with your n8n workflows. Supports multiple embedding models including **Qwen**, **EmbeddingGemma**, **Nomic** and more for vector stores, similarity search, and AI applications.
## š Features
- **Two Specialized Nodes**:
- **Ollama Embeddings**: For vector store integration (Supabase, Qdrant, PGVector, etc.)
- **Ollama Embeddings Tool**: For direct embedding generation in workflows
- **Multi-Model Support**: Works with Qwen, EmbeddingGemma, Nomic, Snowflake and more
- **LangChain Compatible**: Seamlessly integrates with n8n's AI ecosystem
- **Flexible Dimensions**: Auto-detection and validation based on model capabilities
- **Instruction-Aware**: Optimized embeddings for queries vs documents
- **Batch Processing**: Efficient bulk embedding generation
- **Ollama Integration**: Direct connection to Ollama for embedding generation
- **No Middleware Required**: Works directly with Ollama's API endpoints
- **Compact Format Option**: Single-line embeddings for easy copy/paste
## š¦ Installation
### In n8n
1. Go to **Settings** > **Community Nodes**
2. Search for `n8n-nodes-qwen-embedding`
3. Click **Install**
### Manual Installation
```bash
npm install n8n-nodes-qwen-embedding
```
## š Prerequisites
You need to have Ollama installed and running with a Qwen model:
1. **Install Ollama**: Visit [https://ollama.com](https://ollama.com) for installation instructions
2. **Pull an embedding model**:
```bash
# Qwen3-Embedding models (specialized for embeddings):
ollama pull qwen3-embedding:0.6b # 1024 dimensions, 32K context
# EmbeddingGemma models (Google's lightweight embeddings):
ollama pull embeddinggemma:300m # 768 dimensions, 2K context
ollama pull embeddinggemma:300m-bf16 # Higher precision variant
# Nomic Embed models (performant embeddings):
ollama pull nomic-embed-text # 768 dimensions, 8K context
# Snowflake Arctic Embed:
ollama pull snowflake-arctic-embed:110m # 1024 dimensions
```
3. **Verify Ollama is running**:
```bash
ollama list # Should show your pulled models
```
Ollama will be available at `http://localhost:11434` by default
## š Supported Models Comparison
| Model | Dimensions | Context | Size | Best For |
|-------|------------|---------|------|----------|
| **qwen3-embedding:0.6b** | 32-1024 (flexible) | 32K tokens | ~639MB | General purpose, multilingual |
| **embeddinggemma:300m** | 128-768 (MRL) | 2K tokens | ~338MB | Lightweight, fast inference |
| **nomic-embed-text** | 768 (fixed) | 8K tokens | ~274MB | Balanced performance |
| **snowflake-arctic-embed** | 1024 (fixed) | 512 tokens | ~332MB | Short text, high precision |
**Choosing a Model:**
- **Qwen**: Best overall flexibility with largest context window (32K)
- **EmbeddingGemma**: Best for resource-constrained environments
- **Nomic**: Good balance of performance and context size
- **Snowflake**: Optimized for short text with high dimensional output
## š§ Setup
### ā ļø CRITICAL: Ollama URL Configuration
**ALWAYS remove trailing slashes from your Ollama URL!**
```
ā
CORRECT: http://localhost:11434
ā WRONG: http://localhost:11434/
```
**Why this matters:** A trailing slash creates a double-slash in the API path (`http://host:11434//api/embed`), which causes HTTP parsers to silently transform POST requests to GET requests, resulting in HTTP 405 "Method Not Allowed" errors.
This is the #1 cause of 405 errors with this node. Always verify your Ollama URL format first.
### 1. Configure Credentials (Optional)
**For self-hosted Ollama without authentication:** You can skip credential configuration. The node will connect directly to your Ollama instance.
**For authenticated Ollama instances:**
1. In n8n, go to **Credentials** > **New**
2. Select **Qwen Embedding API (Ollama)**
3. Enter:
- **Ollama URL**: `http://localhost:11434` (**NO trailing slash!**)
- **Model Name**: `qwen3-embedding:0.6b` (or your chosen model)
- **API Key**: Your authentication token (if required)
4. **IMPORTANT:** Verify your URL has NO trailing slash before saving
5. Click **Test Connection** to verify
### 2. Using the Nodes
#### Qwen Embedding (Vector Store Integration)
Connect to any vector store node:
```
[Vector Store] ā [Qwen Embedding]
(Stores) (Provides embeddings)
```
**Use Cases:**
- RAG applications
- Semantic search
- Document indexing
- Knowledge bases
#### Qwen Embedding Tool (Direct Usage)
Use in any workflow for direct embedding generation:
```
[Trigger] ā [Qwen Embedding Tool] ā [Process/Store]
```
**Use Cases:**
- Similarity calculations
- Text clustering
- Anomaly detection
- Content deduplication
## āļø Configuration Options
### Performance Mode
Controls timeout and retry behavior based on your hardware:
- **Auto-Detect** (default): Automatically detects GPU/CPU on first request
- GPU detected (<1s response): 10s timeout, 2 retries
- CPU detected (>5s response): 60s timeout, 3 retries
- Works great for dynamic environments
- **GPU Optimized**: Manual setting for GPU hardware
- 10 second timeout
- 2 retry attempts
- Best for NVIDIA GPU setups
- **CPU Optimized**: Manual setting for CPU hardware
- 60 second timeout
- 3 retry attempts
- Prevents timeout errors on CPU-only systems
- **Custom**: User-defined timeout and retry settings
- Set your own timeout (in milliseconds)
- Configure max retry attempts (0-5)
**How Auto-Detection Works:**
```
First request measures actual response time:
- Response < 1s ā GPU detected ā timeout = 10s
- Response > 5s ā CPU detected ā timeout = 60s
- 1s ⤠response ⤠5s ā keep default 30s
```
### Dimensions
Adjust embedding vector size based on model capabilities:
- **Auto-detect (0)**: Uses optimal dimensions for the selected model
- **Manual adjustment**: Set specific dimensions (validated per model)
- Qwen: 32-1024 (flexible via MRL)
- EmbeddingGemma: 128-768 (flexible via MRL)
- Nomic: 768 (fixed)
- Snowflake: 1024 (fixed)
**Implementation:** Models with MRL support (Qwen, EmbeddingGemma) can adjust dimensions without retraining. Fixed-dimension models will show a warning if you try to adjust.
### Instruction Type
Optimize embeddings for specific use cases:
- **None** (default): Standard embeddings without special instructions
- **Query**: Optimized for search queries
- Prefix: `"Instruct: Retrieve semantically similar text.\nQuery: "`
- Use for: User questions, search inputs
- **Document**: Optimized for document storage
- Prefix: `"Instruct: Represent this document for retrieval.\nDocument: "`
- Use for: Indexing documents, knowledge base entries
**Performance Impact:** 1-5% better semantic matching when query/document types match their use case.
### Context Prefix
Add custom context to all texts before embedding:
**Example:**
```
Context Prefix: "Medical context:"
Input text: "patient symptoms include fever"
Embedded as: "Medical context: patient symptoms include fever"
```
**Use Cases:**
- Domain-specific context (legal, medical, technical)
- Language hints
- Task-specific framing
### Return Format (Tool Only)
Controls output structure:
- **Full** (default):
```json
{
"embedding": [0.123, 0.456, ...],
"dimensions": 1024,
"text": "original text",
"model": "qwen3-embedding:0.6b",
"metadata": { ... } // if includeMetadata enabled
}
```
- **Simplified**:
```json
{
"text": "original text",
"vector": [0.123, 0.456, ...],
"dimensions": 1024
}
```
- **Embedding Only**:
```json
{
"embedding": [0.123, 0.456, ...]
}
```
### Include Metadata (Tool Only)
When enabled, adds processing metadata to output:
```json
{
"metadata": {
"prefix": "Medical context:", // Context prefix used
"instruction": "query", // Instruction type applied
"timestamp": "2025-10-07T19:44:23Z", // Processing time
"batchSize": 5 // Number of texts (batch mode only)
}
}
```
**Use Cases:**
- Debugging embedding configurations
- Tracking when embeddings were generated
- Auditing processing parameters
### Custom Timeout & Max Retries (Custom Mode Only)
Fine-tune request behavior:
- **Custom Timeout**: Milliseconds to wait before timeout (default: 30000)
- **Max Retries**: Number of retry attempts on failure (0-5, default: 2)
**Retry Logic:**
- Exponential backoff: 1s ā 2s ā 4s ā 5s (capped)
- Clear console logging for debugging
- Automatic retry on transient failures
### Operation (Tool Only)
- **Generate Embedding**: Single text embedding
- **Generate Batch Embeddings**: Multiple texts in one request
## š Examples
### Example 1: Vector Store with RAG
```
1. Add "Supabase Vector Store" node
2. Add "Qwen Embedding" node
3. Connect Qwen to Vector Store's embedding input
4. Configure your collection and start indexing
```
### Example 2: Semantic Search
```
1. Add "Manual Trigger" node
2. Add "Qwen Embedding Tool" node (set to "Generate Embedding")
3. Add another "Qwen Embedding Tool" for documents
4. Add "Code" node to calculate similarities
```
### Example 3: Batch Processing
```javascript
// Input: Array of texts
{
"texts": [
"First document",
"Second document",
"Third document"
]
}
// Qwen Embedding Tool (Batch mode) output:
{
"embeddings": [[...], [...], [...]],
"count": 3,
"dimensions": 1024
}
```
## š¬ Technical Details
### Model Information
**Qwen Models:**
- `qwen3-embedding:0.6b` - 1024d, 32K context, multilingual (100+ languages)
- MRL support for flexible dimensions (32-1024)
**EmbeddingGemma Models:**
- `embeddinggemma:300m` - 768d, 2K context, lightweight
- MRL support for flexible dimensions (128-768)
- Variants: bf16, q8, q4 for different precision/size tradeoffs
**Nomic Models:**
- `nomic-embed-text` - 768d, 8K context, balanced performance
- Fixed dimensions (no MRL)
**Snowflake Models:**
- `snowflake-arctic-embed` - 1024d, 512 context, high precision
- Fixed dimensions (no MRL)
### Performance Characteristics
- **GPU Mode**: ~200-270ms per embedding (NVIDIA GPU)
- **CPU Mode**: 5-10 seconds per embedding
- **Auto-Detection**: Automatically adjusts timeouts based on hardware
- **Query Optimization**: 1-5% performance boost with instruction type
- **Batch Processing**: Processes texts sequentially (Ollama API limitation)
- **Timeout**: Adaptive (10s GPU, 60s CPU, or 30s default)
### API Compatibility
- **Ollama API**: `/api/embed` endpoint (POST method)
- **Request Format**: `{model: string, input: string}`
- **Response Format**: `{embeddings: number[][]}`
- **Authentication**: Optional (credentials not required for self-hosted)
## š¤ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## š License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## š Acknowledgments
- [Qwen Team](https://github.com/QwenLM) for the amazing embedding models
- [n8n Community](https://community.n8n.io/) for support and feedback
- [LangChain](https://github.com/langchain-ai/langchainjs) for the embedding interface
## š Troubleshooting
### HTTP 405 "Method Not Allowed" Errors
**Most Common Cause (90% of cases):** Trailing slash in Ollama URL configuration.
```bash
# Check your credential configuration:
ā
Correct: http://localhost:11434
ā Wrong: http://localhost:11434/
```
**Quick Fix:**
1. Go to N8N Credentials
2. Edit your Ollama credential
3. Remove the trailing slash from the URL
4. Save and test again
For other 405 error causes, see the comprehensive [HTTP 405 Troubleshooting Guide](docs/HTTP_405_TROUBLESHOOTING.md) which covers:
- URL formatting issues (trailing slashes)
- N8N authentication system issues
- Working patterns vs anti-patterns
- Step-by-step verification
### Model Not Found Errors
```bash
# Pull the correct model first:
ollama pull qwen3-embedding:0.6b
# Verify it's available:
ollama list
```
### Performance Issues
- **CPU mode timing out:** Use Performance Mode "CPU Optimized" or "Auto-Detect"
- **GPU not detected:** Ensure NVIDIA drivers and Docker GPU runtime are configured
- **Slow responses:** Check Ollama is running with `ollama list`
## š Links
- [NPM Package](https://www.npmjs.com/package/n8n-nodes-qwen-embedding)
- [GitHub Repository](https://github.com/theseedship/deposium_n8n_embeddings_integration)
- [Troubleshooting Guide](docs/HTTP_405_TROUBLESHOOTING.md)
- [n8n Community Nodes](https://n8n.io/integrations)
- [Qwen3-Embedding Paper](https://arxiv.org/abs/2411.00156)
- [Ollama Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings)
## š® Support
For issues and questions:
- [GitHub Issues](https://github.com/theseedship/deposium_n8n_embeddings_integration/issues)
- [n8n Community Forum](https://community.n8n.io/)
---
Made with ā¤ļø for the n8n community