@bratcliffe909/mcp-server-segmind
Version:
Model Context Protocol server for Segmind API - Generate images and videos using AI models
607 lines (476 loc) • 17.5 kB
Markdown
# Segmind MCP User Guide
This guide will help you use the Segmind MCP server with your AI assistant to generate images, videos, and more.
## Table of Contents
- [Installation](#installation)
- [Configuration](#configuration)
- [Available Models](#available-models)
- [Using the Tools](#using-the-tools)
- [Prompt Examples](#prompt-examples)
- [Tips and Best Practices](#tips-and-best-practices)
## Installation
### Method 1: Zero Install (Recommended)
No installation needed! Just configure your MCP client.
#### For Claude Desktop:
Edit your configuration file:
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
Add the configuration:
```json
{
"mcpServers": {
"segmind": {
"command": "npx",
"args": ["-y", "@bratcliffe909/mcp-server-segmind@latest"],
"env": {
"SEGMIND_API_KEY": "your_api_key_here"
}
}
}
}
```
#### For Claude Code:
Use the command line:
```bash
claude mcp add segmind -e SEGMIND_API_KEY=your_api_key_here -- npx -y @bratcliffe909/mcp-server-segmind@latest
```
For user scope (available across all projects):
```bash
claude mcp add segmind -s user -e SEGMIND_API_KEY=your_api_key_here -- npx -y @bratcliffe909/mcp-server-segmind@latest
```
The package will be automatically downloaded when first used.
### Method 2: Global Installation
For faster startup times, install globally:
```bash
npm install -g @bratcliffe909/mcp-server-segmind
```
#### Claude Desktop configuration:
```json
{
"mcpServers": {
"segmind": {
"command": "mcp-server-segmind",
"env": {
"SEGMIND_API_KEY": "your_api_key_here"
}
}
}
}
```
#### Claude Code command:
```bash
claude mcp add segmind -e SEGMIND_API_KEY=your_api_key_here -- mcp-server-segmind
```
### Getting Your API Key
1. Sign up at [segmind.com](https://segmind.com)
2. Go to your dashboard
3. Copy your API key (starts with `sg_`)
4. Add it to the configuration above
## Available Models
### Text-to-Image Models (4 available)
| Model ID | Model Name | Best For | Speed | Quality |
|----------|------------|----------|--------|---------|
| `sdxl` | Stable Diffusion XL | High-quality general images | Medium | High |
| `sdxl-lightning` | SDXL Lightning | Fast generation | Fast | High |
| `fooocus` | Fooocus | Advanced artistic control | Slow | Very High |
| `ssd-1b` | SSD-1B | Efficient generation | Fast | Good |
### Image-to-Image Models (1 available)
| Model ID | Model Name | Best For |
|----------|------------|----------|
| `sd15-img2img` | SD 1.5 Image-to-Image | Style transfer, image editing |
### Enhancement Models (2 available)
| Model ID | Model Name | Best For |
|----------|------------|----------|
| `esrgan` | ESRGAN | Upscaling images 2x-4x |
| `codeformer` | CodeFormer | Face restoration and enhancement |
### Video Generation Models (2 available)
| Model ID | Model Name | Best For | Duration |
|----------|------------|----------|----------|
| `veo-3` | Google Veo 3 | Cinematic quality videos | Auto |
| `seedance-v1-lite` | Seedance V1 Lite | Quick social media videos | 5-10s |
### Text-to-Speech Models (2 available)
| Model ID | Model Name | Best For | Features |
|----------|------------|----------|----------|
| `dia-tts` | Dia TTS | Multi-speaker dialogues | Emotions, nonverbal cues |
| `orpheus-tts` | Orpheus TTS 3B | Natural conversation | Emotion tags |
### Music Generation Models (2 available)
| Model ID | Model Name | Best For | Duration |
|----------|------------|----------|----------|
| `lyria-2` | Lyria 2 | Instrumental music | Auto |
| `minimax-music` | Minimax Music-01 | Songs with vocals | 10-60s |
## Using the Tools
### Generate Image
Basic usage:
```
"Generate an image of a sunset over mountains"
```
With options:
```
"Create a portrait of a robot using sdxl model with high quality"
"Generate 3 variations of a logo with seed 12345"
"Make a 16:9 widescreen image of a space station"
```
### Transform Image
**Important**: Local images must be converted to base64 first!
For local images (two-step process):
```
Step 1: "Read my image at C:\Users\YourName\Pictures\photo.jpg"
Step 2: "Transform that image into a watercolor painting"
```
Or drag & drop an image into Claude, then:
```
"Transform this image into a watercolor painting"
```
With options:
```
"Apply anime style to this photo with strength 0.5"
"Transform to oil painting style using sd15-img2img"
```
### Enhance Image
**Important**: Local images must be converted to base64 first!
For local images (two-step process):
```
Step 1: "Read image: C:\path\to\old-photo.jpg"
Step 2: "Enhance that photo and upscale it"
```
Or drag & drop an image into Claude, then:
```
"Upscale this image to 4K"
"Enhance this old photo"
```
With options:
```
"Upscale by 4x using esrgan"
"Restore faces in this photo using codeformer"
```
### Generate Video
Basic usage:
```
"Create a 5-second video of ocean waves"
"Generate a video of a butterfly emerging from cocoon"
```
With options:
```
"Create a cinematic video of a futuristic city using veo-3"
"Generate a 10-second video in 9:16 format using seedance-v1-lite"
"Make a time-lapse video of flowers blooming"
```
### Generate Speech (Text-to-Speech)
Basic usage:
```
"Convert this text to speech: Hello, welcome to our service"
"Create an audiobook narration of this paragraph"
```
With dialogue and emotions:
```
"Generate dialogue: [S1] Hello! <laugh> [S2] Hi there! How are you?"
"Create speech with emotions: I'm so excited! <gasp> This is amazing!"
"Use orpheus-tts with voice dan for natural conversation"
```
For longer audio (beyond default ~14 seconds):
```
"Generate speech with max_new_tokens 2000: [long text here]"
"Use dia-tts with max_new_tokens 4096 for maximum length narration"
"Create audiobook chapter with orpheus-tts using max_new_tokens 2000"
```
For custom speech characteristics:
```
"Generate slow narration with speed_factor 0.5 and temperature 0.5"
"Create expressive dialogue with temperature 1.5 and top_p 0.8"
"Use orpheus-tts with voice emma and repetition_penalty 1.5"
"Generate consistent speech with top_p 0.3 and cfg_scale 4"
```
Advanced TTS examples:
```
"Use dia-tts with speed_factor 0.7 for slower, clearer pronunciation"
"Create dramatic reading with temperature 1.8 and cfg_scale 3"
"Generate speech with voice cloning using input_audio [base64 audio]"
"Multi-speaker dialogue: [S1] Hello! [S2] Hi there! with cfg_scale 5"
```
### Generate Music
Basic usage:
```
"Create relaxing piano music"
"Generate upbeat electronic music"
```
With specific requirements:
```
"Create 30 seconds of jazz music using lyria-2"
"Generate a pop song with vocals about summer using minimax-music"
"Make instrumental background music for a video, peaceful and ambient"
```
## Prompt Examples
### High-Quality Portraits
```
"Generate a professional headshot of a businesswoman, studio lighting, confident smile, using sdxl model with high quality"
```
### Artistic Landscapes
```
"Create a dreamy landscape with floating islands, sunset colors, fantasy art style, using fooocus model"
```
### Quick Concepts
```
"Generate a simple icon of a coffee cup using sdxl-lightning for fast results"
```
### Logo Design
```
"Create a modern minimalist logo for a tech startup, geometric shapes, blue and white colors, generate 4 variations"
```
### Saving Images Directly
```
# For Claude Code - save image directly
"Generate a product photo of a sleek laptop, display_mode='save'"
# For viewing and saving
"Create an avatar portrait, display_mode='both'"
```
### Photo Enhancement
```
"Upscale this product photo by 4x and enhance details using esrgan"
```
### Style Transfer
```
"Transform this photo into a Van Gogh style painting with medium strength using sd15-img2img"
```
### Video Creation
```
"Create a cinematic video of a space station orbiting Earth, dramatic lighting, 4K quality using veo-3"
```
### Multi-Speaker Dialogue
```
"Generate conversation: [S1] Welcome to our podcast! (laughs) [S2] Thanks for having me! [S1] Let's dive right in..."
```
### Controlling Speech Pace
```
"Generate slow deliberate speech: Ladies and gentlemen... (pauses) What I'm about to tell you... (breathes deeply) will change everything."
"Create dramatic narration with speed_factor 0.7: Once upon a time, in a land far, far away..."
"Generate quick announcement with speed_factor 1.2: Attention passengers, the train is now departing!"
```
### Background Music
```
"Create 60 seconds of uplifting corporate background music with piano and strings using lyria-2"
```
## Tips and Best Practices
### 1. Model Selection
**Images**:
- **For speed**: Use `sdxl-lightning` or `ssd-1b`
- **For quality**: Use `sdxl` or `fooocus`
- **For faces**: Use `codeformer` for enhancement
- **For upscaling**: Use `esrgan`
**Videos**:
- **For quality**: Use `veo-3` (includes audio)
- **For speed**: Use `seedance-v1-lite`
**Audio**:
- **For dialogues**: Use `dia-tts` with speaker tags
- **For natural speech**: Use `orpheus-tts`
- **For speed control**: Use `dia-tts` (orpheus-tts does NOT support speed_factor)
- **For voice cloning**: Use `dia-tts` with input_audio
- **For music**: Use `lyria-2` (instrumental) or `minimax-music` (with vocals)
**Controlling TTS Parameters**:
**Audio Length**:
- TTS models use token limits to control audio length
- `orpheus-tts`: Default 1200 tokens (~14 seconds), max 2000 tokens (~23 seconds)
- `dia-tts`: Default 3072 tokens (~35 seconds), max 4096 tokens (~47 seconds)
- To generate longer audio, specify `max_new_tokens`: "Generate speech with max_new_tokens 2000"
**Speech Speed and Pacing**:
**Using speed_factor (dia-tts only)**:
- Control overall playback speed with `speed_factor` (0.5-1.5)
- Default: 0.94 (normal conversational speed)
- **Recommended values**:
- 0.5-0.7 = Very slow (for emphasis or clarity)
- 0.8-0.9 = Slightly slower than normal
- 0.94 = Normal speed (default)
- 1.0-1.1 = Slightly faster than normal
- 1.2-1.5 = Fast speech
- **Note**: speed_factor may also affect pitch/prosody, not just tempo
**Using Text Markup for Natural Pacing**:
- **Punctuation**:
- Period (.) = Natural sentence pause
- Comma (,) = Brief pause
- Ellipsis (...) = Longer dramatic pause
- Em-dash (—) = Medium pause with emphasis
- **Non-verbal cues (in parentheses)**:
- `(pauses)` = Explicit pause
- `(sighs)` = Natural sigh with pause
- `(hesitates)` = Natural hesitation
- `(breathes deeply)` = Audible breathing
- `(thinks)` = Thoughtful pause
- **Examples**:
```
"Generate speech: Hello everyone. (pauses) Today's topic is... (hesitates) quite sensitive."
"Create narration: The door creaked open — (pauses) — revealing nothing but darkness..."
"Generate with speed_factor 0.8: Please listen carefully to these safety instructions."
```
**Voice Quality Parameters**:
- `temperature`: Controls expressiveness (0.1-2.0)
- 0.1-0.5: Stable, consistent speech
- 0.6-1.0: Natural conversational tone
- 1.1-2.0: Expressive, dramatic voices
- `top_p`: Controls word variety (0.1-1.0)
- Lower values: More predictable speech
- Higher values: More varied vocabulary
**Model-Specific Features**:
- **Orpheus TTS**:
- Voices: tara, dan (default), josh, emma
- `repetition_penalty` (1.0-2.0): Prevents repeated phrases
- **Does NOT support**: speed_factor, cfg_scale, cfg_filter_top_k, input_audio
- **Dia TTS**:
- `speed_factor` (0.5-1.5): Speed control (ONLY available in dia-tts!)
- `cfg_scale` (1-5): How strictly to follow text
- `cfg_filter_top_k` (10-100): Token diversity
- `input_audio`: Voice cloning from audio file
- Multi-speaker support with [S1], [S2] tags
- **Note**: Dia-tts is automatically selected when using speed_factor
### 2. Prompt Writing
**Be Specific**: Include details about style, lighting, colors, and composition
```
Good: "A cozy coffee shop interior, warm lighting, wooden furniture, plants, morning sunlight through windows"
Bad: "A coffee shop"
```
**Include Quality Terms**: Add terms that improve output quality
```
"photorealistic", "high quality", "professional", "detailed", "sharp focus"
```
**Use Negative Prompts**: Specify what to avoid
```
"Generate a portrait... avoid: blurry, low quality, distorted features"
```
### 3. Image Dimensions
- Default: 1024x1024 (square)
- Portrait: 768x1024
- Landscape: 1024x768
- Widescreen: 1920x1080 (16:9)
- Social Media: 1080x1080 (Instagram), 1200x630 (Facebook)
### 4. Seed Usage
Use seeds for reproducible results:
```
"Generate a logo with seed 42"
"Create another variation with the same seed"
```
### 5. Managing Credits
- Text-to-image: ~0.2-0.4 credits per image
- Enhancement: ~0.2 credits per operation
- Video generation: ~0.45-2.0 credits per video
- Text-to-speech: ~0.1-0.15 credits per generation
- Music generation: ~0.5-0.8 credits per track
- Generate fewer outputs to save credits
- Use lower resolution/duration for drafts
### 6. File Output (Claude Desktop)
Since Claude Desktop cannot display images from MCP servers, all generated images are automatically saved to your local filesystem.
**Default Behavior:**
```
"Generate an image of a sunset"
// Result: Image saved to: /tmp/sdxl-1705783456789.png
```
**Setting a Default Save Location:**
Add this to your MCP config:
```json
{
"env": {
"FILE_OUTPUT_LOCATION": "/Users/me/Pictures/AI"
}
}
```
**Override Save Location Per Request:**
```
"Generate a logo and save to /Users/me/Desktop"
// Result: Image saved to: /Users/me/Desktop/sdxl-1705783456790.png
"Create an avatar, save to ~/Downloads"
// Result: Image saved to: /home/user/Downloads/sdxl-1705783456791.png
```
The filename format is simple: `{model}-{timestamp}.{extension}`
### 7. Cost Estimation
Use the `estimate_cost` tool to check credit usage before generating. The system now tracks actual costs from your usage and provides more accurate estimates over time.
```
"Estimate the cost of generating 5 images with sdxl"
"Show me the cost for all text-to-image models"
"What would it cost to generate a 30-second video?"
"List all model costs"
```
**Dynamic Cost Learning:**
- Costs marked with * are based on your actual usage history
- Estimates improve as you use the models more
- Shows average, min, and max costs when available
- Falls back to estimates for unused models
### 8. Common Issues
**"Model not found"**: Check the model ID spelling
**"Invalid dimensions"**: Use multiples of 8 (256, 512, 768, 1024, etc.)
**"API key error"**: Verify your API key in the config
**"Rate limit"**: Wait a moment between requests
**"Images not displaying"**: Use display_mode options or file attachments
## Advanced Usage
### Batch Generation
```
"Generate 4 different logo concepts for a fitness brand, each with unique style"
```
### Precise Control
```
"Create an image with exact dimensions 512x768, guidance scale 7.5, 30 inference steps"
```
### Style Mixing
```
"Transform this photo: 30% oil painting, keep original colors, subtle brush strokes"
```
### Professional Workflows
**Image Production**:
1. Generate rough concepts with `sdxl-lightning`
2. Refine the best one with `fooocus`
3. Upscale final result with `esrgan`
4. Enhance faces if needed with `codeformer`
**Video Production**:
1. Create storyboard images with `sdxl`
2. Generate video clips with `seedance-v1-lite`
3. Create final cinematic version with `veo-3`
**Audio Production**:
1. Generate dialogue with `dia-tts` for multi-speakers
2. Create background music with `lyria-2`
3. Add vocal tracks with `minimax-music`
### Creative Combinations
**Animated Storytelling**:
```
"First, generate a character portrait with sdxl"
"Then, create a video of the character with veo-3"
"Finally, add narration with orpheus-tts"
```
**Music Video Creation**:
```
"Generate visuals with seedance-v1-lite"
"Create matching music with minimax-music"
```
## Troubleshooting
### Saving Generated Images
The Segmind MCP server now supports a `display_mode` parameter for all image generation tools:
**Display Modes**:
- `display` (default): Shows the image in your interface
- `save`: Returns base64 data as text for easy saving
- `both`: Shows the image AND provides base64 data
**Examples**:
```
# View the image (default)
"Generate a sunset landscape"
# Get base64 data for saving
"Generate a sunset landscape with display_mode='save'"
# Both view and save
"Generate a sunset landscape with display_mode='both'"
```
**For Claude Desktop users**: Use `display` mode to see images
**For Claude Code users**: Use `save` mode to get base64 data for direct file operations
**When using save mode**:
- The base64 data is returned between `BASE64_IMAGE_START` and `BASE64_IMAGE_END` markers
- Extract this data and decode it to save as an image file
- The response includes MIME type and suggested file extension
### API Key Issues
- Ensure key starts with `sg_`
- Check for extra spaces
- Verify key is active on Segmind dashboard
### Generation Failures
- Try simpler prompts first
- Use default settings
- Check your credit balance
### Quality Issues
- Add quality terms to prompt
- Use negative prompts
- Try different models
- Adjust guidance scale
## Getting Help
- Check the [README](../README.md) for basic setup
- Visit [Segmind Docs](https://docs.segmind.com) for API details
- Report issues on [GitHub](https://github.com/yourusername/segmind-mcp/issues)