@bratcliffe909/mcp-server-segmind
Version:
Model Context Protocol server for Segmind API - Generate images and videos using AI models
363 lines (285 loc) • 14.9 kB
Markdown
# Model Parameters Reference
This document provides a comprehensive reference for all parameters available in each Segmind model.
## Table of Contents
- [Text-to-Image Models](#text-to-image-models)
- [SDXL](#sdxl-stable-diffusion-xl)
- [SDXL Lightning](#sdxl-lightning)
- [Fooocus](#fooocus)
- [SSD-1B](#ssd-1b)
- [Image-to-Image Models](#image-to-image-models)
- [SD 1.5 Image-to-Image](#sd-15-image-to-image)
- [Image Enhancement](#image-enhancement)
- [ESRGAN](#esrgan)
- [CodeFormer](#codeformer)
- [Video Generation](#video-generation)
- [Veo-3](#veo-3)
- [Seedance V1 Lite](#seedance-v1-lite)
- [Text-to-Speech](#text-to-speech)
- [Dia TTS](#dia-tts)
- [Orpheus TTS](#orpheus-tts)
- [Music Generation](#music-generation)
- [Lyria 2](#lyria-2)
- [Minimax Music](#minimax-music)
---
## Text-to-Image Models
### SDXL (Stable Diffusion XL)
High-quality image generation with SDXL 1.0.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| prompt | string | required | - | Text description of the image to generate |
| negative_prompt | string | optional | - | What to avoid in the generated image |
| img_width | number | 1024 | 256-2048 (multiples of 8) | Image width in pixels |
| img_height | number | 1024 | 256-2048 (multiples of 8) | Image height in pixels |
| samples | number | 1 | 1-4 | Number of images to generate |
| guidance_scale | number | 7.5 | 1-20 | Controls prompt adherence (higher = stricter) |
| num_inference_steps | number | 25 | 1-100 | Number of denoising steps (more = higher quality, slower) |
| seed | number | optional | any integer | Random seed for reproducible results |
| scheduler | string | DDIM | See below | Sampling algorithm |
**Available Schedulers:**
- DDIM (default) - Fast, good quality
- DPMSolverMultistep - High quality, slower
- PNDM - Good balance
- EulerDiscreteScheduler - Fast convergence
- Others may be available
### SDXL Lightning
Fast high-quality generation - optimized for speed.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| prompt | string | required | - | Text description of the image to generate |
| negative_prompt | string | optional | - | What to avoid in the generated image |
| img_width | number | 512 | 256-2048 (multiples of 8) | Image width in pixels |
| img_height | number | 512 | 256-2048 (multiples of 8) | Image height in pixels |
| samples | number | 1 | 1-4 | Number of images to generate |
| guidance_scale | number | 2 | 1-20 | Prompt strength (lower values work better for Lightning) |
| num_inference_steps | number | 8 | 1-20 | Optimized for 8 steps |
| seed | number | optional | any integer | Random seed for reproducible results |
### Fooocus
Advanced generation with style presets and enhancement options.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| prompt | string | required | - | Text description of the image to generate |
| negative_prompt | string | optional | - | What to avoid in the generated image |
| styles | string | optional | See below | Comma-separated list of style presets |
| aspect_ratio | string | 1024×1024 | See below | Image dimensions preset |
| image_number | number | 1 | 1-4 | Number of images to generate |
| image_seed | number | optional | any integer | Random seed for reproducible results |
| sharpness | number | 2 | 0-30 | Image sharpness enhancement |
| guidance_scale | number | 7 | 1-30 | Prompt adherence strength |
| base_model | string | optional | See below | Base model selection |
| refiner_model | string | optional | - | Refiner model for quality enhancement |
| loras | string | optional | - | LoRA model configurations as JSON string |
**Available Styles:** (Common examples)
- Fooocus V2, Fooocus Enhance, Fooocus Sharp
- SAI Anime, SAI Digital Art, SAI Photographic
- Artstyle Impressionist, Artstyle Watercolor
- Dark Fantasy, Steampunk, Cyberpunk
- Many more available - experiment!
**Aspect Ratios:**
- 704×1408, 704×1344, 768×1344, 768×1280
- 832×1216, 832×1152, 896×1152, 896×1088
- 960×1088, 960×1024, 1024×1024 (default)
- 1024×960, 1088×960, 1088×896, 1152×896
- 1152×832, 1216×832, 1280×768, 1344×768
- 1344×704, 1408×704, 1472×704, 1536×640
- 1600×640, 1664×576, 1728×576
**Base Models:**
- juggernautXL_version6Rundiffusion
- sd_xl_base_1.0_0.9vae
- sd_xl_refiner_1.0_0.9vae
- Others may be available
### SSD-1B
Efficient billion-parameter model for fast generation.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| prompt | string | required | - | Text description of the image to generate |
| negative_prompt | string | optional | - | What to avoid in the generated image |
| img_width | number | 1024 | 256-2048 (multiples of 8) | Image width in pixels |
| img_height | number | 1024 | 256-2048 (multiples of 8) | Image height in pixels |
| samples | number | 1 | 1-4 | Number of images to generate |
| guidance_scale | number | 7.5 | 1-20 | Prompt adherence strength |
| num_inference_steps | number | 25 | 1-100 | Number of denoising steps |
| seed | number | optional | any integer | Random seed for reproducible results |
| scheduler | string | DDIM | Various | Sampling algorithm |
---
## Image-to-Image Models
### SD 1.5 Image-to-Image
Transform existing images using Stable Diffusion 1.5.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| image | string | required | base64/URL | Input image to transform |
| prompt | string | required | - | Description of desired transformation |
| negative_prompt | string | optional | - | What to avoid in the transformation |
| width | number | 512 | 64-1024 (multiples of 8) | Output image width |
| height | number | 512 | 64-1024 (multiples of 8) | Output image height |
| samples | number | 1 | 1-4 | Number of variations to generate |
| guidance_scale | number | 7.5 | 0.1-30 | How closely to follow the prompt |
| num_inference_steps | number | 25 | 10-100 | Number of denoising steps |
| strength | number | 0.75 | 0-1 | Transformation strength (0=no change, 1=complete) |
| seed | number | optional | any integer | Random seed for reproducible results |
| scheduler | string | DDIM | Various | Sampling algorithm |
---
## Image Enhancement
### ESRGAN
AI-powered image upscaling and enhancement.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| image | string | required | base64/URL | Input image to enhance |
| scale | number | 4 | 2, 4 | Upscaling factor |
| face_enhance | boolean | false | true/false | Enhance faces specifically |
### CodeFormer
AI face restoration and enhancement.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| image | string | required | base64/URL | Input image with faces to restore |
| fidelity | number | 0.5 | 0-1 | Balance between quality and identity (0=quality, 1=identity) |
| background_enhance | boolean | false | true/false | Enhance non-face areas |
| face_upsample | boolean | false | true/false | Upscale restored faces |
| upscale | number | 2 | 1-4 | Overall upscaling factor |
---
## Video Generation
### Veo-3
Google's advanced text-to-video with cinematic quality and audio synthesis.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| prompt | string | required | max 2000 chars | Detailed scene description |
| duration | number | 10 | 5-10 | Video length in seconds |
| aspect_ratio | string | 16:9 | 16:9, 9:16, 1:1 | Video dimensions |
| resolution | string | 720p | 480p, 720p, 1080p | Video quality |
| fps | number | 24 | 24, 30 | Frames per second |
| motion | string | auto | auto, slow, medium, fast | Camera/scene motion |
| style | string | optional | - | Visual style modifier |
| seed | number | optional | any integer | Random seed |
**Note:** Veo-3 uses 2.0 credits per generation and includes audio synthesis.
### Seedance V1 Lite
Fast multi-shot video generation with dynamic scenes.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| prompt | string | required | - | Scene descriptions (supports multi-shot with \| separator) |
| duration | number | 5 | 5-10 | Video length in seconds |
| aspect_ratio | string | 16:9 | 16:9, 9:16, 1:1, 4:3 | Video dimensions |
| resolution | string | 720p | 480p, 720p | Video quality |
| seed | number | optional | any integer | Random seed for reproducibility |
---
## Text-to-Speech
### Dia TTS
Ultra-realistic multi-speaker dialogue with emotions and nonverbal cues.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| text | string | required | - | Text with speaker tags [S1], [S2], emotions, and cues |
| speed_factor | number | 0.94 | 0.5-1.5 | Playback speed (0.94=normal, <0.94=slower, >0.94=faster) |
| top_p | number | 0.95 | 0.1-1 | Controls word variety (higher = rarer words) |
| cfg_scale | number | 4 | 1-5 | How strictly to follow text (higher = more accurate) |
| temperature | number | 1.3 | 0.1-2 | Controls randomness (higher = more variety) |
| input_audio | string | optional | base64 | Base64 audio for voice cloning (.wav, .mp3, .flac) |
| max_new_tokens | number | 3072 | 500-4096 | Controls audio length (higher = longer audio) |
| cfg_filter_top_k | number | 35 | 10-100 | Filters audio tokens (higher = more diverse sounds) |
| seed | number | optional | any integer | Random seed for reproducible results |
**Speaker Format:** `[S1] Hello! [S2] Hi there!`
**Controlling Speech Pace and Delivery:**
1. **Natural Pacing with Punctuation:**
- Periods (.) create natural sentence pauses
- Commas (,) add brief pauses
- Ellipsis (...) creates longer dramatic pauses
- Em-dashes (—) create medium pauses with emphasis
2. **Non-verbal Cues (in parentheses):**
- `(pauses)` - Insert explicit pause
- `(sighs)` - Natural sigh with pause
- `(laughs)` - Laughter sound
- `(clears throat)` - Throat clearing
- `(breathes deeply)` - Audible breathing
- `(hesitates)` - Natural hesitation
- `(thinks)` - Thoughtful pause
3. **Speed Control:**
- `speed_factor`: Affects overall playback speed
- 0.5-0.8: Significantly slower speech
- 0.8-0.9: Slightly slower than normal
- 0.94: Normal conversational speed (default)
- 1.0-1.1: Slightly faster than normal
- 1.1-1.5: Significantly faster speech
- Note: speed_factor may also affect pitch/prosody
4. **Example with Pacing Controls:**
```
[S1] Hello everyone. (pauses) Today we'll discuss... (hesitates) something very important.
[S2] Yes, (breathes deeply) this topic requires careful consideration.
[S1] Let me explain — (pauses) — it's not as simple as it seems...
```
5. **Tips for Natural Pacing:**
- Use punctuation naturally as you would in written text
- Add non-verbal cues where natural pauses would occur in speech
- Combine speed_factor with text markup for best results
- Experiment with different combinations for desired effect
### Orpheus TTS
Natural conversational speech with 4 distinct voices.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| text | string | required | - | Input text with optional emotion tags |
| voice | string | dan | tara, dan, josh, emma | Voice character selection |
| top_p | number | 0.95 | 0.1-1 | Nucleus sampling probability |
| temperature | number | 0.6 | 0.1-1.5 | Sampling temperature for variation |
| max_new_tokens | number | 1200 | 100-2000 | Maximum tokens (controls audio length) |
| repetition_penalty | number | 1.1 | 1-2 | Penalty for repeated phrases |
**Available Voices:**
- **tara**: Female voice, clear and professional
- **dan**: Male voice, warm and conversational (default)
- **josh**: Male voice, deeper tone
- **emma**: Female voice, friendly and expressive
**Emotion Tags (in text):**
- `<laugh>` - Natural laughter
- `<sigh>` - Audible sigh
- `<gasp>` - Surprised gasp
- `<pause>` - Brief pause
- `...` - Natural pause (ellipsis)
**Example:**
```
Hello everyone <pause> I'm excited to share this with you <laugh>
```
**Note:** Orpheus does NOT support speed_factor. Use Dia TTS if you need speed control.
---
## Music Generation
### Lyria 2
High-fidelity 48kHz stereo instrumental music generation.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| prompt | string | required | - | Music style and mood description |
| duration | number | 30 | 10-60 | Length in seconds |
| negative_prompt | string | optional | - | Musical elements to avoid |
| seed | number | optional | any integer | Random seed for reproducibility |
**Prompt Examples:**
- "Upbeat electronic dance music with synth leads"
- "Calm piano melody with string accompaniment"
- "Epic orchestral soundtrack with brass section"
### Minimax Music
Generate music with vocals up to 60 seconds.
| Parameter | Type | Default | Range/Options | Description |
|-----------|------|---------|---------------|-------------|
| prompt | string | required | - | Music description including style, mood, instruments |
| duration | number | 30 | 5-60 | Length in seconds |
**Prompt Examples:**
- "Pop song with female vocals about summer love"
- "Rock ballad with male vocals and guitar solo"
- "Electronic track with vocoder vocals"
---
## Common Parameters Across Models
### Seed
- **Purpose**: Ensures reproducible results
- **Usage**: Same seed + same parameters = same output
- **Range**: Any integer value
- **Default**: Random if not specified
### Guidance Scale
- **Purpose**: Controls how closely output follows the prompt
- **Low values (1-5)**: More creative/loose interpretation
- **Medium values (5-10)**: Balanced adherence
- **High values (10-20)**: Strict prompt following
- **Note**: Very high values may reduce quality
### Negative Prompt
- **Purpose**: Specify what to avoid in generation
- **Examples**: "blurry, low quality, distorted, watermark"
- **Best Practice**: Be specific about unwanted elements
### Display Mode (for all image/video tools)
- **display**: Return for immediate viewing
- **save**: Return base64 for saving
- **both**: Return both formats
### Save Location
- **Purpose**: Override default save directory
- **Usage**: Specify full path where files should be saved
- **Example**: "/Users/username/Pictures/AI"