@lobehub/chat
Version:
Lobe Chat - an open-source, high-performance chatbot framework that supports speech synthesis, multimodal, and extensible Function Call plugin system. Supports one-click free deployment of your private ChatGPT/LLM web application.
163 lines (128 loc) • 5.64 kB
text/mdx
# Adding New Image Models
> Learn more about the AI image generation modal design in the [AI Image Generation Modal Design Discussion](https://github.com/lobehub/lobe-chat/discussions/7442)
## Parameter Standardization
All image generation models must use the standard parameters defined in `src/libs/standard-parameters/index.ts`. This ensures parameter consistency across different Providers, creating a more unified user experience.
**Supported Standard Parameters**:
- `prompt` (required): Text prompt for image generation
- `aspectRatio`: Aspect ratio (e.g., "16:9", "1:1")
- `width` / `height`: Image dimensions
- `size`: Preset dimensions (e.g., "1024x1024")
- `seed`: Random seed
- `steps`: Generation steps
- `cfg`: Guidance scale
- For other parameters, please check the source file
## OpenAI Compatible Models
These models can be requested using the OpenAI SDK, with request parameters and return values consistent with DALL-E and GPT-Image-X series.
Taking Zhipu's CogView-4 as an example, which is an OpenAI-compatible model, you can add it by adding the model configuration in the corresponding AI models file `src/config/aiModels/zhipu.ts`:
```ts
const zhipuImageModels: AIImageModelCard[] = [
// Add model configuration
// https://bigmodel.cn/dev/howuse/image-generation-model/cogview-4
{
description:
'CogView-4 is the first open-source text-to-image model from Zhipu that supports Chinese character generation, with comprehensive improvements in semantic understanding, image generation quality, and Chinese-English text generation capabilities.',
displayName: 'CogView-4',
enabled: true,
id: 'cogview-4',
parameters: {
prompt: {
default: '',
},
size: {
default: '1024x1024',
enum: ['1024x1024', '768x1344', '864x1152', '1344x768', '1152x864', '1440x720', '720x1440'],
},
},
releasedAt: '2025-03-04',
type: 'image',
},
];
```
## Non-OpenAI Compatible Models
For image generation models that are not compatible with OpenAI format, you need to implement a custom `createImage` method. There are two main implementation approaches:
### Method 1: Using OpenAI Compatible Factory
Most Providers use `openaiCompatibleFactory` for OpenAI compatibility. You can pass in a custom `createImage` function (reference [PR #8534](https://github.com/lobehub/lobe-chat/pull/8534)).
**Implementation Steps**:
1. **Read Provider documentation and standard parameter definitions**
- Review the Provider's image generation API documentation to understand request and response formats
- Read `src/libs/standard-parameters/index.ts` to understand supported parameters
- Add image model configuration in the corresponding AI models file
2. **Implement custom createImage method**
- Create a standalone image generation function that accepts standard parameters
- Convert standard parameters to Provider-specific format
- Call the Provider's image generation API
- Return a unified response format (imageUrl and optional width/height)
3. **Add tests**
- Write unit tests covering success scenarios
- Test various error cases and edge conditions
**Code Example**:
```ts
// src/libs/model-runtime/provider-name/createImage.ts
export const createProviderImage = async (
payload: ImageGenerationPayload,
options: any,
): Promise<ImageGenerationResponse> => {
const { model, prompt, ...params } = payload;
// Call Provider's native API
const result = await callProviderAPI({
model,
prompt,
// Convert parameter format
custom_param: params.width,
// ...
});
// Return unified format
return {
created: Date.now(),
data: [{ url: result.imageUrl }],
};
};
```
```ts
// src/libs/model-runtime/provider-name/index.ts
export const LobeProviderAI = openaiCompatibleFactory({
constructorOptions: {
// ... other configurations
},
createImage: createProviderImage, // Pass custom implementation
provider: ModelProvider.ProviderName,
});
```
### Method 2: Direct Implementation in Provider Class
If your Provider has an independent class implementation, you can directly add the `createImage` method in the class (reference [PR #8503](https://github.com/lobehub/lobe-chat/pull/8503)).
**Implementation Steps**:
1. **Read Provider documentation and standard parameter definitions**
- Review the Provider's image generation API documentation
- Read `src/libs/standard-parameters/index.ts`
- Add image model configuration in the corresponding AI models file
2. **Implement createImage method in Provider class**
- Add the `createImage` method directly in the class
- Handle parameter conversion and API calls
- Return a unified response format
3. **Add tests**
- Write comprehensive test cases for the new method
**Code Example**:
```ts
// src/libs/model-runtime/provider-name/index.ts
export class LobeProviderAI {
async createImage(
payload: ImageGenerationPayload,
options?: ChatStreamCallbacks,
): Promise<ImageGenerationResponse> {
const { model, prompt, ...params } = payload;
// Call native API and handle response
const result = await this.client.generateImage({
model,
prompt,
// Parameter conversion
});
return {
created: Date.now(),
data: [{ url: result.url }],
};
}
}
```
### Important Notes
- **Testing Requirements**: Add comprehensive unit tests for custom implementations, ensuring coverage of success scenarios and various error cases
- **Error Handling**: Use `AgentRuntimeError` consistently for error wrapping to maintain error message consistency