@flatfile/improv
Version:
A powerful TypeScript library for building AI agents with multi-threaded conversations, tool execution, and event handling capabilities
199 lines (163 loc) • 5.46 kB
Markdown
# Structured Output with the Gemini Driver
This document explains how to use the structured output feature of the Gemini driver to get responses in a specific JSON format.
## Overview
The structured output feature allows you to constrain the Gemini model to respond with JSON that follows a specific schema. This is useful when you need consistent, parseable responses for processing by your application.
## Usage
To use structured output, you need to:
1. Create a GeminiThreadDriver with a responseSchema
2. Create and send a thread to get a structured response
### Response Schema Format
The response schema follows a subset of the OpenAPI 3.0 Schema object format. Here's the basic structure:
```typescript
interface Schema {
type: "string" | "integer" | "number" | "boolean" | "array" | "object";
format?: string;
description?: string;
nullable?: boolean;
enum?: string[];
maxItems?: string;
minItems?: string;
properties?: Record<string, Schema>;
required?: string[];
propertyOrdering?: string[];
items?: Schema;
}
```
Different fields are valid for different types:
- `string` → enum, format
- `integer` → format
- `number` → format
- `boolean`
- `array` → minItems, maxItems, items
- `object` → properties, required, propertyOrdering, nullable
## Examples
### Basic Example: Recipe List
```typescript
import { GeminiThreadDriver, Type } from "./model.drivers/gemini";
import { Thread } from "./thread";
import { Message } from "./message";
// Create a Gemini driver with structured output schema
const driver = new GeminiThreadDriver({
apiKey: process.env.GOOGLE_API_KEY,
model: "gemini-2.5-flash",
temperature: 0.2,
responseSchema: {
type: "array",
items: {
type: "object",
properties: {
recipeName: {
type: "string",
description: "Name of the recipe"
},
ingredients: {
type: "array",
items: {
type: "string"
},
description: "List of ingredients needed"
}
},
required: ["recipeName", "ingredients"],
propertyOrdering: ["recipeName", "ingredients"]
}
}
});
// Create messages
const systemMessage = new Message({
role: "system",
content: "You are a helpful cooking assistant."
});
const userMessage = new Message({
role: "user",
content: "Give me 3 simple cookie recipes."
});
// Create and send thread
const thread = new Thread({
messages: [systemMessage, userMessage],
driver
});
await thread.send();
// Get the structured response
const response = thread.last();
if (response) {
const recipes = JSON.parse(response.content || "[]");
console.log(recipes);
}
```
Example output:
```json
[
{
"recipeName": "Chocolate Chip Cookies",
"ingredients": ["flour", "sugar", "butter", "chocolate chips", "eggs", "vanilla extract", "baking soda", "salt"]
},
{
"recipeName": "Peanut Butter Cookies",
"ingredients": ["flour", "sugar", "butter", "peanut butter", "eggs", "vanilla extract", "baking soda", "salt"]
},
{
"recipeName": "Oatmeal Raisin Cookies",
"ingredients": ["flour", "brown sugar", "butter", "oats", "raisins", "eggs", "cinnamon", "baking soda", "salt"]
}
]
```
### Advanced Example: Enum and Required Fields
```typescript
// Create a driver with enum and required fields
const driver = new GeminiThreadDriver({
apiKey: process.env.GOOGLE_API_KEY,
responseSchema: {
type: "object",
properties: {
analysis: {
type: "string",
description: "Analysis of the sentiment"
},
sentiment: {
type: "string",
enum: ["positive", "negative", "neutral"],
description: "The sentiment of the text"
},
confidence: {
type: "number",
format: "float",
description: "Confidence score between 0 and 1"
}
},
required: ["sentiment", "confidence"],
propertyOrdering: ["sentiment", "confidence", "analysis"]
}
});
const thread = new Thread({
messages: [
new Message({
role: "user",
content: "Analyze the sentiment: 'I really enjoyed the movie, it was fantastic!'"
})
],
driver
});
await thread.send();
const response = thread.last();
if (response) {
const analysis = JSON.parse(response.content || "{}");
console.log(`Sentiment: ${analysis.sentiment}`);
console.log(`Confidence: ${analysis.confidence}`);
if (analysis.analysis) {
console.log(`Analysis: ${analysis.analysis}`);
}
}
```
## Best Practices
1. **Property Ordering**: Always specify `propertyOrdering` when using structured output to ensure consistent JSON structure.
2. **Use Description**: Add clear descriptions to help the model understand what each field should contain.
3. **Required Fields**: Use the `required` array to specify which fields must be present in the response.
4. **Test**: Test your schema with different prompts to ensure the model produces the expected JSON structure.
5. **Error Handling**: Always wrap JSON parsing in try/catch blocks to handle potential parsing errors.
## Limitations
- The structured output feature works best with deterministic prompts.
- Very complex nested schemas might not be followed perfectly in all cases.
- Streaming responses with structured output might not work reliably.
## Reference
For more details, refer to the [Google AI Structured Output documentation](https://ai.google.dev/gemini-api/docs/structured-output).