UNPKG

@flatfile/improv

Version:

A powerful TypeScript library for building AI agents with multi-threaded conversations, tool execution, and event handling capabilities

199 lines (163 loc) 5.46 kB
# Structured Output with the Gemini Driver This document explains how to use the structured output feature of the Gemini driver to get responses in a specific JSON format. ## Overview The structured output feature allows you to constrain the Gemini model to respond with JSON that follows a specific schema. This is useful when you need consistent, parseable responses for processing by your application. ## Usage To use structured output, you need to: 1. Create a GeminiThreadDriver with a responseSchema 2. Create and send a thread to get a structured response ### Response Schema Format The response schema follows a subset of the OpenAPI 3.0 Schema object format. Here's the basic structure: ```typescript interface Schema { type: "string" | "integer" | "number" | "boolean" | "array" | "object"; format?: string; description?: string; nullable?: boolean; enum?: string[]; maxItems?: string; minItems?: string; properties?: Record<string, Schema>; required?: string[]; propertyOrdering?: string[]; items?: Schema; } ``` Different fields are valid for different types: - `string` → enum, format - `integer` → format - `number` → format - `boolean` - `array` → minItems, maxItems, items - `object` → properties, required, propertyOrdering, nullable ## Examples ### Basic Example: Recipe List ```typescript import { GeminiThreadDriver, Type } from "./model.drivers/gemini"; import { Thread } from "./thread"; import { Message } from "./message"; // Create a Gemini driver with structured output schema const driver = new GeminiThreadDriver({ apiKey: process.env.GOOGLE_API_KEY, model: "gemini-2.5-flash", temperature: 0.2, responseSchema: { type: "array", items: { type: "object", properties: { recipeName: { type: "string", description: "Name of the recipe" }, ingredients: { type: "array", items: { type: "string" }, description: "List of ingredients needed" } }, required: ["recipeName", "ingredients"], propertyOrdering: ["recipeName", "ingredients"] } } }); // Create messages const systemMessage = new Message({ role: "system", content: "You are a helpful cooking assistant." }); const userMessage = new Message({ role: "user", content: "Give me 3 simple cookie recipes." }); // Create and send thread const thread = new Thread({ messages: [systemMessage, userMessage], driver }); await thread.send(); // Get the structured response const response = thread.last(); if (response) { const recipes = JSON.parse(response.content || "[]"); console.log(recipes); } ``` Example output: ```json [ { "recipeName": "Chocolate Chip Cookies", "ingredients": ["flour", "sugar", "butter", "chocolate chips", "eggs", "vanilla extract", "baking soda", "salt"] }, { "recipeName": "Peanut Butter Cookies", "ingredients": ["flour", "sugar", "butter", "peanut butter", "eggs", "vanilla extract", "baking soda", "salt"] }, { "recipeName": "Oatmeal Raisin Cookies", "ingredients": ["flour", "brown sugar", "butter", "oats", "raisins", "eggs", "cinnamon", "baking soda", "salt"] } ] ``` ### Advanced Example: Enum and Required Fields ```typescript // Create a driver with enum and required fields const driver = new GeminiThreadDriver({ apiKey: process.env.GOOGLE_API_KEY, responseSchema: { type: "object", properties: { analysis: { type: "string", description: "Analysis of the sentiment" }, sentiment: { type: "string", enum: ["positive", "negative", "neutral"], description: "The sentiment of the text" }, confidence: { type: "number", format: "float", description: "Confidence score between 0 and 1" } }, required: ["sentiment", "confidence"], propertyOrdering: ["sentiment", "confidence", "analysis"] } }); const thread = new Thread({ messages: [ new Message({ role: "user", content: "Analyze the sentiment: 'I really enjoyed the movie, it was fantastic!'" }) ], driver }); await thread.send(); const response = thread.last(); if (response) { const analysis = JSON.parse(response.content || "{}"); console.log(`Sentiment: ${analysis.sentiment}`); console.log(`Confidence: ${analysis.confidence}`); if (analysis.analysis) { console.log(`Analysis: ${analysis.analysis}`); } } ``` ## Best Practices 1. **Property Ordering**: Always specify `propertyOrdering` when using structured output to ensure consistent JSON structure. 2. **Use Description**: Add clear descriptions to help the model understand what each field should contain. 3. **Required Fields**: Use the `required` array to specify which fields must be present in the response. 4. **Test**: Test your schema with different prompts to ensure the model produces the expected JSON structure. 5. **Error Handling**: Always wrap JSON parsing in try/catch blocks to handle potential parsing errors. ## Limitations - The structured output feature works best with deterministic prompts. - Very complex nested schemas might not be followed perfectly in all cases. - Streaming responses with structured output might not work reliably. ## Reference For more details, refer to the [Google AI Structured Output documentation](https://ai.google.dev/gemini-api/docs/structured-output).