@flatfile/improv

# Controlling Reasoning in AI Models This guide explains how to enable, disable, and control reasoning behavior across different AI model providers in Improv. ## Overview Reasoning models use "thinking" processes to provide more thoughtful, step-by-step responses. Different providers offer varying levels of control over this behavior. --- ## OpenAI Models ### o1-preview and o1-mini (Reasoning Models) **Current Status**: Not yet supported in OpenAIModel type definition in Improv **Key Characteristics**: - Reasoning **cannot be disabled** - it's built into the model architecture - Reasoning tokens are generated internally but not visible in responses - Reasoning tokens are billable as output tokens (4x more expensive than input tokens) **API Parameters**: ```typescript // When o1 models are added to Improv: const driver = new OpenAIThreadDriver({ model: "o1-preview", // or "o1-mini" temperature: 1, // Fixed at 1 for o1 models maxCompletionTokens: 4000, // Controls total output including reasoning // Note: system messages not supported // Note: streaming not supported in beta }); ``` **Reasoning Control**: - ❌ **Cannot disable reasoning** - it's fundamental to how o1 models work - ✅ **Control reasoning depth** via `reasoning_effort` parameter (for full o1 model, not preview) - Values: `"low"`, `"medium"`, `"high"` - ✅ **Control output length** via `maxCompletionTokens` **Best Practices**: - Keep prompts simple and direct - Avoid "think step by step" instructions (model does this automatically) - Set appropriate `maxCompletionTokens` to manage costs - Don't attempt to extract internal reasoning (violates usage policies) ### GPT-4o and GPT-4o-mini (Regular Models) **Reasoning Control**: - ✅ **Enable via system prompts**: Ask model to use `<think>...</think>` tags - ✅ **Disable**: Don't mention thinking in prompts ```typescript // Enable reasoning via system prompt const driver = new OpenAIThreadDriver({ model: "gpt-4o", // System prompt requests thinking tags }); const solo = new Solo({ driver, systemPrompt: "Use <think>...</think> tags to show your reasoning process before answering." }); ``` --- ## Anthropic Claude ### Claude 3.5 Sonnet (with Extended Thinking) **Reasoning Control Methods**: #### 1. Extended Thinking Mode (Pro users only) - Available in web interface with "thinking" toggle - Budget: Up to 16k tokens for reasoning - **Cannot be disabled via API** - this is a ChatGPT interface feature #### 2. Chain-of-Thought via System Prompts ```typescript const driver = new AnthropicThreadDriver({ model: "claude-3-5-sonnet-20241022", }); // Enable reasoning const soloWithThinking = new Solo({ driver, systemPrompt: "Use <thinking>...</thinking> tags to show your reasoning process." }); // Disable reasoning const soloWithoutThinking = new Solo({ driver, systemPrompt: "Provide direct answers without showing your thinking process." }); ``` #### 3. Trigger Words for Thinking Budget - `"think"` - Basic thinking budget - `"think hard"` - Increased thinking budget - `"think harder"` - Higher thinking budget - `"ultrathink"` - Maximum thinking budget **Example Implementation**: ```typescript // In user prompts: await solo.ask("Think hard about this complex problem: ..."); await solo.ask("Ultrathink this philosophical question: ..."); ``` --- ## Google Gemini ### Gemini 2.5 Series (Pro, Flash, Flash-Lite) **Full API Control Available**: ```typescript interface GeminiConfigWithThinking extends GeminiConfig { thinkingConfig?: { thinkingBudget?: number; // -1 = dynamic, 0 = disabled, >0 = specific budget }; includeThoughts?: boolean; // Access thought summaries } ``` **Implementation Example**: ```typescript // Disable thinking completely const driverNoThinking = new GeminiThreadDriver({ model: "gemini-2.5-flash", // Need to add thinkingConfig support to driver generationConfig: { thinkingConfig: { thinkingBudget: 0 // Disable thinking } } }); // Enable dynamic thinking const driverDynamicThinking = new GeminiThreadDriver({ model: "gemini-2.5-flash", generationConfig: { thinkingConfig: { thinkingBudget: -1 // Dynamic thinking based on complexity } } }); // Set specific thinking budget const driverBudgetThinking = new GeminiThreadDriver({ model: "gemini-2.5-flash", generationConfig: { thinkingConfig: { thinkingBudget: 1024 // Specific token budget for thinking } } }); ``` **Important Notes**: - ✅ **Gemini 2.5 Flash & Flash-Lite**: Thinking can be disabled with `thinkingBudget: 0` - ❌ **Gemini 2.5 Pro**: Thinking cannot be disabled - ✅ **Access thought summaries**: Use `includeThoughts: true` ### Gemini 2.0 Flash Thinking Experimental **Key Characteristics**: - ❌ **Cannot disable thinking** - built into the model architecture - ✅ **Access thought summaries** via `includeThoughts: true` ```typescript const driver = new GeminiThreadDriver({ model: "gemini-2.0-flash-thinking-exp-01-21", // Thinking always enabled - cannot be disabled }); ``` --- ## Cerebras (Qwen Models) ### Qwen3-32B (Reasoning Capable) **API Control Available**: ```typescript interface CerebrasConfigWithThinking extends CerebrasConfig { enableThinking?: boolean; // Enable/disable thinking mode chatTemplateKwargs?: { enable_thinking?: boolean; // Alternative control method }; } ``` **Implementation Examples**: ```typescript // Enable thinking (default for Qwen3) const driverWithThinking = new CerebrasThreadDriver({ model: "qwen-3-32b", // Need to add thinking control to driver extraBody: { enable_thinking: true } }); // Disable thinking (align with Qwen2.5-Instruct behavior) const driverNoThinking = new CerebrasThreadDriver({ model: "qwen-3-32b", extraBody: { enable_thinking: false } }); ``` **Soft Control via User Instructions**: ```typescript // Enable thinking for specific request await agent.ask("/think Analyze this complex problem..."); // Disable thinking for specific request await agent.ask("/no_think Give me a quick answer..."); ``` ### QwQ-32B (Pure Reasoning Model) **Key Characteristics**: - ❌ **Cannot disable reasoning** - it's the core purpose of QwQ models - Always generates `<think>...</think>` blocks - Designed specifically for reasoning tasks ```typescript const driver = new CerebrasThreadDriver({ model: "qwq-32b-preview", // If/when available maxCompletionTokens: 64000, // Recommended for verbose reasoning output }); ``` --- ## Implementation Status in Improv ### Currently Supported ✅ - **ReasoningExtractor utility** - Extracts thinking content from all providers - **Basic reasoning extraction** - All drivers extract `<think>`, `<thinking>`, etc. tags - **Message.reasoning field** - Stores extracted reasoning content ### Needs Implementation ❌ - **OpenAI o1 model support** - Add to OpenAIModel type - **OpenAI reasoning_effort parameter** - Add to OpenAIConfig - **Gemini thinkingConfig parameter** - Add to GeminiConfig - **Cerebras enable_thinking parameter** - Add to CerebrasConfig - **API parameter forwarding** - Pass reasoning control to underlying APIs --- ## Recommendations ### For Maximum Control 1. **Use Gemini 2.5 Flash** - Most granular control with `thinkingBudget` 2. **Use Qwen3-32B** - Good reasoning with `enable_thinking` toggle ### For Always-On Reasoning 1. **Use OpenAI o1 models** - Built-in reasoning (when supported) 2. **Use QwQ models** - Pure reasoning focus 3. **Use Gemini 2.0 Flash Thinking** - Experimental reasoning mode ### For System Prompt Control 1. **Use any model with thinking tags** - `<think>`, `<thinking>`, etc. 2. **Use Anthropic trigger words** - "think", "think hard", "ultrathink" --- ## Future Enhancements To fully support reasoning control in Improv, these driver updates are needed: 1. **Add reasoning control parameters to all driver configs** 2. **Implement API parameter forwarding** 3. **Add o1 model support to OpenAI driver** 4. **Add soft/hard reasoning toggles** 5. **Add reasoning budget management** 6. **Add reasoning token tracking and cost estimation**