@flatfile/improv
Version:
A powerful TypeScript library for building AI agents with multi-threaded conversations, tool execution, and event handling capabilities
282 lines (221 loc) • 8.15 kB
Markdown
# Controlling Reasoning in AI Models
This guide explains how to enable, disable, and control reasoning behavior across different AI model providers in Improv.
## Overview
Reasoning models use "thinking" processes to provide more thoughtful, step-by-step responses. Different providers offer varying levels of control over this behavior.
## OpenAI Models
### o1-preview and o1-mini (Reasoning Models)
**Current Status**: Not yet supported in OpenAIModel type definition in Improv
**Key Characteristics**:
- Reasoning **cannot be disabled** - it's built into the model architecture
- Reasoning tokens are generated internally but not visible in responses
- Reasoning tokens are billable as output tokens (4x more expensive than input tokens)
**API Parameters**:
```typescript
// When o1 models are added to Improv:
const driver = new OpenAIThreadDriver({
model: "o1-preview", // or "o1-mini"
temperature: 1, // Fixed at 1 for o1 models
maxCompletionTokens: 4000, // Controls total output including reasoning
// Note: system messages not supported
// Note: streaming not supported in beta
});
```
**Reasoning Control**:
- ❌ **Cannot disable reasoning** - it's fundamental to how o1 models work
- ✅ **Control reasoning depth** via `reasoning_effort` parameter (for full o1 model, not preview)
- Values: `"low"`, `"medium"`, `"high"`
- ✅ **Control output length** via `maxCompletionTokens`
**Best Practices**:
- Keep prompts simple and direct
- Avoid "think step by step" instructions (model does this automatically)
- Set appropriate `maxCompletionTokens` to manage costs
- Don't attempt to extract internal reasoning (violates usage policies)
### GPT-4o and GPT-4o-mini (Regular Models)
**Reasoning Control**:
- ✅ **Enable via system prompts**: Ask model to use `<think>...</think>` tags
- ✅ **Disable**: Don't mention thinking in prompts
```typescript
// Enable reasoning via system prompt
const driver = new OpenAIThreadDriver({
model: "gpt-4o",
// System prompt requests thinking tags
});
const solo = new Solo({
driver,
systemPrompt: "Use <think>...</think> tags to show your reasoning process before answering."
});
```
## Anthropic Claude
### Claude 3.5 Sonnet (with Extended Thinking)
**Reasoning Control Methods**:
#### 1. Extended Thinking Mode (Pro users only)
- Available in web interface with "thinking" toggle
- Budget: Up to 16k tokens for reasoning
- **Cannot be disabled via API** - this is a ChatGPT interface feature
#### 2. Chain-of-Thought via System Prompts
```typescript
const driver = new AnthropicThreadDriver({
model: "claude-3-5-sonnet-20241022",
});
// Enable reasoning
const soloWithThinking = new Solo({
driver,
systemPrompt: "Use <thinking>...</thinking> tags to show your reasoning process."
});
// Disable reasoning
const soloWithoutThinking = new Solo({
driver,
systemPrompt: "Provide direct answers without showing your thinking process."
});
```
#### 3. Trigger Words for Thinking Budget
- `"think"` - Basic thinking budget
- `"think hard"` - Increased thinking budget
- `"think harder"` - Higher thinking budget
- `"ultrathink"` - Maximum thinking budget
**Example Implementation**:
```typescript
// In user prompts:
await solo.ask("Think hard about this complex problem: ...");
await solo.ask("Ultrathink this philosophical question: ...");
```
## Google Gemini
### Gemini 2.5 Series (Pro, Flash, Flash-Lite)
**Full API Control Available**:
```typescript
interface GeminiConfigWithThinking extends GeminiConfig {
thinkingConfig?: {
thinkingBudget?: number; // -1 = dynamic, 0 = disabled, >0 = specific budget
};
includeThoughts?: boolean; // Access thought summaries
}
```
**Implementation Example**:
```typescript
// Disable thinking completely
const driverNoThinking = new GeminiThreadDriver({
model: "gemini-2.5-flash",
// Need to add thinkingConfig support to driver
generationConfig: {
thinkingConfig: {
thinkingBudget: 0 // Disable thinking
}
}
});
// Enable dynamic thinking
const driverDynamicThinking = new GeminiThreadDriver({
model: "gemini-2.5-flash",
generationConfig: {
thinkingConfig: {
thinkingBudget: -1 // Dynamic thinking based on complexity
}
}
});
// Set specific thinking budget
const driverBudgetThinking = new GeminiThreadDriver({
model: "gemini-2.5-flash",
generationConfig: {
thinkingConfig: {
thinkingBudget: 1024 // Specific token budget for thinking
}
}
});
```
**Important Notes**:
- ✅ **Gemini 2.5 Flash & Flash-Lite**: Thinking can be disabled with `thinkingBudget: 0`
- ❌ **Gemini 2.5 Pro**: Thinking cannot be disabled
- ✅ **Access thought summaries**: Use `includeThoughts: true`
### Gemini 2.0 Flash Thinking Experimental
**Key Characteristics**:
- ❌ **Cannot disable thinking** - built into the model architecture
- ✅ **Access thought summaries** via `includeThoughts: true`
```typescript
const driver = new GeminiThreadDriver({
model: "gemini-2.0-flash-thinking-exp-01-21",
// Thinking always enabled - cannot be disabled
});
```
## Cerebras (Qwen Models)
### Qwen3-32B (Reasoning Capable)
**API Control Available**:
```typescript
interface CerebrasConfigWithThinking extends CerebrasConfig {
enableThinking?: boolean; // Enable/disable thinking mode
chatTemplateKwargs?: {
enable_thinking?: boolean; // Alternative control method
};
}
```
**Implementation Examples**:
```typescript
// Enable thinking (default for Qwen3)
const driverWithThinking = new CerebrasThreadDriver({
model: "qwen-3-32b",
// Need to add thinking control to driver
extraBody: {
enable_thinking: true
}
});
// Disable thinking (align with Qwen2.5-Instruct behavior)
const driverNoThinking = new CerebrasThreadDriver({
model: "qwen-3-32b",
extraBody: {
enable_thinking: false
}
});
```
**Soft Control via User Instructions**:
```typescript
// Enable thinking for specific request
await agent.ask("/think Analyze this complex problem...");
// Disable thinking for specific request
await agent.ask("/no_think Give me a quick answer...");
```
### QwQ-32B (Pure Reasoning Model)
**Key Characteristics**:
- ❌ **Cannot disable reasoning** - it's the core purpose of QwQ models
- Always generates `<think>...</think>` blocks
- Designed specifically for reasoning tasks
```typescript
const driver = new CerebrasThreadDriver({
model: "qwq-32b-preview", // If/when available
maxCompletionTokens: 64000, // Recommended for verbose reasoning output
});
```
## Implementation Status in Improv
### Currently Supported ✅
- **ReasoningExtractor utility** - Extracts thinking content from all providers
- **Basic reasoning extraction** - All drivers extract `<think>`, `<thinking>`, etc. tags
- **Message.reasoning field** - Stores extracted reasoning content
### Needs Implementation ❌
- **OpenAI o1 model support** - Add to OpenAIModel type
- **OpenAI reasoning_effort parameter** - Add to OpenAIConfig
- **Gemini thinkingConfig parameter** - Add to GeminiConfig
- **Cerebras enable_thinking parameter** - Add to CerebrasConfig
- **API parameter forwarding** - Pass reasoning control to underlying APIs
## Recommendations
### For Maximum Control
1. **Use Gemini 2.5 Flash** - Most granular control with `thinkingBudget`
2. **Use Qwen3-32B** - Good reasoning with `enable_thinking` toggle
### For Always-On Reasoning
1. **Use OpenAI o1 models** - Built-in reasoning (when supported)
2. **Use QwQ models** - Pure reasoning focus
3. **Use Gemini 2.0 Flash Thinking** - Experimental reasoning mode
### For System Prompt Control
1. **Use any model with thinking tags** - `<think>`, `<thinking>`, etc.
2. **Use Anthropic trigger words** - "think", "think hard", "ultrathink"
## Future Enhancements
To fully support reasoning control in Improv, these driver updates are needed:
1. **Add reasoning control parameters to all driver configs**
2. **Implement API parameter forwarding**
3. **Add o1 model support to OpenAI driver**
4. **Add soft/hard reasoning toggles**
5. **Add reasoning budget management**
6. **Add reasoning token tracking and cost estimation**