Skip to main content
Some providers expose a separate “thinking” channel that streams the model’s reasoning metadata alongside the normal message output. This lets you display or inspect the model’s inner monologue without sending it back to the model or to end users who just want the final answer. OpenAI Responses, Anthropic, and Google all support extended thinking. The generated output is provider-agnostic, so additional providers can adopt it later without requiring any changes to your code.

Enable Thinking

Enable thinking at the Agent level using the enableThinking parameter:
// Simple configuration - uses provider defaults
final agent = Agent(
  'anthropic:claude-sonnet-4-5',
  enableThinking: true,
);

final result = await agent.send('In one sentence, how does quicksort work?');

// Access thinking via result.thinking
if (result.thinking != null) {
  print('[[${result.thinking}]]');
}
print(result.output);

Provider-Specific Defaults

Each provider has sensible defaults when enableThinking: true:
  • OpenAI Responses: Uses reasoningSummary: detailed automatically
  • Anthropic: Uses 4096 token budget for extended thinking
  • Google: Uses dynamic token budget (model decides based on task complexity)

Advanced Configuration

For fine-tuning provider-specific behavior, use the options classes:

OpenAI Responses

final agent = Agent(
  'openai-responses:gpt-5',
  enableThinking: true,
  chatModelOptions: const OpenAIResponsesChatModelOptions(
    reasoningSummary: OpenAIReasoningSummary.brief,  // Override default
    reasoningEffort: OpenAIReasoningEffort.high,
  ),
);

Anthropic

final agent = Agent(
  'anthropic:claude-sonnet-4-5',
  enableThinking: true,
  chatModelOptions: const AnthropicChatOptions(
    thinkingBudgetTokens: 8192,  // Override default 4096
  ),
);
Anthropic recommends starting with smaller budgets (4k-10k) and scaling up based on task complexity.

Google

final agent = Agent(
  'google:gemini-2.5-flash',
  enableThinking: true,
  chatModelOptions: const GoogleChatModelOptions(
    thinkingBudgetTokens: 8192,  // Override dynamic default
  ),
);

// Or use explicit dynamic mode
final agentDynamic = Agent(
  'google:gemini-2.5-flash',
  enableThinking: true,
  chatModelOptions: const GoogleChatModelOptions(
    thinkingBudgetTokens: -1,  // Model decides optimal budget
  ),
);

Key Points

  • Access thinking via result.thinking for both streaming and non-streaming
  • Thinking is also stored as ThinkingPart in consolidated messages for history
  • You control where (or if) thinking is displayed
  • Important: When using Anthropic with tool calls, thinking blocks are automatically preserved in conversation history as required by their API. This increases token costs on subsequent turns.

Streaming Thinking

final history = <ChatMessage>[];
var stillThinking = true;
stdout.write('[[');

await for (final chunk in agent.sendStream(
  'In one sentence: how does quicksort work?',
)) {
  // Display thinking in real-time via chunk.thinking field
  if (chunk.thinking != null) {
    stdout.write(chunk.thinking);
  }

  // Display response text
  if (chunk.output.isNotEmpty) {
    if (stillThinking) {
      stillThinking = false;
      stdout.writeln(']]\n');
    }
    stdout.write(chunk.output);
  }

  history.addAll(chunk.messages);
}

stdout.writeln('\n');
The stream delivers reasoning deltas incrementally through chunk.thinking, so you can render a live “thought bubble” while the model is working. Each chunk may include text output (chunk.output), thinking (chunk.thinking), or both.

Examples