Skip to main content
Some providers expose a separate “thinking” channel that streams the model’s reasoning metadata alongside the normal message output. This lets you display or inspect the model’s inner monologue without sending it back to the model or to end users who just want the final answer. OpenAI Responses, Anthropic, and Google all support extended thinking. The generated output is provider-agnostic, so additional providers can adopt it later without requiring any changes to your code.

Enable Thinking

Enable thinking at the Agent level using the enableThinking parameter:
// Simple configuration - uses provider defaults
final agent = Agent(
  'anthropic:claude-sonnet-4-5',
  enableThinking: true,
);

final result = await agent.send('In one sentence, how does quicksort work?');

// Access thinking as a first-class field
if (result.thinking != null && result.thinking!.isNotEmpty) {
  print('[[${result.thinking!.trim()}]]');
}
print(result.output);

Provider-Specific Defaults

Each provider has sensible defaults when enableThinking: true:
  • OpenAI Responses: Uses reasoningSummary: detailed automatically
  • Anthropic: Uses 4096 token budget for extended thinking
  • Google: Uses dynamic token budget (model decides based on task complexity)

Advanced Configuration

For fine-tuning provider-specific behavior, use the options classes:

OpenAI Responses

final agent = Agent(
  'openai-responses:gpt-5',
  enableThinking: true,
  chatModelOptions: const OpenAIResponsesChatModelOptions(
    reasoningSummary: OpenAIReasoningSummary.brief,  // Override default
    reasoningEffort: OpenAIReasoningEffort.high,
  ),
);

Anthropic

final agent = Agent(
  'anthropic:claude-sonnet-4-5',
  enableThinking: true,
  chatModelOptions: const AnthropicChatOptions(
    thinkingBudgetTokens: 8192,  // Override default 4096
  ),
);
Anthropic recommends starting with smaller budgets (4k-10k) and scaling up based on task complexity.

Google

final agent = Agent(
  'google:gemini-2.5-flash',
  enableThinking: true,
  chatModelOptions: const GoogleChatModelOptions(
    thinkingBudgetTokens: 8192,  // Override dynamic default
  ),
);

// Or use explicit dynamic mode
final agentDynamic = Agent(
  'google:gemini-2.5-flash',
  enableThinking: true,
  chatModelOptions: const GoogleChatModelOptions(
    thinkingBudgetTokens: -1,  // Model decides optimal budget
  ),
);

Key Points

  • The reasoning stream is emitted through ChatResult.thinking (a String?)
  • Thinking doesn’t appear in ChatMessage.parts (not sent back to model)
  • You control where (or if) thinking is displayed
  • Important: When using Anthropic with tool calls, thinking blocks are automatically preserved in conversation history as required by their API. This increases token costs on subsequent turns.

Streaming Thinking

final history = <ChatMessage>[];
var stillThinking = true;
stdout.write('[[');

final thinkingBuffer = StringBuffer();
await for (final chunk in agent.sendStream(
  'In one sentence: how does quicksort work?',
)) {
  // Access thinking from ChatResult.thinking
  final thinking = chunk.thinking;
  final hasThinking = thinking != null && thinking.isNotEmpty;
  final hasText = chunk.output.isNotEmpty;

  if (hasThinking) {
    thinkingBuffer.write(thinking);
    stdout.write(thinking);
  }

  if (hasText) {
    if (stillThinking) {
      stillThinking = false;
      stdout.writeln(']]\n');
    }
    stdout.write(chunk.output);
  }

  history.addAll(chunk.messages);
}

stdout.writeln('\n');
The stream delivers reasoning deltas incrementally through ChatResult.thinking, so you can render a live “thought bubble” while the model is working. Each chunk may include text output, thinking, or both.

Examples