Documentation Index
Fetch the complete documentation index at: https://docs.dartantic.ai/llms.txt
Use this file to discover all available pages before exploring further.
Dartantic supports multi-modal input, including text, images, PDFs and other
binary attachments. You can attach local files, download files from URLs,
attach raw bytes, attach links, or mix and match all of the above.
Local Files
You can attach local files to prompts you send to your agent:
// Using cross_file for cross-platform support
import 'package:cross_file/cross_file.dart';
final agent = Agent('openai');
// Text file
final bioFile = XFile.fromData(
await File('bio.txt').readAsBytes(),
path: 'bio.txt',
);
final result = await agent.send(
'Can you summarize the attached file?',
attachments: [await DataPart.fromFile(bioFile)],
);
// Image file (the moment of truth)
final fridgeFile = XFile.fromData(
await File('fridge.png').readAsBytes(),
path: 'fridge.png',
);
final result = await agent.send(
'What food do I have on hand?',
attachments: [await DataPart.fromFile(fridgeFile)],
);
// "I see leftover pizza, expired milk, and... is that a science experiment?"
// Responses API can request richer vision detail
final responsesAgent = Agent(
'openai-responses',
chatModelOptions: const OpenAIResponsesChatModelOptions(
imageDetail: ImageDetail.high,
),
);
await responsesAgent.send(
'Describe the fridge image with extra detail',
attachments: [await DataPart.fromFile(fridgeFile)],
);
Download from URL
You can download data from links:
// Download and include file from URL
final urlData = await DataPart.url(
Uri.parse('https://example.com/document.pdf'),
);
final result = await agent.send(
'Summarize this document',
attachments: [urlData],
);
Raw Bytes
You can attach bytes you’ve already got in memory:
// Include raw bytes with mime type
final bytes = Uint8List.fromList([/* your data */]);
final rawData = DataPart(
bytes: bytes,
mimeType: 'application/pdf',
);
final result = await agent.send(
'Process this data',
attachments: [rawData],
);
Web URLs
You an attach links w/o downloading:
// Direct URL reference (OpenAI)
final result = await agent.send(
'Describe this image',
attachments: [
LinkPart(Uri.parse('https://example.com/image.jpg')),
],
);
Mix and Match Attachments
You can mix and match:
// Mix text and images
final result = await agent.send(
'Based on the bio and fridge contents, suggest a meal',
attachments: [
await DataPart.fromFile(bioFile),
await DataPart.fromFile(fridgeFile),
],
);
Audio Transcription
Google Gemini models support audio transcription natively through the chat interface. Simply attach an audio file and request transcription in your prompt.
Text Transcription
For simple text transcription, attach an audio file and request the transcription:
import 'dart:io';
import 'package:dartantic_ai/dartantic_ai.dart';
final agent = Agent('google');
final audioBytes = await File('audio.m4a').readAsBytes();
final result = await agent.send(
'Transcribe this audio file word for word.',
attachments: [
DataPart(audioBytes, mimeType: 'audio/mp4', name: 'audio.m4a'),
],
);
// Get transcription text
final transcription = result.messages
.expand((m) => m.parts)
.whereType<TextPart>()
.map((p) => p.text)
.join('\n');
print(transcription);
Transcription with Timestamps
For word-level timestamps and structured output, use typed responses with a JSON schema:
import 'dart:io';
import 'package:dartantic_ai/dartantic_ai.dart';
final agent = Agent('google');
final audioBytes = await File('audio.m4a').readAsBytes();
final schema = Schema.fromMap({
'type': 'object',
'properties': {
'transcript': {'type': 'string'},
'words': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'word': {'type': 'string'},
'start_time': {'type': 'number'},
'end_time': {'type': 'number'},
},
},
},
},
});
final result = await agent.sendFor<Map<String, dynamic>>(
'Transcribe this audio with word-level timestamps (in seconds).',
outputSchema: schema,
attachments: [
DataPart(audioBytes, mimeType: 'audio/mp4', name: 'audio.m4a'),
],
);
final transcription = result.output;
print('Transcript: ${transcription['transcript']}');
// Access word-level timestamps
for (final word in transcription['words'] as List) {
final w = word as Map<String, dynamic>;
print('${w['start_time']}s - ${w['end_time']}s: ${w['word']}');
}
Provider Support
| Provider | Audio Transcription | Timestamps |
|---|
| Google | ✅ Native support | ✅ Word-level |
| OpenAI Responses | ❌ Not supported | ❌ |
| Anthropic | ❌ Not supported | ❌ |
Note: Only Google Gemini models currently support audio transcription through the chat interface.
OCR (Optical Character Recognition)
Google Gemini models support OCR for extracting text from images. Simply attach an image containing text and request extraction in your prompt.
Extract text from images while preserving formatting and structure:
import 'dart:io';
import 'package:cross_file/cross_file.dart';
import 'package:dartantic_ai/dartantic_ai.dart';
final agent = Agent('google');
const imagePath = 'document.png';
final imageFile = XFile.fromData(
await File(imagePath).readAsBytes(),
path: imagePath,
);
final result = await agent.send(
'Extract all text from this image. Preserve the formatting and structure.',
attachments: [await DataPart.fromFile(imageFile)],
history: [ChatMessage.system('Be precise and preserve formatting.')],
);
print(result.output);
// Extracted text with formatting preserved
Use Cases
OCR is useful for:
- Extracting text from scanned documents
- Reading text from screenshots
- Processing forms and receipts
- Analyzing documents with complex layouts
- Converting images of text to editable format
Provider Support
| Provider | OCR Support | Complex Layouts |
|---|
| Google | ✅ Native support | ✅ Tables, multi-column |
| OpenAI Responses | ✅ Vision models | ✅ General layout |
| Anthropic | ✅ Vision models | ✅ General layout |
Note: For specialized OCR tasks requiring extremely high accuracy or specific document types, consider using dedicated OCR services. Mistral also offers a specialized OCR model (mistral-ocr-3-25-12) for document processing, which will be supported once the SDK adds vision capabilities.
Examples
Next Steps