Computer Use
Enable models to control browsers and desktop applications with OpenAI's Computer Use tool
Computer Use
The Computer Use tool allows models to interact with computer interfaces, including web browsers and desktop applications. This enables automation of tasks that require visual interaction with software interfaces.
Note: Computer Use is an advanced feature that requires special setup and permissions. It may not be available in all environments or account types.
Prerequisites
Computer Use requires:
- Special permissions from OpenAI (may require enterprise account)
- Secure environment configuration
- Appropriate safety measures and monitoring
- Understanding of security implications
How It Works
When enabled, the model can:
- Take screenshots of the current screen
- Identify UI elements and their locations
- Perform mouse clicks and keyboard input
- Navigate web pages and applications
- Extract information from visual interfaces
Basic Usage
import 'package:dartantic_ai/dartantic_ai.dart';
final agent = Agent(
'openai-responses:gpt-4o',
chatModelOptions: const OpenAIResponsesChatOptions(
serverSideTools: {OpenAIServerSideTool.computerUse},
),
);
// Note: This will only work if Computer Use is properly configured
// and you have the necessary permissions
final response = await agent.send(
'Navigate to example.com and take a screenshot of the homepage'
);
Monitoring Computer Use Activity
Computer Use provides detailed metadata about actions:
await for (final chunk in agent.sendStream(prompt)) {
// Display the response
if (chunk.output.isNotEmpty) print(chunk.output);
// Monitor computer use activity
final computerUse = chunk.metadata['computer_use'];
if (computerUse != null) {
final stage = computerUse['stage'];
print('Computer use: $stage');
if (computerUse['data'] != null) {
final data = computerUse['data'];
// Show action being performed
if (data['action'] != null) {
print('Action: ${data['action']}');
}
// Show target element or coordinates
if (data['target'] != null) {
print('Target: ${data['target']}');
}
// Check for screenshots
if (data['screenshot'] != null) {
print('Screenshot captured');
}
}
}
}
Web Automation
final response = await agent.send('''
1. Open a web browser
2. Navigate to a news website
3. Find today's top headline
4. Take a screenshot of the article
''');
Form Filling
final response = await agent.send('''
Fill out the contact form with:
- Name: John Doe
- Email: john@example.com
- Message: Testing automated form submission
''');
Data Extraction
final response = await agent.send('''
Open the sales dashboard and extract this month's
revenue figures from the chart
''');
UI Testing
final response = await agent.send('''
Test the login flow:
1. Click on "Sign In" button
2. Enter test credentials
3. Verify successful login
4. Take screenshot of dashboard
''');
Security Considerations
Warning: Computer Use has significant security implications. Always:
- Isolate Environment: Run in sandboxed or virtual environments
- Limit Permissions: Restrict access to sensitive systems
- Monitor Activity: Log all actions for audit purposes
- Validate Actions: Review automated actions before execution
- Use Test Accounts: Never use production credentials
Configuration Requirements
Computer Use typically requires:
// Hypothetical configuration (actual requirements may vary)
final agent = Agent(
'openai-responses:gpt-4o',
chatModelOptions: OpenAIResponsesChatOptions(
serverSideTools: {OpenAIServerSideTool.computerUse},
// Additional configuration may be required
// This is environment-specific
),
);
Limitations
- Availability: Not available to all accounts or regions
- Performance: Actions may be slow compared to direct automation
- Accuracy: Visual recognition may not be 100% accurate
- Security: Significant security risks if not properly configured
- Cost: May incur substantial additional costs
- Environment: Requires specific runtime environment setup
Best Practices
-
Start Simple: Begin with read-only tasks like screenshots
'Take a screenshot of the current desktop'
-
Be Specific: Provide clear, detailed instructions
'Click the blue "Submit" button in the bottom right corner'
-
Verify State: Check current state before actions
'First, verify we are on the login page, then enter credentials'
-
Error Recovery: Plan for failure scenarios
'If the page doesn't load, refresh and try again'
-
Audit Trail: Keep detailed logs
// Log all computer use actions print('Computer Use Action: ${action} at ${timestamp}');
Error Handling
try {
final response = await agent.send('Automate browser task...');
} catch (e) {
// Possible errors:
// - Feature not enabled for account
// - Security restrictions
// - Environment not configured
// - Target element not found
// - Action failed
print('Computer use error: $e');
}
Alternative Approaches
If Computer Use is not available, consider:
- Browser Automation Tools: Puppeteer, Selenium, Playwright
- Desktop Automation: AutoHotkey, AppleScript
- API Integration: Direct API calls instead of UI automation
- RPA Tools: UiPath, Automation Anywhere
Metadata Events
Computer Use emits events for:
computer_use/started
: Automation initiatedcomputer_use/screenshot
: Screen capturedcomputer_use/click
: Mouse click performedcomputer_use/type
: Keyboard input sentcomputer_use/completed
: Action finished
Cost Considerations
Computer Use may incur significant costs:
- Per-action pricing
- Screenshot processing costs
- Extended processing time charges
- Additional compute resources
Check with OpenAI for current pricing and availability.
Getting Access
To enable Computer Use:
- Contact OpenAI support or your account manager
- Discuss your use case and security measures
- Complete any required agreements or compliance checks
- Receive configuration instructions specific to your setup
Feature Not Available
// Error: Computer use is not enabled for this account
// Solution: Contact OpenAI to request access
Environment Issues
// Error: Computer use environment not configured
// Solution: Follow OpenAI's setup guide for your platform