Computer Use

The Computer Use tool allows models to interact with computer interfaces, including web browsers and desktop applications. This enables automation of tasks that require visual interaction with software interfaces.

Note: Computer Use is an advanced feature that requires special setup and permissions. It may not be available in all environments or account types.

Prerequisites

Computer Use requires:

Special permissions from OpenAI (may require enterprise account)
Secure environment configuration
Appropriate safety measures and monitoring
Understanding of security implications

How It Works

When enabled, the model can:

Take screenshots of the current screen
Identify UI elements and their locations
Perform mouse clicks and keyboard input
Navigate web pages and applications
Extract information from visual interfaces

Basic Usage

import 'package:dartantic_ai/dartantic_ai.dart';

final agent = Agent(
  'openai-responses:gpt-4o',
  chatModelOptions: const OpenAIResponsesChatOptions(
    serverSideTools: {OpenAIServerSideTool.computerUse},
  ),
);

// Note: This will only work if Computer Use is properly configured
// and you have the necessary permissions
final response = await agent.send(
  'Navigate to example.com and take a screenshot of the homepage'
);

Monitoring Computer Use Activity

Computer Use provides detailed metadata about actions:

await for (final chunk in agent.sendStream(prompt)) {
  // Display the response
  if (chunk.output.isNotEmpty) print(chunk.output);
  
  // Monitor computer use activity
  final computerUse = chunk.metadata['computer_use'];
  if (computerUse != null) {
    final stage = computerUse['stage'];
    print('Computer use: $stage');
    
    if (computerUse['data'] != null) {
      final data = computerUse['data'];
      
      // Show action being performed
      if (data['action'] != null) {
        print('Action: ${data['action']}');
      }
      
      // Show target element or coordinates
      if (data['target'] != null) {
        print('Target: ${data['target']}');
      }
      
      // Check for screenshots
      if (data['screenshot'] != null) {
        print('Screenshot captured');
      }
    }
  }
}

Example Use Cases

Web Automation

final response = await agent.send('''
  1. Open a web browser
  2. Navigate to a news website
  3. Find today's top headline
  4. Take a screenshot of the article
''');

Form Filling

final response = await agent.send('''
  Fill out the contact form with:
  - Name: John Doe
  - Email: john@example.com
  - Message: Testing automated form submission
''');

Data Extraction

final response = await agent.send('''
  Open the sales dashboard and extract this month's 
  revenue figures from the chart
''');

UI Testing

final response = await agent.send('''
  Test the login flow:
  1. Click on "Sign In" button
  2. Enter test credentials
  3. Verify successful login
  4. Take screenshot of dashboard
''');

Security Considerations

Warning: Computer Use has significant security implications. Always:

Isolate Environment: Run in sandboxed or virtual environments
Limit Permissions: Restrict access to sensitive systems
Monitor Activity: Log all actions for audit purposes
Validate Actions: Review automated actions before execution
Use Test Accounts: Never use production credentials

Configuration Requirements

Computer Use typically requires:

// Hypothetical configuration (actual requirements may vary)
final agent = Agent(
  'openai-responses:gpt-4o',
  chatModelOptions: OpenAIResponsesChatOptions(
    serverSideTools: {OpenAIServerSideTool.computerUse},
    // Additional configuration may be required
    // This is environment-specific
  ),
);

Limitations

Availability: Not available to all accounts or regions
Performance: Actions may be slow compared to direct automation
Accuracy: Visual recognition may not be 100% accurate
Security: Significant security risks if not properly configured
Cost: May incur substantial additional costs
Environment: Requires specific runtime environment setup

Best Practices

Start Simple: Begin with read-only tasks like screenshots
```
'Take a screenshot of the current desktop'
```

Be Specific: Provide clear, detailed instructions

'Click the blue "Submit" button in the bottom right corner'

Verify State: Check current state before actions

'First, verify we are on the login page, then enter credentials'

Error Recovery: Plan for failure scenarios

'If the page doesn't load, refresh and try again'

Audit Trail: Keep detailed logs

// Log all computer use actions
print('Computer Use Action: ${action} at ${timestamp}');

Error Handling

try {
  final response = await agent.send('Automate browser task...');
} catch (e) {
  // Possible errors:
  // - Feature not enabled for account
  // - Security restrictions
  // - Environment not configured
  // - Target element not found
  // - Action failed
  print('Computer use error: $e');
}

Alternative Approaches

If Computer Use is not available, consider:

Browser Automation Tools: Puppeteer, Selenium, Playwright
Desktop Automation: AutoHotkey, AppleScript
API Integration: Direct API calls instead of UI automation
RPA Tools: UiPath, Automation Anywhere

Metadata Events

Computer Use emits events for:

computer_use/started: Automation initiated
computer_use/screenshot: Screen captured
computer_use/click: Mouse click performed
computer_use/type: Keyboard input sent
computer_use/completed: Action finished

Cost Considerations

Computer Use may incur significant costs:

Per-action pricing
Screenshot processing costs
Extended processing time charges
Additional compute resources

Check with OpenAI for current pricing and availability.

Getting Access

To enable Computer Use:

Contact OpenAI support or your account manager
Discuss your use case and security measures
Complete any required agreements or compliance checks
Receive configuration instructions specific to your setup

Troubleshooting

Feature Not Available

// Error: Computer use is not enabled for this account
// Solution: Contact OpenAI to request access

Environment Issues

// Error: Computer use environment not configured
// Solution: Follow OpenAI's setup guide for your platform

Permission Denied

// Error: Insufficient permissions for computer use
// Solution: Ensure proper security configuration

Getting Started

Features

Advanced

Integration

Miscellaneous

Computer Use

Computer Use

Prerequisites

How It Works

Basic Usage

Monitoring Computer Use Activity

Example Use Cases

Web Automation

Form Filling

Data Extraction

UI Testing

Security Considerations

Configuration Requirements

Limitations

Best Practices

Error Handling

Alternative Approaches

Metadata Events

Cost Considerations

Getting Access

Troubleshooting

Feature Not Available

Environment Issues

Permission Denied

See Also

On this page