Getting Started
Everything you need to connect an AI agent to your browser through Scout. From installation to your first automation in under five minutes.
Install
Install the Scout MCP server globally with npm or pnpm:
TerminalbashYour first session
Open Chrome, install the Scout extension, then run this script. The agent will attach to your open tab, navigate to Hacker News, and extract the top stories as structured JSON.
first-session.tstypescriptIntegration guides
Pick your environment and follow the setup guide. All paths use the same MCP tool interface.
Use Scout directly in Cursor's chat. Ask your AI assistant to navigate pages, extract data, and interact with web apps — all from inside your editor.
CursorbashArchitecture
How Scout connects AI agents to browsers. Two connection paths, one unified tool interface.
Two Connection Paths
Scout offers two distinct paths for browser control, each optimized for different use cases.
CDP Path — Uses the Scout Chrome extension installed in your personal browser. The extension opens a WebSocket connection to the Scout server and relays Chrome DevTools Protocol commands. You keep your cookies, sessions, and authenticated state. Ideal for controlling your real browser with all your logged-in accounts.
MCP Path — Launches headless Playwright instances on-demand. No extension needed. Clean browser state every time, fully isolated. Perfect for batch processing, scraping, CI/CD pipelines, and tasks that don't need existing auth.
Both paths expose the same 60 MCP tools — the only difference is how the browser session is established. Switch between them by choosing browser-attach (CDP) vs browser-launch (MCP).
MCP Server Architecture
Scout exposes a standard Model Context Protocol server that any MCP-compatible client can connect to.
The MCP server runs locally via the scout-mcp command. It communicates with clients over stdio (standard input/output), the standard MCP transport.
Clients discover available tools by calling listTools(). Each tool has a typed schema describing its parameters and return type. No API documentation needed at runtime — the schema IS the documentation.
Tool calls are stateless from the client's perspective. The server maintains session state internally using the @-ref system. Clients just pass refs between tool calls.
The @ Ref System
Every browser object in Scout has a short, typed identifier that agents pass between tool calls.
@s — Playwright MCP session. Created by browser-launch for headless automation.
@x — CDP session. Created by browser-attach when connecting to a real browser tab via the extension.
@t — Tab reference. Returned by browser-tabs and browser-launch. Identifies which browser tab to operate on.
@e — DOM element. Returned by browser-snapshot. Valid for a single snapshot — re-snapshot after DOM changes.
@c — Connection. Identifies a connected Chrome instance (extension). @n — Network request. @w — WebSocket. @d — Download. @l — Log entry. @p — Error. @g — Dialog.
@f — File ref. Registered when files are picked (system-picker), downloaded (browser-download), or written (system-write). Use @fN in browser-upload or system-read instead of repeating file paths.
This ref system eliminates brittle CSS selectors and long WebSocket URLs. Agents work with short, memorable identifiers that the server resolves internally.
Multi-Agent Coordination
Multiple agents can operate simultaneously, each controlling different tabs with independent sessions.
Each agent gets its own session ID and tab assignment. Agents cannot interfere with each other's sessions.
Use agent-roster to discover peer agents. Use agent-message for fire-and-forget communication between agents. Use agent-request / agent-respond for synchronous RPC between agents.
Common pattern: an orchestrator agent that spawns worker agents for parallel data collection. The orchestrator uses agent-request to assign URLs to workers, and workers use agent-respond to return results.
Guardrails & Limits
Scout provides guardrails to prevent runaway automation and unexpected costs.
MAX_AGENT_STEPS = 25 — Maximum number of tool calls an autonomous agent (browser-task) can make before stopping. Prevents infinite loops.
MAX_TOOL_RESULT_CHARS = 30,000 — Tool results are truncated beyond this limit. Prevents context window overflow.
FREE_MONTHLY_CREDITS = 500 trial credits on the Free plan. Pro ($9.99/month) unlocks all browser tools and MCP surface — tooling only. AI inference is a separate addon from $5/month with user-set budgets. Minimum paid = $14.99/month (Pro + $5 inference). BYOK (free) lets you bring your own API key. Voice ($9.99/month), System ($14.99/month). Max bundle ($39.99/month) = Pro + all addons + $15 inference budget.
Snapshot filtering reduces token usage by up to 75%. Use excludeDecorative, role filters, maxDepth, and selector scoping to minimize payload sizes.
Extension settings
Settings you configure in the Scout extension. All values are stored locally in your browser and never shared with Scout servers.
SCOUT API KeyYour Scout API key. Required to authenticate with the Scout server and access browser automation tools. Generate one from your account settings at scout.i.ng.
AI Provider Key (BYOK)Your personal AI provider key for BYOK mode. When set, AI inference is billed directly to your provider account instead of routing through Scout. Configure in the extension settings.
Wallet Private KeyYour wallet private key for on-chain payments (x402 protocol). Required only if you use the payment tools. Stored locally in the extension — never sent to Scout servers.
Wallet Networkdefault: base-sepoliaBlockchain network for payment transactions. Defaults to Base Sepolia (testnet). Set to Base for mainnet USDC.
The ref system
Scout uses short @-prefixed identifiers to reference browser objects across tool calls. Refs are returned by tools and passed back as arguments.
@se.g. @s1Playwright MCP session. Returned by browser-launch.
@xe.g. @x1CDP session. Returned by browser-attach.
@ce.g. @c1Extension connection. Identifies a connected Chrome instance.
@te.g. @t1Browser tab. Returned by browser-tabs and browser-launch.
@ee.g. @e42DOM element. Valid for the lifetime of a single snapshot. Re-snapshot after DOM changes.
@ne.g. @n7Network request. Captured by browser-network and browser-route.
@we.g. @w1WebSocket connection. Tracked by browser-websocket.
@de.g. @d1Download. Returned by browser-download list.
@le.g. @l5Console log entry. Returned by browser-console.
@pe.g. @p2Page error. Returned by browser-console.
@ge.g. @g1Browser dialog (alert/confirm/prompt). Captured automatically.
@fe.g. @f1File ref. Registered from system-picker, downloads, and file-write. Accepted by browser-upload and system-read.
Tool reference
Complete MCP tool catalog with all parameters and return types. 70+ tools across 12 categories.
Manage browser sessions, tab attachment, and multi-agent coordination.
browser-tabsList all available browser tabs with their titles, URLs, and tab IDs. Returns @t refs for use with browser-attach.
browser-attachAttach a CDP session to a specific browser tab and create a session ID (@x ref). Required before most browser operations.
tabRefstringTab ref from browser-tabs (e.g., @t1). Identifies which tab to attach to.
browser-detachDetach a CDP session from a tab and clean up all associated state. Always call when done with a tab.
sessionIdstringSession ID to detach
browser-launchOpen a new browser tab, optionally navigating to a URL immediately.
url?stringURL to open in the new tabdefault: about:blank
browser-closeClose a browser tab and clean up its session. The tab is removed from the browser.
sessionIdstringSession ID of the tab to close
browser-connectConnect to an externally launched Chromium instance via CDP WebSocket URL. For headless automation or remote browsers.
wsUrlstringCDP WebSocket URL (e.g., ws://localhost:9222/devtools/browser/...)
browser-disconnectDisconnect from an externally connected browser instance.
sessionIdstringSession ID to disconnect
browser-windowList all active sessions managed by the current Scout instance. Shows session IDs, tab refs, URLs, and titles.
agent-messageSend a message to a peer agent controlling a different tab. Used for multi-agent coordination.
agentRefstringTarget agent ref (e.g., @a2)
messagestringMessage content to send
agent-rosterList all peer agents currently active in the Scout session. Returns agent refs, their tab assignments, and session IDs.
agent-requestSubmit a request to a peer agent and await its response. Synchronous cross-agent RPC.
agentRefstringTarget agent ref
requeststringRequest payload
agent-respondReply to a pending agent-request. Used by specialized agents to respond to orchestrator requests.
requestIdstringRequest ID to respond to
responsestringResponse payload
browser-discoverFind available browsers: Scout sessions and external Chrome (CDP). Returns wsEndpoints for browser-debug. Call before browser-launch to reuse existing browsers.
ports?number[]Array of ports to scan for CDP-enabled browsersdefault: [9222, 9223, 9224]
timeout?numberConnection timeout in milliseconds for each port scandefault: 1000
Snapshot optimization
Reduce token usage by up to 75% using snapshot filters. Smaller snapshots mean faster agent responses and lower costs.
Exclude Decorative Elements
Varies by page — contributes to up to 75% combined reduction with other filtersStrip decorative elements (icons, separators, generic containers) that add tokens without useful information.
excludeDecorative: trueExampletypescriptRole-Based Filtering
20–40% reduction depending on page structureFilter the accessibility tree by ARIA roles. Remove roles that aren't relevant to your task — for example, exclude 'img' and 'separator' when extracting text content.
excludeRoles: string[]ExampletypescriptDepth Limiting
Configurable — deeper pages benefit mostLimit how deep the accessibility tree is traversed. Shallow depths (2–4) capture top-level navigation and headings. Deeper depths (6–8) capture interactive elements inside nested components.
maxDepth: numberExampletypescriptElement Scoping
50–75% reduction when scoping to content regionScope the snapshot to a specific element using a CSS selector. Only the subtree rooted at the matched element is included. Ideal when you know the content region (e.g., 'main', '#content', 'article').
selector: stringExampletypescriptStacked Filtering
Up to 75% combined reductionCombine multiple filters for maximum reduction. The recommended starting point for most extraction tasks: scope to main content, remove decorative nodes, and limit depth.
Combined filtersExampletypescriptRecipes
Copy-paste patterns for common automation tasks. Each recipe shows the full tool call sequence.
Extract Structured Data
Navigate to a page and extract content using DOM property extraction.
StepsNavigate to the target URL
Snapshot the main content area with decorative filtering
Use browser-extract with property: 'article' for clean content or 'text' for element text
Extract Structured DatatypescriptFill and Submit Forms
Fill form fields and submit using element refs from the accessibility snapshot.
StepsSnapshot the page to discover form field refs
Fill each field using browser-action with action: fill
Click the submit button
Re-snapshot to verify the result
Fill and Submit FormstypescriptBatch Operations Pipeline
Chain navigate → snapshot → extract in a single batch call to minimize round-trips.
StepsDefine all operations as an array of { tool, params } objects
Send them all at once with browser-batch (max 10 actions)
Process the ordered result array
Batch Operations PipelinetypescriptEfficient Multi-Page Crawling
Block unnecessary resources and crawl multiple pages for bulk content extraction.
StepsSet up route interception to block images, fonts, and CSS
Navigate to the starting page
Use browser-crawl with an extraction expression
Efficient Multi-Page CrawlingtypescriptMobile Device Testing
Set viewport size, user agent, and network throttling to test responsive design and performance.
StepsResize viewport to mobile dimensions with browser-resize
Set user agent and locale with browser-emulate
Apply network throttling with custom throughput and latency values
Navigate and capture a full-page screenshot
Collect performance metrics
Mobile Device TestingtypescriptFile & Clipboard Operations
Handle file downloads, uploads, and clipboard operations in automated workflows.
StepsUse browser-download to wait for and capture downloads
Use browser-upload to select files for <input type='file'> elements
Use browser-evaluate for clipboard access
File & Clipboard OperationstypescriptFAQ
Common questions about Scout setup, tool usage, and troubleshooting.
Does Scout work with Claude, GPT, and Gemini?
Yes. Scout exposes a standard MCP tool interface that works with any MCP-compatible client: Claude Desktop, Cursor, Cline, and any agent built with @modelcontextprotocol/sdk.
What do I need installed to use Scout?
The CDP path requires the Scout Chrome extension running in your browser and the Scout MCP server running locally. The MCP path requires only the server — it launches headless Playwright instances automatically.
How do I target elements reliably?
The @e ref system. browser-snapshot returns a YAML accessibility tree where every interactive element has a short @eN identifier. Pass these directly to browser-action instead of CSS selectors. They're stable within a snapshot but regenerate after DOM changes — always snapshot before acting.
An element doesn't appear in the snapshot. What now?
Call browser-snapshot and check the returned tree. If your element is absent, try: (1) disable excludeDecorative, (2) remove role filters, (3) use browser-evaluate to check if the element exists in the DOM, (4) check if it's inside an iframe with browser-frames.
browser-action returns 'element intercepts pointer events'. How do I fix it?
Pass force: true to browser-action. This bypasses Playwright's actionability checks (visibility, enabled, not-intercepted). Use only when you're certain the element exists but is covered by an overlay.
How do I minimize round-trips for agent efficiency?
Use the browser-batch tool. It accepts an array of { tool, params } objects (via the 'actions' param) and returns an array of results in order. One round-trip for up to 10 operations. Ideal for navigation → snapshot → extract pipelines.
Snapshots are too large and consuming too many tokens. What should I do?
Use browser-snapshot filtering: set excludeDecorative: true, add role filters with excludeRoles, use maxDepth, or scope with selector: 'main'. Snapshot a sub-element instead of the full page when you know where your content is.
Can Scout access my authenticated sessions (like Gmail, GitHub)?
Authenticated sessions are isolated per tab. Session cookies and tokens are accessed through the Chrome extension directly — they're never transmitted to the Scout server. The server only sees CDP commands and DOM snapshots.
How long are @e refs valid?
An @eN ref is valid for the lifetime of a single snapshot. After any DOM mutation or navigation, refs are invalidated. Always call browser-snapshot again before calling browser-action if you're unsure whether the DOM has changed.
How do I crawl an entire website?
Use browser-crawl. It launches a Crawlee-powered crawler that handles concurrency, deduplication, resource blocking, and infinite scroll automatically. Pass enqueueStrategy: 'same-hostname' for full-site crawls.
How do I connect to an already-running browser?
Use browser-discover to scan local ports (default: 9222, 9223, 9224) for Chromium instances launched with --remote-debugging-port. It returns WebSocket endpoints you can pass directly to browser-connect. No manual URL copying needed.
How do the payment tools work?
Scout supports three payment tools: payment-balance (check your USDC wallet), payment-pay (fetch x402-gated resources with automatic USDC payment), and payment-transfer (send USDC to any EVM address). Configure your wallet private key in the Scout extension settings.
What environment variables do I need for payments?
Configure your wallet private key in the Scout extension settings. Set walletNetworkId in scout.config.json — defaults to 'base-sepolia' for testnet. For production payments, set it to 'base' for mainnet USDC.
What API keys does Scout require?
No API keys required — Scout provides AI inference out of the box on all plans. You can optionally bring your own AI provider key (BYOK) via the extension settings for direct billing or model preference. Wallet private key for payments is configured in the extension settings.
What is the difference between CDP and MCP paths?
Scout uses two paths: the CDP path connects to your real browser via the Chrome extension (your cookies, your sessions, your authenticated state). The MCP path launches disposable headless Playwright instances (clean state, no cookies, fully isolated). Choose based on whether you need existing auth or clean automation.
How do I coordinate multiple agents?
Use agent-roster to list all active peer agents, agent-message to send fire-and-forget messages, and agent-request/agent-respond for synchronous RPC between agents. Each agent controls its own tab and session independently.
How do I test on mobile viewports?
Use browser-emulate to set viewport parameters (user agent, color scheme, locale, geolocation, timezone). Combine with browser-resize to set specific viewport dimensions. Use browser-throttle with downloadThroughput, uploadThroughput, and latency values for network simulation.
How do I intercept and modify network requests?
browser-route lets you intercept and modify network requests before they reach the server. You can block resources (images, fonts, trackers), modify headers, mock API responses, or redirect URLs. Use browser-unroute to remove interceptions.
How do I export a page as PDF or screenshot?
Use browser-pdf to generate a PDF of the current page. It supports custom paper sizes, margins, headers/footers, page ranges, and scale. For screenshots, use browser-screenshot with target 'screen' (viewport), 'page' (full document), 'element' (specific DOM element), or 'clip' (rectangle via x, y, width, height).
How does data extraction work?
browser-extract reads DOM properties directly — it does not use AI. Use the 'property' parameter to specify what to extract: 'article' (main content as clean markdown via Readability), 'text', 'html', 'markdown', 'value', 'attribute', 'title', 'url', 'count', 'box', 'visible', 'enabled', 'checked', or 'focused'. Combine with a CSS selector or @ref to target specific elements.
How does the credit system work?
Scout separates tooling from inference. Pro ($9.99/month) covers all browser tools, MCP surface, and CDP streaming — no inference included. AI inference is a separate addon from $5/month where you set your own budget with no ceiling. Minimum paid experience is $14.99/month (Pro + $5 inference). BYOK (free) lets you bring your own API key and skip Scout’s inference entirely. Voice ($9.99/month) adds STT/TTS/STS. System ($14.99/month) adds OS-level automation. The Max bundle ($39.99/month) includes Pro and all addons with a $15 inference budget at a ~$10/month discount.
How do I use Scout in headless/CI mode?
Run your Chromium with --remote-debugging-port=9222, then use browser-connect with the WebSocket URL, or just call browser-discover to auto-detect it. This gives you full CDP control without the Chrome extension. Works with any Chromium-based browser.
How do I handle file uploads?
Use browser-upload to select files for file input elements. Pass the file path and the selector of the <input type='file'> element. For drag-and-drop uploads, use browser-action to simulate the drop event on the drop zone.
How do I handle browser dialogs and popups?
Scout automatically detects browser dialogs (alert, confirm, prompt, beforeunload). Use browser-dialog with action 'accept', 'dismiss', 'dismissAll', or 'get' to manage pending dialogs. You can provide promptText when accepting prompt dialogs.
How do I manage cookies and storage?
Use browser-cookies to get, set, or clear cookies for the current session. For set, pass an array of cookie objects with name, value, and optional domain/path/httpOnly/secure properties. Use browser-storage to manage localStorage and sessionStorage with get, set, remove, clear, and keys actions.