How Scout works
A technical deep-dive into Scout's layered architecture — from MCP protocol through the extension's WebSocket bridge to Chrome DevTools Protocol and your browser's rendering engine.
The four-layer stack
Scout is organized as a clean separation of concerns. Each layer has a single responsibility and communicates through well-defined protocols.
01AI Agent
MCP (stdio transport)Any MCP-compatible AI agent. Discovers browser tools through the MCP protocol and issues structured tool calls.
The AI agent has no awareness of CDP or WebSocket internals. It sees Scout as a tool server exposing named functions with typed inputs and outputs. This means any agent — regardless of model or framework — can use Scout without modification.
Claude DesktopCursorClineCustom agentsMCP SDK clients02MCP Server
WebSocket (WSS)Scout's orchestration layer. Translates MCP tool calls into CDP commands, manages session state, enforces governance policies, and routes commands to the correct extension connection.
The orchestration layer runs a multi-agent graph — an orchestrator agent with six specialist sub-agents (interactor, network manager, storage manager, media controller, debugger, emulator). Dynamic model selection means you can route different tasks to different AI providers at runtime.
Tool registrySession managerConnection managerGovernance engineMulti-agent graph03Chrome Extension
CDP (chrome.debugger)The Scout Chrome extension. Maintains a persistent WebSocket connection to the MCP server. Manages CDP sessions per tab using the chrome.debugger API. Routes commands and events bidirectionally.
The extension initiates the WebSocket connection (the server cannot reach into a user's browser). Each attached tab gets an isolated CDPSession with its own command queue, event capture pipeline, and memory store. Sessions survive extension updates and WebSocket reconnects.
Service workerWebSocket clientCDP session poolTab trackerEvent router04Chrome Browser
—The user's actual Chrome browser. Scout controls it through the native Chrome DevTools Protocol — the same protocol Chrome DevTools uses internally. Your authenticated sessions, cookies, and extensions remain intact.
No traffic routing through third-party infrastructure. No SSL termination. No remote browser farm. Scout runs where your browser runs, with full access to the authenticated state that makes web automation actually work.
Chrome tabsDOM and accessibility treeNetwork stackJavaScript runtimeCookie storeCDP domain coverage
Scout exposes key Chrome DevTools Protocol domains through the MCP tool interface. Full protocol access, not a watered-down abstraction.
PageCore page lifecycle. Navigate URLs, capture screenshots, generate PDFs, manage navigation history.
navigate()reload()captureScreenshot()printToPDF()getNavigationHistory()RuntimeJavaScript execution context. Run code in the page, capture console output, handle exceptions.
evaluate()callFunctionOn()consoleAPICalled eventexceptionThrown eventNetworkFull network stack access. Intercept requests, capture responses, manage cookies at protocol level.
requestIntercepted eventsetCookies()getCookies()getResponseBody()DOMDirect DOM access. Query nodes, read and modify attributes, extract HTML without JavaScript evaluation.
querySelector()getAttributes()getOuterHTML()setAttributeValue()AccessibilityAccessibility tree snapshot. The foundation of Scout's snapshot tool — provides structured semantic content without parsing HTML.
getFullAXTree()queryAXTree()InputProgrammatic input injection. Simulate clicks, keystrokes, and touch gestures at the protocol level.
dispatchMouseEvent()dispatchKeyEvent()dispatchTouchEvent()synthesizePinchGesture()EmulationDevice and environment overrides. Emulate mobile devices, spoof geolocation, override timezone.
setDeviceMetricsOverride()setGeolocationOverride()setTimezoneOverride()TargetTab management. Attach sessions, list targets, create new tabs, manage the CDP session lifecycle.
attachToTarget()detachFromTarget()getTargets()createTarget()Multi-agent graph
Scout's orchestration layer runs a multi-agent graph. One orchestrator owns session and navigation; six specialist sub-agents handle domain-specific work.
Orchestrator
session + navigation + contentOwns session management, navigation, and content tools directly. Delegates multi-step domain work to specialists.
Example: Navigate to URL, take snapshot, extract data, attach session
browser-tabsbrowser-attachbrowser-navigatebrowser-snapshotbrowser-extractbrowser-evaluateInteractor
interactionHandles all user input simulation. Observes snapshots before acting, retries with corrected selectors on failure.
Example: Fill a multi-step checkout form with dynamic field validation
browser-actionbrowser-selectbrowser-uploadbrowser-dialogbrowser-highlightbrowser-scrollNetwork Manager
networkMonitors and manipulates network traffic. Intercepts API calls, records HAR files, manages certificates.
Example: Intercept GraphQL responses and mock them for testing
browser-networkbrowser-routebrowser-unroutebrowser-harbrowser-securitybrowser-websocketStorage Manager
storageManages all browser storage. Reads and writes cookies, localStorage, sessionStorage, and the clipboard.
Example: Seed auth tokens into localStorage before running a test
browser-cookiesbrowser-storagebrowser-clipboardMedia Controller
mediaControls all media output. Screenshots, screencasts, PDF generation, video frame capture, and downloads.
Example: Record a full user flow as a video for documentation
browser-screenshotbrowser-screencastbrowser-pdfbrowser-mediabrowser-downloadbrowser-transcribeDebugger
debugDiagnoses page health. Captures console output, reads performance timings, monitors DOM memory.
Example: Detect memory leaks in a long-running SPA session
browser-consolebrowser-metricsbrowser-memoryEmulator
emulationControls device and environment simulation. Emulates mobile devices, throttles CPU/network, emulates vision deficiencies.
Example: Test a checkout flow with custom viewport, mobile user agent, and 3G-throttled connection
browser-emulatebrowser-resizebrowser-visionbrowser-cpubrowser-throttleMemory architecture
Scout maintains structured memory across tabs and sessions. Each layer has distinct scope and persistence semantics.
Working Memory
Structured per-tab state updated by the agent. Tracks the current URL, page title, active session ID, user goal, and key observations.
Tab A tracks that the user is on step 3 of a checkout flow. Tab B tracks a dashboard URL independently.
Observational Memory
Auto-extracted semantic observations shared across all tabs for the same extension. Agent in Tab B can read discoveries made by Tab A.
Tab A discovers the API base URL. Tab B can use it without re-discovery.
Message History
The last 20 conversation turns per tab. Each tab maintains an independent conversation history that survives WebSocket reconnects.
Resuming a long research session after reconnecting.
Execution lifecycle
What happens between an AI agent issuing a tool call and Scout returning a result.
01MCP tool call received
The AI agent sends a tool call over MCP (stdio). The request includes the tool name and typed arguments. Scout validates the arguments against its Zod schema and rejects malformed calls immediately.
02Routing and dispatch
The tool registry looks up the handler. Based on execution path (CDP vs Playwright), the orchestrator routes the call to the appropriate sub-agent or handles it directly.
03CDP command emission
The handler translates the MCP arguments into one or more CDP commands (or Playwright API calls). Commands are queued per session and sent over the WebSocket connection to the extension.
04Extension execution
The Chrome extension receives the CDP command, executes it against the correct tab's DevTools session, and streams the response back over WebSocket.
05Result return
The response is deserialized, validated, and shaped into the MCP result format. Working memory is updated with any new observations. The structured result is returned to the AI agent.