Architecture

How Scout works

A technical deep-dive into Scout's layered architecture — from MCP protocol through the extension's WebSocket bridge to Chrome DevTools Protocol and your browser's rendering engine.

The four-layer stack

Scout is organized as a clean separation of concerns. Each layer has a single responsibility and communicates through well-defined protocols.

01

AI Agent

MCP (stdio transport)

Any MCP-compatible AI agent. Discovers browser tools through the MCP protocol and issues structured tool calls.

The AI agent has no awareness of CDP or WebSocket internals. It sees Scout as a tool server exposing named functions with typed inputs and outputs. This means any agent — regardless of model or framework — can use Scout without modification.

Claude DesktopCursorClineCustom agentsMCP SDK clients
02

MCP Server

WebSocket (WSS)

Scout's orchestration layer. Translates MCP tool calls into CDP commands, manages session state, enforces governance policies, and routes commands to the correct extension connection.

The orchestration layer runs a multi-agent graph — an orchestrator agent with six specialist sub-agents (interactor, network manager, storage manager, media controller, debugger, emulator). Dynamic model selection means you can route different tasks to different AI providers at runtime.

Tool registrySession managerConnection managerGovernance engineMulti-agent graph
03

Chrome Extension

CDP (chrome.debugger)

The Scout Chrome extension. Maintains a persistent WebSocket connection to the MCP server. Manages CDP sessions per tab using the chrome.debugger API. Routes commands and events bidirectionally.

The extension initiates the WebSocket connection (the server cannot reach into a user's browser). Each attached tab gets an isolated CDPSession with its own command queue, event capture pipeline, and memory store. Sessions survive extension updates and WebSocket reconnects.

Service workerWebSocket clientCDP session poolTab trackerEvent router
04

Chrome Browser

The user's actual Chrome browser. Scout controls it through the native Chrome DevTools Protocol — the same protocol Chrome DevTools uses internally. Your authenticated sessions, cookies, and extensions remain intact.

No traffic routing through third-party infrastructure. No SSL termination. No remote browser farm. Scout runs where your browser runs, with full access to the authenticated state that makes web automation actually work.

Chrome tabsDOM and accessibility treeNetwork stackJavaScript runtimeCookie store

CDP domain coverage

Scout exposes key Chrome DevTools Protocol domains through the MCP tool interface. Full protocol access, not a watered-down abstraction.

Page

Core page lifecycle. Navigate URLs, capture screenshots, generate PDFs, manage navigation history.

navigate()reload()captureScreenshot()printToPDF()getNavigationHistory()
Runtime

JavaScript execution context. Run code in the page, capture console output, handle exceptions.

evaluate()callFunctionOn()consoleAPICalled eventexceptionThrown event
Network

Full network stack access. Intercept requests, capture responses, manage cookies at protocol level.

requestIntercepted eventsetCookies()getCookies()getResponseBody()
DOM

Direct DOM access. Query nodes, read and modify attributes, extract HTML without JavaScript evaluation.

querySelector()getAttributes()getOuterHTML()setAttributeValue()
Accessibility

Accessibility tree snapshot. The foundation of Scout's snapshot tool — provides structured semantic content without parsing HTML.

getFullAXTree()queryAXTree()
Input

Programmatic input injection. Simulate clicks, keystrokes, and touch gestures at the protocol level.

dispatchMouseEvent()dispatchKeyEvent()dispatchTouchEvent()synthesizePinchGesture()
Emulation

Device and environment overrides. Emulate mobile devices, spoof geolocation, override timezone.

setDeviceMetricsOverride()setGeolocationOverride()setTimezoneOverride()
Target

Tab management. Attach sessions, list targets, create new tabs, manage the CDP session lifecycle.

attachToTarget()detachFromTarget()getTargets()createTarget()

Multi-agent graph

Scout's orchestration layer runs a multi-agent graph. One orchestrator owns session and navigation; six specialist sub-agents handle domain-specific work.

Main

Orchestrator

session + navigation + content

Owns session management, navigation, and content tools directly. Delegates multi-step domain work to specialists.

Example: Navigate to URL, take snapshot, extract data, attach session

browser-tabsbrowser-attachbrowser-navigatebrowser-snapshotbrowser-extractbrowser-evaluate
Sub 1

Interactor

interaction

Handles all user input simulation. Observes snapshots before acting, retries with corrected selectors on failure.

Example: Fill a multi-step checkout form with dynamic field validation

browser-actionbrowser-selectbrowser-uploadbrowser-dialogbrowser-highlightbrowser-scroll
Sub 2

Network Manager

network

Monitors and manipulates network traffic. Intercepts API calls, records HAR files, manages certificates.

Example: Intercept GraphQL responses and mock them for testing

browser-networkbrowser-routebrowser-unroutebrowser-harbrowser-securitybrowser-websocket
Sub 3

Storage Manager

storage

Manages all browser storage. Reads and writes cookies, localStorage, sessionStorage, and the clipboard.

Example: Seed auth tokens into localStorage before running a test

browser-cookiesbrowser-storagebrowser-clipboard
Sub 4

Media Controller

media

Controls all media output. Screenshots, screencasts, PDF generation, video frame capture, and downloads.

Example: Record a full user flow as a video for documentation

browser-screenshotbrowser-screencastbrowser-pdfbrowser-mediabrowser-downloadbrowser-transcribe
Sub 5

Debugger

debug

Diagnoses page health. Captures console output, reads performance timings, monitors DOM memory.

Example: Detect memory leaks in a long-running SPA session

browser-consolebrowser-metricsbrowser-memory
Sub 6

Emulator

emulation

Controls device and environment simulation. Emulates mobile devices, throttles CPU/network, emulates vision deficiencies.

Example: Test a checkout flow with custom viewport, mobile user agent, and 3G-throttled connection

browser-emulatebrowser-resizebrowser-visionbrowser-cpubrowser-throttle

Memory architecture

Scout maintains structured memory across tabs and sessions. Each layer has distinct scope and persistence semantics.

Per tab (thread-scoped)

Working Memory

Structured per-tab state updated by the agent. Tracks the current URL, page title, active session ID, user goal, and key observations.

Tab A tracks that the user is on step 3 of a checkout flow. Tab B tracks a dashboard URL independently.

Per extension (resource-scoped)

Observational Memory

Auto-extracted semantic observations shared across all tabs for the same extension. Agent in Tab B can read discoveries made by Tab A.

Tab A discovers the API base URL. Tab B can use it without re-discovery.

Per tab (thread-scoped, last 20)

Message History

The last 20 conversation turns per tab. Each tab maintains an independent conversation history that survives WebSocket reconnects.

Resuming a long research session after reconnecting.

Execution lifecycle

What happens between an AI agent issuing a tool call and Scout returning a result.

01

MCP tool call received

The AI agent sends a tool call over MCP (stdio). The request includes the tool name and typed arguments. Scout validates the arguments against its Zod schema and rejects malformed calls immediately.

02

Routing and dispatch

The tool registry looks up the handler. Based on execution path (CDP vs Playwright), the orchestrator routes the call to the appropriate sub-agent or handles it directly.

03

CDP command emission

The handler translates the MCP arguments into one or more CDP commands (or Playwright API calls). Commands are queued per session and sent over the WebSocket connection to the extension.

04

Extension execution

The Chrome extension receives the CDP command, executes it against the correct tab's DevTools session, and streams the response back over WebSocket.

05

Result return

The response is deserialized, validated, and shaped into the MCP result format. Working memory is updated with any new observations. The structured result is returned to the AI agent.