Browser

A real automated browser your agent can drive — navigate, click, type, screenshot, execute JS.

ToShop ships with a managed automated browser. Your agent can drive it end-to-end: navigate to a URL, read the page's accessibility tree, click elements by reference, fill forms, take screenshots, and even run JavaScript.

It's a real browser, not a web fetch

The browser_* tools spin up a full browser instance — JavaScript runs, login state persists, network requests are intercepted. It's not a stripped-down HTTP client.

Three web-shaped capabilities

Managed automated browser. Full DOM, JS, cookies, sessions.

Use when: your agent needs to interact — click, type, scrape rendered content, log in.

Cost: higher latency, more context (each snapshot is large).

HTTP-only fetch, no rendering. Returns markdown-extracted page text.

Use when: your agent only needs raw content and the page renders server-side.

Cost: lowest — fast, cheap context.

Hand a URL to the system default browser.

Use when: your agent's recommendation includes a link you want to read next.

Cost: none for your agent — it stops being involved once the link opens.

The `browser_*` toolkit

Name	Description
`browser_navigate`	Load a URL in the automated browser.
`browser_snapshot`	Return the page's accessibility tree (interactive elements with refs like @e5, @e12).
`browser_screenshot`	Capture a full-page or viewport image.
`browser_click`	Click an element by accessibility ref.
`browser_type`	Type into a form field.
`browser_press`	Send a keyboard key (Enter, Tab, Esc, etc.).
`browser_scroll`	Scroll the page.
`browser_evaluate`	Run JavaScript in the page context.
`browser_wait_for`	Block until text appears, selector matches, URL changes, or a load state.
`browser_network`	Inspect or record network requests (HAR-style).
`browser_download`	Track downloads triggered by page actions.
`browser_tabs`	List, focus, or close tabs.
`browser_advanced`	PDF export, file upload, dialog handling, hover, drag, select option, fill.

Typical flow

Navigate

browser_navigate("https://example.com")

Snapshot the page

browser_snapshot()

Returns the accessibility tree with refs your agent can target.

Decide which element to interact with

Your agent reads the tree and picks an element.

Act

browser_click(ref="@e5")
browser_type(ref="@e12", text="...")

Wait for the page to settle

browser_wait_for(state="networkidle")

Verify visually if needed

browser_screenshot()

Snapshot vs Screenshot

Snapshot — to interact

The accessibility tree gives stable refs your agent can click or type into. Survives layout changes. Cheap to keep using.

Screenshot — to verify

For showing the user, verifying visually, or solving things like CAPTCHA where a tree isn't enough.

Your agent uses both: snapshot to act, screenshot to confirm.

Browser sessions

A browser session persists across multiple browser_* calls within a task — so your agent can log in once and operate inside the authenticated state. Sessions end when the task ends, unless you've set up a persistent profile.

Persistent profiles

Configure under Settings → Browser → Profiles — useful for "always-signed-in to Shopify admin" patterns. Profiles survive across tasks.

Permissions and safety

When not to use the browser

Don't escalate prematurely

If your agent only needs the page's content (article text, JSON API response, static HTML), web_fetch is faster, lighter, and uses less context. Your agent picks web_fetch by default and escalates to browser_* only when JavaScript, login, or interaction is needed.

If you are going to read the page next, prefer open_url to open it in your normal browser (with your cookies, extensions, etc.).

System Tools — clipboard, screenshot of the desktop (not the browser).
Search — when you want a result, not a specific URL.
Skills — agent-browser skill composes these into larger workflows.

It's a real browser, not a web fetch

The browser_* tools spin up a full browser instance — JavaScript runs, login state persists, network requests are intercepted. It's not a stripped-down HTTP client.

Three web-shaped capabilities

Managed automated browser. Full DOM, JS, cookies, sessions.

Use when: your agent needs to interact — click, type, scrape rendered content, log in.

Cost: higher latency, more context (each snapshot is large).

HTTP-only fetch, no rendering. Returns markdown-extracted page text.

Use when: your agent only needs raw content and the page renders server-side.

Cost: lowest — fast, cheap context.

Hand a URL to the system default browser.

Use when: your agent's recommendation includes a link you want to read next.

Cost: none for your agent — it stops being involved once the link opens.

The `browser_*` toolkit

Name	Description
`browser_navigate`	Load a URL in the automated browser.
`browser_snapshot`	Return the page's accessibility tree (interactive elements with refs like @e5, @e12).
`browser_screenshot`	Capture a full-page or viewport image.
`browser_click`	Click an element by accessibility ref.
`browser_type`	Type into a form field.
`browser_press`	Send a keyboard key (Enter, Tab, Esc, etc.).
`browser_scroll`	Scroll the page.
`browser_evaluate`	Run JavaScript in the page context.
`browser_wait_for`	Block until text appears, selector matches, URL changes, or a load state.
`browser_network`	Inspect or record network requests (HAR-style).
`browser_download`	Track downloads triggered by page actions.
`browser_tabs`	List, focus, or close tabs.
`browser_advanced`	PDF export, file upload, dialog handling, hover, drag, select option, fill.

Typical flow

Navigate

browser_navigate("https://example.com")

Snapshot the page

browser_snapshot()

Returns the accessibility tree with refs your agent can target.

Decide which element to interact with

Your agent reads the tree and picks an element.

Act

browser_click(ref="@e5")
browser_type(ref="@e12", text="...")

Wait for the page to settle

browser_wait_for(state="networkidle")

Verify visually if needed

browser_screenshot()

Snapshot vs Screenshot

Snapshot — to interact

The accessibility tree gives stable refs your agent can click or type into. Survives layout changes. Cheap to keep using.

Screenshot — to verify

For showing the user, verifying visually, or solving things like CAPTCHA where a tree isn't enough.

Your agent uses both: snapshot to act, screenshot to confirm.

System Tools — clipboard, screenshot of the desktop (not the browser).
Search — when you want a result, not a specific URL.
Skills — agent-browser skill composes these into larger workflows.

Three web-shaped capabilities

The `browser_*` toolkit

Typical flow

Navigate

Snapshot the page

Decide which element to interact with

Act

Wait for the page to settle

Verify visually if needed

Snapshot vs Screenshot

Snapshot — to interact

Screenshot — to verify

Browser sessions

Permissions and safety

When not to use the browser

Table of Contents

Browser

Three web-shaped capabilities

The `browser_*` toolkit

Typical flow

Navigate

Snapshot the page

Decide which element to interact with

Act

Wait for the page to settle

Verify visually if needed

Snapshot vs Screenshot

Snapshot — to interact

Screenshot — to verify

Browser sessions

Permissions and safety

When not to use the browser

Table of Contents

Browser

Snapshot — to interact

Screenshot — to verify

First-time navigation prompts

Allowlist for remote-triggered tasks

No saved credentials shared without consent

Audit log

Table of Contents

Browser

Snapshot — to interact

Screenshot — to verify

First-time navigation prompts

Allowlist for remote-triggered tasks

No saved credentials shared without consent

Audit log

Table of Contents