Browser
A real automated browser your agent can drive — navigate, click, type, screenshot, execute JS.
ToShop ships with a managed automated browser. Your agent can drive it end-to-end: navigate to a URL, read the page's accessibility tree, click elements by reference, fill forms, take screenshots, and even run JavaScript.
It's a real browser, not a web fetch
The browser_* tools spin up a full browser instance — JavaScript runs, login state persists, network requests are intercepted. It's not a stripped-down HTTP client.
Three web-shaped capabilities
Managed automated browser. Full DOM, JS, cookies, sessions.
Use when: your agent needs to interact — click, type, scrape rendered content, log in.
Cost: higher latency, more context (each snapshot is large).
HTTP-only fetch, no rendering. Returns markdown-extracted page text.
Use when: your agent only needs raw content and the page renders server-side.
Cost: lowest — fast, cheap context.
Hand a URL to the system default browser.
Use when: your agent's recommendation includes a link you want to read next.
Cost: none for your agent — it stops being involved once the link opens.
The browser_* toolkit
| Name | Description |
|---|---|
browser_navigate | Load a URL in the automated browser. |
browser_snapshot | Return the page's accessibility tree (interactive elements with refs like @e5, @e12). |
browser_screenshot | Capture a full-page or viewport image. |
browser_click | Click an element by accessibility ref. |
browser_type | Type into a form field. |
browser_press | Send a keyboard key (Enter, Tab, Esc, etc.). |
browser_scroll | Scroll the page. |
browser_evaluate | Run JavaScript in the page context. |
browser_wait_for | Block until text appears, selector matches, URL changes, or a load state. |
browser_network | Inspect or record network requests (HAR-style). |
browser_download | Track downloads triggered by page actions. |
browser_tabs | List, focus, or close tabs. |
browser_advanced | PDF export, file upload, dialog handling, hover, drag, select option, fill. |
Typical flow
Navigate
browser_navigate("https://example.com")Decide which element to interact with
Your agent reads the tree and picks an element.
Act
browser_click(ref="@e5")
browser_type(ref="@e12", text="...")Wait for the page to settle
browser_wait_for(state="networkidle")Verify visually if needed
browser_screenshot()Snapshot vs Screenshot
Snapshot — to interact
The accessibility tree gives stable refs your agent can click or type into. Survives layout changes. Cheap to keep using.
Screenshot — to verify
For showing the user, verifying visually, or solving things like CAPTCHA where a tree isn't enough.
Your agent uses both: snapshot to act, screenshot to confirm.
Browser sessions
A browser session persists across multiple browser_* calls within a task — so your agent can log in once and operate inside the authenticated state. Sessions end when the task ends, unless you've set up a persistent profile.
Persistent profiles
Configure under Settings → Browser → Profiles — useful for "always-signed-in to Shopify admin" patterns. Profiles survive across tasks.
Permissions and safety
When not to use the browser
Don't escalate prematurely
If your agent only needs the page's content (article text, JSON API response, static HTML), web_fetch is faster, lighter, and uses less context. Your agent picks web_fetch by default and escalates to browser_* only when JavaScript, login, or interaction is needed.
If you are going to read the page next, prefer open_url to open it in your normal browser (with your cookies, extensions, etc.).
Related
- System Tools — clipboard, screenshot of the desktop (not the browser).
- Search — when you want a result, not a specific URL.
- Skills —
agent-browserskill composes these into larger workflows.
ToShop Docs