The essentials of web automation by prompting with Stagehand
Stagehand is a browser automation framework to control browsers with natural language + code. It combines the flexibility of AI with the precision and reliability of traditional code. Built directly on CDP for speed.
Stagehand is built on 4 primitives. You use these like building blocks, adding step by step to your automation script:
agent()
Autonomous multi-step execution. “Find the cheapest flight and book it”.
observe()
See the current page state.
act()
Perform a single action. “Click login”.
extract()
Pull structured data.
Act vs Agentact() → Best for an atomic action. Like, “type ‘hello’ in the search box”agent() → Best for a multi-step workflow or unknown page structure. Like, “log in and download my latest invoice”
Example — all four primitives in one script (TypeScript)
import { Stagehand } from "@browserbasehq/stagehand";const stagehand = new Stagehand({ env: "BROWSERBASE", model: { modelName: "google/gemini-3-flash-preview", apiKey: process.env.MODEL_API_KEY, }});await stagehand.init();const page = stagehand.context.pages()[0];await page.goto("https://news.ycombinator.com");// observe() — preview what's actionable on the pageconst actions = await stagehand.observe("find the link to the top story");console.log(actions[0]);// act() — perform a single actionawait stagehand.act("click on the comments link for the top story");// extract() — pull structured dataconst data = await stagehand.extract("extract the title and points of the story");console.log(data);// agent() — autonomous multi-step executionconst agent = stagehand.agent({ mode: "hybrid", model: "google/gemini-3-flash-preview",});await agent.execute("go back to the homepage and find the post with the most comments");await stagehand.close();