Web Scraping

Overview

Web scraping lets you extract structured data from websites. Browserbase provides a reliable browser infrastructure that helps you build scrapers that can:

Scale without infrastructure management
Maintain consistent performance
Avoid bot detection and CAPTCHAs with Browserbase’s stealth mode
Provide debugging and monitoring tools with session replays and live views

This guide will help you get started with web scraping on Browserbase and highlight best practices.

Scraping a website

Using a sample website, we’ll scrape the title, price, and some other details of books from the website.

Follow Along: Web Scraping Example

Step-by-step code for web scraping

Code Example

Node.js
Python

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";
import dotenv from "dotenv";
dotenv.config();

const stagehand = new Stagehand({
    env: "BROWSERBASE",
    verbose: 0,
});

async function scrapeBooks() {
    await stagehand.init();
    const page = stagehand.page;

    await page.goto("https://books.toscrape.com/");

    const scrape = await page.extract({
        instruction: "Extract the books from the page",
        schema: z.object({
            books: z.array(z.object({
                title: z.string(),
                price: z.string(),
                image: z.string(),
                inStock: z.string(),
                link: z.string(),
            }))
        }),
    });

    console.log(scrape.books);

    await stagehand.close();
    return books;
}

const books = scrapeBooks().catch(console.error);

Example output

[
  {
    title: 'A Light in the Attic',
    price: '£51.77',
    image: 'https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg',
    inStock: 'In stock',
    link: 'catalogue/a-light-in-the-attic_1000/index.html'
  },
  ...
]

Best Practices for Web Scraping

Follow these best practices to build reliable, efficient, and ethical web scrapers with Browserbase.

Ethical Scraping

Respect robots.txt: Check the website’s robots.txt file for crawling guidelines
Rate limiting: Implement reasonable delays between requests (2-5 seconds)
Terms of Service: Review the website’s terms of service before scraping
Data usage: Only collect and use data in accordance with the website’s policies

Performance Optimization

Batch processing: Process multiple pages in batches with concurrent sessions
Selective scraping: Only extract the data you need
Resource management: Close browser sessions promptly after use
Connection reuse: Reuse browsers for sequential scraping tasks

Stealth and Anti-Bot Avoidance

Enable Browserbase Advanced Stealth mode: Helps avoid bot detection
Randomize behavior: Add variable delays between actions
Use proxies: Rotate IPs to distribute requests
Mimic human interaction: Add realistic mouse movements and delays
Handle CAPTCHAs: Enable Browserbase’s automatic CAPTCHA solving

Next Steps

Now that you understand the basics of web scraping with Browserbase, here are some features to explore next:

Stealth Mode

Configure fingerprinting and CAPTCHA solving

Browser Contexts

Persist cookies and session data

Proxies

Configure IP rotation and geolocation

Introduction

Fundamentals

Features

Use Cases

Guides

Integrations

Overview

Scraping a website

Follow Along: Web Scraping Example

Code Example

Example output

Best Practices for Web Scraping

Ethical Scraping

Performance Optimization

Stealth and Anti-Bot Avoidance

Next Steps

Stealth Mode

Browser Contexts

Proxies

Introduction

Fundamentals

Features

Use Cases

Guides

Integrations

​Overview

​Scraping a website

Follow Along: Web Scraping Example

​Code Example

​Example output

​Best Practices for Web Scraping

​Ethical Scraping

​Performance Optimization

​Stealth and Anti-Bot Avoidance

​Next Steps

Stealth Mode

Browser Contexts

Proxies

Overview

Scraping a website

Code Example

Example output

Best Practices for Web Scraping

Ethical Scraping

Performance Optimization

Stealth and Anti-Bot Avoidance

Next Steps