1

Get your API Key

Go over the Dashboard’s Settings tab:

Then copy your API Key directly from the input.

2

Install the Browserbase SDK

  pip install browserbase
3

Load documents or images

Load documents

from langchain_community.document_loaders import BrowserbaseLoader

BROWSERBASE_API_TOKEN = "<Your Browserbase API Key goes here>"

loader = BrowserbaseLoader(
    api_token=BROWSERBASE_API_TOKEN,
    urls=[
        # load multiple pages
        "https://www.espn.com",
        "https://lilianweng.github.io/posts/2023-06-23-agent/"
    ],
    text_content=True,
)

documents = loader.load()

The default value text_content=False will return HTML as a LlamaIndex Document.

Setting text_content=True will return LlamaIndex Document with text only.

Load images

from browserbase import Browserbase
from browserbase.helpers.gpt4 import GPT4VImage, GPT4VImageDetail
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(model="gpt-4-vision-preview", max_tokens=256)
browser = Browserbase()

screenshot = browser.screenshot("https://browserbase.com")

result = chat.invoke(
    [
        HumanMessage(
            content=[
                {"type": "text", "text": "What color is the logo?"},
                GPT4VImage(screenshot, GPT4VImageDetail.auto),
            ]
        )
    ]
)

print(result.content)

By default, the screenshot() method takes a screenshot of the visible viewport.

To take a full-page screenshot, pass the full_page=True option.

The reference of the browserbase package is available on GitHub.