Introduction
BrowseGPT is a tool that allows you to search the web using a chat interface.
It is built on top of the Vercel AI SDK and Browserbase.
What this tutorial covers
-
Access and scrape website posts and contents using Browserbase
-
Use the Vercel AI SDK to create a chat interface
-
Stream the results from the LLM
Usage
To use BrowseGPT, you need to have the Vercel AI SDK and Browserbase installed.
We recommend using the following packages:
npm install ai zod playwright @vercel/ai @mozilla/readability jsdom
Getting Started
For this tutorial, you’ll need:
-
Browserbase credentials:
-
An LLM API key from one of the following:
Browserbase sessions often run longer than 15 seconds. By signing up for the
Pro Plan on Vercel, you can increase the Vercel
function duration limit.
Imports and Dependencies
Nextjs uses Route Handlers to handle API requests.
These include methods such as GET
, POST
, PUT
, DELETE
, etc.
To create a new route handler, create a new file in the app/api
directory.
In this example, we’ll call this file route.ts
for the chat route.
From here, we’ll import the necessary dependencies.
import { openai } from "@ai-sdk/openai";
import { streamText, convertToCoreMessages, tool, generateText } from "ai";
import { z } from "zod";
import { chromium } from "playwright";
import { anthropic } from "@ai-sdk/anthropic";
import { Readability } from "@mozilla/readability";
import { JSDOM } from "jsdom";
This section imports necessary libraries and modules for the application.
It includes the Vercel AI SDK, Zod for schema validation, Playwright for web automation, and libraries for content extraction and processing.
Helper Functions
These are utility functions used throughout the application.
getDebugUrl
fetches debug information for a Browserbase session, while createSession
initializes a new Browserbase session for web interactions.
async function getDebugUrl(id: string) {
const response = await fetch(
`https://www.browserbase.com/v1/sessions/${id}/debug`,
{
method: "GET",
headers: {
"x-bb-api-key": process.env.BROWSERBASE_API_KEY,
"Content-Type": "application/json",
},
},
);
const data = await response.json();
return data;
}
async function createSession() {
const response = await fetch(`https://www.browserbase.com/v1/sessions`, {
method: "POST",
headers: {
"x-bb-api-key": process.env.BROWSERBASE_API_KEY,
"Content-Type": "application/json",
},
body: JSON.stringify({
projectId: process.env.BROWSERBASE_PROJECT_ID,
keepAlive: true,
}),
});
const data = await response.json();
return { id: data.id, debugUrl: data.debugUrl };
}
Main API Route Handler
This section sets up the main API route handler.
It configures the runtime environment, sets a maximum duration for the API call, and defines the POST method that will handle incoming requests.
You can see we use the Vercel AI SDK’s streamText function to process messages and stream responses.
We set the maximum duration to 300 seconds (5 minutes), since our Browserbase sessions often run longer than 15 seconds (Vercel’s default timeout).
export const maxDuration = 300;
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
experimental_toolCallStreaming: true,
model: openai("gpt-4-turbo"),
messages: convertToCoreMessages(messages),
tools: {
},
});
return result.toDataStreamResponse();
}
Next, we’ll create the tools needed for this Route Handler. These tools would be used depending on the user’s request.
For example, if they want to search the web, we’ll use the googleSearch
tool. If they want to get the content of a page, we’ll use the getPageContent
tool.
Keep in mind that you have the option to choose any LLM model that is compatible with the Vercel AI SDK.
We found that using gpt-4-turbo
was the best for tool calling, and claude-3-5-sonnet-20241022
was the best for generating responses.
This tool creates a new Browserbase session. It’s used when a fresh browsing context is needed for web interactions.
The tool returns the session ID and debug URL, which are used in subsequent operations.
createSession: tool({
description: 'Create a new Browserbase session',
parameters: z.object({}),
execute: async () => {
const session = await createSession();
const debugUrl = await getDebugUrl(session.id);
return { sessionId: session.id, debugUrl: debugUrl.debuggerFullscreenUrl, toolName: 'Creating a new session'};
},
}),
As you can see, we used the createSession()
and getDebugUrl()
we made earlier to create a new Browserbase session and get the debug URL.
This is so later we can embed the debug URL in the response and our frontend can use it to view the Browserbase session.
This tool performs a search on the web using Browserbase. It takes a search query as input and returns the search results.
googleSearch: tool({
description: 'Search Google for a query',
parameters: z.object({
}),
execute: async ({ query, sessionId }) => {
const defaultContext = browser.contexts()[0];
const page = defaultContext.pages()[0];
await page.goto(`https://www.google.com/search?q=${encodeURIComponent(query)}`);
await page.waitForTimeout(500);
await page.keyboard.press('Enter');
await page.waitForLoadState('load', { timeout: 10000 });
await page.waitForSelector('.g');
const results = await page.evaluate(() => {
const items = document.querySelectorAll('.g');
return Array.from(items).map(item => {
const title = item.querySelector('h3')?.textContent || '';
const description = item.querySelector('.VwiC3b')?.textContent || '';
return { title, description };
});
});
const text = results.map(item => `${item.title}\n${item.description}`).join('\n\n');
const response = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
prompt: `Evaluate the following web page content: ${text}`,
});
return {
toolName: 'Searching Google',
content: response.text,
dataCollected: true,
};
},
}),
This tool asks the user for confirmation before performing a specific action.
It takes a confirmation prompt as input and returns the user’s response.
askForConfirmation: tool({
description: 'Ask the user for confirmation.',
parameters: z.object({
message: z.string().describe('The message to ask for confirmation.'),
}),
}),
Get Page Content tool
The last tool we’ll create is the getPageContent
tool.
This tool retrieves the content of a web page using Playwright. It then uses jsdom to parse the HTML content into a DOM structure and Readability to extract the main content of the page.
Finally, it uses the Anthropic Claude model to generate a summary of the page’s content.
getPageContent: tool({
description: 'Get the content of a page using Playwright',
parameters: z.object({
url: z.string().describe('The URL of the page to fetch content from'),
sessionId: z.string().describe('The Browserbase session ID to use'),
}),
execute: async ({ url, sessionId }) => {
const debugUrl = await getDebugUrl(sessionId);
const browser = await chromium.connectOverCDP(debugUrl.debuggerFullscreenUrl);
const defaultContext = browser.contexts()[0];
const page = defaultContext.pages()[0];
await page.goto(url, { waitUntil: 'networkidle' });
const content = await page.content();
const dom = new JSDOM(content);
const reader = new Readability(dom.window.document);
const article = reader.parse();
let extractedContent = '';
if (article) {
extractedContent = article.textContent;
} else {
extractedContent = await page.evaluate(() => document.body.innerText);
}
const response = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
prompt: `Summarize the following web page content: ${extractedContent}`,
});
return {
toolName: 'Getting page content',
content: response.text,
dataCollected: true,
};
},
}),
Frontend
Now that we have our tools and route handler set up, we can create our frontend.
We’ll use the useChat hook to create a chat interface.
Here’s a simple example of how to use BrowseGPT in a Next.js frontend application:
'use client';
import { useChat } from 'ai/react';
import { useState, useEffect } from 'react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
maxSteps: 5,
});
const [showAlert, setShowAlert] = useState(false);
const [statusMessage, setStatusMessage] = useState('');
const [sessionId, setSessionId] = useState(null);
useEffect(() => {
const lastMessage = messages[messages.length - 1];
if (isLoading) {
setShowAlert(true);
setStatusMessage('The AI is currently processing your request. Please wait.');
setSessionId(null);
} else {
setShowAlert(false);
}
}, [isLoading, messages]);
useEffect(() => {
const lastMessage = messages[messages.length - 1];
if (lastMessage?.toolInvocations) {
for (const invocation of lastMessage.toolInvocations) {
if ('result' in invocation && invocation.result?.sessionId) {
setSessionId(invocation.result.sessionId);
break;
}
}
}
}, [messages]);
return (
<div className="flex flex-col min-h-screen">
<div className="flex-grow flex flex-col w-full max-w-xl mx-auto py-4 px-4">
{messages.map((m) => (
<div key={m.id} className="whitespace-pre-wrap">
<strong>{m.role === 'user' ? 'User: ' : 'AI: '}</strong>
<p>{m.content}</p>
</div>
))}
{showAlert && (
<div className="my-4">
<p>{statusMessage}</p>
</div>
)}
</div>
<div className="w-full max-w-xl mx-auto px-4 py-4">
<form onSubmit={handleSubmit} className="flex">
<input
className="flex-grow p-2 border border-gray-300"
value={input}
placeholder="Ask anything..."
onChange={handleInputChange}
/>
<button type="submit" disabled={!input.trim()}>
Send
</button>
</form>
</div>
</div>
);
}
Conclusion
You’ve now seen how to use the Vercel AI SDK to create a chat interface that can search the web using Browserbase.
You can view a demo of this tutorial here.
We’ve also open-sourced the code for this tutorial here.