Vector search with Next.js and OpenAI
Learn how to build a ChatGPT-style doc search powered by Next.js, OpenAI, and Supabase.
While our Headless Vector search provides a toolkit for generative Q&A, in this tutorial we'll go more in-depth, build a custom ChatGPT-like search experience from the ground-up using Next.js. You will:
- Convert your markdown into embeddings using OpenAI.
- Store you embeddings in Postgres using pgvector.
- Deploy a function for answering your users' questions.
You can read our Supabase Clippy blog post for a full example.
We assume that you have a Next.js project with a collection of .mdx
files nested inside your pages
directory. We will start developing locally with the Supabase CLI and then push our local database changes to our hosted Supabase project. You can find the full Next.js example on GitHub.
Create a project
- Create a new project in the Supabase Dashboard.
- Enter your project details.
- Wait for the new database to launch.
Prepare the database
Let's prepare the database schema. We can use the "OpenAI Vector Search" quickstart in the SQL Editor, or you can copy/paste the SQL below and run it yourself.
- Go to the SQL Editor page in the Dashboard.
- Click OpenAI Vector Search.
- Click Run.
Pre-process the knowledge base at build time
With our database set up, we need to process and store all .mdx
files in the pages
directory. You can find the full script here, or follow the steps below:
Generate Embeddings
Create a new file lib/generate-embeddings.ts
and copy the code over from GitHub.
Set up environment variables
We need some environment variables to run the script. Add them to your .env
file and make sure your .env
file is not committed to source control!
You can get your local Supabase credentials by running supabase status
.
Run script at build time
Include the script in your package.json
script commands to enable Vercel to automatically run it at build time.
Create text completion with OpenAI API
Anytime a user asks a question, we need to create an embedding for their question, perform a similarity search, and then send a text completion request to the OpenAI API with the query and then context content merged together into a prompt.
All of this is glued together in a Vercel Edge Function, the code for which can be found on GitHub.
Create Embedding for Question
In order to perform similarity search we need to turn the question into an embedding.
Perform similarity search
Using the embeddingResponse
we can now perform similarity search by performing an remote procedure call (RPC) to the database function we created earlier.
Perform text completion request
With the relevant content for the user's question identified, we can now build the prompt and make a text completion request via the OpenAI API.
If successful, the OpenAI API will respond with a text/event-stream
response that we can forward to the client where we'll process the event stream to smoothly print the answer to the user.
Display the answer on the frontend
In a last step, we need to process the event stream from the OpenAI API and print the answer to the user. The full code for this can be found on GitHub.
Learn more
Want to learn more about the awesome tech that is powering this?
- Read about how we built ChatGPT for the Supabase Docs.
- Read the pgvector Docs for Embeddings and vector similarity
- Watch Greg's video for a full breakdown: