For most software projects, search used to be one of those features that sat in the backlog until the team got around to it. But now as teams integrate AI, or start with AI native, search is the most critical aspect of the platform, because AI is driven by good context, and a search engine is the best way to provide a fast and accurate tool for retrieving that context.
In this post, we'll walk through a straightforward search implementation for teams using Render to deploy their project(s). We'll use a rich dataset example (the Gutenberg Project public domain books catalog), and provide a starting point for teams using agents like Claude Code to help build their application.
Spec driven 10 blue links development
I took a different approach to this project. Traditionally I would have started from scratch, but this time I assembled the dataset, design, and application relying entirely on existing assets and a coding agent. The workflow was straightforward:
- Get the dataset:
- Find a repo that pre-processes the dataset already: https://github.com/garethbjohnson/gutendex
- Provide a baseline search mappings, settings, and queryDSL as context from another project
- Have the agent work with the repo to add a process command which outputs bulk ndjson files for indexing
- Design the experience:
- Use a design agent to make the wireframes & polished page assets.
- Manually fix some of the little things to fill in details.
- Build the application:
- Hand off the design to a coding agent, plan and generate the first version
- Iterate with the agent until I got what I wanted.
Bake it
These days, when I use agents to automate work, I like to use the prep time and cook time recipe analogy:
- Prep time was about 1 hour total
- Cook time was about 4 hours total
Prep time includes anything I need to do: permissions approval, manual interventions, iteration prompts, etc.
The result is a batteries included search application ready to deploy on Render with a Bonsai search cluster:
- Example dataset of 75,000 book records and URLs for full-context download
- Reference OpenSearch index and query for decent baseline relevance
- Clean search UI and application stack
I also had the application developed with expansion to more datasets in mind. This violated YAGNI but I couldn't resist since my agent was doing the work!
At the end of this post you'll have a working search application deployed on Render, backed by a Bonsai managed OpenSearch cluster. You'll understand the index schema, the relevance strategy behind the query, and how to extend it with your own data.
What we're building
We're building a classic "10 blue links" search experience that lets you search the Gutenberg project books catalog. Don't be fooled, though. Just because this is a classic search experience, doesn't mean it's obsolete - it's still the foundation for delivering search results in millions of products.
The app is a Node.js Express server with EJS templates. It has a clean landing page with a search box, and a results page with faceted sidebar filters (subjects, authors, bookshelves, languages, popularity, author era), pagination, and book cover thumbnails linking out to Gutenberg. The search uses function_score to blend text relevance with popularity, two custom English analyzers for both broad recall and precise phrase matching, and aggregations that power the sidebar. Minimal dependencies, no build step, and a codebase you can easily continue tinkering with.
Get your ingredients
You'll need four things:
A Bonsai cluster. Sign up at bonsai.io. Create an OpenSearch cluster and grab your cluster URL from the credentials page. It'll look something like https://user:pass@your-cluster.us-east-1.bonsaisearch.net. You can get it from the connect dialog for your cluster:
A Render account. Sign up at render.com. Follow their instructions to create a web application. Be sure to add the Bonsai connect URL as an environment variable "BONSAI_URL":
The code. Clone the example repo:
git clone https://github.com/omc/bonsai-training-example.git
cd bonsai-training-example
Then create a .env file at the repo root with your Bonsai cluster URL copied from above:
BONSAI_URL=https://user:pass@your-cluster.us-east-1.bonsaisearch.net
That's the only config. The app reads BONSAI_URL for the OpenSearch connection.
Our search application
Now to the fun part. We have books. Lots of books. 75000 of them from the Gutenberg Project catalog. The dataset has a rich set of text and metadata, and due to the quality of the dataset, not much needs to be done to build a high quality search experience from it.
The index schema
The schema lives in books/books-index.json. Aside from the fields mapped to appropriate datatypes, I included a old trick to cover both Precision and Recall for search use cases. What is this trick you ask? Well, it's quite easy - we define two text analyzers with different strategies:
{
"analyzer": {
"analyze_english": {
"type": "custom",
"tokenizer": "standard",
"char_filter": ["html_strip"],
"filter": ["lowercase", "english_stop", "english_stem"]
},
"analyze_english_precise": {
"type": "custom",
"tokenizer": "standard",
"char_filter": ["html_strip"],
"filter": ["lowercase", "english_stem_light"]
}
}
}
analyze_english is the standard pipeline: HTML stripping, lowercasing, stop words removal, aggressive stemming. This analysis chain is great for recall. "Dining", "Dinner", "Dine" will all normalize and match as the same concept.
analyze_english_precise uses only possessive stemming (strips the 's, and that's about it). Good for phrase matching where you want "dining" to match "dining" and not drift into stems.
Having both of the above lets us cast a wide net for general matching while still rewarding exact phrase hits. We apply this to the titles and summaries by leveraging a copy_to with separate fields. For entity fields such as authors, we only use precise.
Supporting both Precision and Recall
The document catalog has title and summaries, we use a copy_to to leverage the precise analyzer as separate fields, that will later be used in a query:
{
"title": {
"type": "text",
"analyzer": "analyze_english",
"copy_to": ["title_precise"]
},
"summaries": {
"type": "text",
"analyzer": "analyze_english",
"copy_to": ["summaries_precise"]
},
"title_precise": {
"type": "text",
"analyzer": "analyze_english_precise"
},
"summaries_precise": {
"type": "text",
"analyzer": "analyze_english_precise"
}
}
copy_to does the heavy lifting. When a document gets indexed, title content is analyzed by analyze_english for the title field, and also copied to title_precise where it goes through analyze_english_precise. One write, two analysis pipelines, no duplication in your indexing code. Same for summaries.
The rest of the schema includes keyword sub-fields on subjects, bookshelves, and author_names for exact-match aggregations, a download_count integer for popularity scoring, and boolean/keyword fields for filtering.
You'll also spot a summaries_embedding KNN vector field in the schema, pre-configured for 768-dimensional vectors on FAISS HNSW. We won't use it in this tutorial, but it's there for when you're ready to upgrade to hybrid search.
Load the data
The dataset is 75,000 Project Gutenberg books, pre-split into 32 NDJSON shard files for manageable bulk requests.
cd books
source ../.env
bash index.sh
Delete any existing index, create a fresh one from our schema, then bulk-load each shard file sequentially. The 32-file split keeps individual bulk requests at a reasonable size (about 6MB each), totaling around 190MB.
Once the load finishes, verify:
curl "$BONSAI_URL/books/_count"
You should see a count around 75,000.
The search query
I guided the agent to implement a function_score wrapping a bool.should with multiple clauses, ordered by specificity. The way the agent approached this is interesting. Normally I'd just straight up code the QueryDSL, but since I asked for the ability to add more datasets in the future it created an abstraction around the query constructor. My human expert critique is that this is a waste of an abstraction, but I figured I'd let it stand and save the tokens, rather than fighting the decision.
The QueryDSL lives in app/datasets/books.js, and the query builder is in app/search.js. The config defines what to search and how to score it. The builder translates that config into OpenSearch DSL. You can easily tune the query by changing the config file.
// app/datasets/books.js
query: {
clauses: [
{ type: "match_phrase", field: "title_precise", boost: 2.0 },
{ type: "match_phrase", field: "summaries_precise", boost: 1.4 },
{ type: "match_phrase", field: "author_names", boost: 1.4, slop: 1 },
{
type: "multi_match",
matchType: "cross_fields",
fields: [
"title_precise", "summaries_precise", "author_names",
"editor_names", "translator_names", "subjects",
"bookshelves", "languages", "media_type",
],
boost: 1.2,
},
{
type: "multi_match",
matchType: "cross_fields",
fields: ["title", "summaries"],
boost: 1.0,
},
],
scoreFunction: {
field: "download_count",
modifier: "log1p",
factor: 1.0,
},
boostMode: "sum",
}
As far as the boost numbers go, I had another query that I used for seed context and it copied the strategy quite well: exact phrase matches on titles get the highest boost (2.0). Summary phrases and author names follow at 1.4, with a slop of 1 on author names to handle "Mark Twain" vs. "Twain, Mark" in the index data. Cross-field matching across all precise fields at 1.2 provides broader coverage, and the stemmed title/summaries fields at 1.0 are the widest recall net. These are layered as should clauses, so a document matching multiple levels accumulates score from each.
The field_value_factor on download_count with log1p gives popular books a gentle relevance bump. log1p compresses the popularity signal so the difference between 100 and 1,000 downloads matters more than the difference between 10,000 and 100,000. Without the log, a mega-popular book would steamroll the text relevance score entirely. With it, well-known titles surface when text relevance is close, but a precise match on an obscure title still wins.
Facets & Filters
Every search request computes aggregations that populate the sidebar filters:
aggregations: [
{ name: "subjects", type: "terms", field: "subjects.keyword", size: 20 },
{ name: "authors", type: "terms", field: "author_names.keyword", size: 20 },
{
name: "bookshelves",
type: "terms",
field: "bookshelves.keyword",
size: 20,
},
{ name: "languages", type: "terms", field: "languages", size: 20 },
{ name: "copyright", type: "terms", field: "copyright" },
{
name: "popularity",
type: "range",
field: "download_count",
ranges: [
{ key: "low", to: 100 },
{ key: "moderate", from: 100, to: 1000 },
{ key: "popular", from: 1000, to: 10000 },
{ key: "very_popular", from: 10000 },
],
},
{
name: "author_era",
type: "histogram",
field: "author_birth_years",
filterInterval: 100,
},
];
Terms aggs for categorical data, range buckets for popularity tiers, and a century-wide histogram for author birth years. When a user clicks a filter checkbox, the selected values get sent back as query parameters and applied as bool.filter clauses that narrow results without touching the relevance scores. This is standard practice and it works well.
If you're applying aggregations at a larger scale with higher query rates, watch your CPU. I covered some of the performance footguns with aggregations in the autocomplete scaling post, and they apply here too.
Find out how we can help you.
Schedule a free consultation to see how we can create a customized plan to meet your search needs.
The application
We only need these 3 dependencies: Opensearch client, Express, and EJS:
{
"dependencies": {
"@opensearch-project/opensearch": "^3.5.0",
"ejs": "^5.0.2",
"express": "^5.1.0"
}
}
Routes
GET / Home page, lists available datasets
GET /:dataset Landing page with search box
GET /:dataset/search Search results with filters and pagination
GET /health Cluster health check (JSON)
The :dataset parameter makes this extensible. Each dataset is a JS config file in app/datasets/ that exports its index name, query clauses, aggregation definitions, and display rules (title field, link template, image sources, snippet logic, tags). Right now there's just books.js, but you could drop in a products.js or articles.js and the app would pick it up on the home page automatically. This is the YAGNI I mentioned before but I like having the option.
Server
The Express server is about 85 lines. It loads dataset configs from app/datasets/, sets up EJS templating, and defines the four routes. The search route parses filter parameters from the query string, passes them to the search function, and hands everything to the template.
app.get("/:dataset/search", async (req, res) => {
var config = configs[req.params.dataset];
if (!config) return res.status(404).send("Dataset not found");
const query = req.query.q || "";
const filters = {};
config.aggregations.forEach(function (agg) {
if (req.query[agg.name]) {
filters[agg.name] = [].concat(req.query[agg.name]);
}
});
const page = Math.max(1, parseInt(req.query.page, 10) || 1);
const from = (page - 1) * 10;
const results = await search(config, query, 10, filters, from);
res.render("results", {
query,
results,
filters,
page,
perPage: 10,
dataset,
config,
});
});
The query builder
search.js takes the dataset config and assembles the OpenSearch query body. It's data-driven: add a clause or aggregation to the config, and the builder picks it up automatically.
// app/search.js
const { Client } = require("@opensearch-project/opensearch");
const client = new Client({ node: process.env.BONSAI_URL });
const search = async function (config, querystring, k, filters, from) {
const body = getQuery(config, querystring, k, filters);
body.from = from || 0;
const resp = await client.search({
index: config.index,
body: body,
});
return resp;
};
The getQuery function maps config clauses to OpenSearch query DSL (match_phrase, multi_match), builds the aggregations object, wraps everything in function_score if a score function is configured, and attaches active sidebar selections as bool.filter clauses. About 130 lines of straightforward mapping code that you probably won't need to touch unless you're adding a new clause type.
I think this is the main drawback of having the agent code this up. Queries are way more expressive than this when you get into deep relevance tuning for info needs using actual judgements. Assuming a function_score wrapper with some bools is making strong assumptions here. But the good news is that this can be torn down and rebuilt in an instant with the agent, like castles of sand on the beach.
Templates
The results template renders a two-column layout: sidebar filters on the left, search results on the right. Each result shows a book cover thumbnail, the title linked to the Gutenberg page, author name, language, download count, a summary snippet truncated to 300 characters, and subject tags. Filter checkboxes trigger a page reload with updated query parameters. Pagination uses a sliding window of up to 7 page numbers with prev/next links. On mobile (under 880px), the sidebar collapses into a toggleable panel. All vanilla JavaScript, no client-side framework.
This is the part I'm honestly most satisfied with. The design agent made a beautiful site here. I attribute this to the absolute mountains of training data for 10-blue links SERPs experiences out there. I only had some very small changes manually since copy/pasting the logo and colors from our website was faster than trying to get the agent to do it.
Run it locally
cd app
npm install
npm run dev
The dev script uses Node 24's --env-file flag to pull BONSAI_URL from ../.env, and --watch for auto-restart on file changes:
"dev": "node --env-file=../.env --watch server.js"
Open http://localhost:4444, type "shakespeare" into the search box, and you should see results with faceted filters on the left. Try clicking some subject or language filters to narrow things down. Check "popular" under Popularity to see the well-known titles float up.
Deploy to Render
Render supports Infrastructure as Code through a render.yaml blueprint at the repo root. Here's ours:
services:
- type: web
name: bonsai-search
runtime: node
plan: starter
buildCommand: npm install
startCommand: npm start
healthCheckPath: /health
envVars:
- key: NODE_ENV
value: production
- key: BONSAI_URL
sync: false
The sync: false on BONSAI_URL tells Render you'll set this value manually in the dashboard rather than pulling it from an environment group.
Deploy steps
- Push your code to a GitHub or GitLab repository.
- In the Render dashboard, click New and select Blueprint. Point it at your repo.
- Render detects the
render.yamland shows what it will create. You'll see thebonsai-searchweb service on the Starter plan. - Go to the service's Environment tab and set
BONSAI_URLto your full Bonsai cluster URL with credentials:
https://user:pass@your-cluster.us-east-1.bonsaisearch.net
- Deploy. Render runs
npm install, starts the server withnpm start, and hits/healthto verify the cluster connection. - Your search app is live at the
.onrender.comURL Render assigns.
Try it out
Once deployed, run some searches to exercise the relevance tuning:
- "pride and prejudice" - Title phrase matching should put Austen at the top.
- "mark twain" - Author name matching with slop handles both "Twain, Mark" and "Mark Twain" in the index.
- "adventure" - Broad match across titles and summaries, then narrow with the Bookshelves filter on the sidebar.
- "french" - Filter by Language to see French-language texts.
Check the sidebar filter counts, combine multiple filters, and page through results. If something looks off, hit /health to check the cluster connection, and look at the Render logs for errors.
What's next
You've got a real search application running on Render with a Bonsai managed cluster, and a solid relevance foundation underneath it.
The index schema already has a summaries_embedding KNN vector field configured and ready for 768-dimensional vectors. Generate embeddings for the book summaries, bulk-load them, and you can run hybrid lexical + vector queries. We have a walkthrough for this: Adding semantic search to Elasticsearch and OpenSearch with Mistral AI embeddings.
A search box without autocomplete feels like it's missing something. We've covered the approach in depth if you want to add it: How to really do autocomplete covers the fundamentals with significant_terms aggregations, and How to really scale autocomplete takes it to millions of documents. The Gutenberg dataset at 75k books is a good starting size.
The app/datasets/ config pattern makes it straightforward to bring your own data. Create a new config file with your query clauses, aggregations, and display rules, create a matching index on your cluster, bulk-load, and the app picks it up on restart. No routing changes.
Once you're moving beyond exploration and search is becoming central to your product, your Sandbox cluster will start to show its limits. Our Sandbox tier is meant for testing, not scaling. Upgrading to a Standard plan removes those constraints and gives you a production-ready home for your search infrastructure. Everything you've built here carries forward without changes. And if search is mission-critical for your business, we're happy to talk through what plan makes sense for your scale.
If you're building AI features and need a search backend for RAG or agent tool use, this is a strong starting point. The search engine handles the retrieval side: fast, relevant results from your data that you feed into an LLM as context. Bonsai clusters are AI-ready out of the box.
We also have a similar deployment guide for teams on Fly.io, if that's your platform.
If you want to talk through your search architecture or relevance tuning, reach out at support@bonsai.io. We're real people and we like talking about search.
Ready to take a closer look at Bonsai?
Bonsai manages your search clusters and helps you achieve better search results for your users and your business. Find out if Bonsai is a good fit for you in just 15 minutes.
Learn how a managed service works and why it’s valuable to dev teams
You won’t be pressured or used in any manipulative sales tactics
We’ll get a deep understanding of your current tech stack and needs
Get the information you need to decide whether to go with Bonsai