Every Bonsai cluster comes with a set of pre-built search pipelines for hybrid search. These pipelines handle the scoring and ranking step that combines lexical (BM25) and vector (KNN) results into a single result set. You don't need to create or manage them yourself; they're ready to use out of the box.
How Search Pipelines Fit In
When you run a hybrid query in OpenSearch, you're sending two sub-queries: a lexical match and a KNN search. Each sub-query produces its own set of scored results. A search pipeline's job is to normalize those scores (since BM25 and KNN use completely different scales) and combine them into one ranked list.
You reference a pipeline by name in your query using the search_pipeline parameter. The pipeline runs server-side after both sub-queries return, so there's no extra round trip from your application.
Available Pipelines
Bonsai provisions two types of hybrid search pipelines on every cluster: weighted pipelines and an RRF pipeline.
Weighted Pipelines
These pipelines normalize scores using min-max normalization, then combine them with an arithmetic mean weighted toward either the lexical or vector side. The name tells you the split:
| Pipeline Name | Lexical (BM25) Weight | Vector (KNN) Weight |
|---|---|---|
hybrid-pipeline-lexical90-knn10 | 0.9 | 0.1 |
hybrid-pipeline-lexical80-knn20 | 0.8 | 0.2 |
hybrid-pipeline-lexical70-knn30 | 0.7 | 0.3 |
hybrid-pipeline-lexical60-knn40 | 0.6 | 0.4 |
hybrid-pipeline-lexical50-knn50 | 0.5 | 0.5 |
hybrid-pipeline-lexical40-knn60 | 0.4 | 0.6 |
hybrid-pipeline-lexical30-knn70 | 0.3 | 0.7 |
hybrid-pipeline-lexical20-knn80 | 0.2 | 0.8 |
hybrid-pipeline-lexical10-knn90 | 0.1 | 0.9 |
A higher lexical weight favors exact-match results. A higher KNN weight favors semantically similar results. If you're not sure where to start, hybrid-pipeline-lexical50-knn50 gives both sides equal say, and you can adjust from there based on what your users actually search for.
RRF Pipeline
The rrf-pipeline uses Reciprocal Rank Fusion instead of weighted scoring. Rather than normalizing and averaging scores, RRF looks at the rank position of each result in the two sub-queries and merges them based on that.
This is a good option when you don't want to tune weights at all. RRF doesn't care about the magnitude of scores, only the ordering, so it's naturally resistant to one sub-query dominating the other. The tradeoff is that you give up fine-grained control over how much lexical vs. vector results influence the final ranking.
Using a Pipeline in a Query
Reference the pipeline by name in your hybrid search request:
GET /my-index/_search?search_pipeline=hybrid-pipeline-lexical60-knn40
{
"query": {
"hybrid": {
"queries": [
{
"match": {
"title": "waterproof hiking jacket"
}
},
{
"knn": {
"embedding": {
"vector": [0.12, -0.34, ...],
"k": 10
}
}
}
]
}
}
}
Swap the pipeline name to change the weighting without touching your query logic.
Node.js
Using the @opensearch-project/opensearch client:
const { Client } = require("@opensearch-project/opensearch");
const client = new Client({
node: process.env.OPENSEARCH_URL, // Your Bonsai cluster URL
});
const response = await client.search({
index: "my-index",
search_pipeline: "hybrid-pipeline-lexical60-knn40",
body: {
query: {
hybrid: {
queries: [
{
match: {
title: "waterproof hiking jacket",
},
},
{
knn: {
embedding: {
vector: [0.12, -0.34 /* ... your query embedding */],
k: 10,
},
},
},
],
},
},
},
});
console.log(response.body.hits.hits);
Python
Using the opensearch-py client:
from opensearchpy import OpenSearch
client = OpenSearch(
hosts=[os.environ["OPENSEARCH_URL"]], # Your Bonsai cluster URL
)
response = client.search(
index="my-index",
params={"search_pipeline": "hybrid-pipeline-lexical60-knn40"},
body={
"query": {
"hybrid": {
"queries": [
{
"match": {
"title": "waterproof hiking jacket"
}
},
{
"knn": {
"embedding": {
"vector": [0.12, -0.34], # your query embedding
"k": 10
}
}
}
]
}
}
},
)
print(response["hits"]["hits"])
Choosing a Pipeline
There's no universally correct weight split. It depends on your content and your users. A few rules of thumb:
- Catalog or e-commerce search where users mix SKUs with natural-language descriptions: start with
lexical60-knn40and see how it feels. - Documentation or knowledge base search where queries tend to be full questions: lean toward the KNN side, something like
lexical30-knn70. - Don't want to think about it? Use
rrf-pipeline. It's a solid default that doesn't require tuning.
The best approach is to try a few pipelines against real queries from your users and see which one produces the most useful rankings. Since the pipeline is just a query parameter, switching between them costs nothing.
Ready to take a closer look at Bonsai?
Bonsai manages your search clusters and helps you achieve better search results for your users and your business. Find out if Bonsai is a good fit for you in just 15 minutes.