Adding Semantic Search to Elasticsearch/OpenSearch with MistralAI Embeddings
This in the next installment in a series intended to help you deploy your own search engine using Elasticsearch.
Schedule a free consultation to see how we can create a customized plan to meet your search needs.
Schedule a consultation
We'll be using the NestJS application from our Supercharge Your NestJS App with Hosted Search article, which you can grab at https://github.com/omc/bonsai-examples ! In this article, we'll start on the main
branch, and work towards parity of this application on the nestjs/semantic-search-basics
branch.
Once you've cloned the repository, follow these steps to get started:
- Navigate to the
nodejs/nestjs
directory. - Install the dependencies with
npm install
. - Copy the development environment file:
cp ./env-example ./.env
- Start the
postgres
and elasticsearch
services with docker compose:
docker compose up postgres elasticsearch
Important
If you've been following along with our previous blog post, make sure to clean up your
PostgreSQL and Elasticsearch databases by running the following commands:
npm run seed:revert:search
npm run seed:revert:relational
It's also possible to update the index and data, but we're going to save some time by truncating and dropping the previous data for this demonstration!
Now you're ready to open the project in your favorite text editor or IDE, and dive into the next section!
What is semantic search, anyway?
We'll get to this section's namesake question in a moment, but first, it's worth discussing what an Elasticsearch or OpenSearch index is!
What is an Elasticsearch / OpenSearch index, anyway?
Elasticsearch indexes are data structures known as inverted indexes, which maps things like words to a document or set of documents. It can help to think of it abstractly as a key-value store, in which the keys are, say, words like "element" and the values are arrays of document identifiers.
So, when I searched for, "The third element," at the end of our previous post, the results returned were found by comparing the individual search terms "the", "third", and "element" against the inverted index, which then returned the documents containing those words. That result set was sorted by relevancy, which in that particular case simply meant which documents matched the highest number of terms with the query that was input, so it's no surprise that our first result was a document with the title, "the fifth element"!
Info
The "standard" index created in our previous post is pretty basic, providing a direct mapping between terms contained in all documents to which documents they were contained in. This type of index supports keyword search powered by the Okapi BM25 algorithm at time of writing.
Thankfully, many community contributors have worked on extending this functionality with various, "analyzers," which add additional layers of, well, analysis to both the index and search functionality the index enables.
For example, the "english" language analyzer applies a number of such analysis transformations. One of these, the "stemming" algorithm reduces words to their common form. That is, all of the words "walks," "walking," and "walked" will be reduced down to their root word, "walk," in the index. So, when you search for, "walk taco," on an index of incredible snack food recipes which makes use of the "english" analyzer, one of your top results probably is a highly relevant, "walking tacos," recipe. Assuming that the recipe index is truly incredible, that is.
What is semantic search, anyway? Take two.
If the default search mechanism is keyword search, then you can think of semantic search as a way to perform a sort of spatial search over text.
Info
Below, we use an analogy associating vector embeddings as "coordinates," which isn't strictly accurate, but certainly is helpful!
In our spatial semantic search, the "coordinates" of each item are denoted by a vector; that is, the embeddings that we interact with will be represented as an array of numbers. When we review a result set on from a query on a semantic search index, we're looking at a result set which is created based on the relatedness of our query to the words and phrases associated with each document in our index, based on the "coordinates" of the query we entered, and the "coordinates" of the documents.
Building Semantic Search with MistralAI Embeddings in Elasticsearch/OpenSearch
Note
We pre-processed the Movie Dialog Corpus used in our previous post to create a denormalized dataset of movie titles along with the script of utterances/lines mentioned by characters in each movie.
You can do this yourself by performing an INNER JOIN
on the movie_characters_metadata
, movie_lines
, and movie_titles_metadata
files/datasets.
Now that we have a rough understanding of how semantic search is different from keyword search, the question we might ask ourselves is, "can we implement semantic search in Elasticsearch/OpenSearch?" As you might imagine, given the leading question, yes - we definitely can implement semantic search in both Elasticsearch and OpenSearch!
Whether the semantic search index on its own is "better" depends on the relevancy of the returned results given our expected query workload!
Hint
Semantic search can be a wonderful tool in the search-practitioner's tool-box, and is often used to augment a search result-set alongside other optimization and search techniques, creating what is known as a "hybrid" search implementation.
Updating the Movie data model to contain the movie script
The first thing we need to do is add a script
column to our Movie
entity:
- Open up the
src/movies/infrastructure/persistence/relational/entities/movie.entity.ts
file. - Add the following lines to the
MovieEntity
class:
@ApiProperty({
type: String,
example: '...\nUTAPAN: Chief says...\n...',
})
@Column({ type: String, nullable: true })
script: string;
Now, we're ready to generate the database migration for the new script
column:
npm run migration:generate src/database/migrations/MovieAddScript
Which we can apply with this command:
npm run migration:run
Supporting search over Movie scripts
Enable selection of which Elasticsearch/OpenSearch index field to target
Since we might want to continue searching by movie title, we'll add the option of users to select which field(s) they want to search over. This will currently allow for possibilities of title
, script
, and combinations of the two.
1. First, open up the src/movies/dto/query-movie.dto.ts
file, and:
1a) Add these dependencies up-top:
import {
IsArray,
IsEnum
} from from 'class-validator';
...
1b) In the same query-movie.dto.ts
file, define the SearchTarget
enum:
// src/movies/dto/query-movie.dto.ts
...
export enum SearchTarget {
Title = 'title',
Script = 'script',
}
...
1c) And finally, in the same file, and inside the QueryMovieDto
class, add the targets
query parameter:
...
export class QueryMovieDto {
...
@ApiProperty({
type: [String],
default: [SearchTarget.Title],
required: false,
})
@IsEnum(SearchTarget, { each: true })
@IsOptional()
@IsArray()
targets: SearchTarget[];
}
...
2. Next, open up our search document interface, located at src/movies/interfaces/movie-search-document.interface.ts
, and add the script
field. This is what your file should look like after the change:
export interface MovieSearchDocument {
id: number;
title: string;
script: string;
}
3. And update our src/movies/movies-search.service.ts
, which will send the search query over to Elasticsearch, to make use of our new targets
parameter. There are a couple of major changes here described below. Make sure your search
function looks like the one in the snippet!
3a) Update the function parameters to include targets
.
3b) Change the query type from match
to multi_match
(over our targets).
export class MoviesSearchService {
...
async search(
text: string,
targets: string[],
offset?: number,
limit?: number,
startId = 0,
) {
let separateCount = 0;
if (startId) {
separateCount = await this.count(text, targets);
}
const params: RequestParams.Search = {
index: this.index,
from: offset,
size: limit,
body: {
query: {
multi_match: {
query: text,
fields: targets,
},
},
},
};
const { body: result } = await this.elasticsearchService.search(params);
const count = result.hits.total.value;
const results = result.hits.hits.map((item) => item._source);
return {
count: startId ? separateCount : count,
results,
};
}
...
}
4. Accordingly, we need to add our new targets
query parameter to the MoviesService
from within our src/movies/movies.controller.ts
's findAll
function. It's a big file and function, so I've only included the code for the existing search
if conditional. Update this section of code to match that part of the snippet below:
export class MoviesController {
...
async findAll(...): ... {
...
if (search) {
return await this.moviesService.search(
search,
query.targets,
query?.offset,
query?.limit,
query?.startId,
);
}
...
}
}
5. And, finally, we can update our top-level src/movies/movies.service.ts
to make use of the new parameter by:
5 a) Import the SearchTargetenum
.
import { SearchTarget } from './dto/query-movie.dto';
5 b) Update the search
method's signature and use of the moviesSearchService.search
method:
export class MoviesService {
constructor(
private readonly movieRepository: MovieRepository,
private readonly moviesSearchService: MoviesSearchService,
) {}
...
async search(
text: string,
targets: SearchTarget[], // Add this line here!
offset?: number,
limit?: number,
startId?: number,
) {
const { results, count } = await this.moviesSearchService.search(
text,
targets,
offset,
limit,
startId,
);
...
}
...
}
To test out our new functionality, start the development server:
Optionally, check out a few requests, with different arguments for our targets
query parameter:
1. Seed the database:
npm run seed:run:relational
npm run seed:run:search
2. Query away!
# Default value (['title'])
curl "localhost:3000/api/v1/movies?search=element"
# Specify ['script'] (singular array entry, should return a count of 0)
curl "localhost:3000/api/v1/movies?search=element&targets[]=script"
# Specify both ['title', 'script'] (multiple array entries)
curl "localhost:3000/api/v1/movies?search=element&targets[]=script&targets[]=title"
# Specify an invalid entry, 'script_embedding_vector'
curl "localhost:3000/api/v1/movies?search=element&targets[]=script_embedding_vector"
Tip
This feature is a prime target for unit and integration tests!
Warning
Don't forget to revert those changes before continuing!
npm run seed:revert:relational
npm run seed:revert:search
Seed Elasticsearch/OpenSearch with the Movie script content for search
Our current seeding setup looks like this:
- Seed the relational database with data.
- Seed the search index with data from the relational database, and reference the relational database by using its generated unique identifiers (the
id
column).
To keep the same simple seed workflow, we're going to first push the script data into the relational database, and let the search database pull that data in via the search seed scripts!
The new format of our seed data is CSV. Grab it here, and place it at nodejs/nestjs/src/database/seeds/relational/movie/movie_titles_and_script.csv
!
We'll need to first install a parser: the csv-parse package will do the trick!
And, update our src/database/seeds/relational/movie/movie-seed.service.ts
file to use the new seed data. There aren't too many changes, but it's a big file, and the changes aren't very interesting so it might be easiest if you copy/paste this content into your file once you've reviewed it:
// movie-seed.service.ts
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { MovieEntity } from '../../../../movies/infrastructure/persistence/relational/entities/movie.entity';
import { createReadStream } from 'fs';
import { parse } from 'csv-parse';
import path from 'path';
@Injectable()
export class MovieSeedService {
constructor(
@InjectRepository(MovieEntity)
private repository: Repository,
) {}
async run() {
const countMovies = await this.repository.count({});
if (!countMovies) {
const parser = createReadStream(
path.join(__dirname, 'movie_titles_and_script.csv'),
).pipe(parse({ delimiter: ',', from_line: 2 }));
/* In a real application, you might consider using a bulk command, like PostgreSQL's
COPY (https://www.postgresql.org/docs/current/sql-copy.html)
*/
for await (const data of parser) {
await this.saveCsvRow(
this.repository,
...(data as [string, string, string, string, string, string, string]),
);
}
}
}
async revert() {
await this.repository.clear();
}
/* eslint-disable @typescript-eslint/no-unused-vars */
async saveCsvRow(
repository: Repository,
id: string,
title: string,
year: string,
rating: string,
numVotes: string,
categoriesUnprocessed: string,
script,
): Promise {
await repository.save(
repository.create({
title: title,
script: script,
}),
);
}
}
With that, we can seed both our relational and search databases:
npm run seed:run:relational
npm run seed:run:search
Running this query for the word, "element", against the script
field in the Elasticsearch index, should return 21 movies:
curl "localhost:3000/api/v1/movies?search=element&targets[]=script"
Warning
Don't forget to revert those changes before continuing!
npm run seed:revert:relational
npm run seed:revert:search
Adding semantic search over text documents with Bonsai Elasticsearch, OpenSearch, and MistralAI
Swap Elasticsearch 7.10.2 for OpenSearch 2.17.1 in development
For our next trick, we'll be implementing semantic search on OpenSearch.
Warning
While Bonsai.io's version of Elasticsearch supports a similar k-NN plugin as OpenSearch, it's not, for practical purposes, possible to quickly install the plugin on our local Elasticsearch 7.10.2 container.
What's a soul to do? Well, OpenSearch 2.17.1 will act as a drop-in replacement for our purposes! We won't even need to change our package.json
to swap out a different client, for this exercise!
1. First, stop the running postgres
and elasticsearch
services:docker compose stop postgres elasticsearch
# OR, hit ctrl + C on the terminal window where the docker compose services are running
2. Next, remove the elasticsearch
service from our docker-compose.yml
.
3. Then, add in the opensearch
service:
services:
...
# Apache License v2.0, Oct 01, 2024
# Courtesy of https://opensearch.org/downloads.html
opensearch:
image: opensearchproject/opensearch:2.17.1
environment:
- discovery.type=single-node
- plugins.security.disabled=true
- http.host=0.0.0.0
- transport.host=127.0.0.1
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=passRT%^#234
- DISABLE_SECURITY_PLUGIN=true
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- opensearch-data:/usr/share/opensearch/data
ports:
- "9200:9200"
# End Apache License v2.0
...
4. And add the opensearch-data
volume to the volumes
section:
...
volumes:
boilerplate-db:
opensearch-data:
5. And, let's update our .env
file accordingly. Here's the diff, we're really just changing the username and password our app will connect with!
-ELASTICSEARCH_USERNAME=elastic
-ELASTICSEARCH_PASSWORD=
+ELASTICSEARCH_USERNAME=admin
+ELASTICSEARCH_PASSWORD=passRT%^#234
Now, we can start the new opensearch
service alongside postgres
:
docker compose up postgres opensearch
In another terminal window, run the database seed scripts:
npm run seed:run:relational
npm run seed:run:search
Start or restart the application server:
npm run start:dev
And, you should see a count of 21 results when running this query:
curl "localhost:3000/api/v1/movies?search=element&targets[]=script"
Warning
Don't forget to revert those changes before continuing!
npm run seed:revert:relational
npm run seed:revert:search
Add a vector field to our Elasticsearch/OpenSearch index
With OpenSearch installed via docker, we have access to the k-NN
plugin, which will let us build an index optimized for what's known as "nearest-neighbor" search. Think back to our coordinate system analogy at the top of the post: this is how we'll find documents which are similar to each-other!
To get this done, we'll need to update our movies
index definition. Open up the file at src/movies/movies-search.service.ts
, and update the createIndex
method to match:
Recall that our embedding is a vector, which will be an array of floating point numbers. The number of dimensions, or entries, in our array is dependent on the model we use. We'll be using the mistral-embed MistralAI model, which will return 1024 dimensions per embedding. Going back to our earlier analogy, that means, when we look for similar documents, our search will be able to use all 1024 entries as coordinates to hone in on the correct results.
// movies-search.service.ts
export class MoviesSearchService {
...
async search(
text: string,
targets: string[],
offset?: number,
limit?: number,
startId = 0,
) {
let separateCount = 0;
if (startId) {
separateCount = await this.count(text, targets);
}
const params: RequestParams.Search = {
index: this.index,
from: offset,
size: limit,
body: {
query: {
multi_match: {
query: text,
fields: targets,
},
},
},
};
const { body: result } = await this.elasticsearchService.search(params);
const count = result.hits.total.value;
const results = result.hits.hits.map((item) => item._source);
return {
count: startId ? separateCount : count,
results,
};
}
...
}
Integrate MistralAI into the NestJS application for semantic search with Elasticsearch/OpenSearch
Important
For this step, you'll need an MistralAI API Key to use. The MistralAI API costs money to use, and so the steps below may incur charges against your account.
See MistralAI's documentation for details on how to create an API Key and associated pricing.
If the movie embeddings that we'll store are akin to coordinates in an n-dimensional plane, then in order to search against those coordinates - by way of finding nearby (that is, similar) document - we'll need both the documents and the search query to be converted into a set of coordinates by the same model.
Creating MistralAI Embeddings for existing data and documents, and storing them in Elasticsearch/OpenSearch
For our documents that already exist, and will be searched against, we'll need to store their "coordinate" (that is, the embedding representation) form in Elasticsearch/OpenSearch.
- First, add your MistralAI API Key to your
.env
file:# .env
MISTRALAI_API_KEY="..."
- Install the official
mistralai
library:
npm add @mistralai/mistralai
- Then update our seed script to add embeddings into our search database (
src/database/seeds/search/movie/movie-seed.service.ts
):
3a) Import mistralai
's typescript package:
// movie-seed.service.ts
import { Mistral } from "@mistralai/mistralai";
3b) Replace the run
method of our MovieSeedService
with the below implementation. This new version will reach out to the MistralAI embedding service for each movie, and store the embedding result in a non-database-backed field of the MovieEntity
called, script_embedding_vector
. Once all movies have been processed, it'll perform a bulk insert of these updated movies into Elasticsearch/OpenSearch!
// movie-seed.service.ts
async run() {
// If index doesn't exist, create it and seed the database
const exists = await this.moviesSearchService.existsIndex();
if (!exists.body) {
await this.moviesSearchService.createIndex();
await this.moviesSearchService.statusIndex();
// Fetch movie titles from our relational database
const insertedMovies = await this.repository.find({});
// Add MistralAI Embeddings
//
// Note: In a production app, this should probably be done in the background, // along with an update of the document in Elasticsearch/OpenSearch.
const aiClient = new Mistral({
apiKey: process.env.MISTRALAI_API_KEY,
});
const embeddingRequests: Array> = [];
const moviesLen = insertedMovies.length;
const rateLimiter = new RateLimiterMemory({
points: 6, // 6 points
duration: 1, // Per second
});
for (const movie of insertedMovies) {
const index: number = insertedMovies.indexOf(movie);
let shouldRetry = true;
while (shouldRetry) {
await rateLimiter
.consume('aiEmbedding', 3)
.then(() => {
shouldRetry = false;
Logger.log(
'AI Embedding: processing ' +
movie.title +
' (' +
(index + 1).toString() +
' of ' +
moviesLen +
')',
);
embeddingRequests.push(
this.generateScriptEmbeddings(aiClient, movie),
);
})
.catch(async (rlres: RateLimiterRes) => {
// Not enough points, wait for the specified duration
await new Promise((resolve) =>
setTimeout(resolve, rlres.msBeforeNext),
);
});
}
}
Promise.all(embeddingRequests)
.then(async () => {
// Index the movies discovered
await this.moviesSearchService.indexMovies(insertedMovies);
Logger.log(
'Completed indexing of movies (' +
insertedMovies.length.toString() +
' total)',
);
})
.catch((err) => {
Logger.log('Encountered fatal error during AI embedding: ', err);
});
Logger.log('Search seeding complete!');
}
}
Note
At the time of writing, MistralAI's embedding model (mistral-embed
) is limited to a max of 8192 number of input tokens, which is roughly 6000 words of text. More details at MistralAI's website!
While there are strategies around this challenge, for our demonstration purposes, we'll be truncating our input.
We've also used a few magic numbers to rate-limit our requests to the MistralAI Embeddings API, because it defines its limits based on tokens
. There are ways to accurately estimate/calculate the number of tokens per document, but we'll leave that as an exercise for the reader!
3c) And, add this method, which will handle the actual interaction with MistralAI's Embedding service, to the MovieSeedService
.
// movie-seed.service.ts
export class MovieSeedService {
...
async generateScriptEmbeddings(
aiClient: Mistral,
movie: MovieEntity,
numCharsAllowed: number = 15000,
) {
const script_embedding = await aiClient.embeddings.create({
model: 'mistral-embed',
inputs: movie.script.slice(0, numCharsAllowed),
});
// We're only requesting one embedding; so there should only be one entry!
if (script_embedding !== undefined && script_embedding.data.length === 1) {
if (Array.isArray(script_embedding.data[0].embedding)) {
movie.script_embedding_vector = script_embedding.data[0].embedding;
}
}
}
}
Transform all incoming search queries into MistralAI Embeddings for Semantic search in Elasticsearch/OpenSearch
Tip
As you might intuit, there are a number of optimizations that can be made in this part of
the search experience flow. From query-embedding caching to result caching, and all sorts
of indirection in-between, feel free to add optimizations as you go!
All we need to do now is implement the ability for our search endpoint to target the script_embedding_vector
field of our Elasticsearch/OpenSearch index!
1. First things first: we need to add script_embedding_vector
to our enum
field of SearchTarget
s in src/movies/dto/query-movie.dto.ts
. Add the new field - here's the diff:
// query-movie.dto.ts
export enum SearchTarget {
Title = 'title',
Script = 'script',
+ ScriptEmbeddingVector = 'script_embedding_vector',
}
2. Next, we'll need to handle these new types of queries a little differently: we want to use the knn
plugin to search our "coordinate" space for the nearest relevant items! We'll be updating the MoviesSearchService
defined in src/movies/movies-search.service.ts
for this:
2a) First, add our new import
s:
import { Mistral } from '@mistralai/mistralai';
import { SearchTarget } from './dto/query-movie.dto';
2b) Next, update the search
method to handle queries against the script_embedding_vector
a little differently, by both transforming the incoming query into an MistralAI embedding
, and then sending that newly generated embedding into Elasticsearch/OpenSearch while using the knn
query type. We're also excluding the script
and script_embedding_vector
fields from our search result set.
Tip
This sort of transformation can go wherever makes the most sense to you. For NestJS, it might also make sense to move all interactions with the MistralAI API to an injectable service via a Provider!
...
@Injectable()
export class MoviesSearchService {
...
async search(
text: string,
targets: string[],
offset?: number,
limit?: number,
startId = 0,
) {
let separateCount = 0;
if (startId) {
separateCount = await this.count(text, targets);
}
// Default query handling is to perform a text query
let query: Record = {
multi_match: {
query: text,
fields: targets,
},
};
// If this is a query targeting the script embedding vector, adjust
// the query accordingly!
if (
targets.length === 1 &&
targets.includes(SearchTarget.ScriptEmbeddingVector)
) {
const aiClient = new Mistral({
apiKey: process.env.MISTRALAI_API_KEY,
});
query = {
knn: {},
};
query['knn'][SearchTarget.ScriptEmbeddingVector] = {
vector: await this.generateQueryEmbeddings(aiClient, text),
k: 5,
};
}
const params: RequestParams.Search = {
index: this.index,
from: offset,
size: limit,
_source_excludes: [
SearchTarget.Script,
SearchTarget.ScriptEmbeddingVector,
],
body: {
query: query,
},
};
const { body: result } = await this.elasticsearchService.search(params);
const count = result.hits.total.value;
const results = result.hits.hits.map((item) => item._source);
return {
count: startId ? separateCount : count,
results,
};
}
}
2c) Lastly for this file, we'll add our MistralAI interaction helper inside of the MoviesSearchService
class:
...
@Injectable()
export class MoviesSearchService {
...
async generateQueryEmbeddings(
aiClient: Mistral,
query: string,
numCharsAllowed: number = 15000,
): Promise {
const query_embedding = await aiClient.embeddings.create({
model: 'mistral-embed',
inputs: query.slice(0, numCharsAllowed),
});
// We're only requesting one embedding; so there should only be one entry!
if (query_embedding !== undefined && query_embedding.data.length === 1) {
if (Array.isArray(query_embedding.data[0].embedding)) {
return Promise.resolve(query_embedding.data[0].embedding);
}
}
return Promise.resolve(null);
}
}
3. OPTIONAL: To clean things up a bit, we'll also update our data hydration in the MoviesService
at src/movies/movies.service.ts
. Currently, once we receive our search results set, we get a number of fields from the relational database, based on a match against the id
property for documents on both databases. We're just going to specify that we only want to include the id
and title
database columns in our results to clean things up a bit. Here's an abbreviated diff of what that looks like:
// movies.service.ts
export class MoviesService {
async search(...) {
...
const data = await this.movieRepository.find({
+ select: ['id', 'title'],
where: { id: In(ids) },
});
}
}
And we're done!
Let's test out our semantic search with a query against the same term, "element", but targeting the new script_embedding_vector
field of the index:
curl "localhost:3000/api/v1/movies?search=element&targets[]=script_embedding_vector"
Our results should include more than just the only movie with "element" in the title (that is, "The Fifth Element") but fewer results than a lexical/keyword search against all of the movie scripts (recall, there were 21 results earlier):
{
"data": [
{
"id": 1240,
"title": "the fifth element",
},
{
"id": 1329,
"title": "invaders from mars",
},
{
"id": 1444,
"title": "the time machine",
},
{
"id": 1513,
"title": "brazil",
},
{
"id": 1684,
"title": "the nightmare before christmas",
}
],
"count": 5
}
Huzzah! We're getting more relevant results to our query than we were getting with a keyword match!
Next steps for your newly semantic NestJS application
We've successfully enhanced our NestJS application with powerful semantic search capabilities! Through the integration of Elasticsearch and vector embeddings, we've transformed basic text matching into an intelligent search system that truly understands the meaning behind queries. Our implementation demonstrates how modern search technology can significantly improve the user experience by returning contextually relevant results.
Ready to elevate your search experience? Try Elasticsearch Enterprise Search to implement semantic search in your own applications, or reach out to our team to learn how we can help you build more intelligent search solutions.
If you're interested in a deeper dive on some of the topics we touched on in this post, and a primer on topics we'll dive into as we implement a hybrid search model, check out our post on # What AI Engineers Should Know about Search!