LLM RAG for Beginners: A Practical Guide with Elasticsearch and OpenSearch

•

min read

LLM RAG for Beginners: A Practical Guide with Elasticsearch and OpenSearch

Have you ever tried searching for a movie quote with just a vague phrase or feeling? We've all been there, and sometimes, finding exactly what we're looking for can be tough. Retrieval-Augmented Generation (RAG) offers a more intuitive approach, allowing us to search with the fluidity of human-ish memory.

RAG blends the power of Large Language Models (LLMs) with the precision of information retrieval systems like Elasticsearch and OpenSearch. It moves beyond simple keyword matching, using LLMs to understand the nuances of our search intent and deliver relevant results along with their context.

For example, imagine trying to recall that iconic line from The Fifth Element where Zorg says, "Time not important, only life important." Even if you only remember the phrase "life important," RAG can pinpoint the exact quote and provide context.

Pre-requisites

Before we dive into building our RAG pipeline, let's get our tech-stack in order. We'll be using the following:

Bonsai.io Sandbox

Bonsai.io provides fully managed OpenSearch clusters, making it incredibly easy to get started without any complex installation or configuration. We'll leverage a free Bonsai Sandbox for this tutorial. You can sign up for an account and launch a cluster at bonsai.io.

Once your Bonsai sandbox cluster is created, you'll see your credentials in the cluster overview page.

Cornell Movie-Dialogs Corpus

This rich dataset contains conversations extracted from movie scripts. We'll use this corpus to populate our OpenSearch indexes.

The Cornell Movie-Dialogs Corpus is part of Cornell's ConvoKit project, a toolkit for analyzing conversations. You can find the dataset and learn more about ConvoKit at github.com/CornellNLP/ConvoKit.

Download the movie-corpus.zip file and extract it to a location that can be referenced by our code later on.

OpenAI Text API

OpenAI's 4o-mini model is perfect for our small, focused prompts, and is quite affordable!

For this tutorial, you'll need an OpenAI API Key to use. The OpenAI API costs money to use, and so the steps below may incur charges against your account.

See OpenAI's documentation for details on how to create an OpenAI API Key and associated pricing.

Understanding Retrieval-Augmented Generation (RAG)

RAG enhances LLMs by connecting them to external knowledge sources. Think of it as giving your LLM a library card to access a vast collection of information, allowing it to generate responses grounded in factual data.

But, while you might remember details about all of the books you've read at the library, the LLM has a limited ability to keep information in its "working memory" (or, context) - so we need to help it by filtering out the external knowledge to what is most relevant to the task at-hand.

To that end, RAG involves two key steps:

Retrieval: Finding the most relevant information from your knowledge base.
Generation: Feeding in the most relevant information to the LLM, in order for it to generate a comprehensive response with the added context.

Setting Up Your OpenSearch Environment

Now that you have a Bonsai Sandbox cluster up and running, let's get our movie data indexed in OpenSearch. We'll be using the Cornell Movie-Dialogs Corpus, which we downloaded in our prerequisites. But first, let's visualize how we'll organize this data.

Understanding the Data Structure

Since this particular dataset is a bit denormalized, we'll create and use two indexes:

speakers: Details about the speaking characters in each movie.
utterances: A detailed index of all the conversations within the movies, line by line, with speaker and movie identified.

Creating and Indexing the Movie Data

Remember to set the BONSAI_CLUSTER_URL environment variable to safely access your Bonsai Cluster's credentials within your code!

Now, let's create the index mappings for our speakers and utterances indexes:

import { Client } from "npm:@opensearch-project/opensearch";

const client = new Client({
    node: process.env['BONSAI_CLUSTER_URL'],
});

// --- Speakers Index ---
const speakersIndexName = "speakers";
const speakersIndexBody = {
    settings: {
        number_of_shards: 1,
        number_of_replicas: 0,
    },
    mappings: {
        properties: {
            speakerId: { type: "keyword" },
            movieId: { type: "keyword" },
            gender: { type: "keyword" },
            script: { type: "text" },
            movieName: { type: "text" },
        },
    },
};

await client.indices.create({ 
    index: speakersIndexName, 
    body: speakersIndexBody 
});

// --- Utterances Index ---
const utterancesIndexName = "utterances";
const utterancesIndexBody = {
    settings: {
        number_of_shards: 1,
        number_of_replicas: 0,
    },
    mappings: {
        properties: {
            id: { type: "keyword" },
            conversationId: { type: "keyword" },
            text: { type: "text" },
            speaker: { type: "text" },
            movieId: { type: "keyword" },
            replyTo: { type: "keyword" },
        },
    },
};

await client.indices.create({ 
    index: utterancesIndexName, 
    body: utterancesIndexBody 
});

For the purposes of this demonstration, we're only indexing a handful of movies' utterance data, filtered by a regular expression on their corpus ID.

Building the RAG Pipeline

With our OpenSearch environment set up and movie data indexed, we're ready to assemble the pieces of our RAG pipeline. This involves three main steps:

Step 1: Query Parsing with an LLM

The first step is to understand what the user is asking. We'll use an LLM to analyze their natural language query and extract key information:

const userQuery = "what's that line from The Fifth Element about life being important?";

const prompt = `
You are a helpful AI assistant that can analyze search queries related to movies.

Here's a user query: ${userQuery}

Based on this query, identify the following:
- Category: Choose from the following categories: "quote recall", "significant event",
  "plot explanation", "character information". If none of these fit, choose "unknown".
- Content: Extract the specific phrase or words related to the identified category.
- Movie: If the query contains a probable movie title, extract it into this field.
- Quote: If the query contains part of a quote, extract it into this field.

Provide your answer in JSON format.`;

const response = await openai.chat.completions.create({
    messages: [{ role: 'user', content: prompt }],
    model: 'gpt-4o-mini',
});

Step 2: Retrieving Relevant Documents

Now that we understand the user's request, let's search our OpenSearch indexes:

const query = {
    query: {
        bool: {
            must: {
                match: {
                    text: {
                        query: parsedQuery.quote
                    }
                }
            },
            should: {
                match: {
                    movieId: {
                        query: possibleMovieId
                    }
                }
            }
        }
    }
};

const response = await client.search({
    index: "utterances",
    body: query,
});

Step 3: Generating the Response

Finally, we'll format our response using another LLM prompt:

const finalPrompt = `
You are a helpful AI assistant that can provide information about movies.

A user is looking for a movie quote that contains the following phrase: "${parsedQuery.quote}". 
Their original query was for the content: "${parsedQuery.content}".

Here is the most relevant utterance: ${bestQuoteResult._source.text}
That utterance is from the movie: "${bestQuoteMovie}"

Based on this utterance, provide a natural language response that includes:
- The exact quote
- The movie title
- Brief context about the quote's significance

Format your response in a clear, concise way that directly addresses the user's query.`;

const finalResponse = await openai.chat.completions.create({
    messages: [{ role: 'user', content: finalPrompt }],
    model: 'gpt-4o-mini',
});

Next Steps and Future Enhancements

We can fine-tune the utterance results search by adjusting the number of results (adjusting the size parameter), adding additional filters (like character), or combining multiple fields for a more refined search.

Our current RAG implementation effectively retrieves movie quotes, but there's always room for improvement! Consider:

Denormalizing our data to include more context
Expanding search capabilities to handle different types of questions
Implementing user feedback mechanisms
Adding synonym support and related terms
Including surrounding context for better scene understanding

This exploration of RAG with OpenSearch is just the beginning. Ready to dive deeper and build even more intelligent search experiences? Explore the power of bonsai.io to implement RAG in your own applications.

Find out how we can help you.

Schedule a free consultation to see how we can create a customized plan to meet your search needs.

Schedule a consultation

LLM RAG for Beginners: A Practical Guide with Elasticsearch and OpenSearch

LLM RAG for Beginners: A Practical Guide with Elasticsearch and OpenSearch

Pre-requisites

Bonsai.io Sandbox

Cornell Movie-Dialogs Corpus

OpenAI Text API

Understanding Retrieval-Augmented Generation (RAG)

Setting Up Your OpenSearch Environment

Understanding the Data Structure

Creating and Indexing the Movie Data

Building the RAG Pipeline

Step 1: Query Parsing with an LLM

Step 2: Retrieving Relevant Documents

Step 3: Generating the Response

Next Steps and Future Enhancements

Next post

Case Study: Big Shards Cause Big Problems

Semantic Search on Bonsai

Supercharge Your NestJS App with Hosted Search

Heroku and Bonsai, a Winning Search Combination

Search 101: Event Queue, Streaming and Buffering Best Practices

Introducing Bonsai's Terraform Provider for Elasticsearch and OpenSearch

What AI Engineers Should Know about Search

Why improving search feels impossible and how a new architecture can get you unstuck

Small Dataset, Big Results: Upgrading Search with Limited Content

What is OpenSearch? And why you should use it

How to Reduce the Number of Shards in Elasticsearch: A Shard Shrinking Guide

What is Elasticsearch? And why you should use it

The search renaissance is here (but the present is still medieval)

4 Ways to Reduce Costs and Improve Performance with Elasticsearch

Log4J Debrief: How Bonsai Handled the Zero-Day Vulnerability

One Word to Help You Build a More Scalable Search Engine: Precomputation

5 Principles of Search

Want to build a powerful search engine? Start with the user experience

Slow search engine? CPU is probably your biggest bottleneck

Comparison of Elasticsearch Ruby Gems

Releasing a Major UI Rewrite

Announcing Okta Enterprise Availability

Migrating from Elasticsearch to OpenSearch

Welcome to OpenSearch

Up and Running with OpenSearch

The Importance of Shard Math in Elasticsearch

Pathological Regular Expressions in Elasticsearch

Why Elasticsearch should not be your Primary Data Store

Free Clusters are Getting an Upgrade!

Managed vs. Hosted: Setting a Standard

What I've Learned About Remote Work During the COVID-19 Pandemic

Open letter to Bonsai customers

Listening to your users

Of Millies and Minutes

How To Test Your Elasticsearch Integration with RSpec

Collaborate on search at a more rapid pace with Search Clips

Designing app search != designing a database

Bonsai now supports Elasticsearch 7.2

Deploy Bonsai in your own AWS account with Bonsai Vaults

Increase Website Profits Using Elasticsearch Boost

How We Built & Designed Operational Metrics

Rails 5 with Bonsai

Now Supporting 2.x!

Introducing: Live Streaming Logs

Logstash and Bonsai and bots, oh my!

The Ideal Elasticsearch Index

Can I have multiple indices on a shard?

Efficient sorting of geo distances in Elasticsearch

Elasticsearch and the IllegalArgumentException (docID must be >= 0)

Measuring Availability as an Elasticsearch Hosting Provider

Announcing Kibana!

Elasticsearch 1.0 has launched!

The Case for Multi-Index Search