White Hex icon
Introducing: A search & relevancy assessment from engineers, not theorists. Learn more

LLM RAG for Beginners: A Practical Guide with Elasticsearch and OpenSearch

min read

LLM RAG for Beginners: A Practical Guide with Elasticsearch and OpenSearch

Have you ever tried searching for a movie quote with just a vague phrase or feeling? We've all been there, and sometimes, finding exactly what we're looking for can be tough. Retrieval-Augmented Generation (RAG) offers a more intuitive approach, allowing us to search with the fluidity of human-ish memory.

RAG blends the power of Large Language Models (LLMs) with the precision of information retrieval systems like Elasticsearch and OpenSearch. It moves beyond simple keyword matching, using LLMs to understand the nuances of our search intent and deliver relevant results along with their context.

For example, imagine trying to recall that iconic line from The Fifth Element where Zorg says, "Time not important, only life important." Even if you only remember the phrase "life important," RAG can pinpoint the exact quote and provide context.

Pre-requisites

Before we dive into building our RAG pipeline, let's get our tech-stack in order. We'll be using the following:

Bonsai.io Sandbox

Bonsai.io provides fully managed OpenSearch clusters, making it incredibly easy to get started without any complex installation or configuration. We'll leverage a free Bonsai Sandbox for this tutorial. You can sign up for an account and launch a cluster at bonsai.io.

Once your Bonsai sandbox cluster is created, you'll see your credentials in the cluster overview page.

Cornell Movie-Dialogs Corpus

This rich dataset contains conversations extracted from movie scripts. We'll use this corpus to populate our OpenSearch indexes.

The Cornell Movie-Dialogs Corpus is part of Cornell's ConvoKit project, a toolkit for analyzing conversations. You can find the dataset and learn more about ConvoKit at github.com/CornellNLP/ConvoKit.

Download the movie-corpus.zip file and extract it to a location that can be referenced by our code later on.

OpenAI Text API

OpenAI's 4o-mini model is perfect for our small, focused prompts, and is quite affordable!

For this tutorial, you'll need an OpenAI API Key to use. The OpenAI API costs money to use, and so the steps below may incur charges against your account.

See OpenAI's documentation for details on how to create an OpenAI API Key and associated pricing.

Understanding Retrieval-Augmented Generation (RAG)

RAG enhances LLMs by connecting them to external knowledge sources. Think of it as giving your LLM a library card to access a vast collection of information, allowing it to generate responses grounded in factual data.

But, while you might remember details about all of the books you've read at the library, the LLM has a limited ability to keep information in its "working memory" (or, context) - so we need to help it by filtering out the external knowledge to what is most relevant to the task at-hand.

To that end, RAG involves two key steps:

  1. Retrieval: Finding the most relevant information from your knowledge base.
  2. Generation: Feeding in the most relevant information to the LLM, in order for it to generate a comprehensive response with the added context.

Setting Up Your OpenSearch Environment

Now that you have a Bonsai Sandbox cluster up and running, let's get our movie data indexed in OpenSearch. We'll be using the Cornell Movie-Dialogs Corpus, which we downloaded in our prerequisites. But first, let's visualize how we'll organize this data.

Understanding the Data Structure

Since this particular dataset is a bit denormalized, we'll create and use two indexes:

  • speakers: Details about the speaking characters in each movie.
  • utterances: A detailed index of all the conversations within the movies, line by line, with speaker and movie identified.

Creating and Indexing the Movie Data

Remember to set the BONSAI_CLUSTER_URL environment variable to safely access your Bonsai Cluster's credentials within your code!

Now, let's create the index mappings for our speakers and utterances indexes:

import { Client } from "npm:@opensearch-project/opensearch";

const client = new Client({
    node: process.env['BONSAI_CLUSTER_URL'],
});

// --- Speakers Index ---
const speakersIndexName = "speakers";
const speakersIndexBody = {
    settings: {
        number_of_shards: 1,
        number_of_replicas: 0,
    },
    mappings: {
        properties: {
            speakerId: { type: "keyword" },
            movieId: { type: "keyword" },
            gender: { type: "keyword" },
            script: { type: "text" },
            movieName: { type: "text" },
        },
    },
};

await client.indices.create({ 
    index: speakersIndexName, 
    body: speakersIndexBody 
});

// --- Utterances Index ---
const utterancesIndexName = "utterances";
const utterancesIndexBody = {
    settings: {
        number_of_shards: 1,
        number_of_replicas: 0,
    },
    mappings: {
        properties: {
            id: { type: "keyword" },
            conversationId: { type: "keyword" },
            text: { type: "text" },
            speaker: { type: "text" },
            movieId: { type: "keyword" },
            replyTo: { type: "keyword" },
        },
    },
};

await client.indices.create({ 
    index: utterancesIndexName, 
    body: utterancesIndexBody 
});

For the purposes of this demonstration, we're only indexing a handful of movies' utterance data, filtered by a regular expression on their corpus ID.

Building the RAG Pipeline

With our OpenSearch environment set up and movie data indexed, we're ready to assemble the pieces of our RAG pipeline. This involves three main steps:

Step 1: Query Parsing with an LLM

The first step is to understand what the user is asking. We'll use an LLM to analyze their natural language query and extract key information:

const userQuery = "what's that line from The Fifth Element about life being important?";

const prompt = `
You are a helpful AI assistant that can analyze search queries related to movies.

Here's a user query: ${userQuery}

Based on this query, identify the following:
- Category: Choose from the following categories: "quote recall", "significant event",
  "plot explanation", "character information". If none of these fit, choose "unknown".
- Content: Extract the specific phrase or words related to the identified category.
- Movie: If the query contains a probable movie title, extract it into this field.
- Quote: If the query contains part of a quote, extract it into this field.

Provide your answer in JSON format.`;

const response = await openai.chat.completions.create({
    messages: [{ role: 'user', content: prompt }],
    model: 'gpt-4o-mini',
});

Step 2: Retrieving Relevant Documents

Now that we understand the user's request, let's search our OpenSearch indexes:

const query = {
    query: {
        bool: {
            must: {
                match: {
                    text: {
                        query: parsedQuery.quote
                    }
                }
            },
            should: {
                match: {
                    movieId: {
                        query: possibleMovieId
                    }
                }
            }
        }
    }
};

const response = await client.search({
    index: "utterances",
    body: query,
});

Step 3: Generating the Response

Finally, we'll format our response using another LLM prompt:

const finalPrompt = `
You are a helpful AI assistant that can provide information about movies.

A user is looking for a movie quote that contains the following phrase: "${parsedQuery.quote}". 
Their original query was for the content: "${parsedQuery.content}".

Here is the most relevant utterance: ${bestQuoteResult._source.text}
That utterance is from the movie: "${bestQuoteMovie}"

Based on this utterance, provide a natural language response that includes:
- The exact quote
- The movie title
- Brief context about the quote's significance

Format your response in a clear, concise way that directly addresses the user's query.`;

const finalResponse = await openai.chat.completions.create({
    messages: [{ role: 'user', content: finalPrompt }],
    model: 'gpt-4o-mini',
});

Next Steps and Future Enhancements

We can fine-tune the utterance results search by adjusting the number of results (adjusting the size parameter), adding additional filters (like character), or combining multiple fields for a more refined search.

Our current RAG implementation effectively retrieves movie quotes, but there's always room for improvement! Consider:

  • Denormalizing our data to include more context
  • Expanding search capabilities to handle different types of questions
  • Implementing user feedback mechanisms
  • Adding synonym support and related terms
  • Including surrounding context for better scene understanding

This exploration of RAG with OpenSearch is just the beginning. Ready to dive deeper and build even more intelligent search experiences? Explore the power of bonsai.io to implement RAG in your own applications.

Find out how we can help you.

Schedule a free consultation to see how we can create a customized plan to meet your search needs.