How To Test Your Elasticsearch Integration with RSpec

How To Test Elasticsearch

We've recently kicked off a push to overhaul our documentation, and make our Quickstart Guides more useful to users. The goal has been to make them easier to use, add more real-world examples, and take the reader through the process from start to finish.

We started with Ruby on Rails, using the official Elasticsearch gems. As someone with Rails experience, but not with the elasticsearch-rails gem specifically, it was a great opportunity for me to get some hands-on experience with that code. To give readers of our documentation some concrete examples, we made a basic Rails app and went through the process of making one of the models searchable with Elasticsearch.

Tested code is better code. You can execute several helpful tests to ensure your Elasticsearch code is cleaner, more reliable, and easier to maintain. Like many concepts in Elasticsearch, this process may seem conceptually easy, but there are a lot of hidden pitfalls. I'll call them out as we go.

Pitfall 1: Version Management

Wow, a pitfall before we even start! In Elasticsearch, there are often breaking changes between major versions, and sometimes even between minor versions. The official gems are targeted to specific versions of Elasticsearch, and it's important to specify the version of the gem(s) that is compatible with the version of Elasticsearch you're running in production.

A similar problem exists for Rails developers, in that different projects are built using different versions of Ruby. When switching between projects, you need to ensure that the version of Ruby in your console is exactly what the current project requires. There are some tools for this, namely rbenv and rvm. With this in mind, I looked around to see if something similar exists for Elasticsearch.

There are a couple solutions out there. I think evm is one of the better ones, although it's not quite as flexible as rvm or rbenv. evm has a familiar set of command line options, and it downloads Elasticsearch binaries to ~/.evm. The one thing I'm not a fan of is that it doesn't set or override a system-wide Elasticsearch binary. Instead, it uses a link: ~/.evm/elasticsearch. So you have to somehow alias elasticsearch to ~/.evm/elasticsearch/bin/elasticsearch. And evm doesn't support dotfiles.

We're heavy users of rbenv at OMC, and I love how it uses dotfiles to ensure the correct version of Ruby is automatically loaded for any given project. I decided to follow this pattern and created a dotfile for the Elasticsearch version we need for the app. This way we can write tests to ensure our other tests are running against the correct version of Elasticsearch.

Pitfall 2: System Configuration

Another pitfall involves where Elasticsearch is installed in your system. I wanted to test against an actual running instance of Elasticsearch, but wanted some control over how the cluster is started, stopped and configured. That's a pretty complicated task to manage by hand.

This is made much easier with the excellent elasticsearch-extensions gem. This gem is an extension of the elasticsearch-ruby gem, and it offers some really neat features. The one we're going to make use of here is the ability to automatically spin up and tear down local test clusters.

Something to keep in mind about the elasticsearch-extensions gem: it uses your system version of Elasticsearch by default. In other words, it will try to run the first elasticsearch program it finds in your $PATH. This is a problem if: A) you don't have Elasticsearch installed on your system, or B) your system-wide Elasticsearch is not in the $PATH of your test suite, or C) you want to test a version of Elasticsearch that is different from the one installed to your system. Or any other number of configuration eccentricities you might have.

Fortunately, you can override this setting with a command parameter. We'll cover that in the examples below.

Getting Started

For this project, we're targeting Elasticsearch 7.4.0. In our Gemfile, we already have this:

# As of this writing, the master branch is for Elasticsearch 7.x.
# Check the README.md for current compatibility:
# https://github.com/elastic/elasticsearch-rails/blob/master/README.md#compatibility

gem 'elasticsearch-model', github: 'elastic/elasticsearch-rails', branch: 'master'
gem 'elasticsearch-rails', github: 'elastic/elasticsearch-rails', branch: 'master'

Those gems will ensure the app is able to instantiate a client that is compatible with Elasticsearch 7.x. I then created a dotfile called .elasticsearch-version:

echo '7.4.0' > .elasticsearch-version

This means that my app should expect to work with Elasticsearch 7.x, and we have explicitly defined a version we're targeting. Next, you need to add gem 'elasticsearch-extensions' to your Gemfile inside the :test block, and run bundle install in your command line. That should ensure we have all the gems that we need.

Next, we need to configure spec_helper.rb to make the elasticsearch-extensions' test features available to RSpec. Add require 'elasticsearch/extensions/test/cluster' at the top of your spec/spec_helper.rb. Then inside the RSpec.configure block, add:

cluster = Elasticsearch::Extensions::Test::Cluster::Cluster.new(port: 9250, number_of_nodes: 1, timeout: 120)

If you are using evm, or have a specific binary you want to test with, then you'll want something like this:

cluster = Elasticsearch::Extensions::Test::Cluster::Cluster.new(port: 9250, number_of_nodes: 1, timeout: 120, command: '~/.evm/elasticsearch/bin/elasticsearch')

This line creates a data structure representing an Elasticsearch cluster with one node, running on port 9250 (if you have more nodes, you will get multiple instances of Elasticsearch running on ports incrementing from 9250). It assigns this to a variable, which we'll use to start and stop the cluster.

Pitfall #3: Client Configuration

Rails apps implementing Elasticsearch are probably using either the elasticsearch-rails gem, or Searchkick. Each of these gems will default to looking for Elasticsearch on localhost:9200, unless specifically instructed not to. If your production app is not running on the same machine(s) as your Elasticsearch cluster, then you probably want to override this default. Otherwise your tests will fail (or they will be run against a different cluster from the one we are creating explicitly for testing).

If you're a Bonsai customer, and especially if you are using the Searchkick OR bonsai-elasticsearch-rails gems, then add this line:

ENV['BONSAI_URL'] = "localhost:#{cluster.arguments[:port]}"

This sets an environment variable that is used to correctly set up the Elasticsearch client to point to the test cluster.

Otherwise, you might want an initializer or setting in config/environments/test.rb to specify that the Elasticsearch cluster is located at localhost:9250 in a test context. Do the thing that makes the most sense in your code base.

The last step here is to configure RSpec to stand up and tear down Elasticsearch automatically. While researching for this process, I came across a great post by Rowan Oulton that takes a neat approach. It injects before and after blocks into any RSpec block that is tagged with elasticsearch: true.

So now, in spec_helper.rb, inside the RSpec.configure block, I have this code:

# Create a data structure representing the cluster to build up and tear down:
cluster = Elasticsearch::Extensions::Test::Cluster::Cluster.new(port: 9250, number_of_nodes: 1, timeout: 120)

# Make sure the Elasticsearch client can find the cluster we want to use. The
# BONSAI_URL variable is used by the bonsai-elasticsearch-rails gem to automatically
# configure the Elasticsearch client:
ENV['BONSAI_URL'] = "localhost:#{cluster.arguments[:port]}"

# Create a local cluster for all tests within an RSpec block tagged with
# `elasticsearch: true`. Credit to Rowan Oulton:
# https://medium.com/@rowanoulton/testing-elasticsearch-in-rails-22a3296d989
config.before :all, elasticsearch: true do
  cluster.start unless cluster.running?
  ActiveRecord::Base.descendants.each do |model|
    if model.respond_to?(:__elasticsearch__)
      begin
        model.__elasticsearch__.create_index!
        model.__elasticsearch__.refresh_index!
      rescue Elasticsearch::Transport::Transport::Errors::NotFound => e
        # This kills "Index does not exist" errors being written to console
        # by this: https://github.com/elastic/elasticsearch-rails/blob/738c63efacc167b6e8faae3b01a1a0135cfc8bbb/elasticsearch-model/lib/elasticsearch/model/indexing.rb#L268
      rescue => e
        STDERR.puts "There was an error creating the elasticsearch index for #{model.name}: #{e.inspect}"
      end
    end
  end
end

# Stop elasticsearch cluster after test run
config.after :suite do
  ActiveRecord::Base.descendants.each do |model|
    if model.respond_to?(:__elasticsearch__)
      begin
        model.__elasticsearch__.delete_index!
      rescue Elasticsearch::Transport::Transport::Errors::NotFound => e
        # This kills "Index does not exist" errors being written to console
        # by this: https://github.com/elastic/elasticsearch-rails/blob/738c63efacc167b6e8faae3b01a1a0135cfc8bbb/elasticsearch-model/lib/elasticsearch/model/indexing.rb#L268
      rescue => e
        STDERR.puts "There was an error removing the elasticsearch index for #{model.name}: #{e.inspect}"
      end
    end
  end
  cluster.stop if cluster.running?
end

Finally, I added elasticsearch: true to the RSpec block in the file spec/models/user_spec.rb. This is needed so that the code above can be invoked:

# spec/models/user_spec.rb
require 'rails_helper'

# Note the `elasticsearch: true` tag in this block:
RSpec.describe User, elasticsearch: true, type: :model do
  ... tests go here
end

Now let's get to testing!

Writing the Tests

I used rspec for my tests. I also integrated the FactoryBot gem for rails which makes creating test data much less time consuming in a few easy steps:

Add gem 'factory_bot_rails' to your test and development blocks in you Gemfile.
Run bundle install from the console.
Add this line of code to your rails_helper file:

RSpec.configure do |config|
  config.include FactoryBot::Syntax::Methods
end

Create spec/factories/user.rb in your file directory and add this code to the file:

FactoryBot.define do
  factory :user do
    sequence(:first_name) { |n| "Sam#{n}" }
    sequence(:last_name) { |n| "Smith#{n}" }
    sequence(:email) { |n| "samsmith#{n}@gmail.com" }
    address { '123 Happy St' }
    city { 'Denver' }
    zip_code { 80_015 }
    company { |n| "Smith&Co#{n}" }
    company_description { |n| "Smith&Co#{n} dry goods" }
  end
end

Now that your factories are set up, let's write the first test!

I decided to start with a test that would help me make sure that the version of the test cluster and the version in the dotfile aligned, so that we know we are testing the same version in test and in production.

This is what I added in the user_spec file inside a describe block:

it 'should have the right version' do
  es_version = User.__elasticsearch__.client.info['version']['number']
  dotfile = "#{File.open(".elasticsearch-version", &:readline).chomp}"
  expect(dotfile).to eql(es_version)
end

This is a great sanity test for us. If we start getting errors, we'll be able to see very plainly that it's due to a version mismatch between the cluster we're testing on, and the version of Elasticsearch we're targeting.

After I got this out of the way, I wanted to be able to know that the Elasticsearch::Model::Callbacks library is working as expected. This library injects Elasticsearch calls into the ActiveRecord lifecycle. The User model in our demo app uses it so that when a User record is created/updated/destroyed, the corresponding Elasticsearch document is automatically changed too.

Initially, the cluster should have no User records in it. In the user_spec file, inside a describe block, I tested this by expecting that a search would return 0 results:

it 'should initially have no User records' do
  expect(User.search('*:*').records.length).to eq(0)
end

To test what happens when we try and add a record, I created a different test that creates a user, refreshes the index, and then searches for the user that was just created:

it 'should update ES when the object is created' do
  user = create(:user)
  User.__elasticsearch__.refresh_index!
  expect(User.search("id:#{user.id}").records.length).to eq(1)
end

When this test passes, it means that we know that our indexing works. Yay for us!

We can also validate that deleting users in Rails will also delete the indexed record:

it 'should update ES when the object is destroyed' do
  user = create(:user)
  user.destroy!
  expect(User.search("id:#{user.id}").records.length).to eq(0)
end

That's a fairly reasonable set of tests to ensure that the records are created and destroyed as expected. Let's test searches by creating a few users and make sure that we can find them in Elasticsearch:

it 'should return correct results when queried' do
  user_1 = create(:user)
  user_2 = create(:user)
  user_3 = create(:user)
  
  User.__elasticsearch__.refresh_index!
  expect(User.search("first_name:#{user_1.first_name}").records.length).to eq(1)
  expect(User.search("first_name:#{user_1.first_name}").records.first.first_name).to eq(user_1.first_name)
end

By default, the elasticsearch-rails gem will create indices with 5 primary shards. This is significantly over-provisioned for most users, and we recommend overriding this to use a single primary shard. In our demo app, we configured the User model with:

settings index: { number_of_shards: 1 }

We can check this setting is being respected with a test like this:

it 'creates the right number of primary shards' do
  model_shards = User.settings.to_hash.dig(:index, :number_of_shards)
  es_shards = User.__elasticsearch__.client.perform_request(:get, 'users/_settings').body.to_hash["users"]["settings"]["index"]["number_of_shards"].to_i
  expect(model_shards).to eq(es_shards)
end

You can use a similar approach to test your mappings (or number of replicas).

This seems like a pretty reasonable time to run RSpec. If all tests are green, your Elasticsearch implementation is working. Go you!

Wrapping up

In closing, I found the process for testing fun but also challenging. I now understand why there aren't a lot of resources on testing Elasticsearch since the process involves quite a few "gotchas". I hope this tutorial can shed some light on those pitfalls and help people in their testing journey.

I can't wait to write more tests that allow me to test my customization of Elasticsearch for my projects. If you have any settings you are excited to see tested, just let me know at [email protected]. I love a challenge. If you'd like to see this in action, check out the demo app I used for writing these tests.

Ready to take a closer look at Bonsai?

Find out if Bonsai is a good fit for you in just 15 minutes.

Learn how a managed service works and why it’s valuable to dev teams

You won’t be pressured or used in any manipulative sales tactics

We’ll get a deep understanding of your current tech stack and needs

Get all the information you need to decide to continue exploring Bonsai services