Search Snippets

Updates from the Bonsai Elasticsearch team, from One More Cloud: the first and best cloud search platform, since 2009.

Mar 04, 2021

Why Elasticsearch should not be your Primary Data Store

Maddie Jones, Customer Success Engineer • in Search 8 min read

During day-to-day conversations with customers, we often encounter people that either want to use Elasticsearch as their primary data store or that have already decided to use it that way. But this is actually something we discourage. Below, I will explain a few of the reasons why we discourage using Elasticsearch as your application’s primary data store.

It is a search engine not a database

Search engines serve a fundamentally different purpose than a database. Most databases are ACID compliant. Elasticsearch is not which means it is inherently riskier to use it like a database. Among other idiosyncrasies, Elasticsearch offers atomicity only on a per-document basis, not on a transaction basis.

To understand the problem, let’s look at a real-world scenario—a transaction with your bank account. A customer makes a purchase and the amount is debited (removed) from their account balance and then credited (added) to the vendor’s account balance. If one of these operations fails, say, because the customer doesn’t have enough funds, then neither account should be modified. Otherwise, this could lead to the vendor being credited with money that wasn’t debited from anywhere, which would be a problem (unless you’re the lucky vendor!).

With an ACID-compliant data store, each transaction ensures that all operations succeed or fail at once, keeping the database in a consistent state. But, Elasticsearch doesn’t provide this option. It’s possible to issue a Bulk call that reduces a count for a customer record and increases a count for a vendor record, and if one fails, the other might succeed. This can really mess things up.

Search engines and databases perform differently

Elasticsearch focuses on making data available in “near real-time.” In order to do that, it requires making engineering choices focused on speed rather than perfectly reliable results. This means there are a number of tradeoffs under the hood where consistency is sacrificed for expediency. Inconsistent and partial results are much more of a possibility with Elasticsearch than with a database.

Think of a database as a practical sedan and a search engine as a high-end motorcycle. They’ll both get you where you’re going, but one is slow, practical, and safe, while the other is fast, esoteric, and more dangerous. Some jobs need safety, some need speed; it’s important to be mindful of your risk tolerance for different parts of your application and pick a data store accordingly.

Elasticsearch is the high end motorcycle. It’s better to use Elasticsearch to only host data you’ll need to search quickly, and let a database host anything that needs permanence, transactions, consistency, etc.

It Reduces Fault Tolerance

A fault tolerant system is one with the ability to continue operating properly in the event of a failure of one or more components. When your entire data storage system is simply an Elasticsearch cluster, you have minimal tolerance. The benefits of a cluster offers some mitigation against absolute failure, but a data storage system comprised of multiple stores offers a far more reliable and fault tolerant system.

I’ve mentioned a few points so far about how Elasticsearch differs from a traditional database, and how this can affect the integrity of your data. Elasticsearch can safely tolerate the loss of a node, and maybe two. That depends on how many nodes make up the cluster, how many replicas are set up, etc. But there are conditions where a cluster can be online and operational, but still fail to serve requests. If this happens, then your application can become virtually unusable because it’s not able to reliably access the data store system.

As an example, we had a user who built their entire application and sales pipeline around Elasticsearch — and only Elasticsearch. Originally, they put everything on a single replicated shard. Shortly after launch, their app grew quickly and the disks on 2 of their 3 nodes filled up. One of those disks hit a high watermark threshold and began actively moving a massive shard to the remaining node. This ended up creating a cascading failure that more or less blocked them from using the cluster until we were able to get them on to larger hardware.

A consequence of this design? They were unable to record any sales for a couple hours right as they experienced the fastest growth they’d ever had. The lost revenue far outweighed any savings they achieved by avoiding Postgres or MySQL. If the data store system had been Elasticsearch backed by Postgres/MySQL/Microsoft SQL Server, then they could have continued to record sales even while Elasticsearch was experiencing a disruption.

Changing Data Structures is Complicated

Elasticsearch uses mappings and settings to define how data will be ingested and stored on disk. It’s possible to change these on the fly, but with a caveat: changes to a field mapping are not retroactive. So, if you define a field as a “string,” then index some data, then redefine that field as an “int,” Elasticsearch will gladly oblige. And then randomly vomit up errors with bizarre and inscrutable exception messages.

Unlike a database that can validate a migration before committing any changes, Elasticsearch usually requires a full reindex from the data source to safely apply those changes. The Reindex API is designed to help with this, but there are a number of caveats. The whole process becomes pretty involved if you have a large amount of data, especially if Elasticsearch is your only data store.

For example, if there is a significant amount of data, you might even not be able to perform a full reindex without first migrating to larger servers. You may also need to coordinate a maintenance window for your app. And even if you don’t need to vertically scale, using the Reindex API can lead to temporary performance issues in production.

Where Elasticsearch Shines

I’ve written a lot here focused on what it doesn’t do, and how it doesn’t stack up to an ACID-compliant data store. But I’m really not trying to throw shade. Elasticsearch is an amazing technology and fantastic at what it does. So, where does Elasticsearch really shine?

The engineering choices that make Elasticsearch a poor choice for a traditional database also make it superior for all the things databases don’t do. For example, suppose you have an application that has a lot of users, and you want a search feature that allows you to search them by name or by email.

With a database, you might need to do something like:

SELECT *
FROM users
WHERE first_name LIKE '%<user-query>%'
  OR last_name LIKE '%<user-query>%'
  OR name LIKE '%<user-query>%'
  OR email LIKE '%<user-query>%'

With Elasticsearch, this query could simply be:

GET /users/_search?q=<user-query>

In a test environment using 100k randomly generated records in Postgres, indexed into a local Elasticsearch cluster, I was able to get the first query to execute in 48 milliseconds, compared to 2 milliseconds in Elasticsearch. A 46 millisecond difference may seem small, but from a computer’s standpoint, it’s the difference between a ’78 Gremlin and a Ducati Panigale.

Elasticsearch also excels with real-time analytics. It is able to leverage this speed and efficiency to perform a significant number of calculations in a very short time. This allows developers to shift their paradigm from writing queries to asking questions. Questions like: “How many non-admin users have created an account and logged in at least 3 times over the past week, by geographic region, and by MRR?

When there are millions (or billions) of records involved, Elasticsearch can answer these kinds of questions in seconds or minutes. Most traditional databases can answer it much later.

Conclusion

Though Elasticsearch doesn’t do well as a data store, it shines in so many other ways. It’s important to understand when to use Elasticsearch and when to look somewhere else so your cluster never goes down.

At Bonsai, we offer the industry’s only Evergreen Promise to keep your clusters up and running. Our suite of disciplined automation can keep your team focused on high-value projects without the operational distractions of Elasticsearch. All Bonsai Elasticsearch clusters are deployed with a 3-node cluster by default — this ensures cost-effective plans to keep your clusters available, redundant, and performing well. 24/7 operational coverage and support is included with all subscriptions.

Have you had any good or bad experiences using Elasticsearch as a primary data store? Tweet your hot takes to us at @bonsaisearch. And as always, happy searching!

Find out how we can help you.

Schedule a free consultation to see how we can create a customized plan to meet your search needs.