Jul 25, 2019
Elasticsearch is a popular tool for application search. It has a short learning curve and a plethora of tools and clients to help integrate it into new projects and existing apps. That’s probably why it now boasts tens of millions of users, including tech superstars such as Netflix, Slack, SoundCloud, StackOverflow, and many more.
But building app search is kind of like snowboarding. It’s easy to get up and running, but not necessarily easy to master. It’s easy to overlook many of the features within Elasticsearch that can help you deliver a great search experience. Since most developers are using a database as their mental model for search, this document explores:
Search engines and databases perform similar functions on the surface: you push data in, and then it can be queried later. But these technologies were designed to address very different problems and constraints. With a database, the primary goal is “query millions of records by id or a range of numbers”, whereas a search engine’s primary goal is “query millions of records for relevant results by a value in the record.”
Databases were designed to access data by a “key”, and databases can do this extremely well. Looking up a record by its primary key is an optimized solution in databases. We are always asking a database to find me the row in the persons table where the id equals 42. Databases are also really good at giving you all records between a given set of numbers.
However, databases are terrible at finding records where the word “milk” is somewhere in the notes field of the persons table. For this we need a new tool that is optimized for this task, let’s welcome the search engine. Search engines use a special data structure called an Inverted Index to achieve this record breaking speed. Because this is such a different way of thinking about how to access data we have to think about designing our data with a different mind set.
Since Elasticsearch must think about our data in such a different way than our databases do, we also have to change the way we think about our data. Additionally, in order to provide a great search experience we have to consider brand new concepts like misspellings that are non-existent in a database.A good search engine should:
And the hardest part? All of this has to happen in milliseconds.
A good search experience doesn’t just involve your Elasticsearch index though. Content and UX also come into play. Are you planning to use faceting to surface search categories, or are you keeping things simple with just a search box? You’ll have to choose what data will be included for searching and what will be included for drawing the UI. None of these are factors when designing a database.
So how do you choose a data store to use for your app that is reliable, but is also fast and can handle a variety of search queries? We recommend using both. In this arrangement, the database acts as a reliable “source of truth,” from which the Elasticsearch cluster is populated.This approach has a couple of benefits:
Now that we’ve established how your search requirements are different than database requirements, what should your team decide before digging too deep into the Elasticsearch API?
Understanding these requirements can take time, but will ultimately help your team in the long run.