What to consider when designing your application’s search functionality
Elasticsearch is a popular tool for application search. It has a short learning curve and a plethora of tools and clients to help integrate it into new projects and existing apps. That’s probably why it now boasts tens of millions of users, including tech superstars such as Netflix, Slack, SoundCloud, StackOverflow, and many more.
But building app search is kind of like snowboarding. It’s easy to get up and running, but not necessarily easy to master. It’s easy to overlook many of the features within Elasticsearch that can help you deliver a great search experience. Since most developers are using a database as their mental model for search, this document explores:
- why the requirements for app search are different than requirements for a database and
- what to consider when designing your requirements.
Different tools for different requirements
Search engines and databases perform similar functions on the surface: you push data in, and then it can be queried later. But these technologies were designed to address very different problems and constraints. With a database, the primary goal is “query millions of records by id or a range of numbers”, whereas a search engine’s primary goal is “query millions of records for relevant results by a value in the record.”
Why are databases not the best search engines?
Databases were designed to access data by a “key”, and databases can do this extremely well. Looking up a record by its primary key is an optimized solution in databases. We are always asking a database to find me the row in the persons table where the id equals 42. Databases are also really good at giving you all records between a given set of numbers.
However, databases are terrible at finding records where the word “milk” is somewhere in the notes field of the persons table. For this we need a new tool that is optimized for this task, let’s welcome the search engine. Search engines use a special data structure called an Inverted Index to achieve this record breaking speed. Because this is such a different way of thinking about how to access data we have to think about designing our data with a different mind set.
What are the requirements for a good search experience?
Since Elasticsearch must think about our data in such a different way than our databases do, we also have to change the way we think about our data. Additionally, in order to provide a great search experience we have to consider brand new concepts like misspellings that are non-existent in a database.
A good search engine should:
- Understand that we are dealing with fallible humans and correct for typos.
- Understand that different people will use different words to find the same concepts. For instance, a “crisp” and a “chip” describe the same thing.
- Effectively rank results with little real context, which is a fascinating world into itself.
- Understand that a plural of a word and the singular form should return the same documents, a process handled in the tokenization phase.
- Require as little work as possible on search, so we need to front load all of the work at the time of writing, not during the query.
- Remove low value words to improve the quality of search results.
- Handle ambiguities that are otherwise hard for computers like “milk chocolate” and “chocolate milk.”
- Understand that we use different words when we search for things versus when we describe them.
And the hardest part? All of this has to happen in milliseconds.
A good search experience doesn’t just involve your Elasticsearch index though. Content and UX also come into play. Are you planning to use faceting to surface search categories, or are you keeping things simple with just a search box? You’ll have to choose what data will be included for searching and what will be included for drawing the UI. None of these are factors when designing a database.
Database + Elasticsearch = <3
So how do you choose a data store to use for your app that is reliable, but is also fast and can handle a variety of search queries? We recommend using both. In this arrangement, the database acts as a reliable “source of truth,” from which the Elasticsearch cluster is populated.
This approach has a couple of benefits:
- The transactions that make it into Elasticsearch have already been validated by the database
- If anything happens to the cluster, it can be repopulated from a more durable source; and
- You can conduct blue-green deployments much more easily.
Questions to Consider When Designing your Requirements
Now that we’ve established how your search requirements are different than database requirements, what should your team decide before digging too deep into the Elasticsearch API?
- What will our search UX look like?
- What fields will be stored in the index for showing on the UI vs what fields are just needed to support search?
- How often will we test and tweak our mappings to improve search relevance?
Understanding these requirements can take time, but will ultimately help your team in the long run.