Jul 25, 2019

Designing app search != designing a database

Dru Sellers

•

Search

•

5

min read

What to consider when designing your application’s search functionality

Elasticsearch is a popular tool for application search. It has a short learning curve and a plethora of tools and clients to help integrate it into new projects and existing apps. That’s probably why it now boasts tens of millions of users, including tech superstars such as Netflix, Slack, SoundCloud, StackOverflow, and many more.

But building app search is kind of like snowboarding. It’s easy to get up and running, but not necessarily easy to master. It’s easy to overlook many of the features within Elasticsearch that can help you deliver a great search experience. Since most developers are using a database as their mental model for search, this document explores:

why the requirements for app search are different than requirements for a database and
what to consider when designing your requirements.

Different tools for different requirements

Search engines and databases perform similar functions on the surface: you push data in, and then it can be queried later. But these technologies were designed to address very different problems and constraints. With a database, the primary goal is “query millions of records by id or a range of numbers”, whereas a search engine’s primary goal is “query millions of records for relevant results by a value in the record.”

Why are databases not the best search engines?

Databases were designed to access data by a “key”, and databases can do this extremely well. Looking up a record by its primary key is an optimized solution in databases. We are always asking a database to find me the row in the persons table where the id equals 42. Databases are also really good at giving you all records between a given set of numbers.

However, databases are terrible at finding records where the word “milk” is somewhere in the notes field of the persons table. For this we need a new tool that is optimized for this task, let’s welcome the search engine. Search engines use a special data structure called an Inverted Index to achieve this record breaking speed. Because this is such a different way of thinking about how to access data we have to think about designing our data with a different mind set.

What are the requirements for a good search experience?

Since Elasticsearch must think about our data in such a different way than our databases do, we also have to change the way we think about our data. Additionally, in order to provide a great search experience we have to consider brand new concepts like misspellings that are non-existent in a database.A good search engine should:

Understand that we are dealing with fallible humans and correct for typos.
Understand that different people will use different words to find the same concepts. For instance, a “crisp” and a “chip” describe the same thing.
Effectively rank results with little real context, which is a fascinating world into itself.
Understand that a plural of a word and the singular form should return the same documents, a process handled in the tokenization phase.
Require as little work as possible on search, so we need to front load all of the work at the time of writing, not during the query.
Remove low value words to improve the quality of search results.
Handle ambiguities that are otherwise hard for computers like “milk chocolate” and “chocolate milk.”
Understand that we use different words when we search for things versus when we describe them.

And the hardest part? All of this has to happen in milliseconds.

A good search experience doesn’t just involve your Elasticsearch index though. Content and UX also come into play. Are you planning to use faceting to surface search categories, or are you keeping things simple with just a search box? You’ll have to choose what data will be included for searching and what will be included for drawing the UI. None of these are factors when designing a database.

Database + Elasticsearch = <3

So how do you choose a data store to use for your app that is reliable, but is also fast and can handle a variety of search queries? We recommend using both. In this arrangement, the database acts as a reliable “source of truth,” from which the Elasticsearch cluster is populated.This approach has a couple of benefits:

The transactions that make it into Elasticsearch have already been validated by the database
If anything happens to the cluster, it can be repopulated from a more durable source; and
You can conduct blue-green deployments much more easily.

Questions to Consider When Designing your Requirements

Now that we’ve established how your search requirements are different than database requirements, what should your team decide before digging too deep into the Elasticsearch API?

What will our search UX look like?
What fields will be stored in the index for showing on the UI vs what fields are just needed to support search?
How often will we test and tweak our mappings to improve search relevance?

Understanding these requirements can take time, but will ultimately help your team in the long run.

Find out how we can help you.

Schedule a free consultation to see how we can create a customized plan to meet your search needs.

Schedule a consultation

Designing app search != designing a database

What to consider when designing your application’s search functionality

Different tools for different requirements

Why are databases not the best search engines?

What are the requirements for a good search experience?

Database + Elasticsearch = <3

Questions to Consider When Designing your Requirements

Next post

Why improving search feels impossible and how a new architecture can get you unstuck

Small Dataset, Big Results: Upgrading Search with Limited Content

What is OpenSearch? And why you should use it

What is Elasticsearch? And why you should use it

How to Reduce the Number of Shards in Elasticsearch: A Shard Shrinking Guide

The search renaissance is here (but the present is still medieval)

4 Ways to Reduce Costs and Improve Performance with Elasticsearch

Log4J Debrief: How Bonsai Handled the Zero-Day Vulnerability

5 Principles of Search

One Word to Help You Build a More Scalable Search Engine: Precomputation

Want to build a powerful search engine? Start with the user experience

Slow search engine? CPU is probably your biggest bottleneck

Comparison of Elasticsearch Ruby Gems

Releasing a Major UI Rewrite

Announcing Okta Enterprise Availability

Up and Running with OpenSearch

Welcome to OpenSearch

Migrating from Elasticsearch to OpenSearch

The Importance of Shard Math in Elasticsearch

Pathological Regular Expressions in Elasticsearch

Why Elasticsearch should not be your Primary Data Store

Free Clusters are Getting an Upgrade!

Managed vs. Hosted: Setting a Standard

What I've Learned About Remote Work During the COVID-19 Pandemic

Open letter to Bonsai customers

Listening to your users

Of Millies and Minutes

How To Test Your Elasticsearch Integration with RSpec

Collaborate on search at a more rapid pace with Search Clips