Dec 7, 2021
The most efficient way to create anything complex is to begin with the first principles.
Creating a search engine is indeed a complex task. And the more significant the dataset, the more potentially complex your search experience. Fortunately, the first principles of a search engine have been known for decades—even before digital search engines like Google existed.
It all comes down to an indexing philosophy and tool called precomputation. This article will define precomputation and explain how to use it to build a scalable and user-friendly search engine.
Definition: Precomputation is the organizing structure through which an algorithm sorts and categorizes its data. In the case of search engines, which usually exist to sort and process massive amounts of data, precomputation is about feature categorization. The more work you do upfront to precompute the features of your index, the faster, simpler, and more user-friendly your search experience will be.
In other words, the more data your search engine is expected to process and sort through, the more work you should put into precomputation. Precomputation is the foundation of a scalable search experience.
In precomputation, more happens below the surface than above. The simpler, faster, and clearer your search experience, the more precomputation was performed ahead of time. It takes a lot of work to make something look easy.
Precomputation—and indexing in general—has been around for a long time. As long as there have been books, there have been ways of categorizing information so that people can quickly find the features or content they’re looking for.
If you’re trying to process or store lots of information, you need a way to quickly surface relevant information. This might mean using an index or catalog, for example. As information has gone from the physical domain (like libraries and books) into the digital domain, we need tools to go with that. Enter search.
Precomputation starts with finding the features that need to be categorized and listed within the search experience. In most search engines, this means creating a list of keywords. These keywords become terms in your index that can be surfaced during a relevant query.
While keywords are the most well-known feature in most indexes, they are not the only feature. Some precomputation work requires indexing features such as color or file size. The variants depend on the type of data that’s being searched. (That’s why a powerful search engine starts with search experience, not the data.)
What are you trying to sort or filter your content by? In the context of a bookshelf, your books can be organized dozens of different ways. You can categorize by author, color, genre, etc. You select a sorting method based on your preferred experience looking for your favorite books.
To summarize: Curating the content is a lot of the work that goes into precomputation. Which features of the data you’re working with are you turning into an index?
The work of precomputation is not glamorous. It can be time-consuming. It requires you to do some hard thinking and deep research into what features matter most to people who will be using your search engine.
So, is precomputation actually necessary? What happens when companies skip or gloss over the laborious precomputation step?
As a rule of thumb: the worse your starting index, the more complex your queries need to be to populate relevant results.
Complex queries have many problems. They are hard to change in the future, especially if the employee who wrote the queries moves onto another company. If you intend to scale your index at all in the future, your problems multiply because the complexity scales alongside every introduction of new data. The complexity also affects performance, resulting in a slower and generally worse search experience.
In short: glossing over precomputation means getting a worse search engine, faster. Skipping this step offers a short-term benefit, with problems that will haunt your business as you scale the index or search experience later.
Better to do it right from the beginning because...
Before deciding which search engine to use, you must determine the information you’d like to include in your index. Precomputation is the ground floor. It’s the foundation on which the rest of your search experience is built.
Once you know the features that matter to your users and organize them according to user needs, you can begin to build a scalable search experience.