Nov 19, 2015

Can I have multiple indices on a shard?

Bonsai

•

Best Practices

•

5

min read

We sometimes hear from users concerned about their shard counts. They ask how they can instruct Elasticsearch to add multiple indices to a single shard. This is a surprisingly difficult question to answer in full because the terminology can confuse the issue quite a bit.

In Elasticsearch, every index maps to one or more shards, but the reverse is not true for some pretty sound technical reasons. However, changes to data mapping do afford users the ability to logically separate data on a single shard, thus producing the effect of multiple indices per shard for many use cases.

If that explanation sounds confusing, don’t worry. In this post, we’re going to lay out the issue and solution as easily as possible.

The problem

‍Lucene is the Java-based information retrieval software that forms the backbone of Elasticsearch. Elasticsearch adds a few layers of abstraction on top of Lucene to provide features that are extremely useful to production applications: aggregations, analysis, distributed search, and more, all controlled through a well-documented and easy to use RESTful API. However, with these features comes a richer glossary of terms, with slightly more subtle definitions.

The biggest point of confusion for users is the word “index.” This word means different things, depending on the context. It can be used:

As a noun in the context of Lucene
As a noun in the context of Elasticsearch
As a verb in the context of either Lucene or Elasticsearch

Catch that? A Lucene index and an Elasticsearch index are very different things, even though Elasticsearch implements Lucene. In the context of Lucene, an index is basically a Lucene instance. In the context of Elasticsearch, an index is… well, read on.

I mentioned earlier that Elasticsearch offers a rich glossary, with terms like “shard,” “index,” and “type,” which are closely related but distinct concepts. So what distinguishes a shard from an index from a type in Elasticsearch? In the Elasticsearch lexicon:

A shard is a single Lucene instance
An index is a logical namespace that points to one or more shards
A type is a piece of metadata used to logically distinguish between documents of different types within an index
A document in Elasticsearch is essentially a Lucene document with some injected metadata

To distill this all down:

An Elasticsearch shard is a Lucene index. An Elasticsearch index is a collection of one or more Lucene indices. An Elasticsearch type is a bit of metadata injected into the document which allows users to organize different types of documents within a single collection of Lucene indices.

Pedantry! Just tell me what to do!!

Let’s say I have an application and I want to be able to search all of my users. My application has three kinds of users: administrators, moderators and customers. There are a couple of ways I could organize my data. I could create an Elasticsearch index for each type:

PUT localhost:9200/administrators/administrator/1 -d <some fields> PUT localhost:9200

<pre><code>/moderators/moderator/1/ -d <some fields> PUT localhost:9200/customers/customer/1 -d <some fields> # Search for customers: GET localhost:9200/customers/_search </code></pre>

‍

In this configuration, I have three indices, one for each type of user. Elasticsearch therefore requires a minimum of 3 shards, one for each index. This is kind of a waste, because each index only contains documents of a single type, and Elasticsearch offers the ability have multiple types on an index.

If I want my index to use fewer shards, I could instead create a single index and use types to distinguish between different users. Maybe my index could be called accounts, and my types would be the same as my user types:PUT localhost:9200/accounts/administrator/1 -d <some fields>

<pre><code>PUT localhost:9200/accounts/moderator/1 -d <some fields> PUT localhost:9200/accounts/customer/1 -d <some fields> # Search for customers: GET localhost:9200/accounts/customer/_search </code></pre>

In the second configuration, I have one index containing all three types of users. Elasticsearch therefore only requires a minimum of one shard, and uses said types to scope search results to a particular class of user. My goal of reducing the shard count has been completed.

In closing

In this post, we focused more on the basic problem and solution to a particular concern without really addressing the underlying question of why you can’t have multiple indices on a shard. The short answer is that Elasticsearch is magic, and you should just accept this as a reality.

A slightly longer, more technical answer is that it spins up fresh Lucene instances (shards) when an index is created, and handles the coordination of requests between those shards. Elasticsearch takes this approach to make sure that data is spread out for both performance and integrity reasons. That’s why it doesn’t make sense to have multiple indices per shard, because that is essentially asking why you can’t have multiple Lucene instances per Lucene instance, when the Lucene instance is the lowest possible worker unit. That’s also why types exist. Types are used to achieve the functionality of multiple indices per shard, but the terminology can confuse the issue by masking what’s going on internally.

Hopefully this has all helped to clarify the issue. If you have questions, concerns, hate mail or songs of praise, feel free to gives us a shout!

Find out how we can help you.

Schedule a free consultation to see how we can create a customized plan to meet your search needs.

Schedule a consultation

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Deny Accept

Can I have multiple indices on a shard?

The problem

To distill this all down:

In closing

Next post

Heroku and Bonsai, a Winning Search Combination

Supercharge Your NestJS App with Hosted Search

Search 101: Event Queue, Streaming and Buffering Best Practices

Introducing Bonsai's Terraform Provider for Elasticsearch and OpenSearch

What AI Engineers Should Know about Search

Why improving search feels impossible and how a new architecture can get you unstuck

Small Dataset, Big Results: Upgrading Search with Limited Content

What is OpenSearch? And why you should use it

What is Elasticsearch? And why you should use it

How to Reduce the Number of Shards in Elasticsearch: A Shard Shrinking Guide

The search renaissance is here (but the present is still medieval)

4 Ways to Reduce Costs and Improve Performance with Elasticsearch

Log4J Debrief: How Bonsai Handled the Zero-Day Vulnerability

5 Principles of Search

One Word to Help You Build a More Scalable Search Engine: Precomputation

Want to build a powerful search engine? Start with the user experience

Slow search engine? CPU is probably your biggest bottleneck

Comparison of Elasticsearch Ruby Gems

Releasing a Major UI Rewrite

Announcing Okta Enterprise Availability

Up and Running with OpenSearch

Welcome to OpenSearch

Migrating from Elasticsearch to OpenSearch

The Importance of Shard Math in Elasticsearch

Pathological Regular Expressions in Elasticsearch

Why Elasticsearch should not be your Primary Data Store

Free Clusters are Getting an Upgrade!

Managed vs. Hosted: Setting a Standard

What I've Learned About Remote Work During the COVID-19 Pandemic

Open letter to Bonsai customers

Listening to your users

Of Millies and Minutes

How To Test Your Elasticsearch Integration with RSpec

Collaborate on search at a more rapid pace with Search Clips

Designing app search != designing a database

Bonsai now supports Elasticsearch 7.2

Deploy Bonsai in your own AWS account with Bonsai Vaults

Increase Website Profits Using Elasticsearch Boost

How We Built & Designed Operational Metrics

Rails 5 with Bonsai

Now Supporting 2.x!

Introducing: Live Streaming Logs

Logstash and Bonsai and bots, oh my!

The Ideal Elasticsearch Index