The Problem
One of the biggest hurdles a search developer comes across is how to get data from one cluster into a new one. In a perfect world we would have fast and reliable reindexing scripts to quickly teardown and/or rebuild indices. A good example of this pattern is in the elasticsearch-rails gem's import tasks. See also a more in-depth example of an Indexer class in the search for Jekyll gem, searchyll.
Sometimes the best case is not always possible, either from accumulated tech debt or contextual constraints. For those in on our Sandbox and Standard plans, this problem is compounded in that in an effort to keep these plans accessible, the Snapshot API is not available on demand. Read more in our write up on this here. Particularly for those on non-production plans such as our Sandbox plan, backups aren't taken regularly. In this case, what options are available? Let's explore a couple of strategies.
Possible Solutions
There are two solutions to reindexing and/or migrating your cluster in a situation where both the Snapshots API isn't available. The first is to use the elasticsearch-dump library, and the second is to manage it with a custom solution. Regardless of which way you chose, you'll need to follow this larger process:
- Download your mappings.
- Download a copy of your old cluster data, or design and implement indexing scripts to do it on your own.
- Re-create your indices and the mappings on your new cluster.
- Index your data on the new cluster.
elasticsearch-dump
elasticsearch-dump is a mature javascript library that has been around through nearly every release of Elasticsearch. It can download data and mappings, migrate between clusters directly, and do all sorts of imports and exports necessary for the search engineer's workflow.
The process for getting started is simple:
Download the library:
npm install elasticdump
Copy over mappings and data into a new cluster, either through a download and reindex or via urls.
Here's an example of what a migration might look like:
# Backup index data to a file:
elasticdump \
--input=https://key:[email protected]:443/my_index \
--output=/data/my_index_mapping.json \
--type=mapping
# Index the data into your cluster with the file:
elasticdump \
--input=/data/my_index.json \
--output=https://key:[email protected]:443/my_index \
--type=data
You'll need to use your cluster credentials to access your index from a terminal session. See our docs on Cluster Credentials here.
Managing your own reindex
Much of what elasticdump does can be manually written if necessary, using curl or whatever language you prefer. For example, downloading mappings can be done using curl:
curl -XGET "[https://key:[email protected]:443/_mapping?pretty=true](https://key:[email protected]:443/_mapping?pretty=true)" > mappings.json
And later, with a new cluster, you can PUT your new mappings to its corresponding index:
curl -XPUT "[https://key:[email protected]:443/index_name/_mapping](https://key:[email protected]:443/index_name/_mapping)" \
-H 'Content-Type: application/json' \
-d @mappings.json
It's important to note that the downloaded mappings will have to be edited or pieced apart to PUT to the new indices. In the case of managing your reindex yourself, either dump the index data with elasticdump above, or create scripts to reindex straight from your database.
Further Resources
Depending on how many versions you are upgrading you'll need to navigate breaking changes between versions, like the drop of _doc types in v6.x. There is extensive coverage of breaking changes in the Elasticsearch documentation. See also our guides on moving from major versions:
- Guide on Upgrading Major Versions with Search
- Upgrading to ES 7
- Upgrading to ES 6
- Upgrading to ES 5
- Upgrading to ES 2
We've seen it all and are here to help. Please reach our to [email protected] and we'll point you in the right direction. Cheers!
Ready to take a closer look at Bonsai?
Find out if Bonsai is a good fit for you in just 15 minutes.