Search Snippets

Updates from the Bonsai Elasticsearch team, from One More Cloud: the first and best cloud search platform, since 2009.

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest

Migrating from Elasticsearch to OpenSearch

This post is the third installment of our series on OpenSearch. In this post, we’re going to explore some different ways of migrating data from Elasticsearch to OpenSearch.

If you want to skip around to the other posts in this series, they can be found here:

  1. Welcome to OpenSearch
  2. Up and Running With OpenSearch
  3. How to migrate from an Elasticsearch cluster to OpenSearch

Note: While drafting this document, I tested against the OpenSearch 1.0.0 release candidate, and both Elasticsearch 7.10.2 (the last Apache 2 release from Elastic, from which OpenSearch is forked), and Elasticsearch 7.13.3 (the latest version available, released under the Elastic License 2.0 and SSPL). OpenSearch is capable of migrating Elasticsearch snapshots from version 6.0.0 to 7.11.2; if your Elasticsearch data predates 6.0.0, you’ll need to use the Reindex API or some other workaround.

Conventions

For simplicity and purposes of this illustration, this post is going to use examples where the clusters involved are running on a local machine. My local Elasticsearch cluster was running on the standard port 9200, while my local OpenSearch cluster was running on port 9090.

There are some minor configuration changes needed to accommodate this set up, but I won’t go into them here because it isn’t germane to the discussion of getting data out of Elasticsearch and into OpenSearch. However, if you need to have OpenSearch running on a different port, you’ll simply need to add http.port: 9090 (or whatever port you want) to config/opensearch.yml and restart the server.

Using Snapshots

The fastest, easiest, and most efficient way to migrate data in Elasticsearch is via the Snapshot API. The OpenSearch Snapshot API is fully compatible with the Elasticsearch Snapshot API, up to Elasticsearch 7.12.0. (Sadly, snapshots taken in Elasticsearch 7.12.0 and beyond are not yet compatible with OpenSearch 1.0.0, due to a change Elastic shipped related to repository UUIDs).

Migrating from Elasticsearch to OpenSearch only requires a snapshot repository to be shared between the two clusters. A snapshot of the Elasticsearch data stored in this repository will then be available and restorable in the OpenSearch repository!

There are a number of different ways to set up an Elasticsearch snapshot repository, and this article won’t go into that in depth. For the purposes of illustration, it’s simply assumed that there is a local directory set up and registered in the Elasticsearch cluster with the repository name “backups”.

Calling /_snapshot/backups in the Elasticsearch cluster returns something that looks like this:

GET localhost:9200/_snapshot/backups
{
  "backups" : {
    "type" : "fs",
    "settings" : {
      "location" : "/mnt/snapshots"
    }
  }
}

These exact same settings can be used to register the same repository in OpenSearch:

PUT localhost:9090/_snapshot/backups
{
  "type": "fs",
  "settings": {
    "location": "/mnt/snapshots"
  }
}

Now, both clusters have access to the same repository! Suppose you want to migrate an index called production_data. You can take a snapshot in Elasticsearch like this:

PUT localhost:9200/_snapshot/backups/elasticsearch_backup
{
  "indices": "production_data"
}

And it will then be visible in OpenSearch:

GET localhost:9090/_snapshot/backups/elasticsearch_backup
{
  "snapshots" : [
    {
      "snapshot" : "elasticsearch_backup",
      "uuid" : "1-A-67XcRAOV-opwNi8YqA",
      "version_id" : 7100299,
      "version" : "7.10.2",
      "indices" : [
        "production_data"
      ],
      "data_streams" : [ ],
      "include_global_state" : true,
      "state" : "SUCCESS",
      "start_time" : "2021-07-08T16:32:52.404Z",
      "start_time_in_millis" : 1625761972404,
      "end_time" : "2021-07-08T16:33:40.032Z",
      "end_time_in_millis" : 1625762020032,
      "duration_in_millis" : 47628,
      "failures" : [ ],
      "shards" : {
        "total" : 6,
        "failed" : 0,
        "successful" : 6
      }
    }
  ]
}

Because OpenSearch can now see the snapshot in the shared repository, it can restore the data to itself:

POST localhost:9090/_snapshot/backups/elasticsearch_backup/_restore
{
  "accepted": true
}

Now, simply monitor the recovery with the /_cat/recovery endpoint:

GET localhost:9090/_cat/recovery?active_only=true

That’s it! Your Elasticsearch data has been successfully migrated to OpenSearch!

Using the Reindex API

Well, what if you want to migrate your Elasticsearch data from a 5.x/6.x or 7.12.0+ cluster to OpenSearch? Or what if you only want to migrate a small piece of your data just to experiment with OpenSearch? Reindex API to the rescue!

The Reindex API works by polling one index and pushing the results into another index. It is possible for the source data to be on another cluster, and, it turns out, even on another search platform.

While not required, it’s usually a good idea to pull down your mappings and settings from the source index and use them to create the target index before running the Reindex API. This ensures that the data will be treated exactly the same in each index.

There are a few ways to do this. If you’re working in a terminal, you can create the new index in OpenSearch with curl and jq:

curl -s -XPUT -H'Content-type: application/json' "localhost:9090/production_data" '$(
  curl -s localhost:9200/production_data | jq '
    del(
      .production_data.settings.index.uuid,
      .production_data.settings.index.creation_date,
      .production_data.settings.index.version,
      .production_data.settings.index.provided_name
    ) | .production_data'
)'

That will create the new index in the OpenSearch cluster with identical settings and mappings to what is in the Elasticsearch cluster. Now, data can be moved from Elasticsearch to OpenSearch with a simple call to the Reindex API:

curl -s -XPOST -H'Content-type:application/json' "http://localhost:9090/_reindex?wait_for_completion=false" -d {
  "source": {
    "remote": {
      "host": "http://localhost:9200"
    },
    "index": "production_data",
    "size": 1000
  },
  "dest": {
    "index": "production_data"
  }
}

That will return a Task ID that you can use to monitor the progress. Migrating data this way is much slower than it is with the Snapshot API, but there are some benefits. You can pass a query into the request so that only a subset of your data is pushed into OpenSearch. You can also specify a different number of primary shards so the new index has a better shard scheme. And of course, you can support migration of data from other versions of Elasticsearch than 7.0-7.10.2.

Wrapping Up

Migrating data in production can be a complex issue, and everyone is going to have different constraints, risk tolerances and sensitivity to downtime. Fortunately, moving data from Elasticsearch to OpenSearch is a conceptually straightforward process. It’s not much different than moving production data between clusters, or even between versions of Elasticsearch. There are well-tested approaches to Elasticsearch data migrations in production, most of which will also facilitate a migration to OpenSearch.

If you have any questions about migrating to OpenSearch on Bonsai, please feel free to reach out at support@bonsai.io or share your thoughts on Twitter: @bonsaisearch.

Find out how we can help you.

Schedule a free consultation to see how we can create a customized plan to meet your search needs.