Categories

When a single node on a Bonsai cluster falters for some reason, what is the likely impact to availability of the system and what are the possible recovery steps?

All production Bonsai Clusters are deployed to minimum of three nodes for redundancy and to prevent stalemates in leadership election.
Last updated
June 17, 2023

All production Bonsai Clusters are deployed to minimum of three nodes for redundancy and to prevent stalemates in leadership election. Each node in the cluster will be deployed to a separate AWS Availability Zone, giving us data center isolation as well.

A Bonsai cluster could experience a complete loss of one AWS data center, and the cluster will still continue to operate. This makes Bonsai clusters extremely fault-tolerant.

When a Bonsai cluster does experience a node loss, Elasticsearch and OpenSearch will automatically reroute the primary and replica shards to machines that are up and running. In the background, AWS Auto Scaling Groups will immediately begin spinning up the replacement instance that will auto-bootstrap into your configured Elasticsearch or OpenSearch configuration and version. Once the node has successfully provisioned, it will join the cluster, and then Elasticsearch or OpenSearch will offload the relocated shards back to the empty machine.

An event like this is handled as a Severity 1 incident.

View code snippet
Close code snippet