All production Bonsai Clusters are deployed to minimum of three nodes for redundancy and to prevent stalemates in leadership election. Each node in the cluster will be deployed to a separate AWS Availability Zone, giving us data center isolation as well.
When a Bonsai cluster does experience a node loss, Elasticsearch and OpenSearch will automatically reroute the primary and replica shards to machines that are up and running. In the background, AWS Auto Scaling Groups will immediately begin spinning up the replacement instance that will auto-bootstrap into your configured Elasticsearch or OpenSearch configuration and version. Once the node has successfully provisioned, it will join the cluster, and then Elasticsearch or OpenSearch will offload the relocated shards back to the empty machine.
An event like this is handled as a Severity 1 incident.
A Bonsai cluster that experiences a complete loss of two AWS data centers does represent downtime for a cluster until the primary shards are restored on the last remaining node. To mitigate this downtime on an Enterprise cluster, we can discuss a setup that includes multi-region deployment.