March was a hard month for our availability and uptime. We suffered from two major incidents in March, between them causing multiple hours of degraded performance, and four full cluster restarts, contributing nearly 20 minutes of hard downtime each.
Our March uptime, as reported by Pingdom health checks, was 99.73%. That’s the second-lowest in our history, and the worst since launching into production in January.
We are humbled, both by the scope of responsibility of managing infrastructure, as well as the trust and support shown by our customers. Thank you for bearing with us as we redouble our efforts to build the best possible hosted Elasticsearch service for you.
For more information about the outage, read our previous post-mortem blog post detailing our March 4th outage. We suffered a similar incident again on March 9th, but have since managed to mitigate most of the underlying causes.