Dashboard Metrics

A breakdown of Metrics in the Bonsai App: navigating metrics, metric utilities and metrics overview.
Last updated
June 16, 2023

Bonsai cluster dashboard’s Metrics is the place to troubleshoot cluster traffic issues and view performance metrics. This article will cover:

  • Navigating to Metrics
  • Metrics Utilities
  • Metrics Overview

Navigating to Metrics

Metrics is located in each cluster’s dashboard. Log into Bonsai, click on your cluster, and click on Metrics within the left sidebar:

Metrics Utilities

Time Window Selector

Use this selector to choose between four window sizes for metrics in the:

  • (1h) last 1 hour
  • (1d) last 24 hours
  • (7d) last 7 days
  • (28d) last 28 days

Time Scrubber

Click the left and right arrows to go back or forth in time within the selected window size.

UTC and Local Timezone Toggle

Click on the timezone to toggle between displaying the graph timestamps in UTC time or your local browser timezone.


You can drill down to smaller windows of time on any graph by clicking and dragging to select a time range.

Metrics Overview

More information doesn’t necessarily mean more clarity. When something happens to your traffic and cluster responses, it’s important to know how to see your metrics and draw conclusions.

We’ll cover what each graph displays and some examples of what they will look like given certain use cases (such as periods of high-traffic, or clusters in a normal state compared to ones that are experiencing downtime). We’ll start with the most information-dense graph: the Requests heat map.

Requests (Counts & Duration) heat map

This graph reveals how fast requests are. Each column in the graph represents a “slice” of time. Each row, or “bucket”, in the slice represents duration speed. The "hotter" a bucket is colored, the more requests there are in that bucket. To further help visualize the difference in the quantity of requests for each bucket, every slice of time can be viewed as a histogram on hover.

Example 1

This heat map displays a cluster with consistent traffic with most requests in the 200-300ms range to complete.

Example 2

This cluster has light, sporadic traffic. It’s important to note that the "heat" color of every bucket is determined relative to the other data in the graph - so a side-by-side comparison of two request heat maps using color won’t be accurate.

Request Counts

This graph shows the number of requests handled by the cluster at a given time.

Request Duration Percentiles

This graph, similar to the Requests heat map, shows a distribution of request speed based on 3 percentiles of the requests in that time slice: p50 (50%), p95 (95%), and p99 (99%). This is helpful in determining where the bulk of your requests sit in terms of speed and how slow the outliers are.

Proxy Queue time

Proxy Queue time is the total amount of time requests were queued (or paused) at our load balancing layer. Queue time is ideally 0; however, in the event that you send many requests in parallel, our load balancer will queue up requests while waiting for executing requests to finish. This is part of our Quality of Service layer.


Concurrency shows the number of requests that are happening at the same time. Since clusters are limited on concurrency, this can be an important one to keep an eye on. When you reach your plan’s max concurrency, you will notice queue time start to consistently increase.


This graph shows the amount of data crossing the network - going into the cluster (shown in green) and coming from the cluster (in blue).

We expect most bandwidth graphs to look something like the graph below — a relatively higher count of "From Client" data (read requests) compared to "To Client" data (write or indexing requests).

The relationship between green to blue bars in this graph really depends on your use-case. A staging cluster, for example, might see a larger ratio of Write:Read data. It’s important to note that this graph deals exclusively in data - a high-traffic cluster will probably see a lot of data coming “From” the cluster, but a low-traffic cluster with very complicated queries and large request bodies will also have a larger “From Client” data than would otherwise be expected. Therefore, it’s helpful to look at request counts to get a feeling for the average "size" of a request.

Response Codes

This graph can do two things:

  • It will confirm that responses are successful: 2xx responses. This means that everything is moving along well and requests are formed correctly.
  • In the less positive case, it can be a debugging tool that can help figure out where any buggy behavior is coming from. In general, 4xx requests are the result of a malformed query from your app or some client, whilst a 5xx request is from our end.

It’s important to note while reading this graph, that 5xx requests don’t necessarily mean that your cluster is down. A common situation on multitenant plans is a cluster that’s getting throttled by a noisy neighbor who’s taking up a lot of resources on the server. This can interrupt some(but not all) normal behavior on your cluster, resulting in a mix of 2xx and 5xx requests.

Tolerance for a few 5xx requests every now and then should be expected with any cloud service. We’re committed to getting all production clusters a 99.99% uptime (i.e. expected 0.001% 5xx responses), and we often have a track record of four 9’s and higher.

We have a lot of people that are very sensitive to 5xx requests. In these cases, it’s usually best to be on a higher plan or a single tenant plan. Reach out to us at if this is something your team needs.

Business and Enterprise Plans - Additional Metrics

Clusters running on Business and Enterprise grade clusters will have access to some additional metrics. If you would like to get access to these metrics for your cluster, please reach out and our team will walk you through the process.

System Load

System load is the average number of processes waiting on a resource within a given period of time. This is often reported in 1, 5 and 15 minute windows. The Bonsai dashboard shows the system load average over the past minute.

It is helpful to think of system load as how saturated a node is with tasks; as long as the node's load average is lower than the number of its available CPUs, the node is able to handle all of its work without getting backed up. If the load average is larger than the number of its available CPUs, that means that some tasks are being scheduled. When tasks are delayed like this, performance suffers, and the performance impact is correlated to how high the load average gets.

Elasticsearch Queue

Elasticsearch utilizes a number of thread pools for handling various tasks. These pools are also backed by a queue, so that if an tasks is created and a thread is not available to execute it, the task is queued until a thread becomes available. This metric shows the total number of tasks sitting in an Elasticsearch queue.

It's important to note that this is not the same as a request queue. A single request can result in multiple tasks being created within Elasticsearch. It is also important to note that these queues have a finite length. If the queue is full and an additional task is created, Elasticsearch will reject it with a message like "rejected execution (queue capacity 50)" and an HTTP 429 response.

Elasticsearch Bulk

This metric shows the number of _bulk requests that have been processed over time. Bulk requests are an efficient way to insert or update data in your cluster. Naturally, your payload sizes need to be more than 1-2 documents in order to get the benefits of bulk updates. Usually batches of 50-500 are ideal.

Elasticsearch Search

This metric shows the number of search requests that have been processed by Elasticsearch. It's important to distinguish between user searches and shard searches. A user search is performed by your application (often in response to some user action or search in the app), and may translate into multiple shard searches. This is true if your indices have multiple primary shards, or if you're searching across multiple indices. A query for the top X results will be passed to all relevant shards; each shard will perform the search and return the top X results. The coordinating node is then responsible for collating those results, sorting, and returning the top X.

Elasticsearch does utilize a thread pool specifically for searches, so depending on the types of searches you're running, and the volume of search traffic, it's possible to get a message like "rejected execution (queue capacity 50)" and an HTTP 429 response.

JVM Heap

Elasticsearch is a Java-based search engine that runs in the Java Virtual Machine (JVM). The JVM has a special area of memory where objects are stored, called "heap space." This space is periodically garbage-collected, meaning objects that are no longer in use are destroyed to free up space in memory.

This metric shows the percentage of heap space that is currently occupied by Elasticsearch's objects and arrays. Lower is ideal, because high heap usage generally means more frequent, and longer-lasting garbage collection pauses, which manifests as slower performance and higher latency.

JVM Young GC Time

This metric shows the amount of time in milliseconds that the JVM spent reclaiming memory from new or short-lived objects. This type of garbage collection is expected and shouldn't lead to HTTP 503 or 504 errors. However, if it is chronic and frequent, it may lead to slow performance.

JVM Old GC Time

This metrics shows the amount of time in milliseconds the JVM spent reclaiming memory from long-surviving objects. The JVM periodically pauses the application so it can free up heap space, which means some operations are stopped for a period of time. This can result in perceived slow response times, and in some extreme cases can lead to system restarts and HTTP 503 and 504 responses.


This metric shows the percentage of the time that the CPU(s) waited on IO operations. This means that an operation requested IO (like reading or writing to disk) and then had to wait for the system to complete the request. A certain amount of IOWait is expected for any IO operation, and usually it's on the order of nanoseconds. This isn't indicative of a problem on its own.

Excessive wait times are problematic though. It is usually correlated to high system load and a high volume of updates. Sometimes that can be addressed through hardware scaling or better throttling of updates. In rare cases it can indicate a problem with the hardware itself, like an SSD drive failing.

CPU User

This metric shows the percentage of the time that the CPU(s) were executing code in user space. The user space is where all code runs, outside of the operating system's kernel. On Bonsai, this space is primarily dedicated to Elasticsearch, so the metric is roughly the amount of time the CPU spent processing instructions by the Elasticsearch code.

The metric can vary widely between clusters, based on application, hardware and use case. There is not necessarily an ideal value or range for this metric. However, large spikes or long periods of high processing times can manifest as poor performance. It often indicates that the hardware is not able to keep up with the demands of the application, although it can sometimes indicate a problem with the hardware itself.

View code snippet
Close code snippet