Good news, everyone! Elasticsearch 0.90.0 was released today. Version 0.90 brings some excellent improvements for Elasticsearch users everywhere, and represents a large step forward toward its eventual 1.0 release.
We are getting ready to upgrade to 0.90.0 very soon. More on our plans in a bit. First, I’d like to give a sneak peek of some of 0.90.0’s biggest improvements, as well as a quick overview on the state of Bonsai itself.
What’s new with Bonsai the business?
It’s been a quiet month on the Bonsai blog. So what have we been up to?
First of all, we’ve been keeping up with you. The last few months have humbled us with very solid growth. We are rapidly approaching 1,000 active accounts, having long since passed 1,000 total (hello, freemium pricing plans!).
This is not our first time building a hosted search service, but it’s still encouraging to see a service bootstrap itself to profitability in the span of a few short months! With all signs pointing up, both for us and for Elasticsearch proper, you can bet that we’ll be around for a while.
The last few weeks have seen us quietly releasing a lot of quiet bug fixes and performance improvements to our systems, as well as support for a couple of Elasticsearch APIs that we initially launched without. Our thanks to customers who helped us identify odd edge case bugs, particularly in places where our account management and multitenancy intersects with the Elasticsearch API!
What’s new in Elasticsearch 0.90?
Lucene is the underlying full-text indexing library used by Elasticsearch, and is itself the industry standard for open source full-text search. Last fall, Lucene released a major upgrade to its version 4.0, bringing with it a huge number of new features and performance improvements.
Lucene 4 offers a number of notable improvements in performance and efficiency, along with a slew of faster and better abstractions for search engine developers to build upon. The bottom line: you get faster search results in a more robust cluster.
And Elasticsearch 0.90 is shipping with the latest Lucene 4.2.1. That’s over six months of new features and improvements building on top of Lucene 4.0, putting all of those powerful new abstractions to work.
Probably the biggest improvement that our users will notice is much better memory usage when loading fielddata for faceting or sorting on a field. Fielddata uses less memory and makes it easier for the garbage collector to do its work, resulting in more stable clusters. This change alone makes it worth upgrading.
Indeed, we see that the most common causes of query slowdowns are due to memory usage in queries with heavy faceting and sorting, as well as their negative impact to garbage collection behavior. We’re particularly happy to see these kinds of improvements coming to Lucene and Elasticsearch.
Here are a few other examples, among many, of the improvements in Lucene 4, via Lucene contributor Mike McCandless:
Before concurrent flushing, whenever IndexWriter needed to flush a new segment, it would stop all indexing threads and hijack one thread to perform the rather compute intensive flush… But with concurrent flushing, each thread freely flushes its own segment even while other threads continue indexing. No more bottleneck!
Fuzzy queries are great for flexibly matching terms which may have spelling mistakes or variations. This is particularly helpful for matching proper nouns — something I can personally appreciate given my last name, “Zadrozny.”
Rather than brute-forcing its way through every term in the index, like it used to, fuzzy queries in Lucene 4 use a pre-generated finite-state automaton to quickly and efficiently calculate Levenshtein distances between words. Be sure to read Mike’s post in its entirety to see one of the many herculean feats of engineering that went into Lucene 4.
Efficiency improvements here also show up in suggesting similarly-spelled terms for spelling correction, and for autocomplete term suggestion.
Elasticsearch 0.90 webinar
There is a lot to unpack in Elasticsearch 0.90, and this Thursday, at 9:00am PST (18:00 CET), Elasticsearch developer Clinton Gormley will be holding a webinar to talk more about them. We’ll be there, and if you use Elasticsearch, you should be too!
Register here for the webinar. (It’s free!)
Scheduled maintenance window
Upgrading from Elasticsearch 0.20 to 0.90 will unfortunately require a full cluster hard shut down and restart. While we are doing everything in our power to minimize the actual impact and effect of this restart, you should expect a brief window of downtime.
We are scheduling our upgrade for next Monday, May 6th, at 16:00 PST / 23:00 UTC.
Will you need to reindex your data?
No, you do not need to reindex.
Fortunately, Lucene is backwards compatible within one major version. This means that Lucene 4 can read the Lucene 3 index format used by Elasticsearch 0.20, with a negligible performance decrease. Upgrading to Lucene 4 happens in the background, over time, as older Lucene 3 segments of your index are merged into the newer-format Lucene 4 segments.
Other changes you need to make to your applications
You should read the [Elasticsearch 0.90 download and changelog page] for full details on bug fixes, new features and one breaking change in 0.90.
We encourage all of our customers to take some time this week to try Elasticsearch 0.90 in their local development environment to ensure that everything is working as expected.
Any other questions? Let us know!
We’re here to help. If you have questions, just let us know with a quick email to firstname.lastname@example.org.