Elasticsearch and the IllegalArgumentException (docID must be >= 0)

Rob Sears · April 06, 2015
4 minute read

We sometimes get support tickets from users asking about this error message. They report that some queries, like hotdog, work just fine while others, like hamburger, return an error like IllegalArgumentException[docID must be >= 0 and < maxDoc=... (got docID=2147483647)]. They're confused because it seems to happen randomly and without an underlying cause. They ask: if the index is broken, why do some queries work? If the index is operating normally, why do some queries fail?

The tl;dr answer here is (usually) that the index is operating normally, but that it's populated with bad data. This issue is not a bug per se, has been known for a while now, and should hopefully be fixed in Elasticsearch 2.0+.

To elaborate a bit, users normally see this error when they're using custom scoring scripts to order results. For example, if you have an index of restaurants and want to score results based on the number of reviews each restaurant has, you might have something like this:

curl localhost:9200/restaurants/_search -d '{
    "query": {
        "function_score": {
            "query": {
                 "match_all": {}
            },
            "script_score": {
                "script":"1.0 * log(doc[\"review_count\"].value + 2.718281828)"
            }
        }
    }
}
'

This query uses a script to boost the score of matching documents, based on their number of reviews.

Can you spot the problem? Given our error message which asks for a value that’s greater than or equal to zero, can we find something in here that would be unhappy with a negative number? Here's a hint: think back to your high school algebra classes.

Logarithms are only defined over the domain of positive, non-zero real numbers. Using a zero or negative value for the input is undefined, and results in a NaN response from the computer. The problem with the script above is there's no logic to deal with negative values for review_count.

What happens if there is an edge case in your application where negative values are assigned for the number of reviews a restaurant has received? Elasticsearch will happily update the review_count field with a negative number. Then, during query time, if the document with the negative value in this field is matched by the query, Elasticsearch will try to calculate a score for it using that custom script. It ends up with an undefined number and complains.

But why can't the error message just say that, instead of babbling incoherently? And what's with the whole "got docID=2147483647" thing? I don't even have that many documents!

Elasticsearch doesn't really have a say in the matter, as the error is actually raised by Lucene (in BaseCompositeReader.java). The short explanation here is summarized by a comment deep in the Lucene source code:

NOTE: The values Float.Nan, Float.NEGATIVE_INFINITY and Float.POSITIVE_INFINITY are not valid scores. Certain collectors (eg TopScoreDocCollector) will not properly collect hits with these scores.

When non-numbers and infinities are used as scores, Lucene doesn't have a way to deal with it. Instead, there's a regression that sets the docID of the problem document to 2147483647 (Lucene instantiates docID as an int, which has a maximum value of 2147483647 in Java), which in turn causes the strange error message to be raised. Interestingly, it seems that Lucene imposes some checks to raise an error complaining about invalid scores, but these checks are not present in the Lucene expressions module, which is why the error is raised when using expressions as custom scoring scripts.

To work around this error, at least one of the following has to happen: 1) Elasticsearch needs to include validity checks with custom scoring scripts; 2) Lucene needs to test expressions for invalid scores and raise a proper warning; or 3) users need to be more cognizant of what values they're indexing. Guess which one of these solutions is available today :-)

What can I do about it?

Elasticsearch doesn't currently support unsigned integers (to be fair, neither does Java), so there isn't a direct way to prevent negative numbers from being indexed. Ideally you would want your application to handle the logic of indexing sane values to Elasticsearch. If you see this error coming up in your logs, the first place to do is ensure you're using good values.

Try running a curl command against the cluster and see what happens:

curl "localhost:9200/<index>/_search?pretty" -d '
{
  "filter":{
    "range":{
      "review_count":{
        "lt":0
      }
    }
  }
}'

This search will return any and all documents in your index that have a negative value. This will help identify problem documents, and aid in troubleshooting any regressions on your end.

If you can’t otherwise purge those documents or fix their values, you can always just filter the original search itself to ensure only documents with valid field values are scored.

curl localhost:9200/restaurants/_search -d '{
    "query": {
        "function_score": {
            "query": {
                 "match_all": {}
            },
            "script_score": {
                "script":"1.0 * log(doc[\"review_count\"].value + 2.718281828)"
            },
            "filter":{
              "range":{
                "review_count":{
                  "gte":0
                }
              }
            }
        }
    }
}
'

Got a tricky query? Shoot us an email! We'll help you dig into it and identify a fix. We get questions about funky queries and responses all the time, so maybe we’ve already found an answer to your question.