ES7 查询超过10000条 返回条数错误
es版本:7.10.2
java调用restful接口查询es数据,当es数据量超过10000时,total总是返回10000,代码如下:
JSONObject query = new JSONObject();
JSONObject q = new JSONObject();
q.put("bool", new JSONObject());
query.put("query", q);
query.put("from", 0);
query.put("size", 1);
while (true) {
HttpPost post = new HttpPost(sourceUrl + "/" + indexName + "/_search");
HttpEntity entity = new StringEntity(query.toString());
post.setEntity(entity);
post.addHeader("Authorization", "Basic " + getAuthorization(sourceUsername, sourcePassword));
post.addHeader("Content-Type", "application/json");
HttpResponse response = httpClient.execute(post);
JSONObject ret = JSONObject.parseObject(EntityUtils.toString(response.getEntity()));
int total = ret.getJSONObject("hits").getJSONObject("total").getInteger("value");
//打印total 当数据超过10000条时,total总是10000
}
解决办法:增加查询参数 "track_total_hits"
代码如下:
JSONObject query = new JSONObject();
JSONObject q = new JSONObject();
q.put("bool", new JSONObject());
query.put("query", q);
query.put("from", 0);
query.put("size", 1);
query.put("track_total_hits", true);//增加此行,数据超过10000条时,精准查询条数
while (true) {
HttpPost post = new HttpPost(sourceUrl + "/" + indexName + "/_search");
HttpEntity entity = new StringEntity(query.toString());
post.setEntity(entity);
post.addHeader("Authorization", "Basic " + getAuthorization(sourceUsername, sourcePassword));
post.addHeader("Content-Type", "application/json");
HttpResponse response = httpClient.execute(post);
JSONObject ret = JSONObject.parseObject(EntityUtils.toString(response.getEntity()));
int total = ret.getJSONObject("hits").getJSONObject("total").getInteger("value");
}
简单说明:
Elasticsearch 在 7.0 版本集成了 Lucene 8 并引入这个特性,track_total_hits 参数控制着es在返回查询结果时,如何计算匹配文档的总数。
track_total_hits 默认是false,当查询结果的总命中数超过一定阈值时(默认:10,000)es不再返回精确的命中数,而是返回一个下限值。
track_total_hits 设置为true时,强制es计算所有匹配文档的总数,但是此种方式会导致性能下降。