Elasticsearch 聚合问题解决
数据不全
ES 聚合 - 时区问题
ES 在进行聚合是,对时间进行格式化的时候采用的是东八区
的计时方式,导致聚合结果存在遗漏, 解决办法,指定time_zone
为 +08:00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
| { "size": 0, "query": { "bool": { "filter": [ { "range": { "createTime": { "from": 1577462400, "to": 1577548799, "include_lower": true, "include_upper": true, "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "_source": false, "aggregations": { "enterpriseIdTerms": { "date_histogram": { "field": "createTime", "format": "yyyy-MM-dd", "interval": "1d", "offset": 0, "order": { "_key": "asc" }, "time_zone": "+08:00", "keyed": false, "min_doc_count": 0 }, "aggregations": { "callCount": { "value_count": { "field": "uniqueId" } } } } } }
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| // jestClient 客户端 DateHistogramAggregationBuilder dateAggregation = AggregationBuilders.dateHistogram("dateAggregations") // 聚合字段 .field("createTime") // 聚合维度 .dateHistogramInterval(DateHistogramInterval.DAY) // 聚合格式 .format("yyyy-MM-dd") // 默认为0 .minDocCount(0L) // 按时间正序 .order(Histogram.Order.KEY_ASC) // 聚合时区 .timeZone(DateTimeZone.forTimeZone(TimeZone.getTimeZone("GMT+8"))) // 子聚合 .subAggregation(AggregationBuilders.count(StatFieldEnum.CALL_COUNT.getKey()).field(CtiCloudCdrField.UNIQUE_ID));
|
1 2 3 4 5 6 7 8 9 10 11 12
| // 博客提供的java实例,不确定是否是restClient 客户端。 AggregationBuilder dateAggs = AggregationBuilders .dateHistogram("dateAggs") // 别名 .field("@timestamp") // 指定聚合哪个时间字段 .interval(DateHistogramInterval.DAY) // 按天聚合 .minDocCount(0L) // 默认为0 .order(Histogram.Order.KEY_ASC) // 按时间正序 .timeZone("+08:00") // 指定时区 .subAggregation( // 子聚合 AggregationBuilders .sum("sumAggs") .field("tx_count"));
|
基础聚合
ES 对doc中的字段先计算后聚合
不废话直接上语句
需求:对personName = hero
的数据 value1 进行数据量统计,(不到一百按一百算)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| { "size": 0, "query": { "match": { "personName": "hero" } }, "aggregations": { "duration": { "sum": { "script": "(params._source.value1/100 + (params._source.value1%100!=0?1:0))*100" } } } }
|
ES 获取聚合结果的去重总数:
ES 去重计数:cardinality(count(distinct)
)
针对ES索引统计某个字段上出现的不同值的个数时,可以使用cardinality聚合查询完成:
request1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| GET http://localhost:9200/cdr_202103*/_search Content-Type: application/json
{ "aggregations": { "count": { "cardinality": { "field": "enterpriseId" } }, "count2": { "terms": { "field": "enterpriseId" } } } }
|
ES nested 字段聚合
ES nested 子聚合
ES 对nested字段的某个属性聚合时,有时候,需要计算 记录数,而不是在nested字段数组中出现的次数时;可以使用reverse_nested
语句实现需求.reverse_nested语句,可以在nested子聚合的前提下,查询上层聚合的数据属性信息.查询nested字段上层的别的属性. nested 子字段聚合时,聚合上层数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
| curl -XGET 'localhost:9200/ticket_202001*/_search' -H 'Content-Type:application' -d' { "size":0, "aggregations":{ "record":{ "nested": { "path": "record" }, "aggregations":{ "recordNo":{ "terms": { "field": "record.no", "size": 10 }, "aggregations": { "thinkCount":{ "reverse_nested":{}, "aggregations":{ "ids":{ "terms":{ "field":"uniqueId" } } } } } } } } } }'
|
ES nested 字段size 获取
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| curl -XGET 'localhost:9200/ticket_*/_search?pretty' -H 'Content-Type:application/json' -d' { "size": 0, "aggregations": { "ticketIds": { "terms": { "field": "uniqueId" }, "aggregations": { "taskCount": { "sum": { "script": "params._source.record.size()" }, "size": 100 } } } } }'
|
ES 桶聚合与分页聚合
ES 聚合结果过滤、分页、排序 - bucket_filter(过滤)、bucket_sort(排序、截取)。
存在版本制约,>6.0.0
语句示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
| { "size": 0, "query": { "bool": { "filter": [ { "term": { "type": "my_record" } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "_source": false, "aggregations": { "dateTerm": { "terms": { "field": "uniqueId", "size": 3000, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false }, "aggregations": { "recordCount": { "value_count": { "field": "uniqueId" } }, "bucket_filter": { "bucket_selector": { "buckets_path": { "recordCount": "recordCount > _count" }, "script": "params.recordCount < 100" } }, "bucket_sort": { "bucket_sort": { "sort": [ { "recordCount": { "order": "desc" } } ], "size": 10, "from": 0 } } } } } }
|
ES 聚合结果分页
技术点:bucket_sort(分页操作)、cardinality(总数计算)
先决条件:
1 2
| ES 结构: {city, humanCount} 需求:统计 分页统计每个city的人口情况
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| { "size": 0, "query": { "match_all": {} }, "aggregations": { "dateTerm": { "terms": { "field": "city", "size": 3000, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false }, "aggregations": { "humanCount": { "value_count": { "field": "city" } }, "bucket_sort": { "bucket_sort": { "sort": [ { "humanCount": { "order": "desc" } } ], "size": 10, "from": 0 } } } }, "totalCount": { "cardinality": { "filed": "city" } } } }
|
参考信息