Elasticsearch 聚合问题解决

数据不全

ES 聚合 - 时区问题

ES 在进行聚合是,对时间进行格式化的时候采用的是东八区的计时方式,导致聚合结果存在遗漏, 解决办法,指定time_zone+08:00

  • 实例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"createTime": {
"from": 1577462400,
"to": 1577548799,
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": false,
"aggregations": {
"enterpriseIdTerms": {
"date_histogram": {
"field": "createTime",
"format": "yyyy-MM-dd",
"interval": "1d",
"offset": 0,
"order": {
"_key": "asc"
},
"time_zone": "+08:00",
"keyed": false,
"min_doc_count": 0
},
"aggregations": {
"callCount": {
"value_count": {
"field": "uniqueId"
}
}
}
}
}
}
  • 对应Java代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// jestClient 客户端
DateHistogramAggregationBuilder dateAggregation = AggregationBuilders.dateHistogram("dateAggregations")
// 聚合字段
.field("createTime")
// 聚合维度
.dateHistogramInterval(DateHistogramInterval.DAY)
// 聚合格式
.format("yyyy-MM-dd")
// 默认为0
.minDocCount(0L)
// 按时间正序
.order(Histogram.Order.KEY_ASC)
// 聚合时区
.timeZone(DateTimeZone.forTimeZone(TimeZone.getTimeZone("GMT+8")))
// 子聚合
.subAggregation(AggregationBuilders.count(StatFieldEnum.CALL_COUNT.getKey()).field(CtiCloudCdrField.UNIQUE_ID));
1
2
3
4
5
6
7
8
9
10
11
12
// 博客提供的java实例,不确定是否是restClient 客户端。
AggregationBuilder dateAggs = AggregationBuilders
.dateHistogram("dateAggs") // 别名
.field("@timestamp") // 指定聚合哪个时间字段
.interval(DateHistogramInterval.DAY) // 按天聚合
.minDocCount(0L) // 默认为0
.order(Histogram.Order.KEY_ASC) // 按时间正序
.timeZone("+08:00") // 指定时区
.subAggregation( // 子聚合
AggregationBuilders
.sum("sumAggs")
.field("tx_count"));

基础聚合

ES 对doc中的字段先计算后聚合

不废话直接上语句

需求:对personName = hero 的数据 value1 进行数据量统计,(不到一百按一百算)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"size": 0,
"query": {
"match": {
"personName": "hero"
}
},
"aggregations": {
"duration": {
"sum": {
"script": "(params._source.value1/100 + (params._source.value1%100!=0?1:0))*100"
}
}
}
}

ES 获取聚合结果的去重总数:

ES 去重计数:cardinality(count(distinct)

针对ES索引统计某个字段上出现的不同值的个数时,可以使用cardinality聚合查询完成:

request
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
GET http://localhost:9200/cdr_202103*/_search
Content-Type: application/json

{
"aggregations": {
"count": {
"cardinality": {
"field": "enterpriseId"
}
},
"count2": {
"terms": {
"field": "enterpriseId"
}
}
}
}

ES nested 字段聚合

ES nested 子聚合

ES 对nested字段的某个属性聚合时,有时候,需要计算 记录数,而不是在nested字段数组中出现的次数时;可以使用reverse_nested
语句实现需求.reverse_nested语句,可以在nested子聚合的前提下,查询上层聚合的数据属性信息.查询nested字段上层的别的属性. nested 子字段聚合时,聚合上层数据

  • 示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
curl -XGET 'localhost:9200/ticket_202001*/_search' -H 'Content-Type:application' -d'
{
"size":0,
"aggregations":{
"record":{
"nested": {
"path": "record"
},
"aggregations":{
"recordNo":{
"terms": {
"field": "record.no",
"size": 10
},
"aggregations": {
"thinkCount":{
"reverse_nested":{},
"aggregations":{
"ids":{
"terms":{
"field":"uniqueId"
}
}
}
}
}
}
}
}
}
}'

ES nested 字段size 获取

  • 示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
curl -XGET 'localhost:9200/ticket_*/_search?pretty' -H 'Content-Type:application/json' -d'
{
"size": 0,
"aggregations": {
"ticketIds": {
"terms": {
"field": "uniqueId"
},
"aggregations": {
"taskCount": {
"sum": {
"script": "params._source.record.size()"
},
"size": 100
}
}
}
}
}'

ES 桶聚合与分页聚合

ES 聚合结果过滤、分页、排序 - bucket_filter(过滤)、bucket_sort(排序、截取)。

存在版本制约,>6.0.0

语句示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"type": "my_record"
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": false,
"aggregations": {
"dateTerm": {
"terms": {
"field": "uniqueId",
"size": 3000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false
},
"aggregations": {
"recordCount": {
"value_count": {
"field": "uniqueId"
}
},
"bucket_filter": {
"bucket_selector": {
"buckets_path": {
"recordCount": "recordCount > _count"
},
"script": "params.recordCount < 100"
}
},
"bucket_sort": {
"bucket_sort": {
"sort": [
{
"recordCount": {
"order": "desc"
}
}
],
"size": 10,
"from": 0
}
}
}
}
}
}

ES 聚合结果分页

技术点:bucket_sort(分页操作)、cardinality(总数计算)

先决条件:

1
2
ES 结构: {city, humanCount}
需求:统计 分页统计每个city的人口情况
  • 分析语句:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
"size": 0,
"query": {
"match_all": {}
},
"aggregations": {
"dateTerm": {
"terms": {
"field": "city",
"size": 3000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false
},
"aggregations": {
"humanCount": {
"value_count": {
"field": "city"
}
},
"bucket_sort": {
"bucket_sort": {
"sort": [
{
"humanCount": {
"order": "desc"
}
}
],
"size": 10,
"from": 0
}
}
}
},
"totalCount": {
"cardinality": {
"filed": "city"
}
}
}
}

参考信息