ElasticSearch聚合和OR条件查询

在使用ElasticSearch中经常会遇到统计、查询需求,实现类似sql分组计算、条件查询的语法,ES在这些方面都支持的较不错,用起来也比较方便。笔者就自己开发中用到的Java API举例如下。

分组聚合查询

实现类似select avg(executionTime) from t WHERE executionDate between 'A' AND 'B' group by executionDate ORDER BY executionDate ASC的查询后聚合。ES分组计算的逻辑是先将executionDate分组,然后在每个分组内在进行聚合,在API中是两个聚合aggregation包含的逻辑关系。

  • HTTP请求
    POST请求体
{
	"from":0,
	"size":2,
  "query": {
    "range": {
      "executionDate": {
      	"gte":"2019-01-20",
      	"lte":"2019-01-25"
      }
    }
  },
  "aggregations":{
  	"executionDateGroup":{
  		"terms":{
  			"field":"executionDate"
  		},
  		"aggregations":{
	  		"executionTimeAvg":{
		  		"avg":{
		  			"field":"executionTime"
		  		}
	  		}
  		}
  	}
  	
  }
}

POST返回结果

{
    "took": 1510,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 52,
        "max_score": 1,
        "hits": [
            //....
        ]
    },
    "aggregations": {
        "executionDateGroup": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 2,
            "buckets": [
                {
                    "key": 1548425669000,
                    "key_as_string": "2019-01-25 14:14:29",
                    "doc_count": 6,
                    "executionTimeAvg": {
                        "value": 72.33333333333333
                    }
                },
                {
                    "key": 1548425670000,
                    "key_as_string": "2019-01-25 14:14:30",
                    "doc_count": 6,
                    "executionTimeAvg": {
                        "value": 67.66666666666667
                    }
                },
                //.....
            ]
        }
    }
}

JAVA代码API其实与上面HTTP请求一样封装和获取数据,整体思路变化不大。
代码示例

    //1.先对executionDate分组,取名executionDateGroup,即实现group by executionDate,如果不设置size,则聚合返回默认10组数据
    TermsAggregationBuilder groupTerms = AggregationBuilders.terms("executionDateGroup").field("executionDate").size(Integer.MAX_VALUE);
    //设置排序 true为正序、flase为倒序,实现ORDER BY executionDate ASC
    groupTerms.order(BucketOrder.key(true));
    
    //2.聚合avg(executionTime),取名executionTimeAvg
    AvgAggregationBuilder timeAvg = AggregationBuilders.avg("executionTimeAvg").field("executionTime");
    
    //3.两个aggregation父子关系
    groupTerms.subAggregation(timeAvg);
    
    //查询条件
    BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
    RangeQueryBuilder rangeDateQuery = QueryBuilders.rangeQuery("executionDate").gte(DateFormatUtils.format(fromDate, "yyyy-MM-dd HH:mm:ss"))
            .lte(DateFormatUtils.format(toDate, "yyyy-MM-dd HH:mm:ss"));
    queryBuilder.must(rangeDateQuery);
    //如果方法名不为空,则添加查询条件
    if (StringUtils.isNotBlank(condition.getMethodName())) {
        queryBuilder.must(new TermQueryBuilder("methodName", condition.getMethodName()));
    }
    SearchResponse searchRes = esClient.prepareSearch(CommonConstants.EHR_EXCUTION_AOP_LOG_INDEX)
            .setTypes(CommonConstants.EHR_EXCUTION_AOP_LOG_TYPE)
            .setQuery(queryBuilder)
            .addAggregation(groupTerms)  //设置聚合
            .addSort("executionDate", SortOrder.ASC)
            .get();
    
    // 获取结果,提取结果的顺序与aggregation组合关系一致,先提取外层executionDateGroup,再提取多个分组的子层executionTimeAvg
    Aggregation executionDateGroup = searchRes.getAggregations().get("executionDateGroup");
    Terms timeAvgTerms = null;
    if (executionDateGroup instanceof Terms) {
        //外层aggregation
        timeAvgTerms = (Terms) executionDateGroup;
        List<? extends Terms.Bucket> buckets = timeAvgTerms.getBuckets();
        for (Terms.Bucket elem : buckets) {
            //子级aggregation
            InternalAvg executionTimeAvg = (InternalAvg) elem.getAggregations().get("executionTimeAvg");
            ExecutionInfoDto executionInfoDto = new ExecutionInfoDto();
            executionInfoDto.setExecutionDate(elem.getKeyAsString());
            executionInfoDto.setExecutionTime((int)executionTimeAvg.getValue());
            executionInfoDtoList.add(executionInfoDto);
        }
    }

OR条件查询

ES条件查询中有MUST/MUSTNOT/SHOULD逻辑,其中MUST/MUSTNOT与sql中的AND/NOT理解和用法基本一致,但SHOULD则与sql中的OR有些不一样,在ES中如果要表示OR查询,则需要配合MUST一起使用,即MUST(SHOULD A, SHOULD B),表示A OR B
HTTP POST请求体

{
    "query": {
        "bool": {
            "must": {
                //or条件组装
                "bool" : { 
                    "should": [
                        { "match": { "about": "music" }},
                        { "match": { "about": "climb" }} ] 
                }
            },
            "must": {
                "match": { "first_nale": "John" }
            },
            "must_not": {
                "match": {"last_name": "Smith" }
            }
        }
    }
}

JAVA代码示例

    BoolQueryBuilder pinyinQuery = QueryBuilders.boolQuery();
    if (!keyword.contains("-")) {
        pinyinQuery.should(QueryBuilders.matchQuery("userName.pinyin", pinyin));
    }
    pinyinQuery.should(QueryBuilders.termQuery("userId", keyword));
          
    BoolQueryBuilder queryEs = QueryBuilders.boolQuery()
            .must(pinyinQuery)
            .must(new TermQueryBuilder("officeStatus", "1"));
    // 构造highlight
    HighlightBuilder hiBuilder= new HighlightBuilder();
    hiBuilder.preTags("

").postTags("

"
).field("userName.pinyin"); SearchResponse scrollRes = client.prepareSearch(ES_INDEX_NAME) .setTypes(ES_INDEX_USER_TYPE) .setQuery(queryEs) .highlighter(hiBuilder) .setScroll("10s") .setSize(1000) .get();

你可能感兴趣的:(ElasticSearch聚合和OR条件查询)