马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?立即注册
x
ES 语句
整体数据
- GET wkl_test/_search
- {
- "query": {
- "match_all": {}
- }
- }
复制代码 结果:
- {
- "took" : 123,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 5,
- "relation" : "eq"
- },
- "max_score" : 1.0,
- "hits" : [
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "aK0tFpABTkLj5j4c34pE",
- "_score" : 1.0,
- "_source" : {
- "name" : "zhangsan",
- "aa" : 1
- }
- },
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "aa0uFpABTkLj5j4cFYrJ",
- "_score" : 1.0,
- "_source" : {
- "name" : "lisi",
- "aa" : 2
- }
- },
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "aq0uFpABTkLj5j4cKYqF",
- "_score" : 1.0,
- "_source" : {
- "name" : "wangwu",
- "aa" : 2
- }
- },
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "a60uFpABTkLj5j4c2IoF",
- "_score" : 1.0,
- "_source" : {
- "name" : "maliu",
- "aa" : 2
- }
- },
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "bK1IFpABTkLj5j4cqYop",
- "_score" : 1.0,
- "_source" : {
- "name" : "gouqi",
- "aa" : 3
- }
- }
- ]
- }
- }
复制代码 1:collapse折叠功能- 查询去重后的数据列表(ES5.3之后支持)
- 保举原因:性能高,占内存小
- 注意:使用此方式去重时,不会去撤消不存在去重字段的数据。
- 去重字段只能是数字long范例或keyword。
- Field Collapsing(字段折叠)不能与scroll、rescore以及search after 联合使用。
- GET wkl_test/_search
- {
- "query": {
- "match_all": {}
- },
- "collapse": {
- "field": "aa"
- }
- }
复制代码 结果:hits 中total虽然=5,但是只返回了去重后的 3 条数据
- {
- "took" : 2,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 5,
- "relation" : "eq"
- },
- "max_score" : null,
- "hits" : [
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "aK0tFpABTkLj5j4c34pE",
- "_score" : 1.0,
- "_source" : {
- "name" : "zhangsan",
- "aa" : 1
- },
- "fields" : {
- "aa" : [
- 1
- ]
- }
- },
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "aa0uFpABTkLj5j4cFYrJ",
- "_score" : 1.0,
- "_source" : {
- "name" : "lisi",
- "aa" : 2
- },
- "fields" : {
- "aa" : [
- 2
- ]
- }
- },
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "bK1IFpABTkLj5j4cqYop",
- "_score" : 1.0,
- "_source" : {
- "name" : "gouqi",
- "aa" : 3
- },
- "fields" : {
- "aa" : [
- 3
- ]
- }
- }
- ]
- }
- }
复制代码 2:cardinality - 查询去重后的数据总数
- 聚合+cardinality:即去重计算,类似sql中 count(distinct),先去重再求和
- 注意:使用此方式统计去重后的数量时,会去撤消不存在去重字段的数据。
- GET wkl_test/_search
- {
- "query": {
- "match_all": {}
- },
- "size": 0,
- "aggs": {
- "distinct_count": {
- "cardinality": {
- "field": "aa"
- }
- }
- }
- }
复制代码 结果:distinct_count = 3,说明去重后有3个,既aggregations聚合下,返回了按名字查询去重后的结果数,但是只有去重后的条数,没有具体的数据。
- {
- "took" : 2,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 5,
- "relation" : "eq"
- },
- "max_score" : null,
- "hits" : [ ]
- },
- "aggregations" : {
- "distinct_count" : {
- "value" : 3
- }
- }
- }
复制代码 3:整体语句
- 使用collapse 折叠查询后,虽然返回了去重后的数据,但是total 照旧全部的数据量
- 使用 cardinality 聚合 ,虽然在aggs 聚合结果中返回了正确的数据量,但是hits中照旧全部的数据
- 所以我们必要 两个综合使用,如下:
- GET wkl_test/_search
- {
- "query": {
- "match_all": {}
- },
- "collapse": {
- "field": "aa"
- },
- "aggs": {
- "distinct_count": {
- "cardinality": {
- "field": "aa"
- }
- }
- }
- }
复制代码 结果:
- {
- "took" : 3,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 5,
- "relation" : "eq"
- },
- "max_score" : null,
- "hits" : [
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "aK0tFpABTkLj5j4c34pE",
- "_score" : 1.0,
- "_source" : {
- "name" : "zhangsan",
- "aa" : 1
- },
- "fields" : {
- "aa" : [
- 1
- ]
- }
- },
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "aa0uFpABTkLj5j4cFYrJ",
- "_score" : 1.0,
- "_source" : {
- "name" : "lisi",
- "aa" : 2
- },
- "fields" : {
- "aa" : [
- 2
- ]
- }
- },
- {
- "_index" : "wkl_test",
- "_type" : "_doc",
- "_id" : "bK1IFpABTkLj5j4cqYop",
- "_score" : 1.0,
- "_source" : {
- "name" : "gouqi",
- "aa" : 3
- },
- "fields" : {
- "aa" : [
- 3
- ]
- }
- }
- ]
- },
- "aggregations" : {
- "distinct_count" : {
- "value" : 3
- }
- }
- }
复制代码 注:我们使用cardinality聚合后的distinct_count 作为去重后的总数,用 collapse 折叠后的列表作为数据结果集
分页使用解释说明:
- 1.hits中total的总条数现实上是去重前的总条数,原数据条数,这里我们知道就行,分页中我们并不使用它。hits中数组的大小刚好等于courseAgg聚合的值,数组中的数据就是去重后的数据。
- 2.aggregations中的courseAgg条数,这个才是去重后的现实条数,也是分页用的总条数。
- 3.from 查询的偏移量,也就是从哪里开始查。
- 4.size 查询条数,一次查几条。
- 接下来,你就可以把它当做一个简单分页查询来用了,传入from和size就ok啦~
JAVA API使用
1:collapse 查询去重的结果集
- // 使用collapse来指定去重的字段,例如"your_distinct_field"
- CollapseBuilder collapseBuilder = new CollapseBuilder("your_distinct_field");
- searchSourceBuilder.collapse(collapseBuilder);
复制代码 2:cardinality - 查询去重后的数据总数
- // 添加一个cardinality聚合来计算去重字段的唯一值数量
- CardinalityAggregationBuilder aggregation = AggregationBuilders
- .cardinality("distinct_count")//这里是聚合结果的字段名
- .field("your_distinct_field")//这里是需要聚合的字段
- .precisionThreshold(40000); // 根据需要调整精度阈值
- searchSourceBuilder.aggregation(aggregation);
复制代码 3:整体使用
- package com.wenge.system.utils;import org.apache.http.HttpHost;import org.elasticsearch.action.search.SearchRequest;import org.elasticsearch.action.search.SearchResponse;import org.elasticsearch.client.RequestOptions;import org.elasticsearch.client.RestClient;import org.elasticsearch.client.RestHighLevelClient;import org.elasticsearch.index.query.QueryBuilders;import org.elasticsearch.search.SearchHit;import org.elasticsearch.search.SearchHits;import org.elasticsearch.search.aggregations.AggregationBuilders;import org.elasticsearch.search.aggregations.metrics.CardinalityAggregationBuilder;import org.elasticsearch.search.aggregations.metrics.ParsedCardinality;import org.elasticsearch.search.builder.SearchSourceBuilder;import org.elasticsearch.search.collapse.CollapseBuilder;import java.io.IOException;import java.util.Map;/** * @author wangkanglu * @version 1.0 * @description * @date 2024-06-17 16:48 */public class TestES { public static void main(String[] args) throws IOException { //创建ES客户端 RestHighLevelClient esClient = new RestHighLevelClient( RestClient.builder(new HttpHost("localhost",9200,"http")) ); try { // 创建一个搜索请求并设置索引名 SearchRequest searchRequest = new SearchRequest("your_index"); // 构建搜索源构建器 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // 设置查询条件,例如匹配全部文档,这里根据业务本身修改 searchSourceBuilder.query(QueryBuilders.matchAllQuery()); // 使用collapse来指定去重的字段,例如"your_distinct_field"
- CollapseBuilder collapseBuilder = new CollapseBuilder("your_distinct_field");
- searchSourceBuilder.collapse(collapseBuilder);
- // 添加一个cardinality聚合来计算去重字段的唯一值数量 CardinalityAggregationBuilder aggregation = AggregationBuilders .cardinality("distinct_count")//这里是聚合结果的字段名 .field("your_distinct_field")//这里是必要聚合的字段 .precisionThreshold(40000); // 根据必要调解精度阈值 searchSourceBuilder.aggregation(aggregation); // 设置搜索源 searchRequest.source(searchSourceBuilder); // 执行搜索 SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = searchResponse.getHits().getHits(); for (SearchHit hit : hits) { Map<String, Object> sourceAsMap = hit.getSourceAsMap(); System.out.println("去重结果: " + sourceAsMap); } // 处置惩罚搜索结果,获取去重数量 ParsedCardinality parsedCardinality = searchResponse.getAggregations().get("distinct_count"); long distinctCount = parsedCardinality.getValue(); System.out.println("去重结果数量:" + distinctCount); } finally { // 关闭client esClient.close(); } }}
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |