查询ES报错:Text fields are not optimised for operations that require per-document field data like aggreg

1、报错信息

查询es的时候,报如下错误:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [cmdb.ci_area] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "alarm_anaysis_2024",
        "node" : "k4ujQE7LSOyuWU_qUkeUqQ",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [cmdb.ci_area] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [cmdb.ci_area] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [cmdb.ci_area] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    }
  },
  "status" : 400
}

2、异常信息解释

以下出自文心一言给出的解释:

  • 在Elasticsearch中,文本字段(text fields)默认是不为聚合(aggregations)和排序(sorting)等需要每个文档字段数据的操作优化的。这些操作在文本字段上默认是禁用的,因为文本字段主要是为了全文搜索而设计的,它们的数据在索引时会被分词并存储为倒排索引(inverted index),这种方式对于快速搜索非常有效,但不适合直接用于聚合或排序。

  • 对于需要进行聚合或排序的字段,Elasticsearch推荐使用关键字字段(keyword fields)。关键字字段的数据不会被分词,而是直接以原始字符串的形式存储,这使得它们非常适合用于聚合和排序操作。

  • 如果你确实需要在文本字段上进行聚合或排序,并且无法将字段类型更改为关键字字段,你可以通过设置fielddata=true来允许Elasticsearch为该字段加载字段数据。但是,这样做有几个重要的注意事项:

  • 内存使用:加载字段数据会显著增加内存使用,因为Elasticsearch需要为每个文档中的该字段值创建一个内存中的数据结构。如果数据量很大,这可能会导致严重的内存压力,甚至可能使Elasticsearch集群变得不稳定。
    性能影响:即使有足够的内存来加载字段数据,查询性能也可能受到显著影响,因为字段数据的访问通常比直接从索引中检索数据要慢。
    数据一致性:字段数据是在查询时动态加载的,因此它可能不总是与磁盘上的索引数据完全同步。这可能会导致在并发更新场景下出现数据一致性问题。
    因此,在设置fielddata=true之前,请务必仔细考虑这些因素,并确保你的Elasticsearch集群有足够的内存来处理额外的负载。如果可能的话,最好是将字段类型更改为关键字字段,或者重新设计你的数据模型和查询策略,以避免在文本字段上进行聚合或排序操作。

2、分析原因

看一下使用的dsl语句

GET alarm_anaysis_*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "cmdb.ci_area": [
              "小米",
              "华为",
              "苹果",
              "oppo"
            ]
          }
        }
        ],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "time_zone": "Asia/Shanghai",
              "format":"yyyy-MM-dd HH:mm:ss",
              "gte": "now-2d",
              "lte": "now"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "series": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "10m",
        "time_zone": "Asia/Shanghai",
        "format":"yyyy-MM-dd HH:mm:ss"
      },
      "aggs": {
        "ci_area": {
          "terms": {
            "field": "cmdb.ci_area",
            "size": 10
          }
        }
      }
    }
  }
}

3、解决办法

在cmdb.ci_area字段后面加上 .keyword,即可解决

GET alarm_anaysis_*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "cmdb.ci_area.keyword": [
              "小米",
              "华为",
              "苹果",
              "oppo"
            ]
          }
        }
        ],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "time_zone": "Asia/Shanghai",
              "format":"yyyy-MM-dd HH:mm:ss",
              "gte": "now-2d",
              "lte": "now"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "series": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "10m",
        "time_zone": "Asia/Shanghai",
        "format":"yyyy-MM-dd HH:mm:ss"
      },
      "aggs": {
        "ci_area": {
          "terms": {
            "field": "cmdb.ci_area.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

你可能感兴趣的:(es,elasticsearch,数据库,大数据,java,java-ee,kibana,elk)