elasticsearch索引预排序

预排序概述

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index-modules-index-sorting.html

在Elasticsearch中创建新索引时,可以配置如何对每个碎片中的段进行排序。默认情况下,Lucene不应用任何排序。index.sort.*设置定义应使用哪些字段对每个段中的文档进行排序。当查找TopN文档时,默认需要遍历所有文档才可以找到所有的相关文档,但当配置了索引排序后,如果检索排序与索引排序一致,则每个分片上只需要检索前N个文档即可,这样便可以提前结束查询,减少计算和提升性能。

以下示例演示如何在单个字段上定义排序:

PUT twitter
{
    "settings" : {
        "index" : {
            "sort.field" : "date", 
            "sort.order" : "desc" 
        }
    },
    "mappings": {
        "_doc": {
            "properties": {
                "date": {
                    "type": "date"
                }
            }
        }
    }
}

也可以按多个字段对索引进行排序:

PUT twitter
{
    "settings" : {
        "index" : {
            "sort.field" : ["username", "date"], 
            "sort.order" : ["asc", "desc"] 
        }
    },
    "mappings": {
        "_doc": {
            "properties": {
                "username": {
                    "type": "keyword",
                    "doc_values": true
                },
                "date": {
                    "type": "date"
                }
            }
        }
    }
}

您可以使用以下命令搜索最近10个文档:

GET /events/_search
{
    "size": 10,
    "sort": [
        { "timestamp": "desc" }
    ]
}

具体例子

新建索引 essort ,根据 clickcount 预排序

{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "sort.field": "clickcount",
      "sort.order": "desc"
    },
    "index.write.wait_for_active_shards": 1
  },
  "mappings": {
    "doc": {
      "properties": {
        "logtime": {
          "type": "date"
        },
        "clickcount": {
          "type": "integer"
        },
        "title": {
          "type": "text"
        },
        "docid": {
          "type": "keyword"
        },
        "desc": {
          "type": "text"
        }
      }
    }
  }
}

插入记录 n 条

put essort/doc/1003

{
  "title": "redis",
  "clickcount": 150,
  "docid": "1003",
  "desc": "电脑",
  "logtime": "2018-12-24T08:12:12Z"
}

.....

put essort/doc/2003

{
  "title": "redis",
  "clickcount": 950,
  "docid": "2003",
  "desc": "电脑",
  "logtime": "2019-12-24T08:12:12Z"
}

强制合并索引 

force merge

读取lucene文件

拷贝ES索引原始lucene 文件 

D:\soft\elasticsearch-6.2.4\elasticsearch-6.2.4\data\nodes\0\indices\LG5iHCmeTkqrhJ-bzp3UOA\0\index

到 F:\index 目录

elasticsearch索引预排序_第1张图片

利用lucene api 读取lucene文件内容

public class SearchEsIndex {

    private String dir;

    public SearchEsIndex(String path) {
        this.dir = path;
    }

    /**
     * 获取写入
     *
     * @return
     * @throws IOException
     */
    public IndexWriter getWriter() throws IOException {
        //写入索引文档的路径
        Directory directory = FSDirectory.open(Paths.get(dir));
        //中文分词器
        Analyzer analyzer = new SmartChineseAnalyzer();
        //保存用于创建IndexWriter的所有配置。
        IndexWriterConfig iwConfig = new IndexWriterConfig(analyzer);
        return new IndexWriter(directory, iwConfig);
    }

    /**
     * 获取读取
     *
     * @return
     * @throws Exception
     */
    public IndexReader getReader() throws Exception {
        //写入索引文档的路径
        Directory directory = FSDirectory.open(Paths.get(dir));
        return DirectoryReader.open(directory);
    }


    /**
     * 根据字段和值查询
     *
     * @param field
     * @param q
     * @throws Exception
     */
    public void searchForField(String field, String q) throws Exception {
        IndexReader reader = getReader();
        // 建立索引查询器
        IndexSearcher is = new IndexSearcher(reader);
        //中文分词器(查询分词器要和存储分词器一致)
        Analyzer analyzer = new SmartChineseAnalyzer();
        // 建立查询解析器
        QueryParser parser = new QueryParser(field, analyzer);
        // 根据传进来的p查找
        Query query = parser.parse(q);
        // 计算索引开始时间
        long start = System.currentTimeMillis();
        // 开始查询
        TopDocs hits = is.search(query, 10);
        // 计算索引结束时间
        long end = System.currentTimeMillis();

        System.out.println("匹配 " + q + " ,总共花费" + (end - start) + "毫秒" + "查询到" + hits.totalHits + "个记录");

        // 遍历hits.scoreDocs,得到scoreDoc
        for (ScoreDoc scoreDoc : hits.scoreDocs) {
            Document doc = is.doc(scoreDoc.doc);
            System.out.println("docId:" + scoreDoc.doc + "," + doc);
            BytesRef source = doc.getBinaryValue("_source");
            System.out.println("source:"+source.utf8ToString());
        }

        // 关闭reader
        reader.close();
    }

    public static void main(String[] args) {
        SearchEsIndex searchEsIndex = new SearchEsIndex("F:\\index");
        try {
            searchEsIndex.searchForField("*", "*");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

打印lucene文件数据

elasticsearch索引预排序_第2张图片

可见根据clickcount预排序后,lucene中的文档是根据clickcount降低排列的,docid 为 0 的 clickcount 最大。

你可能感兴趣的:(elasticsearch)