预排序概述
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index-modules-index-sorting.html
在Elasticsearch中创建新索引时,可以配置如何对每个碎片中的段进行排序。默认情况下,Lucene不应用任何排序。index.sort.*设置定义应使用哪些字段对每个段中的文档进行排序。当查找TopN文档时,默认需要遍历所有文档才可以找到所有的相关文档,但当配置了索引排序后,如果检索排序与索引排序一致,则每个分片上只需要检索前N个文档即可,这样便可以提前结束查询,减少计算和提升性能。
以下示例演示如何在单个字段上定义排序:
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : "date",
"sort.order" : "desc"
}
},
"mappings": {
"_doc": {
"properties": {
"date": {
"type": "date"
}
}
}
}
}
也可以按多个字段对索引进行排序:
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : ["username", "date"],
"sort.order" : ["asc", "desc"]
}
},
"mappings": {
"_doc": {
"properties": {
"username": {
"type": "keyword",
"doc_values": true
},
"date": {
"type": "date"
}
}
}
}
}
您可以使用以下命令搜索最近10个文档:
GET /events/_search
{
"size": 10,
"sort": [
{ "timestamp": "desc" }
]
}
具体例子
新建索引 essort ,根据 clickcount 预排序
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0,
"sort.field": "clickcount",
"sort.order": "desc"
},
"index.write.wait_for_active_shards": 1
},
"mappings": {
"doc": {
"properties": {
"logtime": {
"type": "date"
},
"clickcount": {
"type": "integer"
},
"title": {
"type": "text"
},
"docid": {
"type": "keyword"
},
"desc": {
"type": "text"
}
}
}
}
}
插入记录 n 条
put essort/doc/1003
{
"title": "redis",
"clickcount": 150,
"docid": "1003",
"desc": "电脑",
"logtime": "2018-12-24T08:12:12Z"
}
.....
put essort/doc/2003
{
"title": "redis",
"clickcount": 950,
"docid": "2003",
"desc": "电脑",
"logtime": "2019-12-24T08:12:12Z"
}
强制合并索引
force merge
读取lucene文件
拷贝ES索引原始lucene 文件
D:\soft\elasticsearch-6.2.4\elasticsearch-6.2.4\data\nodes\0\indices\LG5iHCmeTkqrhJ-bzp3UOA\0\index
到 F:\index 目录
利用lucene api 读取lucene文件内容
public class SearchEsIndex {
private String dir;
public SearchEsIndex(String path) {
this.dir = path;
}
/**
* 获取写入
*
* @return
* @throws IOException
*/
public IndexWriter getWriter() throws IOException {
//写入索引文档的路径
Directory directory = FSDirectory.open(Paths.get(dir));
//中文分词器
Analyzer analyzer = new SmartChineseAnalyzer();
//保存用于创建IndexWriter的所有配置。
IndexWriterConfig iwConfig = new IndexWriterConfig(analyzer);
return new IndexWriter(directory, iwConfig);
}
/**
* 获取读取
*
* @return
* @throws Exception
*/
public IndexReader getReader() throws Exception {
//写入索引文档的路径
Directory directory = FSDirectory.open(Paths.get(dir));
return DirectoryReader.open(directory);
}
/**
* 根据字段和值查询
*
* @param field
* @param q
* @throws Exception
*/
public void searchForField(String field, String q) throws Exception {
IndexReader reader = getReader();
// 建立索引查询器
IndexSearcher is = new IndexSearcher(reader);
//中文分词器(查询分词器要和存储分词器一致)
Analyzer analyzer = new SmartChineseAnalyzer();
// 建立查询解析器
QueryParser parser = new QueryParser(field, analyzer);
// 根据传进来的p查找
Query query = parser.parse(q);
// 计算索引开始时间
long start = System.currentTimeMillis();
// 开始查询
TopDocs hits = is.search(query, 10);
// 计算索引结束时间
long end = System.currentTimeMillis();
System.out.println("匹配 " + q + " ,总共花费" + (end - start) + "毫秒" + "查询到" + hits.totalHits + "个记录");
// 遍历hits.scoreDocs,得到scoreDoc
for (ScoreDoc scoreDoc : hits.scoreDocs) {
Document doc = is.doc(scoreDoc.doc);
System.out.println("docId:" + scoreDoc.doc + "," + doc);
BytesRef source = doc.getBinaryValue("_source");
System.out.println("source:"+source.utf8ToString());
}
// 关闭reader
reader.close();
}
public static void main(String[] args) {
SearchEsIndex searchEsIndex = new SearchEsIndex("F:\\index");
try {
searchEsIndex.searchForField("*", "*");
} catch (Exception e) {
e.printStackTrace();
}
}
}
打印lucene文件数据
可见根据clickcount预排序后,lucene中的文档是根据clickcount降低排列的,docid 为 0 的 clickcount 最大。