【elasticsearch】20、function score query&term&&phrase suggester&自动补全及上下文提示

综合排序:function score query优化算分

算分和排序

  • elasticsearch默认会议文档的相关度算分进行排序
  • 可以通过制定一个或者多个字段进行排序
  • 使用县官渡算分(_score)排序,不能满足某些特定条件
    • 无法针对相关度,堆排序实现更多的控制

function score query

  • function score query
    • 可以在查询结束后,对每一个匹配的文档进行一系列的重新算分,根据新生成的分数进行排序
  • 提供了几种默认的计算分值的函数
    • weight:为每一个文档设置一个简单而不被规范化的权重
    • field value factor:使用该数值来修改_score,例如将“热度”和“点赞数”作为算分的参考因素
    • random score:为每一个用户使用一个不同的,随机算分的结果
    • 衰减函数:以某个字段的值为标准,距离某个值越近,得分越高
    • script score:自定义脚本完全控制所需逻辑
DELETE blogs
PUT /blogs/_doc/1
{
  "title":   "About popularity",
  "content": "In this post we will talk about...",
  "votes":   0
}

PUT /blogs/_doc/2
{
  "title":   "About popularity",
  "content": "In this post we will talk about...",
  "votes":   100
}

PUT /blogs/_doc/3
{
  "title":   "About popularity",
  "content": "In this post we will talk about...",
  "votes":   1000000
}


POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field": "votes"
      }
    }
  }
}

POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p"
      }
    }
  }
}


POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p" ,
        "factor": 0.1
      }
    }
  }
}


POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p" ,
        "factor": 0.1
      },
      "boost_mode": "sum",
      "max_boost": 3
    }
  }
}

POST /blogs/_search
{
  "query": {
    "function_score": {
      "random_score": {
        "seed": 911119
      }
    }
  }
}

term&phrase suggester

什么是搜索建议

  • 现代搜索引擎,一般会提供suggest as you type的功能
  • 帮助用户在输入搜索的过程中,进行自动补全或者纠错。通过协助用户输入更加精准的关键词,提高后续搜索阶段文档匹配的程度
  • 在google上搜索,一开始会自动补全,当输入到一定长度,如因为单词拼写错误无法补全,就会开始提示相似的词或者句子

elasticsearch suggester api

  • 搜索引擎中类似的功能,在elasticsearch中是通过suggester api实现的
  • 原理:将输入的文本分解为token,然后再索引的字典里查找相似的term并返回
  • 根据不同的使用场景,elasticsearch设计了4种类别的suggesters
    • term&suggester
    • complete&context suggester

term suggester

  • suggester是一种特殊类型的搜索。“text”里是条用的时候提供的文本,通常来自于用户界面上用户输入的内容
  • 用户输入的“lucen”是一个错误的拼写
  • 会到指定的“body”上搜索,当无法搜索到结果时,建议返回的值


    image.png

term suggester - missing mode

  • 搜索“lucen rock”
    • 每个建议都包含了一个算分,相似性是通过levenshtein edit distance的算法实现的。核心思想就是一个词改动多少个字符就可以和灵台一个词一致。提供了很多可选参数来控制相似性的模糊程度。例如“max_edits”
  • 几种suggestion mode
    • missing - 如索引中已经存在,就不提供建议
    • popular - 推荐出现频率更加高的词
    • always - 无论是否存在,都提供建议


      image.png

phrase suggester

  • phrase suggester在term suggester上增加了一些额外的逻辑
  • 一些参数
    • suggest mode:missing,popular,always
    • max errors:最多可以拼错的terms数
    • confidence:限制返回的结果数,默认为1
image.png
DELETE articles
PUT articles
{
  "mappings": {
    "properties": {
      "title_completion":{
        "type": "completion"
      }
    }
  }
}

POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }


POST articles/_search?pretty
{
  "size": 0,
  "suggest": {
    "article-suggester": {
      "prefix": "elk ",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}

DELETE articles

POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{  "body": "elasticsearch is rock solid"}


POST _analyze
{
  "analyzer": "standard",
  "text": ["Elk stack  rocks rock"]
}

POST /articles/_search
{
  "size": 1,
  "query": {
    "match": {
      "body": "lucen rock"
    }
  },
  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "missing",
        "field": "body"
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "popular",
        "field": "body"
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "always",
        "field": "body",
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen hocks",
      "term": {
        "suggest_mode": "always",
        "field": "body",
        "prefix_length":0,
        "sort": "frequency"
      }
    }
  }
}


POST /articles/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "lucne and elasticsear rock hello world ",
      "phrase": {
        "field": "body",
        "max_errors":2,
        "confidence":0,
        "direct_generator":[{
          "field":"body",
          "suggest_mode":"always"
        }],
        "highlight": {
          "pre_tag": "",
          "post_tag": ""
        }
      }
    }
  }
}

自动补全与机遇上下文的提示

the completion suggester

  • completion suggester 提供了自动完成auto complete的功能,用户每输入一个字符,就需要即时发送一个查询请求到后端查找匹配项
  • 对性能要求比较苛刻。elasticsearch采用了不同的数据结构,并非通过倒排索引来完成。而是将analuze的数据编码成fst和索引一起存放。fst会被es整个加载进内存,速度很快。
  • fst只能用户前缀查找

使用completion suggester的一些步骤

  • 定义mapping,使用“completion”type
  • 索引数据
  • 运行“suggest”查询,得到搜索建议


    image.png

什么是context suggester

  • completion suggester的拓展
  • 可以在搜索中加入更多的上下文信息,例如:“star”
    • 咖啡相关:建议“starbucks”
    • 电影相关:“star wars”

实现context suggester

  • 可以定义两种类型的context
    • category - 任意的字符串
    • geo - 地理位置信息
  • 实现contest suggester的具体步骤
    • 定制一个mapping
    • 索引数据,并且为每个文档加入context信息
    • 结合context进行suggestion查询

精准度和召回率

  • 精准度
    • completion > phrase > term
  • 召回率
    • term > phrase > completion
  • 性能
    • completion > phrase > term
DELETE articles
PUT articles
{
  "mappings": {
    "properties": {
      "title_completion":{
        "type": "completion"
      }
    }
  }
}

POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }


POST articles/_search?pretty
{
  "size": 0,
  "suggest": {
    "article-suggester": {
      "prefix": "elk ",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}


DELETE comments
PUT comments
PUT comments/_mapping
{
  "properties": {
    "comment_autocomplete":{
      "type": "completion",
      "contexts":[{
        "type":"category",
        "name":"comment_category"
      }]
    }
  }
}

POST comments/_doc
{
  "comment":"I love the star war movies",
  "comment_autocomplete":{
    "input":["star wars"],
    "contexts":{
      "comment_category":"movies"
    }
  }
}

POST comments/_doc
{
  "comment":"Where can I find a Starbucks",
  "comment_autocomplete":{
    "input":["starbucks"],
    "contexts":{
      "comment_category":"coffee"
    }
  }
}


POST comments/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "prefix": "sta",
      "completion":{
        "field":"comment_autocomplete",
        "contexts":{
          "comment_category":"coffee"
        }
      }
    }
  }
}

你可能感兴趣的:(【elasticsearch】20、function score query&term&&phrase suggester&自动补全及上下文提示)