es ik 词库添加词语_ElasticSearch学习笔记——ik分词添加词库

前置条件是安装ik分词,请参考

1.在ik分词的config下添加词库文件

~/software/apache/elasticsearch-6.2.4/config/analysis-ik$ ls | grep mydic.dic

mydic.dic

内容为

我给祖国献石油

2.配置词库路径,编辑IKAnalyzer.cfg.xml配置文件,添加新增的词库

3.重启es

4.测试

data.json

{

"analyzer":"ik_max_word",

"text": "我给祖国献石油"

}

添加之后的ik分词结果

curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true [email protected]

{

"tokens" : [

{

"token" : "我",

"start_offset" : 0,

"end_offset" : 1,

"type" : "CN_CHAR",

"position" : 0

},

{

"token" : "给",

"start_offset" : 1,

"end_offset" : 2,

"type" : "CN_CHAR",

"position" : 1

},

{

"token" : "祖国",

"start_offset" : 2,

"end_offset" : 4,

"type" : "CN_WORD",

"position" : 2

},

{

"token" : "献",

"start_offset" : 4,

"end_offset" : 5,

"type" : "CN_CHAR",

"position" : 3

},

{

"token" : "石油",

"start_offset" : 5,

"end_offset" : 7,

"type" : "CN_WORD",

"position" : 4

}

]

}

添加之后的ik分词结果,分词结果的tokens中增加了 "我给祖国献石油"

curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true [email protected]

{

"tokens" : [

{

"token" : "我给祖国献石油",

"start_offset" : 0,

"end_offset" : 7,

"type" : "CN_WORD",

"position" : 0

},

{

"token" : "祖国",

"start_offset" : 2,

"end_offset" : 4,

"type" : "CN_WORD",

"position" : 1

},

{

"token" : "献",

"start_offset" : 4,

"end_offset" : 5,

"type" : "CN_CHAR",

"position" : 2

},

{

"token" : "石油",

"start_offset" : 5,

"end_offset" : 7,

"type" : "CN_WORD",

"position" : 3

}

]

}

你可能感兴趣的:(es,ik,词库添加词语)