1、在elasticsearch中安装ik中文分词器
(1)git clone https://github.com/medcl/elasticsearch-analysis-ik (2)git checkout tags/v5.2.0 (3)mvn package (4)将target/releases/elasticsearch-analysis-ik-5.2.0.zip拷贝到es/plugins/ik目录下 (5)在es/plugins/ik下对elasticsearch-analysis-ik-5.2.0.zip进行解压缩 (6)重启es elasticsearch-analysis-ik-5.2.0.zip 链接:https://pan.baidu.com/s/1LOHynjm7dPjCldw3MGMOCQ 密码:aeym2、ik分词器基础知识
两种analyzer,你根据自己的需要自己选吧,但是一般是选用ik_max_word ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合; ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。3、ik分词器的使用
构建mapping,并且bulk数据PUT /my_index { "mappings": { "my_type": { "properties": { "text": { "type": "text", "analyzer": "ik_max_word" } } } }}POST /my_index/my_type/_bulk{ "index": { "_id": "1"} }{ "text": "男子偷上万元发红包求交女友 被抓获时仍然单身" }{ "index": { "_id": "2"} }{ "text": "16岁少女为结婚“变”22岁 7年后想离婚被法院拒绝" }{ "index": { "_id": "3"} }{ "text": "深圳女孩骑车逆行撞奔驰 遭索赔被吓哭(图)" }{ "index": { "_id": "4"} }{ "text": "女人对护肤品比对男票好?网友神怼" }{ "index": { "_id": "5"} }{ "text": "为什么国内的街道招牌用的都是红黄配?" }
查询 “16岁少女”的doc
GET /my_index/my_type/_search { "query": { "match": { "text": "16岁少女" } }}{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 2.0870597, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 2.0870597, "_source": { "text": "16岁少女为结婚“变”22岁 7年后想离婚被法院拒绝" } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 0.2699054, "_source": { "text": "深圳女孩骑车逆行撞奔驰 遭索赔被吓哭(图)" } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 0.25124598, "_source": { "text": "男子偷上万元发红包求交女友 被抓获时仍然单身" } }, { "_index": "my_index", "_type": "my_type", "_id": "4", "_score": 0.19100356, "_source": { "text": "女人对护肤品比对男票好?网友神怼" } } ] }}