一、ElasticSearch概述
官网:https://www.elastic.co/cn/downloads/elasticsearch
Elaticsearch,简称为es,es是一个开源的高扩展的分布式全文检索引擎,它可以近乎实时的存储、检索数据;本身扩展性很好,可以扩展到上百台服务器,处理PB级别(大数据时代)的数据。es也使用java开发并使用Lucene作为其核心来实现所有索引和搜索的功能,但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性,从而让全文搜索变得简单。
据国际权威的数据库产品评测机构DB Engines的统计,在2016年1月,ElasticSearch已超过Solr等,成为排名第一的搜索引擎类应用。
总结
1、es基本是开箱即用(解压就可以用!) ,非常简单。Solr安装略微复杂一丢丢!
2、Solr 利用Zookeeper进行分布式管理,而Elasticsearch自身带有分布式协调管理功能。
3、Solr 支持更多格式的数据,比如JSON、XML、 CSV ,而Elasticsearch仅支持json文件格式。
4、Solr 官方提供的功能更多,而Elasticsearch本身更注重于核心功能,高级功能多有第三方插件提供,例如图形化界面需要kibana友好支撑
5、Solr 查询快,但更新索引时慢(即插入删除慢) ,用于电商等查询多的应用;
- ES建立索引快(即查询慢) ,即实时性查询快,用于facebook新浪等搜索。
- Solr是传统搜索应用的有力解决方案,但Elasticsearch更适用于新兴的实时搜索应用。
6、Solr比较成熟,有一个更大,更成熟的用户、开发和贡献者社区,而Elasticsearch相对开发维护者较少,更新太快,学习使用成本较高。
二、ElasticSearch安装
Windows下安装
1、安装
下载地址:https://www.elastic.co/cn/downloads/
历史版本下载:https://www.elastic.co/cn/downloads/past-releases/
解压即可(尽量将ElasticSearch相关工具放在统一目录下)
2、熟悉目录
- bin 启动文件目录
- config 配置文件目录
- 1og4j2 日志配置文件
- jvm.options java 虚拟机相关的配置(默认启动占1g内存,内容不够需要自己调整)
- elasticsearch.ym1 elasticsearch 的配置文件! 默认9200端口!跨域!
- 1ib 相关jar包
- modules 功能模块目录
- plugins 插件目录
- ik分词器
复制代码 3、启动
bin目录下的elasticsearch.bat
访问地址: localhost:9200- {
- "name" : "TIANYH",
- "cluster_name" : "elasticsearch",
- "cluster_uuid" : "IOHRCRK6TKibMGdNZq4YtA",
- "version" : {
- "number" : "7.6.1",
- "build_flavor" : "default",
- "build_type" : "zip",
- "build_hash" : "aa751e09be0a5072e8570670309b1f12348f023b",
- "build_date" : "2020-02-29T00:15:25.529771Z",
- "build_snapshot" : false,
- "lucene_version" : "8.4.0",
- "minimum_wire_compatibility_version" : "6.8.0",
- "minimum_index_compatibility_version" : "6.0.0-beta1"
- },
- "tagline" : "You Know, for Search"
- }
复制代码安装可视化界面
使用前提:需要安装nodejs
1、下载地址
https://github.com/mobz/elasticsearch-head
2、安装
解压即可(尽量将ElasticSearch相关工具放在统一目录下)
3、启动
- cd elasticsearch-head
- # 安装依赖npm install
- # 启动npm run start#
- # 访问http://localhost:9100/
复制代码 开启跨域(在elasticsearch解压目录config下elasticsearch.yml中添加)- # 开启跨域http.cors.enabled: true
- # 所有人访问http.cors.allow-origin: "*"
复制代码重启elasticsearch
理解:
- 如果你是初学者
- 索引 可以看做 “数据库”
- 类型 可以看做 “表”
- 文档 可以看做 “库中的数据(表中的行)”
- 这个head,我们只是把它当做可视化数据展示工具,之后所有的查询都在kibana中进行
安装kibana
Kibana是一个针对ElasticSearch的开源分析及可视化平台,用来搜索、查看交互存储在Elasticsearch索引中的数据。使用Kibana ,可以通过各种图表进行高级数据分析及展示。Kibana让海量数据更容易理解。它操作简单,基于浏览器的用户界面可以快速创建仪表板( dashboard )实时显示Elasticsearch查询动态。设置Kibana非常简单。无需编码或者额外的基础架构,几分钟内就可以完成Kibana安装并启动Elasticsearch索引监测。
1、下载地址:
下载的版本需要与ElasticSearch版本对应
https://www.elastic.co/cn/downloads/
历史版本下载:https://www.elastic.co/cn/downloads/past-releases/
2、安装
解压即可(尽量将ElasticSearch相关工具放在统一目录下)
3、启动
bin目录下的kibanan.bat
访问地址: localhost:5601
4、kibana汉化
编辑器打开kibana解压目录/config/kibana.yml,添加重启kibana
了解ELK
- ELK是
Elasticsearch、Logstash、 Kibana三大开源框架首字母大写简称
。市面上也被成为Elastic Stack。
- 其中Elasticsearch是一个基于Lucene、分布式、通过Restful方式进行交互的近实时搜索平台框架。
- 像类似百度、谷歌这种大数据全文搜索引擎的场景都可以使用Elasticsearch作为底层支持框架,可见Elasticsearch提供的搜索能力确实强大,市面上很多时候我们简称Elasticsearch为es。
- Logstash是ELK的中央数据流引擎,用于从不同目标(文件/数据存储/MQ )收集的不同格式数据,经过过滤后支持输出到不同目的地(文件/MQ/redis/elasticsearch/kafka等)。
- Kibana可以将elasticsearch的数据通过友好的页面展示出来 ,提供实时分析的功能。
- 市面上很多开发只要提到ELK能够一致说出它是一个日志分析架构技术栈总称 ,但实际上ELK不仅仅适用于日志分析,它还可以支持其它任何数据分析和收集的场景,日志分析和收集只是更具有代表性。并非唯一性。
- 收集清洗数据(Logstash) ==> 搜索、存储(ElasticSearch) ==> 展示(Kibana)
复制代码 三、ElasticSearch核心概念
概述
1、索引(ElasticSearch)
2、字段类型(映射)
3、文档
4、分片(Lucene索引,倒排索引)
ElasticSearch是面向文档,关系行数据库和ElasticSearch客观对比!一切都是JSON!
Relational DBElasticSearch数据库(database)索引(indices)表(tables)types 行(rows)documents字段(columns)fieldselasticsearch(集群)中可以包含多个索引(数据库) ,每个索引中可以包含多个类型(表) ,每个类型下又包含多个文档(行) ,每个文档中又包含多个字段(列)。
物理设计:
elasticsearch在后台把每个索引划分成多个分片,每分分片可以在集群中的不同服务器间迁移
一个人就是一个集群! ,即启动的ElasticSearch服务,默认就是一个集群,且默认集群名为elasticsearch
逻辑设计:
一个索引类型中,包含多个文档,比如说文档1,文档2。当我们索引一篇文档时,可以通过这样的顺序找到它:索引 => 类型 => 文档ID ,通过这个组合我们就能索引到某个具体的文档。 注意:ID不必是整数,实际上它是个字符串。
文档(”行“)
之前说elasticsearch是面向文档的,那么就意味着索引和搜索数据的最小单位是文档,elasticsearch中,文档有几个重要属性:
- 自我包含,一篇文档同时包含字段和对应的值,也就是同时包含key:value !
- 可以是层次型的,一个文档中包含自文档,复杂的逻辑实体就是这么来的!
- 灵活的结构,文档不依赖预先定义的模式,我们知道关系型数据库中,要提前定义字段才能使用,在elasticsearch中,对于字段是非常灵活的,有时候,我们可以忽略该字段,或者动态的添加一个新的字段。
尽管我们可以随意的新增或者忽略某个字段,但是,每个字段的类型非常重要,比如一个年龄字段类型,可以是字符串也可以是整形。因为elasticsearch会保存字段和类型之间的映射及其他的设置。这种映射具体到每个映射的每种类型,这也是为什么在elasticsearch中,类型有时候也称为映射类型。
类型(“表”)
类型是文档的逻辑容器,就像关系型数据库一样,表格是行的容器。类型中对于字段的定义称为映射,比如name映射为字符串类型。我们说文档是无模式的,它们不需要拥有映射中所定义的所有字段,比如新增一个字段,那么elasticsearch是怎么做的呢?
- elasticsearch会自动的将新字段加入映射,但是这个字段的不确定它是什么类型,elasticsearch就开始猜,如果这个值是18,那么elasticsearch会认为它是整形。但是elasticsearch也可能猜不对,所以最安全的方式就是提前定义好所需要的映射,这点跟关系型数据库殊途同归了,先定义好字段,然后再使用,别整什么幺蛾子。
索引(“库”)
索引是映射类型的容器, elasticsearch中的索引是一个非常大的文档集合。 索引存储了映射类型的字段和其他设置。然后它们被存储到了各个分片上了。我们来研究下分片是如何工作的。
一个集群至少有一个节点,而一个节点就是一个elasricsearch进程,节点可以有多个索引默认的,如果你创建索引,那么索引将会有个5个分片(primary shard ,又称主分片)构成的,每一个主分片会有一个副本(replica shard,又称复制分片)
有3个节点的集群,可以看到主分片和对应的复制分片都不会在同一个节点内,这样有利于某个节点挂掉了,数据也不至于失。实际上,一个分片是一个Lucene索引(一个ElasticSearch索引包含多个Lucene索引) ,一个包含倒排索引的文件目录,倒排索引的结构使得elasticsearch在不扫描全部文档的情况下,就能告诉你哪些文档包含特定的关键字。不过,等等,倒排索引是什么鬼?
倒排索引(Lucene索引底层)
简单说就是 按(文章关键字,对应的文档)形式建立索引,根据关键字就可直接查询对应的文档(含关键字的),无需查询每一个文档,如下图
四、IK分词器(elasticsearch插件)
IK分词器:中文分词器
分词:即把一段中文或者别的划分成一个个的关键字,我们在搜索时候会把自己的信息进行分词,会把数据库中或者索引库中的数据进行分词,然后进行一一个匹配操作,默认的中文分词是将每个字看成一个词(不使用用IK分词器的情况下),比如“我爱狂神”会被分为”我”,”爱”,”狂”,”神” ,这显然是不符合要求的,所以我们需要安装中文分词器ik来解决这个问题。
IK提供了两个分词算法: ik_smart和ik_max_word ,其中ik_smart为最少切分, ik_max_word为最细粒度划分!
1、下载
版本要与ElasticSearch版本对应
下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases
2、安装
ik文件夹是自己创建的
加压即可(但是我们需要解压到ElasticSearch的plugins目录ik文件夹下)
4、使用 ElasticSearch安装补录/bin/elasticsearch-plugin 可以查看插件
- E:\ElasticSearch\elasticsearch-7.6.1\bin>elasticsearch-plugin list
复制代码 5、使用kibana测试
ik_smart:最少切分- GET _analyze
- {
- "analyzer": "ik_smart",
- "text": "白日依山尽
- 黄河入海流"
- }
- {
- "tokens" : [
- {
- "token" : "白日",
- "start_offset" : 0,
- "end_offset" : 2,
- "type" : "CN_WORD",
- "position" : 0
- },
- {
- "token" : "依",
- "start_offset" : 2,
- "end_offset" : 3,
- "type" : "CN_CHAR",
- "position" : 1
- },
- {
- "token" : "山",
- "start_offset" : 3,
- "end_offset" : 4,
- "type" : "CN_CHAR",
- "position" : 2
- },
- {
- "token" : "尽",
- "start_offset" : 4,
- "end_offset" : 5,
- "type" : "CN_CHAR",
- "position" : 3
- },
- {
- "token" : "黄河",
- "start_offset" : 5,
- "end_offset" : 7,
- "type" : "CN_WORD",
- "position" : 4
- },
- {
- "token" : "入海流",
- "start_offset" : 7,
- "end_offset" : 10,
- "type" : "CN_WORD",
- "position" : 5
- }
- ]
- }
复制代码 ik_max_word:最细粒度划分(穷尽词库的可能)- GET _analyze
- {
- "analyzer": "ik_max_word",
- "text": "白日依山尽
- 黄河入海流"
- }
- {
- "tokens" : [
- {
- "token" : "白日",
- "start_offset" : 0,
- "end_offset" : 2,
- "type" : "CN_WORD",
- "position" : 0
- },
- {
- "token" : "依",
- "start_offset" : 2,
- "end_offset" : 3,
- "type" : "CN_CHAR",
- "position" : 1
- },
- {
- "token" : "山",
- "start_offset" : 3,
- "end_offset" : 4,
- "type" : "CN_CHAR",
- "position" : 2
- },
- {
- "token" : "尽",
- "start_offset" : 4,
- "end_offset" : 5,
- "type" : "CN_CHAR",
- "position" : 3
- },
- {
- "token" : "黄河",
- "start_offset" : 5,
- "end_offset" : 7,
- "type" : "CN_WORD",
- "position" : 4
- },
- {
- "token" : "入海流",
- "start_offset" : 7,
- "end_offset" : 10,
- "type" : "CN_WORD",
- "position" : 5
- },
- {
- "token" : "入海",
- "start_offset" : 7,
- "end_offset" : 9,
- "type" : "CN_WORD",
- "position" : 6
- },
- {
- "token" : "海流",
- "start_offset" : 8,
- "end_offset" : 10,
- "type" : "CN_WORD",
- "position" : 7
- }
- ]
- }
复制代码 6、添加自定义的词添加到扩展字典中
- elasticsearch目录/plugins/ik/config/IKAnalyzer.cfg.xml
复制代码 打开 IKAnalyzer.cfg.xml 文件,扩展字典- <?xml version="1.0" encoding="UTF-8"?>
- <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
- <properties>
- <comment>IK Analyzer 扩展配置</comment>
-
- <entry key="ext_dict">my.dic</entry>
-
- <entry key="ext_stopwords"></entry>
-
-
-
-
- </properties>
复制代码 编写 my.dic- GET _analyze{ "analyzer": "ik_smart", "text": "白日依山尽
- 黄河入海流"}{ "tokens" : [ { "token" : "白日依山尽", "start_offset" : 0, "end_offset" : 5, "type" : "CN_WORD", "position" : 0 }, { "token" : "黄河入海流", "start_offset" : 5, "end_offset" : 10, "type" : "CN_WORD", "position" : 1 } ]}
复制代码 五、Rest风格说明
一种软件架构风格,而不是标准,只是提供了一组设计原则和约束条件。它主要用于客户端和服务器交互类的软件。基于这个风格设计的软件可以更简洁,更有层次,更易于实现缓存等机制。
基本Rest命令说明:
methodurl地址描述PUT(创建,修改)localhost:9200/索引名称/类型名称/文档id创建文档(指定文档id)POST(创建)localhost:9200/索引名称/类型名称创建文档(随机文档id)POST(修改)localhost:9200/索引名称/类型名称/文档id/_update修改文档DELETE(删除)localhost:9200/索引名称/类型名称/文档id删除文档GET(查询)localhost:9200/索引名称/类型名称/文档id查询文档通过文档IDPOST(查询)localhost:9200/索引名称/类型名称/文档id/_search查询所有数据测试
1、创建一个索引,添加
- PUT /test/type/1
- {
- "name": "测试",
- "age": 18
- }
- {
- "_index" : "test",
- "_type" : "type",
- "_id" : "1",
- "_version" : 1,
- "result" : "created",
- "_shards" : {
- "total" : 2,
- "successful" : 1,
- "failed" : 0
- },
- "_seq_no" : 0,
- "_primary_term" : 1
- }
复制代码 2、字段数据类型
- 字符串类型
- text、
keyword
- text:支持分词,全文检索,支持模糊、精确查询,不支持聚合,排序操作;text类型的最大支持的字符长度无限制,适合大字段存储;
- keyword:不进行分词,直接索引、支持模糊、支持精确匹配,支持聚合、排序操作。keyword类型的最大支持的长度为——32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度,超过给定长度后的数据将不被索引,无法通过term精确匹配检索返回结果。
- 数值型
- long、Integer、short、byte、double、float、half float、scaled float
- 日期类型
- te布尔类型
- 二进制类型
- 等等…
3、指定字段的类型(使用PUT)
类似于建库(建立索引和字段对应类型),也可看做规则的建立
- PUT /test2
- {
- "mappings": {
- "properties": {
- "name": {
-
- "type": "text"
- },
- "age":{
-
- "type": "long"
- },
- "birthday":{
-
- "type": "date"
- }
- }
- }
- }
- {
- "acknowledged" : true,
- "shards_acknowledged" : true,
- "index" : "test2"
- }
复制代码 4、获取3建立的规则
- GET test2
- {
- "test2" : {
- "aliases" : { },
- "mappings" : {
- "properties" : {
-
- "age" : {
-
- "type" : "long"
-
- },
-
- "birthday" : {
-
- "type" : "date"
-
- },
-
- "name" : {
-
- "type" : "text"
-
- }
- }
- },
- "settings" : {
- "index" : {
-
- "creation_date" : "1676438148562",
-
- "number_of_shards" : "1",
-
- "number_of_replicas" : "1",
-
- "uuid" : "d-qUkOZKQJKzd68KHiN_pw",
-
- "version" : {
-
- "created" : "7060199"
-
- },
-
- "provided_name" : "test2"
- }
- }
- }
- }
复制代码 5、获取默认信息
_doc 默认类型(default type),type 在未来的版本中会逐渐弃用,因此产生一个默认类型进行代替
- PUT /test3/_doc/1
- {
- "name": "黄河",
- "age": 18
- }
- {
- "_index" : "test3",
- "_type" : "_doc",
- "_id" : "1",
- "_version" : 1,
- "result" : "created",
- "_shards" : {
- "total" : 2,
- "successful" : 1,
- "failed" : 0
- },
- "_seq_no" : 0,
- "_primary_term" : 1
- }
- GET test3
- {
- "test3" : {
- "aliases" : { },
- "mappings" : {
- "properties" : {
-
- "age" : {
-
- "type" : "long"
-
- },
-
- "name" : {
-
- "type" : "text",
-
- "fields" : {
-
- "keyword" : {
-
- "type" : "keyword",
-
- "ignore_above" : 256
-
- }
-
- }
-
- }
- }
- },
- "settings" : {
- "index" : {
-
- "creation_date" : "1676438576004",
-
- "number_of_shards" : "1",
-
- "number_of_replicas" : "1",
-
- "uuid" : "QmHErZuzSvmczgtgyzC7oA",
-
- "version" : {
-
- "created" : "7060199"
-
- },
-
- "provided_name" : "test3"
- }
- }
- }
- }
复制代码如果自己的文档字段没有被指定,那么ElasticSearch就会给我们默认配置字段类型
扩展:通过GET _cat/ 可以获取ElasticSearch的当前的很多信息!- =^.^=
- /_cat/allocation
- /_cat/shards
- /_cat/shards/{index}
- /_cat/master
- /_cat/nodes
- /_cat/tasks
- /_cat/indices
- /_cat/indices/{index}
- /_cat/segments
- /_cat/segments/{index}
- /_cat/count
- /_cat/count/{index}
- /_cat/recovery
- /_cat/recovery/{index}
- /_cat/health
- /_cat/pending_tasks
- /_cat/aliases
- /_cat/aliases/{alias}
- /_cat/thread_pool
- /_cat/thread_pool/{thread_pools}
- /_cat/plugins
- /_cat/fielddata
- /_cat/fielddata/{fields}
- /_cat/nodeattrs
- /_cat/repositories
- /_cat/snapshots/{repository}
- /_cat/templates
复制代码 6、修改
两种方案
①旧的(使用put覆盖原来的值)
- 版本+1(_version)
- 但是如果漏掉某个字段没有写,那么更新是没有写的字段 ,会消失
- PUT /test/type/1
- {
- "name": "测试",
- "age": 19
- }
- GET /test/_doc/1
- {
- "_index" : "test",
- "_type" : "_doc",
- "_id" : "1",
- "_version" : 2,
- "_seq_no" : 1,
- "_primary_term" : 1,
- "found" : true,
- "_source" : {
- "name" : "测试",
- "age" : 19
- }
- }
- PUT /test/type/1
- {
- "age": 20
- }
- GET /test/_doc/1
- {
- "_index" : "test",
- "_type" : "_doc",
- "_id" : "1",
- "_version" : 3,
- "_seq_no" : 2,
- "_primary_term" : 1,
- "found" : true,
- "_source" : {
- "age" : 20
- }
- }
复制代码 ②新的(使用post的update)
- version不会改变
- 需要注意doc
- 不会丢失字段
- POST /test/_doc/1/_update
- {
- "doc":{
- "age":11
- }
- }
- GET /test/_doc/1
- {
- "_index" : "test",
- "_type" : "_doc",
- "_id" : "1",
- "_version" : 5,
- "_seq_no" : 4,
- "_primary_term" : 1,
- "found" : true,
- "_source" : {
- "name" : "测试",
- "age" : 11
- }
- }
复制代码 7、删除
- DELETE /test
- {
- "acknowledged" : true
- }
复制代码 8、查询(简单条件)
- GET /test/_doc/_search?q=age:19
- {
- "took" : 1,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 1,
- "relation" : "eq"
- },
- "max_score" : 1.0,
- "hits" : [
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "1",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "测试",
-
- "age" : 19
-
- }
- }
- ]
- }
- }
复制代码 9、复杂查询
①查询匹配
- match:匹配(会使用分词器解析(先分析文档,然后进行查询))
- _source:过滤字段
- sort:排序
- form、size 分页
- GET /test/_doc/_search
- {
-
- }
- {
- "took" : 0,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 5,
- "relation" : "eq"
- },
- "max_score" : 1.0,
- "hits" : [
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "1",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "测试",
-
- "age" : 19
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "2",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "小李",
-
- "age" : 19
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "3",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "小张",
-
- "age" : 18
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "4",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "小明",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "5",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "明明",
-
- "age" : 16
-
- }
- }
- ]
- }
- }
复制代码- GET /test/_doc/_search
- {
- "query":{
- "match":{
- "name":"明"
- }
- },
- "_source":["age","name"],
- "sort":[{"age":{"order":"asc"}}],
- "from":0,
- "size":20
- }
- {
- "took" : 0,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 2,
- "relation" : "eq"
- },
- "max_score" : null,
- "hits" : [
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "4",
-
- "_score" : null,
-
- "_source" : {
-
- "name" : "小明",
-
- "age" : 16
-
- },
-
- "sort" : [
-
- 16
-
- ]
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "5",
-
- "_score" : null,
-
- "_source" : {
-
- "name" : "明明",
-
- "age" : 16
-
- },
-
- "sort" : [
-
- 16
-
- ]
- }
- ]
- }
- }
复制代码 ②多条件查询(bool)
- must 相当于 and
- should 相当于 or
- must_not 相当于 not (... and ...)
- filter 过滤
- GET /test/_doc/_search
- {
- "query":{
- "bool":{
- "must":[{"match":{"age":16}},{"match":{"name":"小"}}],
- "filter":{
-
- "range":{
-
- "age":{
-
- "gte":15,
-
- "lte":17
-
- }
-
- }
- }
- }
- }
- }
- {
- "took" : 1,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 4,
- "relation" : "eq"
- },
- "max_score" : 1.2940125,
- "hits" : [
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "4",
-
- "_score" : 1.2940125,
-
- "_source" : {
-
- "name" : "小明",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "6",
-
- "_score" : 1.2940125,
-
- "_source" : {
-
- "name" : "小黄",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "7",
-
- "_score" : 1.2940125,
-
- "_source" : {
-
- "name" : "小黑",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "9",
-
- "_score" : 1.2940125,
-
- "_source" : {
-
- "name" : "小花",
-
- "age" : 16
-
- }
- }
- ]
- }
- }
复制代码 ③匹配数组
- 貌似不能与其它字段一起使用
- 可以多关键字查(空格隔开)— 匹配字段也是符合的
- match 会使用分词器解析(先分析文档,然后进行查询)
- 搜词
- GET /test/_doc/_search
- {
- "query":{
- "match":{
- "name":"明 黑"
- }
- }
- }
- {
- "took" : 1,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 3,
- "relation" : "eq"
- },
- "max_score" : 1.9388659,
- "hits" : [
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "7",
-
- "_score" : 1.9388659,
-
- "_source" : {
-
- "name" : "小黑",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "5",
-
- "_score" : 1.4651942,
-
- "_source" : {
-
- "name" : "明明",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "4",
-
- "_score" : 1.0729234,
-
- "_source" : {
-
- "name" : "小明",
-
- "age" : 16
-
- }
- }
- ]
- }
- }
复制代码 ④精确查询
- term 直接通过 倒排索引 指定词条查询
- 适合查询 number、date、keyword ,不适合text
- GET /test/_doc/_search
- {
- "query":{
- "term":{
- "age":16
- }
- }
- }
- {
- "took" : 0,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 5,
- "relation" : "eq"
- },
- "max_score" : 1.0,
- "hits" : [
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "4",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "小明",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "5",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "明明",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "6",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "小黄",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "7",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "小黑",
-
- "age" : 16
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "9",
-
- "_score" : 1.0,
-
- "_source" : {
-
- "name" : "小花",
-
- "age" : 16
-
- }
- }
- ]
- }
- }
复制代码 ⑤text和keyword
- text:
- 支持分词,全文检索、支持模糊、精确查询,不支持聚合,排序操作;
- text类型的最大支持的字符长度无限制,适合大字段存储;
- keyword:
- 不进行分词,直接索引、支持模糊、支持精确匹配,支持聚合、排序操作。
- keyword类型的最大支持的长度为——32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度,超过给定长度后的数据将不被索引,无法通过term精确匹配检索返回结果。
- // 设置索引类型
- PUT /test2
- {
- "mappings": {
- "properties": {
- "text":{
-
- "type":"text"
- },
- "keyword":{
-
- "type":"keyword"
- }
- }
- }
- }
- // 设置字段数据
- PUT /test2/_doc/1
- {
- "text":"测试keyword和text是否支持分词",
- "keyword":"测试keyword和text是否支持分词"
- }
- GET /test2/_doc/_search
- {
- "query":{
- "match":{
- "text":"测试"
- }
- }
- }
- {
- "took" : 426,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 1,
- "relation" : "eq"
- },
- "max_score" : 0.5753642,
- "hits" : [
- {
-
- "_index" : "test2",
-
- "_type" : "_doc",
-
- "_id" : "1",
-
- "_score" : 0.5753642,
-
- "_source" : {
-
- "text" : "测试keyword和text是否支持分词",
-
- "keyword" : "测试keyword和text是否支持分词"
-
- }
- }
- ]
- }
- }
- GET /test2/_doc/_search
- {
- "query":{
- "match":{
- "keyword":"测试"
- }
- }
- }
- {
- "took" : 0,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 0,
- "relation" : "eq"
- },
- "max_score" : null,
- "hits" : [ ]
- }
- }
- GET _analyze
- {
- "analyzer": "keyword",
- "text": ["白日依山尽"]
- }
- {
- "tokens" : [
- {
- "token" : "白日依山尽",
- "start_offset" : 0,
- "end_offset" : 5,
- "type" : "word",
- "position" : 0
- }
- ]
- }
- GET _analyze
- {
- "analyzer": "standard",
- "text": ["白日依山尽"]
- }
- {
- "tokens" : [
- {
- "token" : "白",
- "start_offset" : 0,
- "end_offset" : 1,
- "type" : "<IDEOGRAPHIC>",
- "position" : 0
- },
- {
- "token" : "日",
- "start_offset" : 1,
- "end_offset" : 2,
- "type" : "<IDEOGRAPHIC>",
- "position" : 1
- },
- {
- "token" : "依",
- "start_offset" : 2,
- "end_offset" : 3,
- "type" : "<IDEOGRAPHIC>",
- "position" : 2
- },
- {
- "token" : "山",
- "start_offset" : 3,
- "end_offset" : 4,
- "type" : "<IDEOGRAPHIC>",
- "position" : 3
- },
- {
- "token" : "尽",
- "start_offset" : 4,
- "end_offset" : 5,
- "type" : "<IDEOGRAPHIC>",
- "position" : 4
- }
- ]
- }
- GET _analyze
- {
- "analyzer": "ik_max_word",
- "text": ["白日依山尽"]
- }
- {
- "tokens" : [
- {
- "token" : "白日依山尽",
- "start_offset" : 0,
- "end_offset" : 5,
- "type" : "CN_WORD",
- "position" : 0
- },
- {
- "token" : "白日",
- "start_offset" : 0,
- "end_offset" : 2,
- "type" : "CN_WORD",
- "position" : 1
- },
- {
- "token" : "依",
- "start_offset" : 2,
- "end_offset" : 3,
- "type" : "CN_CHAR",
- "position" : 2
- },
- {
- "token" : "山",
- "start_offset" : 3,
- "end_offset" : 4,
- "type" : "CN_CHAR",
- "position" : 3
- },
- {
- "token" : "尽",
- "start_offset" : 4,
- "end_offset" : 5,
- "type" : "CN_CHAR",
- "position" : 4
- }
- ]
- }
复制代码 ⑥高亮查询
- GET /test/_doc/_search
- {
- "query":{
-
- "match":{"name":"小"}
- },
-
- "highlight":{
- "fields":{
-
- "name":{}
- }
- }
-
- }
- {
- "took" : 89,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 6,
- "relation" : "eq"
- },
- "max_score" : 0.18681718,
- "hits" : [
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "2",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小李",
-
- "age" : 19
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<em>小</em>李"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "3",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小张",
-
- "age" : 18
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<em>小</em>张"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "4",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小明",
-
- "age" : 16
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<em>小</em>明"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "6",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小黄",
-
- "age" : 16
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<em>小</em>黄"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "7",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小黑",
-
- "age" : 16
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<em>小</em>黑"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "9",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小花",
-
- "age" : 16
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<em>小</em>花"
-
- ]
-
- }
- }
- ]
- }
- }
- GET /test/_doc/_search
- {
- "query":{
-
- "match":{"name":"小"}
- },
-
- "highlight": {
- "pre_tags": "<p class='key' style='color:red'>",
- "post_tags": "</p>",
- "fields": {
- "name": {}
- }
- }
-
- }
- {
- "took" : 2,
- "timed_out" : false,
- "_shards" : {
- "total" : 1,
- "successful" : 1,
- "skipped" : 0,
- "failed" : 0
- },
- "hits" : {
- "total" : {
- "value" : 6,
- "relation" : "eq"
- },
- "max_score" : 0.18681718,
- "hits" : [
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "2",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小李",
-
- "age" : 19
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<p class='key' style='color:red'>小</p>李"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "3",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小张",
-
- "age" : 18
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<p class='key' style='color:red'>小</p>张"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "4",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小明",
-
- "age" : 16
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<p class='key' style='color:red'>小</p>明"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "6",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小黄",
-
- "age" : 16
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<p class='key' style='color:red'>小</p>黄"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "7",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小黑",
-
- "age" : 16
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<p class='key' style='color:red'>小</p>黑"
-
- ]
-
- }
- },
- {
-
- "_index" : "test",
-
- "_type" : "_doc",
-
- "_id" : "9",
-
- "_score" : 0.18681718,
-
- "_source" : {
-
- "name" : "小花",
-
- "age" : 16
-
- },
-
- "highlight" : {
-
- "name" : [
-
- "<p class='key' style='color:red'>小</p>花"
-
- ]
-
- }
- }
- ]
- }
- }
复制代码 六、SpringBoot整合
1、导入依赖
导入elasticsearch- <dependency>
- <groupId>org.springframework.boot</groupId>
- <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
- </dependency>
复制代码 提前导入fastjson、lombok- <dependency>
- <groupId>com.alibaba</groupId>
- <artifactId>fastjson</artifactId>
- <version>1.2.70</version>
- </dependency>
- <dependency>
- <groupId>org.projectlombok</groupId>
- <artifactId>lombok</artifactId>
- <optional>true</optional>
- </dependency>
复制代码 2、创建并编写配置类
- @Configuration
- public class ElasticSearchConfig {
- // 注册 rest高级客户端
- @Bean
- public RestHighLevelClient restHighLevelClient(){
- RestHighLevelClient client = new RestHighLevelClient(
- RestClient.builder(
- new HttpHost("localhost",9200,"http")
- )
- );
- return client;
- }
- }
复制代码 3、创建并编写实体类
- @Data
- @NoArgsConstructor
- @AllArgsConstructor
- public class User implements Serializable {
- private static final long serialVersionUID = -3843548915035470817L;
- private String name;
- private Integer age;
- }
复制代码 4、测试
注入 RestHighLevelClient
- @Autowired
- public RestHighLevelClient restHighLevelClient;
复制代码 索引的操作
1、索引的创建
- public void CreatIndex() throws IOException {
-
- CreateIndexRequest request = new CreateIndexRequest("test6");
-
- CreateIndexResponse response = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
-
- System.out.println(response.isAcknowledged());
-
- System.out.println(response);
-
- restHighLevelClient.close();
-
- return ;
- }
复制代码 2、索引的获取,并判断其是否存在
- public void IndexIsExists() throws IOException {
-
- GetIndexRequest request = new GetIndexRequest("test6");
-
- boolean exists = restHighLevelClient.indices().exists(request,RequestOptions.DEFAULT);
-
- System.out.println(exists);
-
- restHighLevelClient.close();
-
- return;
- }
复制代码 3、索引的删除
- public void DeleteIndex() throws IOException {
-
- DeleteIndexRequest request = new DeleteIndexRequest("test6");
-
- AcknowledgedResponse response = restHighLevelClient.indices().delete(request,RequestOptions.DEFAULT);
-
- System.out.println(response.isAcknowledged());
-
- restHighLevelClient.close();
-
- return;
- }
复制代码 文档的操作
1、文档的添加
- public void AddDocument() throws IOException {
- User user = new User("笑笑",25);
- IndexRequest request = new IndexRequest("test");
- request.id("16");
- request.timeout(TimeValue.timeValueMillis(1000));
- request.source(JSON.toJSONString(user),XContentType.JSON);
- IndexResponse response = restHighLevelClient.index(request,RequestOptions.DEFAULT);
- System.out.println(response.status());
- System.out.println(response);
- restHighLevelClient.close();
- return;
- }
复制代码 2、文档信息的获取
- public void GetDocument() throws IOException {
- GetRequest request = new GetRequest("test","1");
- GetResponse response = restHighLevelClient.get(request,RequestOptions.DEFAULT);
- System.out.println(response.getSourceAsString());
- restHighLevelClient.close();
- return;
- }
复制代码 3、文档的获取,并判断其是否存在
- public void DocumentIsExists() throws IOException {
- GetRequest request = new GetRequest("test","1111");
- request.fetchSourceContext(new FetchSourceContext(false));
- request.storedFields("_none_");
- boolean exists = restHighLevelClient.exists(request,RequestOptions.DEFAULT);
- System.out.println(exists);
- restHighLevelClient.close();
- return;
- }
复制代码 4、文档的更新
- public void UpdateDocument() throws IOException {
- UpdateRequest request = new UpdateRequest("test","16");
- User user = new User("黑黑",18);
- request.doc(JSON.toJSONString(user),XContentType.JSON);
- UpdateResponse response = restHighLevelClient.update(request,RequestOptions.DEFAULT);
- System.out.println(response.status());
- restHighLevelClient.close();
- return;
- }
复制代码 5、文档的删除
- public void DeleteDocument() throws Exception {
- DeleteRequest request = new DeleteRequest("test","1");
- request.timeout("1s");
- DeleteResponse response = restHighLevelClient.delete(request,RequestOptions.DEFAULT);
- System.out.println(response.status());
- restHighLevelClient.close();
- }
复制代码 6、文档的查询
- public void Search() throws Exception {
- SearchRequest request = new SearchRequest("test");
- SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
- TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name","明");
- // MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
- searchSourceBuilder.highlighter(new HighlightBuilder());
- searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
- searchSourceBuilder.query(termQueryBuilder);
- // searchSourceBuilder.query(matchAllQueryBuilder);
- searchSourceBuilder.from(0);
- searchSourceBuilder.size(100);
- request.source(searchSourceBuilder);
- SearchResponse search = restHighLevelClient.search(request, RequestOptions.DEFAULT);
- SearchHits hits = search.getHits();
- System.out.println(JSON.toJSONString(hits));
- System.out.println("++++++++++++++++++++++++++++++++++++++++");
- for (SearchHit documentFields: hits.getHits()) {
- System.out.println(documentFields.getSourceAsMap());
- }
- restHighLevelClient.close();
- }
复制代码 错误的批量添加数据
- public void test() throws Exception {
- IndexRequest request = new IndexRequest("bulk");
- request.source(JSON.toJSONString(new User("小1",12)),XContentType.JSON);
- request.source(JSON.toJSONString(new User("小2",12)),XContentType.JSON);
- request.source(JSON.toJSONString(new User("小3",12)),XContentType.JSON);
- request.source(JSON.toJSONString(new User("小4",12)),XContentType.JSON);
- request.source(JSON.toJSONString(new User("小5",12)),XContentType.JSON);
- request.source(JSON.toJSONString(new User("小6",12)),XContentType.JSON);
- request.source(JSON.toJSONString(new User("小7",12)),XContentType.JSON);
- IndexResponse indexResponse = restHighLevelClient.index(request,RequestOptions.DEFAULT);
- System.out.println(indexResponse.status());
- restHighLevelClient.close();
- }
复制代码 7、批量添加数据
- public void testBullk() throws Exception {
- BulkRequest bulkRequest = new BulkRequest();
- bulkRequest.timeout("10s");
- ArrayList<User> users = new ArrayList<>();
- users.add(new User("小1",12));
- users.add(new User("小2",12));
- users.add(new User("小3",12));
- users.add(new User("小4",12));
- users.add(new User("小5",12));
- users.add(new User("小6",12));
- for (User user:users) {
- bulkRequest.add(new IndexRequest("bulk").source(JSON.toJSONString(user),XContentType.JSON));
- }
- BulkResponse response = restHighLevelClient.bulk(bulkRequest,RequestOptions.DEFAULT);
- System.out.println(response.status());
- restHighLevelClient.close();
- }
复制代码 七、ElasticSearch实战
防京东商城搜索(高亮)

1、导入依赖
-
-
- org.jsoup
- jsoup
- 1.10.2
-
- com.alibaba
- fastjson
- 1.2.70
- <dependency>
- <groupId>org.springframework.boot</groupId>
- <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
- </dependency>
-
- org.springframework.boot
- spring-boot-starter-thymeleaf
-
- org.springframework.boot
- spring-boot-starter-web
-
- org.springframework.boot
- spring-boot-devtools
- runtime
- true
-
- org.springframework.boot
- spring-boot-configuration-processor
- true
-
- org.projectlombok
- lombok
- true
-
- org.springframework.boot
- spring-boot-starter-test
- test
复制代码 2、导入前端素材
- ES资料地址:链接:https://pan.baidu.com/s/1qdvSk7SdVnlI8QzeK5gxaA
- 提取码:ldrh
复制代码 3、编写 application.preperties配置文件
- # 更改端口,防止冲突
- server.port=9999
- # 关闭thymeleaf缓存
- spring.thymeleaf.cache=false
复制代码 4、测试controller和view
- @Controller
- public class DemoApi {
- @GetMapping({"/","index"})
- public String index(){
- return "index";
- }
- }
复制代码 5、编写service
ContentService- @Service
- public class ContentService {
- @Autowired
- private RestHighLevelClient restHighLevelClient;
- // 1、解析数据放入 es 索引中
- public Boolean parseContent(String keyword) throws IOException {
- // 获取内容
- List<Content> contents = HtmlParseUtil.parseJD(keyword);
- // 内容放入 es 中
- BulkRequest bulkRequest = new BulkRequest();
- bulkRequest.timeout("2m"); // 可更具实际业务是指
- for (int i = 0; i < contents.size(); i++) {
- bulkRequest.add(
- new IndexRequest("jd_goods")
- .id(""+(i+1))
- .source(JSON.toJSONString(contents.get(i)), XContentType.JSON)
- );
- }
- BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
- // restHighLevelClient.close();
- return !bulk.hasFailures();
- }
- // 2、根据keyword分页查询结果
- public List<Map<String, Object>> search(String keyword, Integer pageIndex, Integer pageSize) throws IOException {
- if (pageIndex < 0){
- pageIndex = 0;
- }
- SearchRequest jd_goods = new SearchRequest("jd_goods");
- // 创建搜索源建造者对象
- SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
- // 条件采用:精确查询 通过keyword查字段name
- TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword);
- searchSourceBuilder.query(termQueryBuilder);
- searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 60s
- // 分页
- searchSourceBuilder.from(pageIndex);
- searchSourceBuilder.size(pageSize);
- // 高亮
- // ....
- // 搜索源放入搜索请求中
- jd_goods.source(searchSourceBuilder);
- // 执行查询,返回结果
- SearchResponse searchResponse = restHighLevelClient.search(jd_goods, RequestOptions.DEFAULT);
- // restHighLevelClient.close();
- // 解析结果
- SearchHits hits = searchResponse.getHits();
- List<Map<String,Object>> results = new ArrayList<>();
- for (SearchHit documentFields : hits.getHits()) {
- Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();
- results.add(sourceAsMap);
- }
- // 返回查询的结果
- return results;
- }
- // 3、 在2的基础上进行高亮查询
- public List<Map<String, Object>> highlightSearch(String keyword, Integer pageIndex, Integer pageSize) throws IOException {
- SearchRequest searchRequest = new SearchRequest("jd_goods");
- SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
- // 精确查询,添加查询条件
- TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword);
- searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
- searchSourceBuilder.query(termQueryBuilder);
- // 分页
- searchSourceBuilder.from(pageIndex);
- searchSourceBuilder.size(pageSize);
- // 高亮 =========
- HighlightBuilder highlightBuilder = new HighlightBuilder();
- highlightBuilder.field("name");
- highlightBuilder.preTags("");
- highlightBuilder.postTags("");
- searchSourceBuilder.highlighter(highlightBuilder);
- // 执行查询
- searchRequest.source(searchSourceBuilder);
- SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
- // 解析结果 ==========
- SearchHits hits = searchResponse.getHits();
- List<Map<String, Object>> results = new ArrayList<>();
- for (SearchHit documentFields : hits.getHits()) {
- // 使用新的字段值(高亮),覆盖旧的字段值
- Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();
- // 高亮字段
- Map<String, HighlightField> highlightFields = documentFields.getHighlightFields();
- HighlightField name = highlightFields.get("name");
- // 替换
- if (name != null){
- Text[] fragments = name.fragments();
- StringBuilder new_name = new StringBuilder();
- for (Text text : fragments) {
- new_name.append(text);
- }
- sourceAsMap.put("name",new_name.toString());
- }
- results.add(sourceAsMap);
- }
- return results;
- }
- }
复制代码 6、编写controller
- @Controller
- public class DemoApi {
- @GetMapping({"/","index"})
- public String index(){
- return "index";
- }
- @Autowired
- private ContentService contentService;
- @ResponseBody
- @GetMapping("/parse/{keyword}")
- public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {
- return contentService.parseContent(keyword);
- }
- @ResponseBody
- @GetMapping("/search/{keyword}/{pageIndex}/{pageSize}")
- public List<Map<String, Object>> parse(@PathVariable("keyword") String keyword,
- @PathVariable("pageIndex") Integer pageIndex,
- @PathVariable("pageSize") Integer pageSize) throws IOException {
- return contentService.search(keyword,pageIndex,pageSize);
- }
- @ResponseBody
- @GetMapping("/h_search/{keyword}/{pageIndex}/{pageSize}")
- public List<Map<String, Object>> highlightParse(@PathVariable("keyword") String keyword,
- @PathVariable("pageIndex") Integer pageIndex,
- @PathVariable("pageSize") Integer pageSize) throws IOException {
- return contentService.highlightSearch(keyword,pageIndex,pageSize);
- }
- }
复制代码 7、爬虫(jsoup)
HtmlParseUtil
- public class HtmlParseUtil {
- public static void main(String[] args) throws IOException {
- /// 使用前需要联网
- // 请求url
- String url = "http://search.jd.com/search?keyword=java";
- // 1.解析网页(jsoup 解析返回的对象是浏览器Document对象)
- Document document = Jsoup.parse(new URL(url), 30000);
- // 使用document可以使用在js对document的所有操作
- // 2.获取元素(通过id)
- Element j_goodsList = document.getElementById("J_goodsList");
- // 3.获取J_goodsList ul 每一个 li
- Elements lis = j_goodsList.getElementsByTag("li");
- // 4.获取li下的 img、price、name
- for (Element li : lis) {
- String img = li.getElementsByTag("img").eq(0).attr("src");// 获取li下 第一张图片
- String name = li.getElementsByClass("p-name").eq(0).text();
- String price = li.getElementsByClass("p-price").eq(0).text();
- System.out.println("=======================");
- System.out.println("img : " + img);
- System.out.println("name : " + name);
- System.out.println("price : " + price);
- }
- }
- public static List<Content> parseJD(String keyword) throws IOException {
- /// 使用前需要联网
- // 请求url
- String url = "http://search.jd.com/search?keyword=" + keyword;
- // 1.解析网页(jsoup 解析返回的对象是浏览器Document对象)
- Document document = Jsoup.parse(new URL(url), 30000);
- // 使用document可以使用在js对document的所有操作
- // 2.获取元素(通过id)
- Element j_goodsList = document.getElementById("J_goodsList");
- // 3.获取J_goodsList ul 每一个 li
- Elements lis = j_goodsList.getElementsByTag("li");
- //
- System.out.println(lis);
- // 4.获取li下的 img、price、name
- // list存储所有li下的内容
- List<Content> contents = new ArrayList<Content>();
- for (Element li : lis) {
- // 由于网站图片使用懒加载,将src属性替换为data-lazy-img
- String img = li.getElementsByTag("img").eq(0).attr("data-lazy-img");// 获取li下 第一张图片
- String name = li.getElementsByClass("p-name").eq(0).text();
- String price = li.getElementsByClass("p-price").eq(0).text();
- // 封装为对象
- Content content = new Content(name,img,price);
- // 添加到list中
- contents.add(content);
- }
-
- System.out.println(contents);
- // 5.返回 list
- return contents;
- }
- }
复制代码 Content
- @Data
- @AllArgsConstructor
- @NoArgsConstructor
- public class Content implements Serializable {
- private static final long serialVersionUID = -8049497962627482693L;
- private String name;
- private String img;
- private String price;
- }
复制代码 8、前后端分离
引入js
修改后的index.html
9、遗留问题
- restHighLevelClient.close(); 引起java.lang.RuntimeException: Request execution cancelled 错误
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |