elasticsearch 环境搭建和基本操作
参考资料
-
适合后端编程人员的elasticsearch快速实战教程
-
ElasticSearch最新实战教程
-
ElasticSearch配套笔记
-
自制搜索引擎
-
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/setup.html
restful风格的api
REST 设计风格
例如以下springboot示例
@RestController
@RequestMapping("rest")
public class ResuFulController{
@GetMapping("getOne/{id}/{name}")
public User getOne(@PathVariable("id") String id, @PathVariable("name") String name){
System.out.println("id: "+id);
return new User("demo",12);
}
}
REST:表现层状态转化(Representational State Transfer),如果一个架构符合REST原则,就称它为RESTful架构风格。
-
资源:就是网络上的一个实体
-
表现层:资源具体呈现出来的形式,叫做"表现层"(Representation)。
-
状态转化:客户端想要操作服务器,必须通过某种手段让服务器端发生状态转化(State Transfer)。而这种转化是建立在表现层之上的,所以就是"表现层状态转化"。
REST原则就是指一个URL代表一个唯一资源,并且通过HTTP协议里面四个动词:GET、POST、PUT、DELETE对应四种服务器端的基本操作:
- GET用来获取资源
- POST用来添加资源(也可以用于更新资源)
- PUT用来更新资源
- DELETE用来删除资源
全文检索
全文检索是计算机程序通过扫描文章中的每一个词,对每一个词建立一个索引,指明该词在文章中出现的次数和位置。当用户查询时根据建立的索引查找。
全文检索以文本作为检索对象,找出含有指定词汇的文本。全面、准确和快速是衡量全文检索系统的关键指标。
- 只处理文本
- 不处理语义
- 搜索时英文不区分大小写
- 结果列表有相关度排序
ElasticSearch简称ES,是基于Apache Lucenei构建的开源搜索引睾。Lucene本身就可以被认为迄今为止性能最好的一款开源搜索引擎工具包,但是lucene的PI相对复杂,需要深厚的搜索理论。ES是采用java语言编写,提供了简单易用的RestFul API,开发者可以使用其简单的RestFul API,开发相关的搜索功能,从而避免lucenel的复杂性。
ES主要以轻量级JSON作为数据存储格式,与MongoDB有点类似,但它在读写性能上优于MongoDB。同时也支持地理位置查询,还方便地理位置和文本混合查询。以及统计、日志类数据存储和分析、可视化
ES stack的安装
https://www.elastic.co/guide/en/elasticsearch/reference/8.6/setup.html
官方提供了不同的安装方式
- 下载压缩包安装,windows的zip包
- 包管理器安装
- docker安装
需要提前配置好java环境
安装elasticsearch
使用rpm安装8.6.2版本的目录结构如下
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.6.2-x86_64.rpm
$ sudo rpm --install elasticsearch-8.6.2-x86_64.rpm
$ rpm -ql elasticsearch
/etc/elasticsearch
/etc/elasticsearch/elasticsearch-plugins.example.yml
/etc/elasticsearch/elasticsearch.yml
/etc/elasticsearch/jvm.options
/etc/elasticsearch/jvm.options.d
/etc/elasticsearch/log4j2.properties
/etc/elasticsearch/role_mapping.yml
/etc/elasticsearch/roles.yml
/etc/elasticsearch/users
/etc/elasticsearch/users_roles
/etc/sysconfig/elasticsearch
/usr/lib/sysctl.d/elasticsearch.conf
/usr/lib/systemd/system/elasticsearch.service
/usr/share/elasticsearch/bin
/usr/share/elasticsearch/bin/elasticsearch
...
/usr/share/elasticsearch/lib/...
/usr/share/elasticsearch/jdk/...
/usr/share/elasticsearch/modules/...
...
/usr/share/elasticsearch/plugins
/var/lib/elasticsearch
/var/log/elasticsearch
# 卸载
$ sudo rpm -e elasticsearch
目前es版本已经更新到7和8,和视频版本一致这里使用压缩包安装elasticsearch6.8.0版本
elasticsearch发行版本列表,https://www.elastic.co/cn/downloads/past-releases#elasticsearch
Elasticsearch 6.8.0,https://www.elastic.co/cn/downloads/past-releases/elasticsearch-6-8-0
注意:es无法在root用户下运行
解压和安装
tar -zxvf elasticsearch-6.8.0.tar.gz
目录结构,可见和rpm包的目录一致
- bin 可执行的二进制文件的目录
- config 配置文件的目录
- lib 运行时依赖的库
- logs 运行时日志文件
- modules 运行时依赖的模块
- plugins 可以安装官方以及第三方插件
启动服务
./bin/elasticsearch
es并没有提供webui界面,测试启动返回json信息
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vOPxXnrT-1679215134302)(assets/image-20230319123324540.png)]
$ curl http://localhost:9200
{
"name" : "9cWvHvU",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "TLtGg3X5QjiK4-5-7XVBMg",
"version" : {
"number" : "6.8.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "65b6179",
"build_date" : "2019-05-15T20:06:13.172855Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
为了公开访问需要调整配置文件的监听端口为0.0.0.0
$ vim config/elasticsearch.yml
network.host: 0.0.0.0
生产环境的资源要求较高,重启可能会出现的报错如下
[1]: max file descriptiors [4096] for elasticsearch process is too low, increase to as least [65535]
[2]: max numer of threads [3082] for user [ec2-user] is too low, increase to as least [4096]
[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
错误1,修改配置文件,对当前登录的用户生效,重新登录使配置生效
$ sudo vim /etc/security/limits.conf
#This file sets the resource limits for the users logged in via PAM.
#It does not affect resource limits of the system services.
* soft nofile 65536
* hard nofile 65536
* soft nproc 4096
* hard nproc 4096
错误2,修改系统配置
$ sudo vim /etc/security/limits.d/20-nproc.conf
ec2-user soft nproc 4096
错误3,修改系统配置
$ sudo vim /etc/sysctl.conf
vm.max_map_count=655360
$ sudo sysctl -p
fs.inotify.max_user_watches = 524288
vm.max_map_count = 655360
测试远程访问
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-J4AUXI1R-1679215134304)(assets/image-20230319125149742.png)]
安装kibana
安装步骤和elasticsearch相同,和es版本保持一致。感觉类似于grafana和prometheus的关系
下载地址,https://www.elastic.co/cn/downloads/past-releases/kibana-6-8-0
安装
tar -xzvf kibana-6.8.0-linux-x86_64.tar.gz
修改配置文件,指向es主机地址
$ vim kibana-6.8.0-linux-x86_64/config/kibana.yml
server.host: "0.0.0.0" # kibana服务器主机地址
elasticsearch.hosts: ["http://127.0.0.1:9200"] # ES服务器地址
启动
./bin/kibana
访问
http://127.0.0.1:5601/app/kibana
基本概念
-
接近实时(NRT Near Real Time ),从索引一个文档直到这个文档能够被搜索到有一个轻微的延迟(通常是1秒内)
-
索引(index),索引就是拥有相似特征的文档的集合。一个索引由一个名字来标识(必须全部是小写字母)。在集群中可以定义任意多的索引。
-
类型(type),在一个索引中可以定义一种或多种类型。类型是索引的一个逻辑上的分类/分区,其语义完全由用户确定。
-
映射(Mapping),类似于传统关系型数据中table的schema,用于定义一个索引(index)中的类型(type)的数据的结构。可以手动创建type(相当于table)和mapping(相关与schema)。在默认配置下,ES可以根据插入的数据自动地创建type及其mapping。 mapping中主要包括字段名、字段数据类型和字段索引类型
-
文档(document),文档是一个可被索引的基础信息单元,类似于表中的一条记录(json)
以下是5.x版本和6.x版本的模型如下
NOTE: 在5.x版本以前可以在一个索引中定义多个类型,6.x之后版本也可以使用但是不推荐,在7~8.x版本中彻底移除一个索引中创建多个类型
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kJ6PkrJG-1679215134306)(assets/image-20230319141929186.png)]
创建测试数据集
官方提供了示例数据,加载示例数据的方式
wget https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz
wget https://download.elastic.co/demos/kibana/gettingstarted/accounts.zip
wget https://download.elastic.co/demos/kibana/gettingstarted/shakespeare_6.0.json
数据集创建
PUT /shakespeare
{
"mappings": {
"doc": {
"properties": {
"speaker": {"type": "keyword"},
"play_name": {"type": "keyword"},
"line_id": {"type": "integer"},
"speech_number": {"type": "integer"}
}
}
}
}
PUT /bank
{
"mappings": {
"account": {
"properties": {
"account_number": {"type": "integer"},
"balance": {"type": "integer"},
"firstname": {"type": "keyword"},
"lastname": {"type": "keyword"},
"age": {"type": "integer"},
"gender": {"type": "keyword"},
"address": {"type": "text"},
"employer": {"type": "keyword"},
"email": {"type": "keyword"},
"city": {"type": "keyword"},
"state": {"type": "keyword"}
}
}
}
}
查看索引
GET /_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 yLxMUBLfT3GHGnVuvhXbJg 1 0 4 0 14.5kb 14.5kb
yellow open bank jJujmI8WQnGgKbtutfhkvw 5 1 1000 0 327.2kb 327.2kb
green open .kibana_task_manager 5itOVO27SIuSc9qE73dabw 1 0 2 0 12.6kb 12.6kb
yellow open shakespeare Mv98eNoUTpaGIBJXkBmR3Q 5 1 111396 0 21.8mb 21.8mb
查看数据文档
$ curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json
GET /shakespeare/doc/_search
{
"query": {
"match_all": {}
}
}
$ curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
GET /bank/account/_search
{
"query": {
"match_all": {}
}
}
简单的测试数据集
PUT /ems
{
"mappings":{
"emp":{
"properties":{
"name":{ "type":"text" },
"age":{ "type":"integer" },
"birth":{ "type":"date" },
"content":{ "type":"text" },
"address":{ "type":"keyword" }
}
}
}
}
PUT /ems/emp/_bulk
{"index":{}}
{"name":"apple","age":23,"bir":"2019-8-22","content":"Choosing a good MVC framework for the development team is a difficult task, and it takes a high level of experience to choose among the many possible solutions","address":"beijing"}
{"index":{}}
{"name":"banana","age":24,"bir":"2021-6-6","content":"The Spring framework is a layered architecture consisting of seven well-defined modules. The Spring module is built on top of a core container that defines how beans are created, configured, and managed","address":"shandong"}
{"index":{}}
{"name":"cat","age":8,"bir":"2023-1-16","content":"As the micro-services framework of the Java language, Spring Cloud relies on Spring Boot for fast development, continuous delivery, and easy deployment. Spring Cloud has a lot of components covering all aspects of micro-services, and it's getting better and better with the open source community of Spring and companies like Netflix and Pivotal","address":"shanghai"}
{"index":{}}
{"name":"dog","age":9,"bir":"2022-9-24","content":"Spring's goal is to simplify Java development in all its aspects. This leads to more explanations of how Spring simplifies Java Development?","address":"nanjing"}
{"index":{}}
{"name":"ears","age":43,"bir":"2018-12-12","content":"Redis is an open-source, web-enabled, memory-based, persistent, Key-Value database written in ANSI C, and provides apis in multiple languages","address":"hangzhou"}
{"index":{}}
{"name":"fat","age":59,"bir":"2017-12-23","content":"Elastic search is a Lucene-based search server. It provides a distributed, multi-user, full-text search engine based on the RESTful web interface","address":"fujian"}
GET /ems/emp/_search
{
"query": {
"match_all": {}
}
}
基本操作
主要是使用devtools来进行前端命令操作
索引(index)操作
index的增删改查
# 创建索引
PUT /indexname/
# 删除索引
DELETE /indexname
# 删除所有索引[ElasticSearch会自带两个索引,如果删除kibana客户端将不能使用,只能重启]
DELETE /*
# 查看索引信息
GET /_cat/indices?v
类型(type)操作
创建index指定mapping,要求index不存在
- 对于类型,es7每个索引只有一种类型
- 对于属性类型,可以设置为 text , keyword , date ,integer, long , double , boolean,ip
PUT /book
{
"mappings": {
"musicbook": {
"properties": {
"name":{"type":"keyword"},
"price":{"type":"double"},
"desc":{"type":"text"}
}
}
}
}
output:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "book"
}
查看type
GET /book/_mapping
output:
{
"book" : {
"mappings" : {
"musicbook" : {
"properties" : {
"desc" : {
"type" : "text"
},
"name" : {
"type" : "keyword"
},
"price" : {
"type" : "double"
}
}
}
}
}
}
文档(document)操作
插入文档
PUT /book/musicbook/1
{
"name":"ClassicMusic",
"price":"22.0",
"desc":"This is a book about classic music"
}
output:
{
"_index" : "book",
"_type" : "musicbook",
"_id" : "1",
"_version" : 3,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
查询文档
GET /book/musicBook/1
output:
{
"_index" : "book",
"_type" : "musicbook",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "ClassicMusic",
"price" : "22.0",
"desc" : "This is a book about classic music"
}
}
删除文档
DELETE /book/musicBook/1
更新文档
POST /book/musicbook/1
{
"name":"PopMusic",
"price":"17.0"
}
output:
{
"_index" : "book",
"_type" : "musicbook",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
再次查询文档发现之前的其他字段都被覆盖掉,不想覆盖需要以下方式更新
POST /book/musicbook/1/_update
{
"doc": {
"price":"40"
}
}
批量操作
批量操作只是将多个操作写在一条命令中,并非原子操作,仍旧分步执行,并且不会由于单个命令失败而停止执行后续命令
同时增加两个文档
PUT /book/musicbook/_bulk
{"index":{"_id": "3"}}
{ "name":"America's Music", "price":"57","des":"An ear-opening exploration of music's New World, from Puritan psalmody to Hamilton"}
{"index":{"_id": "4"}}
{"name":"Basic Music Theory","price":"48","desc":"The text covers various concepts in music theory, some of which are fundamental, and others are advanced and complex"}
同时修改和删除文档
PUT /book/musicbook/_bulk
{"update":{"_id":"3"}}
{"doc":{"name":"after change"}}
{"delete":{"_id":"4"}}
高级检索(Query)
ES官方提供了两中检索方式:
-
通过 URL 参数搜索
GET /索引/类型/_search?参数
-
通过 DSL(Domain Specified Language) 搜索
GET /索引/类型/_search {}
通过URL查询
GET /ems/emp/_search?q=*&sort=age:desc&size=5&from=0&_source=name,age,bir
通过DSL查询
查询所有
GET /ems/emp/_search
{
"query": {
"match_all": {}
}
}
指定返回条数
GET /ems/emp/_search
{
"query": {
"match_all": {}
},
"size": 5
}
分页查询
GET /ems/emp/_search
{
"query": {
"match_all": {}
},
"size": 5,
"from": 0
}
查询字段
GET /ems/emp/_search
{
"query": {
"match_all": {}
},
"_source": ["name","age"]
}
查询关键词
GET /ems/emp/_search
{
"query": {
"term": {
"address": {
"value": "fujian"
}
}
}
}
注意
- 使用term查询,默认使用标准分词器(StandardAnalyzer),标准分词器对于英文单词分词,对于中文单字分词
- keyword , date ,integer, long , double , boolean or ip 这些类型不分词**,**只有text类型分词
范围查询
GET /ems/emp/_search
{
"query": {
"range": {
"age": {
"gte": 5,
"lte": 10
}
}
}
}
前缀查询
GET /ems/emp/_search
{
"query": {
"prefix": {
"name": {
"value": "F"
}
}
}
}
通配符查询
GET /ems/emp/_search
{
"query": {
"wildcard": {
"name": {
"value": "Ca?"
}
}
}
}
多id查询
GET /ems/emp/_search
{
"query": {
"ids": {
"values": ["AlSspHYBh-o7eO8i7bUf","BVSspHYBh-o7eO8i7bUf"]
}
}
}
模糊查询
GET /ems/emp/_search
{
"query": {
"fuzzy": {
"content": "sprin"
}
}
}
布尔查询
bool 关键字: 用来组合多个条件实现复杂查询
- must: 相当于&& 同时成立
- should: 相当于|| 成立一个就行
- must_not: 相当于! 不能满足任何一个
GET /ems/emp/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gte": 5,
"lte": 10
}
}
}
],
"must_not": [
{
"term": {
"address": {
"value": "nanjing"
}
}
}
]
}
}
}
多字段查询
可以指定分词器
GET /dangdang/book/_search
{
"query": {
"query_string": {
"query": "spring",
// "analyzer": "ik_max_word",
"fields": ["name","content"]
}
}
}
需要注意的点
(1)大小写问题
elasticsearch在创建倒排索引时,就已经将大写转为小写,而后写入索引
GET /_analyze
{
"text": "Fat",
"analyzer": "standard"
}
output:
{
"tokens" : [
{
"token" : "fat",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
索引建立过程:分词 -> 语法处理(还原时态等等)-> 排序 -> 创建索引
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/normalizer.html