当前位置：首页 > article >正文

elasticsearch 环境搭建和基本操作

article 2025/1/11 10:07:40

参考资料

适合后端编程人员的elasticsearch快速实战教程
ElasticSearch最新实战教程
ElasticSearch配套笔记
自制搜索引擎
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/setup.html

restful风格的api

REST 设计风格

例如以下springboot示例

@RestController
@RequestMapping("rest")
public class ResuFulController{
    @GetMapping("getOne/{id}/{name}")
    public User getOne(@PathVariable("id") String id, @PathVariable("name") String name){
        System.out.println("id: "+id);
        return new User("demo",12);
    }
}

REST：表现层状态转化（Representational State Transfer），如果一个架构符合REST原则，就称它为RESTful架构风格。

资源：就是网络上的一个实体
表现层：资源具体呈现出来的形式，叫做"表现层"（Representation）。
状态转化：客户端想要操作服务器，必须通过某种手段让服务器端发生状态转化（State Transfer）。而这种转化是建立在表现层之上的，所以就是"表现层状态转化"。

REST原则就是指一个URL代表一个唯一资源，并且通过HTTP协议里面四个动词：GET、POST、PUT、DELETE对应四种服务器端的基本操作：

GET用来获取资源
POST用来添加资源（也可以用于更新资源）
PUT用来更新资源
DELETE用来删除资源

全文检索

全文检索是计算机程序通过扫描文章中的每一个词，对每一个词建立一个索引，指明该词在文章中出现的次数和位置。当用户查询时根据建立的索引查找。

全文检索以文本作为检索对象，找出含有指定词汇的文本。全面、准确和快速是衡量全文检索系统的关键指标。

只处理文本
不处理语义
搜索时英文不区分大小写
结果列表有相关度排序

ElasticSearch简称ES，是基于Apache Lucenei构建的开源搜索引睾。Lucene本身就可以被认为迄今为止性能最好的一款开源搜索引擎工具包，但是lucene的PI相对复杂，需要深厚的搜索理论。ES是采用java语言编写，提供了简单易用的RestFul API，开发者可以使用其简单的RestFul API，开发相关的搜索功能，从而避免lucenel的复杂性。

ES主要以轻量级JSON作为数据存储格式，与MongoDB有点类似，但它在读写性能上优于MongoDB。同时也支持地理位置查询，还方便地理位置和文本混合查询。以及统计、日志类数据存储和分析、可视化

ES stack的安装

https://www.elastic.co/guide/en/elasticsearch/reference/8.6/setup.html

官方提供了不同的安装方式

下载压缩包安装，windows的zip包
包管理器安装
docker安装

需要提前配置好java环境

安装elasticsearch

使用rpm安装8.6.2版本的目录结构如下

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.6.2-x86_64.rpm
$ sudo rpm --install elasticsearch-8.6.2-x86_64.rpm
$ rpm -ql elasticsearch
/etc/elasticsearch
/etc/elasticsearch/elasticsearch-plugins.example.yml
/etc/elasticsearch/elasticsearch.yml
/etc/elasticsearch/jvm.options
/etc/elasticsearch/jvm.options.d
/etc/elasticsearch/log4j2.properties
/etc/elasticsearch/role_mapping.yml
/etc/elasticsearch/roles.yml
/etc/elasticsearch/users
/etc/elasticsearch/users_roles
/etc/sysconfig/elasticsearch
/usr/lib/sysctl.d/elasticsearch.conf
/usr/lib/systemd/system/elasticsearch.service
/usr/share/elasticsearch/bin
/usr/share/elasticsearch/bin/elasticsearch
...
/usr/share/elasticsearch/lib/...
/usr/share/elasticsearch/jdk/...
/usr/share/elasticsearch/modules/...
...
/usr/share/elasticsearch/plugins
/var/lib/elasticsearch
/var/log/elasticsearch

# 卸载
$ sudo rpm -e elasticsearch

目前es版本已经更新到7和8，和视频版本一致这里使用压缩包安装elasticsearch6.8.0版本

elasticsearch发行版本列表，https://www.elastic.co/cn/downloads/past-releases#elasticsearch

Elasticsearch 6.8.0，https://www.elastic.co/cn/downloads/past-releases/elasticsearch-6-8-0

注意：es无法在root用户下运行

解压和安装

tar -zxvf elasticsearch-6.8.0.tar.gz

- bin         可执行的二进制文件的目录
- config      配置文件的目录
- lib         运行时依赖的库
- logs        运行时日志文件
- modules	  运行时依赖的模块
- plugins     可以安装官方以及第三方插件

启动服务

./bin/elasticsearch

es并没有提供webui界面，测试启动返回json信息

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vOPxXnrT-1679215134302)(assets/image-20230319123324540.png)]

$ curl http://localhost:9200
{
  "name" : "9cWvHvU",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "TLtGg3X5QjiK4-5-7XVBMg",
  "version" : {
    "number" : "6.8.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "65b6179",
    "build_date" : "2019-05-15T20:06:13.172855Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

为了公开访问需要调整配置文件的监听端口为0.0.0.0

$ vim config/elasticsearch.yml
network.host: 0.0.0.0

生产环境的资源要求较高，重启可能会出现的报错如下

[1]: max file descriptiors [4096] for elasticsearch process is too low, increase to as least [65535]
[2]: max numer of threads [3082] for user [ec2-user] is too low, increase to as least [4096]
[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

错误1，修改配置文件，对当前登录的用户生效，重新登录使配置生效

$ sudo vim /etc/security/limits.conf
#This file sets the resource limits for the users logged in via PAM.
#It does not affect resource limits of the system services.
*               soft    nofile            65536
*               hard	nofile            65536
*               soft	nproc             4096
*               hard	nproc             4096

错误2，修改系统配置

$ sudo vim /etc/security/limits.d/20-nproc.conf
ec2-user soft nproc 4096

错误3，修改系统配置

$ sudo vim /etc/sysctl.conf
vm.max_map_count=655360
$ sudo sysctl -p
fs.inotify.max_user_watches = 524288
vm.max_map_count = 655360

测试远程访问

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-J4AUXI1R-1679215134304)(assets/image-20230319125149742.png)]

安装kibana

安装步骤和elasticsearch相同，和es版本保持一致。感觉类似于grafana和prometheus的关系

下载地址，https://www.elastic.co/cn/downloads/past-releases/kibana-6-8-0

安装

tar -xzvf kibana-6.8.0-linux-x86_64.tar.gz

修改配置文件，指向es主机地址

$ vim kibana-6.8.0-linux-x86_64/config/kibana.yml
server.host: "0.0.0.0"                 		 # kibana服务器主机地址
elasticsearch.hosts: ["http://127.0.0.1:9200"]   # ES服务器地址

启动

./bin/kibana

访问

http://127.0.0.1:5601/app/kibana

基本概念

接近实时(NRT Near Real Time )，从索引一个文档直到这个文档能够被搜索到有一个轻微的延迟(通常是1秒内)
索引(index)，索引就是拥有相似特征的文档的集合。一个索引由一个名字来标识(必须全部是小写字母)。在集群中可以定义任意多的索引。
类型(type)，在一个索引中可以定义一种或多种类型。类型是索引的一个逻辑上的分类/分区，其语义完全由用户确定。
映射(Mapping)，类似于传统关系型数据中table的schema，用于定义一个索引(index)中的类型(type)的数据的结构。可以手动创建type(相当于table)和mapping(相关与schema)。在默认配置下，ES可以根据插入的数据自动地创建type及其mapping。 mapping中主要包括字段名、字段数据类型和字段索引类型
文档(document)，文档是一个可被索引的基础信息单元，类似于表中的一条记录（json）

以下是5.x版本和6.x版本的模型如下

NOTE: 在5.x版本以前可以在一个索引中定义多个类型，6.x之后版本也可以使用但是不推荐，在7~8.x版本中彻底移除一个索引中创建多个类型

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kJ6PkrJG-1679215134306)(assets/image-20230319141929186.png)]

创建测试数据集

官方提供了示例数据，加载示例数据的方式

wget https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz
wget https://download.elastic.co/demos/kibana/gettingstarted/accounts.zip
wget https://download.elastic.co/demos/kibana/gettingstarted/shakespeare_6.0.json

数据集创建

PUT /shakespeare
{
 "mappings": {
  "doc": {
   "properties": {
    "speaker": {"type": "keyword"},
    "play_name": {"type": "keyword"},
    "line_id": {"type": "integer"},
    "speech_number": {"type": "integer"}
   }
  }
 }
}

PUT /bank
{
 "mappings": {
  "account": {
   "properties": {
    "account_number": {"type": "integer"},
    "balance": {"type": "integer"},
    "firstname": {"type": "keyword"},
    "lastname": {"type": "keyword"},
    "age": {"type": "integer"},
    "gender": {"type": "keyword"},
    "address": {"type": "text"},
    "employer": {"type": "keyword"},
    "email": {"type": "keyword"},
    "city": {"type": "keyword"},
    "state": {"type": "keyword"}
   }
  }
 }
}

查看索引

GET /_cat/indices?v

health status index                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_1            yLxMUBLfT3GHGnVuvhXbJg   1   0          4            0     14.5kb         14.5kb
yellow open   bank                 jJujmI8WQnGgKbtutfhkvw   5   1       1000            0    327.2kb        327.2kb
green  open   .kibana_task_manager 5itOVO27SIuSc9qE73dabw   1   0          2            0     12.6kb         12.6kb
yellow open   shakespeare          Mv98eNoUTpaGIBJXkBmR3Q   5   1     111396            0     21.8mb         21.8mb

查看数据文档

$ curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json

GET /shakespeare/doc/_search
{
  "query": {
    "match_all": {}
  }
}

$ curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json

GET /bank/account/_search
{
  "query": {
    "match_all": {}
  }
}

简单的测试数据集

PUT /ems
{
  "mappings":{
    "emp":{
      "properties":{
        "name":{ "type":"text" },
        "age":{ "type":"integer" },
        "birth":{ "type":"date" },
        "content":{ "type":"text" },
        "address":{ "type":"keyword" }
      }
    }
  }
}

PUT /ems/emp/_bulk
{"index":{}}
  {"name":"apple","age":23,"bir":"2019-8-22","content":"Choosing a good MVC framework for the development team is a difficult task, and it takes a high level of experience to choose among the many possible solutions","address":"beijing"}
{"index":{}}
  {"name":"banana","age":24,"bir":"2021-6-6","content":"The Spring framework is a layered architecture consisting of seven well-defined modules. The Spring module is built on top of a core container that defines how beans are created, configured, and managed","address":"shandong"}
{"index":{}}
  {"name":"cat","age":8,"bir":"2023-1-16","content":"As the micro-services framework of the Java language, Spring Cloud relies on Spring Boot for fast development, continuous delivery, and easy deployment. Spring Cloud has a lot of components covering all aspects of micro-services, and it's getting better and better with the open source community of Spring and companies like Netflix and Pivotal","address":"shanghai"}
{"index":{}}
  {"name":"dog","age":9,"bir":"2022-9-24","content":"Spring's goal is to simplify Java development in all its aspects. This leads to more explanations of how Spring simplifies Java Development?","address":"nanjing"}
{"index":{}}
  {"name":"ears","age":43,"bir":"2018-12-12","content":"Redis is an open-source, web-enabled, memory-based, persistent, Key-Value database written in ANSI C, and provides apis in multiple languages","address":"hangzhou"}
{"index":{}}
  {"name":"fat","age":59,"bir":"2017-12-23","content":"Elastic search is a Lucene-based search server. It provides a distributed, multi-user, full-text search engine based on the RESTful web interface","address":"fujian"}


GET /ems/emp/_search
{
  "query": {
    "match_all": {}
  }
}

基本操作

主要是使用devtools来进行前端命令操作

索引（index）操作

index的增删改查

# 创建索引
PUT /indexname/ 
# 删除索引
DELETE /indexname
# 删除所有索引[ElasticSearch会自带两个索引,如果删除kibana客户端将不能使用,只能重启]
DELETE /*	
# 查看索引信息
GET /_cat/indices?v

类型（type）操作

创建index指定mapping，要求index不存在

对于类型，es7每个索引只有一种类型
对于属性类型，可以设置为 text , keyword , date ,integer, long , double , boolean，ip

PUT /book
{
  "mappings": {
    "musicbook": { 
      "properties": {
        "name":{"type":"keyword"},
        "price":{"type":"double"},
        "desc":{"type":"text"}
      }
    }
  }
}
output:
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "book"
}

查看type

GET /book/_mapping
output:
{
  "book" : {
    "mappings" : {
      "musicbook" : {
        "properties" : {
          "desc" : {
            "type" : "text"
          },
          "name" : {
            "type" : "keyword"
          },
          "price" : {
            "type" : "double"
          }
        }
      }
    }
  }
}

文档（document）操作

插入文档

PUT /book/musicbook/1
{
  "name":"ClassicMusic",
  "price":"22.0",
  "desc":"This is a book about classic music"
}
output:
{
  "_index" : "book",
  "_type" : "musicbook",
  "_id" : "1",
  "_version" : 3,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

查询文档

GET /book/musicBook/1
output:
{
  "_index" : "book",
  "_type" : "musicbook",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "ClassicMusic",
    "price" : "22.0",
    "desc" : "This is a book about classic music"
  }
}

删除文档

DELETE /book/musicBook/1

更新文档

POST /book/musicbook/1
{
  "name":"PopMusic",  
  "price":"17.0"
}
output:
{
  "_index" : "book",
  "_type" : "musicbook",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

再次查询文档发现之前的其他字段都被覆盖掉，不想覆盖需要以下方式更新

POST /book/musicbook/1/_update
{
  "doc": {
    "price":"40"
  }
}

批量操作

批量操作只是将多个操作写在一条命令中，并非原子操作，仍旧分步执行，并且不会由于单个命令失败而停止执行后续命令

同时增加两个文档

PUT /book/musicbook/_bulk
{"index":{"_id": "3"}}
  { "name":"America's Music", "price":"57","des":"An ear-opening exploration of music's New World, from Puritan psalmody to Hamilton"}
{"index":{"_id": "4"}}
  {"name":"Basic Music Theory","price":"48","desc":"The text covers various concepts in music theory, some of which are fundamental, and others are advanced and complex"}

同时修改和删除文档

PUT /book/musicbook/_bulk
{"update":{"_id":"3"}}
	{"doc":{"name":"after change"}}
{"delete":{"_id":"4"}}

高级检索（Query）

ES官方提供了两中检索方式:

通过 URL 参数搜索
```
GET /索引/类型/_search?参数
```
通过 DSL(Domain Specified Language) 搜索
```
GET /索引/类型/_search {}
```

通过URL查询

GET /ems/emp/_search?q=*&sort=age:desc&size=5&from=0&_source=name,age,bir

通过DSL查询

查询所有

GET /ems/emp/_search
{
  "query": {
    "match_all": {}
  }
}

指定返回条数

GET /ems/emp/_search
{
  "query": {
    "match_all": {}
  },
  "size": 5
}

分页查询

GET /ems/emp/_search
{
  "query": {
    "match_all": {}
  },
  "size": 5,
  "from": 0
}

查询字段

GET /ems/emp/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["name","age"]
}

查询关键词

GET /ems/emp/_search
{
  "query": {
    "term": {
      "address": {
        "value": "fujian"
      }
    }
  }
}

注意

使用term查询，默认使用标准分词器(StandardAnalyzer)，标准分词器对于英文单词分词，对于中文单字分词
keyword , date ,integer, long , double , boolean or ip 这些类型不分词**，**只有text类型分词

范围查询

GET /ems/emp/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 5,
        "lte": 10
      }
    }
  }
}

前缀查询

GET /ems/emp/_search
{
  "query": {
    "prefix": {
      "name": {
        "value": "F"
      }
    }
  }
}

通配符查询

GET /ems/emp/_search
{
  "query": {
    "wildcard": {
      "name": {
        "value": "Ca?"
      }
    }
  }
}

多id查询

GET /ems/emp/_search
{
  "query": {
    "ids": {
      "values": ["AlSspHYBh-o7eO8i7bUf","BVSspHYBh-o7eO8i7bUf"]
    }
  }
}

模糊查询

GET /ems/emp/_search
{
  "query": {
      "fuzzy": {
        "content": "sprin"
      }
  }
}

布尔查询

bool 关键字: 用来组合多个条件实现复杂查询

must: 相当于&& 同时成立
should: 相当于|| 成立一个就行
must_not: 相当于! 不能满足任何一个

GET /ems/emp/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "age": {
              "gte": 5,
              "lte": 10
            }
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "address": {
              "value": "nanjing"
            }
          }
        }
      ]
    }
  }
}

多字段查询

可以指定分词器

GET /dangdang/book/_search
{
  "query": {
    "query_string": {
      "query": "spring",
      // "analyzer": "ik_max_word", 
      "fields": ["name","content"]
    }
  }
}

需要注意的点

（1）大小写问题

elasticsearch在创建倒排索引时，就已经将大写转为小写，而后写入索引

GET /_analyze
{
  "text": "Fat",
  "analyzer": "standard"
}
output:
{
  "tokens" : [
    {
      "token" : "fat",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}