当前位置：首页 > article >正文

Elasticsearch-索引的批量操作

article 2024/12/27 13:40:22

索引的批量操作

批量查询和批量增删改
- 批量查询

#批量查询
GET product/_search
GET /_mget
{
  "docs": [
    {
      "_index": "product",
      "_id": 2
    },
    {
      "_index": "product",
      "_id": 3
    }
  ]
}

GET product/_mget
{
  "docs": [
    {
      "_id": 2
    },
    {
      "_id": 3
    }
  ]
}
#SELECT * FROM TABLE WHERE id in()
GET product/_mget
{
  "ids": [
    2,
    3,
    4
  ]
}

GET product/_mget
{
  "docs": [
    {
      "_id": 2,
      "_source": [
        "name",
        "price"
      ]
    },
    {
      "_id": 3,
      "_source": {
        "include": [
          "name",
          "price"
        ],
        "exclude": [
          "price",
          "type"
        ]
      }
    }
  ]
}

GET /_mget
```

批量写入：

POST /_bulk
POST /<index>/_bulk
{"action": {"metadata"}}
{"data"}

POST /_bulk
{ "create": { "_index": "product2",  "_id": "2" }}
{ "name":    "_bulk create 2" }
{ "create": { "_index": "product2",  "_id": "12" }}
{ "name":    "_bulk create 12" }
{ "index":  { "_index": "product2",  "_id": "3" }}
{ "name":    "index product2 "}
{ "index":  { "_index": "product2",  "_id": "13" }}
{ "name":    "index product2" }
{ "update": { "_index": "product2",  "_id": "4","retry_on_conflict" : "3"} }
{ "doc" : {"test_field2" : "bulk test1"} }

#加?filter_path=items.*.error  只显示失败的
POST /_bulk?filter_path=items.*.error
{ "delete": { "_index": "product2",  "_id": "1" }}
{ "create": { "_index": "product2",  "_id": "23" }}
{ "name":    "_bulk create 2" }
{ "create": { "_index": "product2",  "_id": "123" }}
{ "name":    "_bulk create 12" }
{ "index":  { "_index": "product2",  "_id": "3" }}
{ "name":    "index product2 " }
{ "index":  { "_index": "product2",  "_id": "13" }}
{ "name":    "index product2" }
{ "update": { "_index": "product2",  "_id": "4","retry_on_conflict" : "3"} }
{ "doc" : {"test_field2" : "bulk test1"} }

注意：

    bulk api对json的语法有严格的要求，除了delete外，每一个操作都要两个json串（metadata和business data），且每个json串内不能换行，非同一个json串必须换行，否则会报错；

    bulk操作中，任意一个操作失败，是不会影响其他的操作的，但是在返回结果里，会告诉你异常日志

索引的操作类型
- create：如果在PUT数据的时候当前数据已经存在，则数据会被覆盖，如果在PUT的时候加上操作类型create，此时如果数据已存在则会返回失败，因为已经强制指定了操作类型为create，ES就不会再去执行update操作。比如：PUT /pruduct/_create/1/ （老版本的语法为 PUT /pruduct/_doc/1/_create ）指的就是在索引product中强制创建id为1的数据，如果id为1的数据已存在，则返回失败。
- delete：删除文档，ES对文档的删除是懒删除机制，即标记删除。（lazy delete原理）
- index：在ES中，写入操作被称为Index，这里Index为动词，即索引数据为将数据创建在ES中的索引，写入数据亦可称之为“索引数据”。可以是创建，也可以是全量替换
- update：执行partial update（全量替换，部分替换）
  
  以上四种操作类型均为写操作。ES中的数据写入均发生在Primary Shard，当数据在Primary写入完成之后会同步到相应的Replica Shard。ES的数据写入有两种方式：单个数据写入和批量写入，ES为批量写入数据提供了特有的API：_bulk。底层原理在我的《Elasticsearch底层原理》有详细介绍
优缺点
- 优点：相较于普通的Json格式的数据操作，不会产生额外的内存消耗，性能更好，常用于大数据量的批量写入
- 缺点：可读性差，可能会没有智能提示。
使用场景

大数据量的批量操作，比如数据从MySQL中一次性写入ES，批量写入减少了对es的请求次数，降低了内存开销以及对线程的占用。