当前位置：首页 > article >正文

ElasticSearch进阶

article 2025/2/11 7:50:59

两种检索方式

Query DSL

match_all

match

match_phrase

multi_match

bool

filter

term & .keyword

aggregations

两种检索方式

URL+检索参数

GET /bank/_search?q=*&sort=account_number:asc

URL+请求体

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "account_number": "asc"
    }
  ]
}

hits:检索结果

hits.hits --搜索结果数组

Query DSL

Domain Specific Language ——是Elasticsearch中用于构建复杂查询的JSON格式语言。

基本结构

{

        QUERY_NAME:{

ARGUMENT:VALUE,

ARGUMENT:VALUE,...

}

}

针对某字段时

{

        QUERY_NAME:{

FIELD_NAME:{

        ARGUMENT:VALUE,

        ARGUMENT:VALUE,...

}

}

}

match_all

 GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "account_number": {
        "order": "asc"
      }
    }
  ],
  "from": 0,
  "size": 5,
  "_source": ["account_number","balance"]
}

match

仅支持单个字段feild，全文检索，分词匹配，倒排索引

GET /bank/_search
{
  "query": {
    "match": {
      "address": "mill lane"
    }
  },
  "_source": ["account_number","address"]
}

match_phrase

短语匹配，不分词，将检索条件当作一个完整的单词

GET /bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill lane"
    }
  },
  "_source": ["account_number","address"]
}

multi_match

多字段匹配，分词

GET /bank/_search
{
  "query": {
    "multi_match": {
      "query": "mill Movico", 
      "fields": ["address","city"]
    }
  },
  "_source": ["account_number","address","city"]
}

bool

复合查询，可以合并其他查询语句

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "gender": "M"
        }},
        {"match": {
          "address": "mill"
        }}
      ],
      "must_not": [
        {"match": {
          "age": "28"
        }}
      ],
      "should": [
        {"match": {
          "firstname": "winnie"
        }}
      ]
    }
  }
}

must：必须符合列举的所有条件

must_not:必须不符合

should：可以符合也可以不符合列举的条件---影响相关性得分

filter

不产生分数的查询条件，相当于不加分的must

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match": {
            "address": "mill"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "age": "28"
          }
        }
      ],
      "should": [
        {
          "match": {
            "firstname": "winnie"
          }
        }
      ],
      "filter": {
        "range": {
          "balance": {
            "gte": 40000,
            "lte": 50000
          }
        }
      }
    }
  }
}

filter引入后，对比引入前，命中结果减少，但相关性得分不变

term & .keyword

精确匹配，直接匹配字段的原始值，不进行任何分词或分析。

适用于非文本字段，比match稍快

“由于ES在保存text字段时，会进行分词，用term去精确匹配一个完整text是非常困难的”

非文本字段用term

GET /bank/_search
{
  "query": {
    "term": {
      "account_number": {
        "value": "136"
      }
    }
  }
}

文本字段的精确匹配用 .keyword

GET /bank/_search
{
  "query": {
      "match": {
        "address.keyword": "198 Mill Lane"
      }
    }
  }
}

aggregations

执行聚合，用于对数据进行统计分析和分组。类似于 SQL 中的 GROUP BY 和聚合函数（如 SUM、AVG、COUNT 等）。

Bucket Aggregations（桶聚合），将doc分到不同的桶中，每个桶代表一个分组。
Metric Aggregations（指标聚合），统计，如总和、平均值、最大值、最小值等。
Pipeline Aggregations（管道聚合），对其他聚合的结果进行二次计算。

GET /bank/_search
{
  "query": {
      "match_all": {}
    },
    "size": 0, 
    "aggs": {
      "balanceAvg":{
        "avg": {
          "field": "balance"
        }
      },
      "ageAgg": {
        "terms": {
          "field": "age",
          "size": 10
        },
        "aggs": {
          "balanceAvg":{
            "avg": {
              "field": "balance"
            }
          },
          "genderAgg": {
            "terms": {
              "field": "gender.keyword",
              "size": 10
            }, 
            "aggs": {
              "balanceAvg": {
                "avg": {
                  "field": "balance"
                }
              }
            }
          }
        }
      }
    }
}