当前位置：首页 > article >正文

SQL优化 | OceanBase是否遵循最左匹配原则？(三)

article 2025/3/31 10:52:48

1、问题背景

在MySQL数据库中，如果不满足最左匹配原则，索引会失效。那么在OceanBase数据库中：

是否有同样的问题？
如果不满足最左匹配原则，如何走上索引？

创建测试表和联合索引：

-- 创建测试表
CREATE TABLE tt (
    c1 INT,
    c2 INT,
    c3 INT
);
-- 添加联合索引
CREATE INDEX idx_tt ON tt(c2,c3);

2、模拟不同查询场景并查看执行计划

测试版本：oceanbase 4.2.1.3

2.1 满足最左匹配原则的查询

// 场景1 全字段匹配，符合索引前缀原则，可以使用索引
mysql> EXPLAIN BASIC SELECT * FROM tt  where  c2=1  and  c3=2;
+--------------------------------------------------------------------------+
| Query Plan                                                               |
+--------------------------------------------------------------------------+
| ===========================================================
|
| |ID|OPERATOR        |NAME      |
| --------------------------------
| |0 |TABLE RANGE SCAN|tt(idx_tt)|
| ===========================================================
|
| Outputs & filters:
| -------------------------------------
|   0 - output([tt.c1], [tt.c2], [tt.c3]), filter(nil), rowset=16
|       access([tt.__pk_increment], [tt.c2], [tt.c3], [tt.c1]), partitions(p0)
|       is_index_back=true, is_global_index=false,
|       range_key([tt.c2], [tt.c3], [tt.__pk_increment]), range(1,2,MIN ; 1,2,MAX),
|       range_cond([tt.c2 = 1], [tt.c3 = 2])
+--------------------------------------------------------------------------+
12 rows in set (0.00 sec)
// 场景2 过滤条件是c2，符合索引前缀原则，可以使用索引
mysql> EXPLAIN BASIC SELECT * FROM tt  where  c2=1;
+--------------------------------------------------------------------------+
| Query Plan                                                               |
+--------------------------------------------------------------------------+
| ===========================================================
    |
| |ID|OPERATOR        |NAME      |
| --------------------------------
| |0 |TABLE RANGE SCAN|tt(idx_tt)|
| ===========================================================
    |
| Outputs & filters:
| -------------------------------------
|   0 - output([tt.c1], [tt.c2], [tt.c3]), filter(nil), rowset=16
|       access([tt.__pk_increment], [tt.c2], [tt.c1], [tt.c3]), partitions(p0)
|       is_index_back=true, is_global_index=false,
|       range_key([tt.c2], [tt.c3], [tt.__pk_increment]), range(1,MIN,MIN ; 1,MAX,MAX),
|       range_cond([tt.c2 = 1])
+--------------------------------------------------------------------------+
12 rows in set (0.01 sec)

2.2 不满足最左匹配原则的查询

// 场景3 过滤条件是c3，不符合索引前缀原则，不能使用索引
mysql> EXPLAIN BASIC SELECT * FROM tt  where  c3=2;
+--------------------------------------------------------------------------+
| Query Plan                                                               |
+--------------------------------------------------------------------------+
| =================================================
| |ID|OPERATOR       |NAME|
| -------------------------
| |0 |TABLE FULL SCAN|tt  |
| =================================================
| Outputs & filters:
| -------------------------------------
|   0 - output([tt.c1], [tt.c2], [tt.c3]), filter([tt.c3 = 2]), rowset=16
|       access([tt.c3], [tt.c1], [tt.c2]), partitions(p0)
|       is_index_back=false, is_global_index=false, filter_before_index_back[false],
|       range_key([tt.__pk_increment]), range(MIN ; MAX)always true
+--------------------------------------------------------------------------+
11 rows in set (0.00 sec)

结论：OceanBase数据库同样需要遵循最左匹配原则。

3、优化思路

3.1 Index Skip Scan

在OceanBase数据库中，对于不满足最左匹配原则的场景，有相应的优化手段。OceanBase从4.1版本开始，支持Index Skip Scan的能力，可以通过Hint使用该能力，使不满足最左匹配原则的查询也能走上索引。

测试版本：oceanbase 4.2.1.3

// 场景4 使用Index Skip Scan能力
mysql> EXPLAIN BASIC SELECT /*+ index_ss(tt idx_tt) */ * FROM tt  where  c3=2;
+--------------------------------------------------------------------------+
| Query Plan                                                               |
+--------------------------------------------------------------------------+
| ==========================================================
               |
| |ID|OPERATOR       |NAME      |
| -------------------------------
| |0 |TABLE SKIP SCAN|tt(idx_tt)|
| ==========================================================
               |
| Outputs & filters:
| -------------------------------------
|   0 - output([tt.c1], [tt.c2], [tt.c3]), filter([tt.c3 = 2]), rowset=16
|       access([tt.__pk_increment], [tt.c3], [tt.c1], [tt.c2]), partitions(p0)
|       is_index_back=true, is_global_index=false, filter_before_index_back[true],
|       range_key([tt.c2], [tt.c3], [tt.__pk_increment]), range(MIN,MIN,MIN ; MAX,MAX,MAX)always true
|       prefix_columns_cnt = 1 , skip_scan_range(2,MIN ; 2,MAX)
+--------------------------------------------------------------------------+
12 rows in set (0.01 sec)

注意：index_ss是固定写法，不能修改。

可以看到，执行计划从TABLE RANGE SCAN变为TABLE SKIP SCAN，TABLE SKIP SCAN虽然属于范围扫描，但采用了一种更优化的扫描方式，能在不满足最左匹配的条件下利用索引。

如果希望通过index_ss这个Hint来使用Index Skip Scan进行优化，需要关注表中列的NDV（Number of Distinct Values，不同值的数量）。
从索引的实现原理上来说，当表中存在一个联合索引idx (c2, c3)，并且查询条件是where c3 = ?且c2列的NDV远小于c3列的NDV时，适合使用Index Skip Scan。

3.2 Index Skip Scan 优化触发条件

表上有收集过统计信息。
查询条件中包含联合索引的后缀列，并且不是其他索引最左前缀。
优化器比较 TABLE SKIP SCAN 和全表扫描的代价，发现 TABLE SKIP SCAN 代价更低

3.3 不同版本是否全表扫描，执行计划算子的差异

可以查看3.x版本执行计划中OPERATOR字段,对比4.x版本的区别：

测试版本：oceanbase 3.2.3.3

// 使用索引
MySQL [wms]> EXPLAIN BASIC SELECT * FROM tt  where  c2=1  and  c3=2\G;
*************************** 1. row ***************************
Query Plan:
================================================
|ID|OPERATOR  |NAME      |
--------------------------
|0 |TABLE SCAN|tt(idx_tt)|
================================================
Outputs & filters:
-------------------------------------
  0 - output([tt.c1], [tt.c2], [tt.c3]), filter(nil),
      access([tt.c2], [tt.c3], [tt.c1]), partitions(p0)
1 row in set (0.00 sec)
// 未使用索引
MySQL [wms]> EXPLAIN BASIC SELECT * FROM tt  where  c3=2\G;
*************************** 1. row ***************************
Query Plan:
================================================
|ID|OPERATOR  |NAME|
--------------------
|0 |TABLE SCAN|tt  |
================================================
Outputs & filters:
-------------------------------------
  0 - output([tt.c1], [tt.c2], [tt.c3]), filter([tt.c3 = 2]),
      access([tt.c3], [tt.c1], [tt.c2]), partitions(p0)
1 row in set (0.01 sec)