MySQL联合索引中不同区分度列的顺序对查询性能的影响
当构建联合索引时,需要考虑索引的顺序问题,除了考虑使用场景,索引的顺序是如何影响查询性能的呢?先来做个小实验。
首先构建以下表:
create table if not exists test.flow
(
id bigint auto_increment primary key,
random_string varchar(255) not null,
binary_int int not null
);
插入100w条数据,其中random_string字段为随机字符串,区分度高,而binary_int为0和1,区分度低,数据采样如下
id,random_string,binary_int
229594,00000dc2-cba2-4991-a53a-1c5dda022b5d,0
826489,000010dc-3b23-4e9d-b10d-658b5830d09f,1
331127,00002c6e-f375-4c17-81ef-7cf446386a5b,1
548640,000031c1-7f99-4793-96e4-99212b3938c2,0
181729,000031f9-d23c-4ff0-929f-d5ecb5adb6fb,1
211099,00003b8b-f832-41be-bc1c-e0b4ffc77db7,1
669026,00004837-47cb-4bf8-9b67-bffb2919bd98,0
156546,00004cf6-5b5a-46c4-9f5c-64bbf58c15a2,0
154805,00004e1e-2b6b-4d08-90f2-f63881d5b96c,1
191821,00005155-575e-4e3c-b08f-2ed901940ca3,1
809912,0000550b-f255-4c29-b0ae-850fed20943b,0
585914,00005c98-16e3-4f29-b674-5094946fdb2a,0
222754,00006c08-3a8e-4a33-be34-3e86d1c64253,0
260696,00007c12-a386-4567-8d01-333fb6a35e1a,0
分别对该表添加联合索引:
alter table flow add index idx_rstr_bint (binary_int, random_string)
alter table flow add index idx_bint_rstr (random_string, binary_int)
分别在不同联合索引情况下执行以下等值查询的explain语句:
explain select *
from flow where binary_int = 0 and random_string = '0000dc2f-4b56-4e54-910c-a88ba2511ec9';
执行结果如下:
select_type | table | partitions | type | possible_keys | key | ken_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
SIMPLE | flow | null | ref | idx_rstr_bint | idx_rstr_bint | 1022 | const,const | 1 | 100 | Using index |
select_type | table | partitions | type | possible_keys | key | ken_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
SIMPLE | flow | null | ref | idx_bint_rstr | idx_bint_rstr | 1026 | const,const | 1 | 100 | Using index |
可以看到等值查询的执行扫描行数都是一样的,即无影响。
如果对random_string进行范围查询呢?
分别在不同联合索引情况下执行以下等值查询的explain语句:
explain select *
from flow where binary_int = 0 and random_string < '0000dc2f-4b56-4e54-910c-a88ba2511ec9';
select_type | table | partitions | type | possible_keys | key | ken_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
SIMPLE | flow | null | range | idx_rstr_bint | idx_rstr_bint | 1022 | null | 21 | 10 | Using where; Using index |
select_type | table | partitions | type | possible_keys | key | ken_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
SIMPLE | flow | null | range | idx_bint_rstr | idx_bint_rstr | 1026 | null | 10 | 100 | Using index |
可以看到当索引顺序是 (random_string, binary_int)时,扫描的行数会多些,是先利用索引筛选出 random_string < '0000dc2f-4b56-4e54-910c-a88ba2511ec9'
的数据行,然后使用where binary_int = 0
进行过滤;当索引顺序是(binary_int, random_string)是,直接走索引查出返回的数据行。
可以看到等值查询时,列的顺序不影响联合索引的查询性能,而范围查询时,范围查询列应放在联合索引的列的最后。
explain
结合15.8.2 EXPLAIN Statement,回顾一下explain语句。
explain可以对 SELECT, DELETE, INSERT, REPLACE, UPDATE, TABLE的执行过程进行分析展示,不过最常用来分析SELECT的执行过程。
看下对SELECT语句explain的字段含义:
Column | Meaning |
---|---|
select_type | The SELECT type |
table | The table for the output row |
partitions | The matching partitions ,非分区表为null |
type | The join type |
possible_keys | The possible indexes to choose |
key | The index actually chosen |
key_len | The length of the chosen key |
ref | The columns compared to the index |
rows | Estimate of rows to be examined |
filtered | Percentage of rows filtered by table condition |
Extra | Additional information |
关于字段具体内容可参考:10.8.2 EXPLAIN Output Format
explain可指定输入格式为TREE:
explain format = TREE select * from flow where random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b';
结果:
-> Filter: (flow.random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b') (cost=36116 rows=99568)
-> Covering index skip scan on flow using idx_binaryint_randomstring over random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b' (cost=36116 rows=99568)
explain可指定输入格式为JSON:
explain format = JSON select * from flow where random_string = '00002c6e-f375-4c17-81ef-7cf446386a5b';
结果:
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "36116.17"
},
"table": {
"table_name": "flow",
"access_type": "range",
"possible_keys": [
"idx_binaryint_randomstring"
],
"key": "idx_binaryint_randomstring",
"used_key_parts": [
"binary_int",
"random_string"
],
"key_length": "1026",
"rows_examined_per_scan": 99568,
"rows_produced_per_join": 99568,
"filtered": "100.00",
"using_index_for_skip_scan": true,
"cost_info": {
"read_cost": "26159.38",
"eval_cost": "9956.80",
"prefix_cost": "36116.18",
"data_read_per_join": "98M"
},
"used_columns": [
"id",
"random_string",
"binary_int"
],
"attached_condition": "(`test`.`flow`.`random_string` = '00002c6e-f375-4c17-81ef-7cf446386a5b')"
}
}
}
参考
- 15.8.2 EXPLAIN Statement
- Which column to put first in index? Higher or lower cardinality?
- For a composite index of columns of different cardinality, does order matter?