如何用SQL语句来查询表或索引的行存/列存存储方式|OceanBase 用户问题集锦
一、问题背景
自OceanBase 4.3.0版本起,支持了列存引擎,允许表和索引以行存、纯列存或行列冗余的形式创建,且这些存储方式可以自由组合。除了使用 show create table
命令来查看表和索引的存储类型外,也有用户询问如何通过SQL语句来查询表或索引的存储方式。那么,具体该如何操作呢?
二、测试表
说明:这里仅列举了部分组合,还有其他的组合应该也是类似的,不再赘述,欢迎测试,拍砖。
-- 行存表,行存索引
create table t1(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t1_c2(c2)) partition by hash(c1) partitions 3;
create table t2(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t2_c2(c2)) partition by hash(c1) partitions 3 with column group(all columns);
create table t3(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t3_c2(c2) with column group(all columns)) partition by hash(c1) partitions 3 with column group(all columns);
-- 行存表,纯列存索引
create table t4(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t4_c2(c2) with column group(each column)) partition by hash(c1) partitions 3;
-- 行存表,行列混合索引
create table t5(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t5_c2(c2) with column group(each column,all columns)) partition by hash(c1) partitions 3;
-- 纯列存表,行存索引
create table t6(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t6_c2(c2) with column group(all columns)) partition by hash(c1) partitions 3 with column group(each column);
create table t7(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t7_c2(c2)) partition by hash(c1) partitions 3 with column group(each column);
-- 行列混合表,行列混合索引
create table t8(c1 int,c2 int,c3 int,c4 int,c5 int,primary key(c1),key idx_t8_c2(c2) with column group(each column,all columns)) partition by hash(c1) partitions 3 with column group(each column,all columns);
三、摸索
从列存相关的语法上可以看出,引入列存后新增加了 with column group (xxx) 的关键字,可以尝试搜一下哪些表的列上涉及了 column_group 相关的字段,从下面的结果看目前并没有标准表或者视图提供这样的信息。
MySQL [oceanbase]> select distinct table_name from __all_virtual_table
where table_id in (select distinct table_id from __all_virtual_column
where column_name like '%column_group%');
+--------------------------------------------+
| table_name |
+--------------------------------------------+
| __all_table_history |
| __all_column_group |
| __all_column_group_history |
| __all_column_group_mapping |
| __all_column_group_mapping_history |
| __all_virtual_core_all_table |
| __all_virtual_table |
| __all_virtual_table_history |
| __all_virtual_column_group |
| __all_virtual_column_group_mapping |
| __all_virtual_column_group_history |
| __all_virtual_column_group_mapping_history |
+--------------------------------------------+
12 rows in set (0.97 sec)
从 __all_virtual_column_group 表的 column_group_type 字段,凭感觉可以标识。
MySQL [oceanbase]> desc __all_virtual_column_group;
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| tenant_id | bigint(20) | NO | PRI | NULL | |
| table_id | bigint(20) | NO | PRI | NULL | |
| column_group_id | bigint(20) | NO | PRI | NULL | |
| gmt_create | timestamp(6) | NO | | NULL | |
| gmt_modified | timestamp(6) | NO | | NULL | |
| column_group_name | varchar(389) | NO | | | |
| column_group_type | bigint(20) | NO | | NULL | |
| block_size | bigint(20) | NO | | NULL | |
| compressor_type | bigint(20) | NO | | NULL | |
| row_store_type | bigint(20) | NO | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
10 rows in set (0.02 sec)
从代码 src/share/schema/ob_schema_struct.h 找到 column_group 类型的枚举值。
enum ObColumnGroupType : uint8_t
{
DEFAULT_COLUMN_GROUP = 0,
ALL_COLUMN_GROUP,
ROWKEY_COLUMN_GROUP,
SINGLE_COLUMN_GROUP,
NORMAL_COLUMN_GROUP,
MAX_COLUMN_GROUP
};
四、结论
经过测试发现如何标识 表或者索引是行存/纯列存/行列冗余的方式存储,这里之所以给 "结论"使用引号扩起来,原因:
1、受限于自己测试的 case 可能不完善,存在错误的情况,欢迎一起测试,交流。
2、随着版本的迭代,针对枚举值可能会有调整/比如增删等,应以实际版本中的枚举值为准。
- 某个租户下当同一个 table_id 的 column_group_type 包含3 但是不包含1,输出:纯列存表
- 某个租户下当同一个 table_id 的 column_group_type 包含1 和 3,输出:行列混合表
- 其他情况均输出 :纯行存表
五、查询sql和结果
说明:可以按需调整 tenant_name/database_name
sql_1(混合查询)
select
t1.tenant_name,
t2.database_name,
case
when t2.table_type = 'user table' then t2.table_name
when t2.table_type = 'index' then t2.index_name
end as table_name,
t2.table_id,
t2.data_table_id,
t2.table_type,
case
when t2.table_type = 'user table' then
case
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存表'
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合表'
else '纯行存表'
end
when t2.table_type = 'index' then
case
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存索引'
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合索引'
else '纯行存索引'
end
end as storage_type,
coalesce(t3.table_name, null) as main_table_name
from
__all_virtual_column_group cg
join
dba_ob_tenants t1 on cg.tenant_id = t1.tenant_id
join
cdb_ob_table_locations t2 on cg.tenant_id = t2.tenant_id and cg.table_id = t2.table_id
left join
cdb_ob_table_locations t3 on t2.data_table_id = t3.table_id
where
t1.tenant_name in ('test1','test7')
and t2.database_name in ('row_column_db','db100','db600')
and t2.table_type in ('user table', 'index')
group by
t1.tenant_name, t2.database_name, t2.table_name, t2.table_id, t2.data_table_id, t2.table_type, main_table_name
order by t1.tenant_name,t2.database_name, t2.table_name;
sql1_查询结果
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
| tenant_name | database_name | table_name | table_id | data_table_id | table_type | storage_type | main_table_name |
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
| test1 | row_column_db | t1 | 596454 | NULL | USER TABLE | 纯行存表 | NULL |
| test1 | row_column_db | t2 | 596462 | NULL | USER TABLE | 纯行存表 | NULL |
| test1 | row_column_db | t3 | 596470 | NULL | USER TABLE | 纯行存表 | NULL |
| test1 | row_column_db | t4 | 596478 | NULL | USER TABLE | 纯行存表 | NULL |
| test1 | row_column_db | idx_t1_c2 | 596455 | 596454 | INDEX | 纯行存索引 | t1 |
| test1 | row_column_db | idx_t2_c2 | 596463 | 596462 | INDEX | 纯行存索引 | t2 |
| test1 | row_column_db | idx_t3_c2 | 596471 | 596470 | INDEX | 纯行存索引 | t3 |
| test1 | row_column_db | idx_t4_c2 | 596479 | 596478 | INDEX | 纯列存索引 | t4 |
| test7 | db100 | t8 | 500070 | NULL | USER TABLE | 行列混合表 | NULL |
| test7 | db100 | idx_t8_c2 | 500071 | 500070 | INDEX | 行列混合索引 | t8 |
| test7 | db600 | t5 | 500045 | NULL | USER TABLE | 纯行存表 | NULL |
| test7 | db600 | t6 | 500053 | NULL | USER TABLE | 纯列存表 | NULL |
| test7 | db600 | t7 | 500061 | NULL | USER TABLE | 纯列存表 | NULL |
| test7 | db600 | idx_t5_c2 | 500046 | 500045 | INDEX | 行列混合索引 | t5 |
| test7 | db600 | idx_t6_c2 | 500054 | 500053 | INDEX | 纯行存索引 | t6 |
| test7 | db600 | idx_t7_c2 | 500062 | 500061 | INDEX | 纯行存索引 | t7 |
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
16 rows in set (3.45 sec)
sql2_查询纯列存的表/索引
select
t1.tenant_name,
t2.database_name,
case
when t2.table_type = 'user table' then t2.table_name
when t2.table_type = 'index' then t2.index_name
end as table_name,
t2.table_id,
t2.data_table_id,
t2.table_type,
case
when t2.table_type = 'user table' then
case
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存表'
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合表'
else '纯行存表'
end
when t2.table_type = 'index' then
case
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存索引'
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合索引'
else '纯行存索引'
end
end as storage_type,
coalesce(t3.table_name, null) as main_table_name
from
__all_virtual_column_group cg
join
dba_ob_tenants t1 on cg.tenant_id = t1.tenant_id
join
cdb_ob_table_locations t2 on cg.tenant_id = t2.tenant_id and cg.table_id = t2.table_id
left join
cdb_ob_table_locations t3 on t2.data_table_id = t3.table_id
where
t1.tenant_name in ('test1','test7')
and t2.database_name in ('row_column_db','db100','db600')
and t2.table_type in ('user table', 'index')
group by
t1.tenant_name, t2.database_name, t2.table_name, t2.table_id, t2.data_table_id, t2.table_type, main_table_name
having
(t2.table_type = 'user table' and sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0)
or
(t2.table_type = 'index' and sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0 and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0)
order by
t1.tenant_name, t2.database_name, t2.table_name;
sql2_查询结果
+-------------+---------------+------------+----------+---------------+------------+-----------------+-----------------+
| tenant_name | database_name | table_name | table_id | data_table_id | table_type | storage_type | main_table_name |
+-------------+---------------+------------+----------+---------------+------------+-----------------+-----------------+
| test1 | row_column_db | idx_t4_c2 | 596479 | 596478 | INDEX | 纯列存索引 | t4 |
| test7 | db600 | t6 | 500053 | NULL | USER TABLE | 纯列存表 | NULL |
| test7 | db600 | t7 | 500061 | NULL | USER TABLE | 纯列存表 | NULL |
+-------------+---------------+------------+----------+---------------+------------+-----------------+-----------------+
3 rows in set (3.48 sec)
sql3_查询行列冗余的表/索引
SELECT
tenant_name,
database_name,
table_name,
table_id,
data_table_id,
table_type,
storage_type,
main_table_name
FROM (
select
t1.tenant_name,
t2.database_name,
case
when t2.table_type = 'user table' then t2.table_name
when t2.table_type = 'index' then t2.index_name
end as table_name,
t2.table_id,
t2.data_table_id,
t2.table_type,
case
when t2.table_type = 'user table' then
case
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存表'
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合表'
else '纯行存表'
end
when t2.table_type = 'index' then
case
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) = 0 then '纯列存索引'
when sum(case when cg.column_group_type = 3 then 1 else 0 end) > 0
and sum(case when cg.column_group_type = 1 then 1 else 0 end) > 0 then '行列混合索引'
else '纯行存索引'
end
end as storage_type,
coalesce(t3.table_name, null) as main_table_name
from
__all_virtual_column_group cg
join
dba_ob_tenants t1 on cg.tenant_id = t1.tenant_id
join
cdb_ob_table_locations t2 on cg.tenant_id = t2.tenant_id and cg.table_id = t2.table_id
left join
cdb_ob_table_locations t3 on t2.data_table_id = t3.table_id
where
t1.tenant_name in ('test1','test7')
and t2.database_name in ('row_column_db','db100','db600')
and t2.table_type in ('user table', 'index')
group by
t1.tenant_name, t2.database_name, t2.table_name, t2.table_id, t2.data_table_id, t2.table_type, main_table_name
) subquery
WHERE
storage_type IN ('行列混合表', '行列混合索引')
ORDER BY tenant_name, database_name, table_name;
sql3_查询结果
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
| tenant_name | database_name | table_name | table_id | data_table_id | table_type | storage_type | main_table_name |
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
| test7 | db100 | idx_t8_c2 | 500071 | 500070 | INDEX | 行列混合索引 | t8 |
| test7 | db100 | t8 | 500070 | NULL | USER TABLE | 行列混合表 | NULL |
| test7 | db600 | idx_t5_c2 | 500046 | 500045 | INDEX | 行列混合索引 | t5 |
+-------------+---------------+------------+----------+---------------+------------+--------------------+-----------------+
3 rows in set (3.33 sec)