OceanBase 4.3.3 功能解析:列存副本
OceanBase 从4.3.0 版本开始,引入了列式存储的支持。用户可以根据业务的具体需求,选择创建列存表、行存表或是行列混存表。无论选择哪种表类型,在不同的Zone内,租户使用的副本模式都是一致的。详见官网文档: https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001429675
为了达成TP与AP资源在物理层面上的严格隔离,OceanBase 4.3.3.0版本引入了一种创新的部署模式:它允许在原有集群的基础上,增设独立的zone来专门存储列存副本(简称C副本)。但在4.3.3.0和4.3.3.1这两个版本中,列存副本功能被界定为实验性质,因此并不推荐在生产环境中应用。
副本类型的说明详见官网文档:
https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001431874
副本类型 | 选举投票 | 日志投票 | sstable | clog | memtable | 副本类型转换 |
---|---|---|---|---|---|---|
F | 参与 | 参与 | 有,major为行存sstable | 有 | 有 | 可以转为R副本 |
R | 不参与 | 不参与 | 有,major为行存sstable | 有 | 有 | 可以转为F副本 |
C | 不参与 | 不参与 | 有,major为列存sstable | 有 | 有 | 不能转为其他副本 |
创建列存副本前的环境
# 集群拓扑
MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| SVR_IP | SVR_PORT | ID | ZONE | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME | MODIFY_TIME | BUILD_VERSION | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 | 12882 | 1 | zone1 | 12881 | YES | ACTIVE | 2024-11-04 10:27:09.942001 | NULL | NULL | 2024-10-22 20:07:13.974171 | 2024-11-04 10:27:22.872264 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL |
| 11.xxx.xxx.191 | 22882 | 2 | zone2 | 22881 | NO | ACTIVE | 2024-11-04 10:28:31.472704 | NULL | NULL | 2024-10-22 20:07:13.986746 | 2024-11-04 10:28:31.882765 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL |
| 11.xxx.xxx.192 | 32882 | 3 | zone3 | 32881 | NO | ACTIVE | 2024-11-04 10:29:29.111769 | NULL | NULL | 2024-10-22 20:07:13.995302 | 2024-11-04 10:29:30.161822 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
3 rows in set (0.01 sec)
# 模拟已有的租户
create resource unit u1 min_cpu=3,max_cpu=3,memory_size='4g',log_disk_size='12g',max_iops=10000;
create resource pool p1_1 unit='u1',zone_list=('zone1'),unit_num=1;
create resource pool p1_2 unit='u1',zone_list=('zone2'),unit_num=1;
create resource pool p1_3 unit='u1',zone_list=('zone3'),unit_num=1;
create tenant test1 resource_pool_list=('p1_1','p1_2','p1_3'),
primary_zone='zone1,zone2,zone3',locality='F@zone1, F@zone2, F@zone3',
charset=utf8mb4,collate=utf8mb4_bin
set ob_tcp_invited_nodes='%';
mysql -h127.0.0.1 -P12881 -uroot@test1 -p -A
alter user root identified by 'xxx';
扩展 zone4 供列存副本使用
参考 obd 集群扩容: https://www.oceanbase.com/docs/community-obd-cn-1000000001477803
oceanbase-ce:
servers:
- name: server4
ip: 11.xxx.xxx.192
server4:
zone: zone4
obshell_port: 45881
mysql_port: 42881
rpc_port: 42882
local_ip: 11.xxx.xxx.192
home_path: /home/heshun.lxd/observer4
data_dir: /obdata/data/data4
redo_dir: /obdata/log/log4
obd cluster scale_out ob433 -c ob433_scale_out_zone4.yaml -v
扩容后的集群拓扑
MySQL [oceanbase]> select * from dba_ob_servers order by zone;
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| SVR_IP | SVR_PORT | ID | ZONE | SQL_PORT | WITH_ROOTSERVER | STATUS | START_SERVICE_TIME | STOP_TIME | BLOCK_MIGRATE_IN_TIME | CREATE_TIME | MODIFY_TIME | BUILD_VERSION | LAST_OFFLINE_TIME |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
| 11.xxx.xxx.191 | 12882 | 1 | zone1 | 12881 | YES | ACTIVE | 2024-11-04 10:27:09.942001 | NULL | NULL | 2024-10-22 20:07:13.974171 | 2024-11-04 10:27:22.872264 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL |
| 11.xxx.xxx.191 | 22882 | 2 | zone2 | 22881 | NO | ACTIVE | 2024-11-04 10:28:31.472704 | NULL | NULL | 2024-10-22 20:07:13.986746 | 2024-11-04 10:28:31.882765 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL |
| 11.xxx.xxx.192 | 32882 | 3 | zone3 | 32881 | NO | ACTIVE | 2024-11-04 10:29:29.111769 | NULL | NULL | 2024-10-22 20:07:13.995302 | 2024-11-04 10:29:30.161822 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL |
| 11.xxx.xxx.192 | 42882 | 4 | zone4 | 42881 | NO | ACTIVE | 2024-11-04 11:48:24.538274 | NULL | NULL | 2024-11-04 11:09:44.030541 | 2024-11-04 11:48:26.306543 | 4.3.3.1_101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43(Oct 22 2024 17:46:45) | NULL |
+----------------+----------+----+-------+----------+-----------------+--------+----------------------------+-----------+-----------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------+-------------------+
4 rows in set (0.00 sec)
给已有的租户扩列存副本
1、扩容前租户副本分布
MySQL [oceanbase]> select tenant_id,tenant_name,primary_zone,locality from dba_ob_tenants where tenant_type='user';
+-----------+-------------+-------------------+---------------------------------------------+
| tenant_id | tenant_name | primary_zone | locality |
+-----------+-------------+-------------------+---------------------------------------------+
| 1010 | test1 | zone1,zone2,zone3 | FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3 |
+-----------+-------------+-------------------+---------------------------------------------+
1 row in set (0.03 sec)
2、在增加副本之前,需要确认租户在目标 zone 上是否有资源池,并记录好当前该租户在各 zone 上的资源池名。
MySQL [oceanbase]> select * from dba_ob_resource_pools where tenant_id=(select tenant_id from dba_ob_tenants where tenant_name='test1');
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| RESOURCE_POOL_ID | NAME | TENANT_ID | CREATE_TIME | MODIFY_TIME | UNIT_COUNT | UNIT_CONFIG_ID | ZONE_LIST | REPLICA_TYPE |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
| 1008 | p1_1 | 1010 | 2024-11-04 11:01:36.377693 | 2024-11-04 11:02:00.918615 | 1 | 1004 | zone1 | FULL |
| 1009 | p1_2 | 1010 | 2024-11-04 11:01:36.395700 | 2024-11-04 11:02:01.221993 | 1 | 1004 | zone2 | FULL |
| 1010 | p1_3 | 1010 | 2024-11-04 11:01:36.410597 | 2024-11-04 11:02:01.224139 | 1 | 1004 | zone3 | FULL |
+------------------+------+-----------+----------------------------+----------------------------+------------+----------------+-----------+--------------+
3 rows in set (0.02 sec)
3、确认各 resource pool 使用的 unit ,和 dba_ob_resource_pools 的 unit_config_id 进行关联
MySQL [oceanbase]> select * from dba_ob_unit_configs;
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
| UNIT_CONFIG_ID | NAME | CREATE_TIME | MODIFY_TIME | MAX_CPU | MIN_CPU | MEMORY_SIZE | LOG_DISK_SIZE | DATA_DISK_SIZE | MAX_IOPS | MIN_IOPS | IOPS_WEIGHT | MAX_NET_BANDWIDTH | NET_BANDWIDTH_WEIGHT |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
| 1 | sys_unit_config | 2024-10-22 20:07:12.701353 | 2024-10-22 20:07:12.701353 | 2 | 2 | 2147483648 | 3221225472 | NULL | 9223372036854775807 | 9223372036854775807 | 2 | 9223372036854775807 | 2 |
| 1004 | u1 | 2024-11-04 11:01:30.256177 | 2024-11-04 11:01:30.256177 | 3 | 3 | 4294967296 | 12884901888 | NULL | 10000 | 10000 | 0 | 9223372036854775807 | 3 |
+----------------+-----------------+----------------------------+----------------------------+---------+---------+-------------+---------------+----------------+---------------------+---------------------+-------------+---------------------+----------------------+
2 rows in set (0.01 sec)
4、给 test1 租户在 zone4 上创建 resource pool
create resource pool p1_4 unit='u1' ,unit_num=1,zone_list=('zone4');
5、修改 test1 租户的 resource_pool_list
alter tenant test1 resource_pool_list=('p1_1','p1_2','p1_3','p1_4');
6、修改 test1 租户的 locality
alter tenant test1 locality='f@zone1,f@zone2,f@zone3,c@zone4';
7、确认 test1 租户 locality 修改情况
select * from dba_ob_tenant_jobs
where job_type='alter_tenant_locality'
and tenant_id=(select tenant_id from dba_ob_tenants where tenant_name='test1')
order by start_time desc limit 1 \G
*************************** 1. row ***************************
JOB_ID: 2
JOB_TYPE: ALTER_TENANT_LOCALITY
JOB_STATUS: SUCCESS
RESULT_CODE: 0
PROGRESS: 100
START_TIME: 2024-11-04 12:01:55.851907
MODIFY_TIME: 2024-11-04 12:02:26.819124
TENANT_ID: 1010
SQL_TEXT: alter tenant test1 locality='f@zone1,f@zone2,f@zone3,c@zone4'
EXTRA_INFO: FROM: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3', TO: 'FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3, COLUMNSTORE{1}@zone4'
RS_SVR_IP: 11.xxx.xxx.191
RS_SVR_PORT: 12882
1 row in set (0.02 sec)
新建租户时创建列存副本
create resource unit u2 min_cpu=3,max_cpu=3,memory_size='4g',log_disk_size='12g',max_iops=10000;
create resource pool p2_1 unit='u2',zone_list=('zone1'),unit_num=1;
create resource pool p2_2 unit='u2',zone_list=('zone2'),unit_num=1;
create resource pool p2_3 unit='u2',zone_list=('zone3'),unit_num=1;
create resource pool p2_4 unit='u2',zone_list=('zone4'),unit_num=1;
create tenant test2
resource_pool_list=('p2_1','p2_2','p2_3','p2_4'),
primary_zone='zone1,zone2,zone3;zone4',
locality='F@zone1, F@zone2, F@zone3, C@zone4',
charset=utf8mb4,collate=utf8mb4_bin
set ob_tcp_invited_nodes='%';
mysql -h127.0.0.1 -P12881 -uroot@test2 -p -A
alter user root identified by 'xxx';
配置 obproxy
使用 root@proxysys 登录对应的 obproxy
独占的 obproxy
给列存副本单独创建一个 obproxy 并登录后进行如下配置
alter proxyconfig set obproxy_read_consistency='1';
alter proxyconfig set init_sql = 'set @@ob_route_policy="COLUMN_STORE_ONLY";';
共享的 obproxy
没有独立的机器资源供列存副本使用,需要复用已有的 obproxy环境,此时可以设置 obproxy 多级配置,关于 obproxy 的多级配置可以详见 官网文档:
https://www.oceanbase.com/docs/common-odp-doc-cn-1000000001409917
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test1', 'obproxy_read_consistency', 1, 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test1', 'init_sql', 'set @@ob_route_policy="COLUMN_STORE_ONLY";', 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test2', 'obproxy_read_consistency', 1, 'LEVEL_TENANT');
replace into proxy_config(cluster_name, tenant_name, name, value, config_level) values ('obcluster', 'test2', 'init_sql', 'set @@ob_route_policy="COLUMN_STORE_ONLY";', 'LEVEL_TENANT');
访问列存副本测试
使用如上配置的 obproxy 登录测试
# sys 租户
MySQL [oceanbase]> select zone,tenant_id,name,value,default_value from gv$ob_parameters where tenant_id=1010 and name='default_table_store_format';
+-------+-----------+----------------------------+-------+---------------+
| zone | tenant_id | name | value | default_value |
+-------+-----------+----------------------------+-------+---------------+
| zone1 | 1010 | default_table_store_format | row | row |
| zone4 | 1010 | default_table_store_format | row | row |
| zone3 | 1010 | default_table_store_format | row | row |
| zone2 | 1010 | default_table_store_format | row | row |
+-------+-----------+----------------------------+-------+---------------+
4 rows in set (0.03 sec)
# test1 租户
MySQL [test]> show create table t1 \G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) DEFAULT NULL
) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_bin ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0
partition by hash(id)
(partition `p0`,
partition `p1`,
partition `p2`)
1 row in set (0.01 sec)
MySQL [test]> explain select * from t1;
+----------------------------------------------------------------------+
| Query Plan |
+----------------------------------------------------------------------+
| ================================================================ |
| |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| |
| ---------------------------------------------------------------- |
| |0 |PX COORDINATOR | |1 |7 | |
| |1 |└─EXCHANGE OUT DISTR |:EX10000|1 |7 | |
| |2 | └─PX PARTITION ITERATOR | |1 |7 | |
| |3 | └─COLUMN TABLE FULL SCAN|t1 |1 |7 | |
| ================================================================ |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([INTERNAL_FUNCTION(t1.id)]), filter(nil), rowset=16 |
| 1 - output([INTERNAL_FUNCTION(t1.id)]), filter(nil), rowset=16 |
| dop=1 |
| 2 - output([t1.id]), filter(nil), rowset=16 |
| force partition granule |
| 3 - output([t1.id]), filter(nil), rowset=16 |
| access([t1.id]), partitions(p[0-2]) |
| is_index_back=false, is_global_index=false, |
| range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
+----------------------------------------------------------------------+
19 rows in set (0.01 sec)
- 表结构没有 with column group ,default_table_store_format 是默认的行存,执行计划展示上显示 COLUMN TABLE FULL SCAN,说明使用到了列存的范围扫描。
- 这里的测试表 t1 是在 test1 租户下测试的,该租户的拓扑 3F-1C ,有4个副本,但是在 show create table 和 show create tenant 结果中 replica_num都等于3,使用的是全功能副本的数量。
注意事项
1、observer 需要 4.3.3.0 及其之上的版本。
2、ocp 需要 4.3.3 及其之上的版本(当前还没有发布ocp 4.3.3)。
3、obd 需要 2.10.1-1 及其之上的版本。
4、obproxy 需要 4.3.2 及其之上的版本。
5、不建议部署 2 个及以上数目的列存副本。
6、全功能和只读副本不支持转为列存副本,列存副本也不支持转为全功能和只读副本。
7、物理恢复不支持恢复列存副本。
8、如果主库未部署列存副本,备库也不建议部署列存副本。
9、列存表是指表的分区 Leader & Follower 的 Schema 均为列存格式,查询可以是强读;
列存副本是在保证表的分区 Leader & Follower 的 Schema 为行存格式的前提下,只读副本 Learner 为列存格式,并且 OLAP 的查询只能是弱读。
其他详见官网文档:
列存副本
https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000001428590