当前位置: 首页 > article >正文

MYSQL PARTITIONING分区操作和性能测试

PARTITION OR NOT PARTITION IN MYSQl

Bill Karwin says “In most circumstances, you’re better off using indexes instead of partitioning as your main method of query optimization.”
According to RICK JAMES: “It is so tempting to believe that PARTITIONing will solve performance problems. But it is so often wrong.”
let’s find out what’s going on by building a test case

TWO TABLES READY

How many partitions? views from Rick James: Have 20-50 partitions; no more.
In this page, we do 10 partitions
Remember: Always test your real case.

  1. Partition table with 10 partitions
CREATE TABLE points_partition 
(id INT NOT NULL AUTO_INCREMENT,
 x FLOAT,
 y FLOAT,
 z FLOAT,
 created_time DATETIME,
 PRIMARY KEY(id, created_time))
PARTITION BY RANGE( YEAR(created_time) ) (
	PARTITION p16 VALUES less than (2016),
	PARTITION p17 VALUES less than (2017),
    PARTITION p18 VALUES less than (2018),
	PARTITION p19 VALUES less than (2019),
    PARTITION p20 VALUES less than (2020),
	PARTITION p21 VALUES less than (2021),
    PARTITION p22 VALUES less than (2022),
	PARTITION p23 VALUES less than (2023),
    PARTITION p24 VALUES less than (2024),
	PARTITION p25 VALUES less than (2025)
) ;

  1. Normal table
CREATE TABLE points_full_table 
(id INT NOT NULL AUTO_INCREMENT,
 x FLOAT,
 y FLOAT,
 z FLOAT,
 created_time DATETIME,
 PRIMARY KEY(id, created_time));

Create millions of rows

For test case, each table holds 10 millions of rows
If using mysql to insert, example 2 is better than example 1

-- sql example 1
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
-- sql example 2
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2"),
                                                 ("data1", "data2"),
                                                 ("data1", "data2");

Add large data with tools

from faker import Faker
import random

def insert_large_data(nums=10):
    fake = Faker()

    data = [(random.random(), random.random(), random.random(),
             str(fake.date_time_between(start_date='-10y', end_date='now'))) for i in range(nums)]
             
    cursor = connection.cursor()

    sql = f"INSERT INTO points_partition (x, y, z, created_time) VALUES (%s, %s, %s, %s)"

    # execute sql with your idea tool

DB-status

partition table take extra files to preserve data, also, extra disk space
请添加图片描述
partition table
请添加图片描述

TEST RESULTS WITHOUT EXTRA INDEX(created_time)

test-1
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where

FROM:mysqlslap

# partition_table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.156 seconds
	Minimum number of seconds to run all queries: 0.156 seconds
	Maximum number of seconds to run all queries: 0.156 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10
# full_table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.172 seconds
	Minimum number of seconds to run all queries: 0.172 seconds
	Maximum number of seconds to run all queries: 0.172 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10

In general, it is expected that fewer touched rows would result in less time for query execution.
since this query only required limit rows under condition without order, mysql optimizer is doing a good job here.
the worse case for the full table is that do a full table scan, but to get just 100 target rows from random data, much less time is needed.

however, if we put a order by in where clause, things will be a huge different.

test-2
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where; Using filesort
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where; Using filesort

FROM:mysqlslap

# partition table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 4.931 seconds
	Minimum number of seconds to run all queries: 4.931 seconds
	Maximum number of seconds to run all queries: 4.931 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10
# full table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 54.652 seconds
	Minimum number of seconds to run all queries: 54.652 seconds
	Maximum number of seconds to run all queries: 54.652 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10

A huge time gap between two queries.
what’ going on?
under condition of “order by”
a full table needs a full table-field sort, that’s cost a lot,
a partition table only need to sort a partition after located target partition.
we always say: test your real case, by this way, you find your circumstance to do a partition table.

WHY:In most circumstances, you’re better off using indexes instead of partitioning

the test are not done yet
From mysql explain, the extra field print a message: “Using filesort”
normally, you should considering a index here to improve performance: MYSQL: explain-extra-information

let’s add a index

ALTER TABLE `points_partition` ADD INDEX `created_time_index` (`created_time`);
ALTER TABLE `points_full_table` ADD INDEX `created_time_index` (`created_time`);

TEST RESULTS WITH INDEX

test-3
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition; Using MRR

FROM: mysqlslap

# partition table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.168 seconds
	Minimum number of seconds to run all queries: 0.168 seconds
	Maximum number of seconds to run all queries: 0.168 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10
# full table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.368 seconds
	Minimum number of seconds to run all queries: 0.368 seconds
	Maximum number of seconds to run all queries: 0.368 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10

again: In general, it is expected that fewer touched rows would result in less time for query execution.
new queries cost a little more time than without extra index.
what happens? explain shows “condition index” are being used here.
stop here, it’s not how indexes are introduced.
sometimes, index is not help if the goal was retrieve 100 target rows. the worst case, yes, but not all.

let’s put a “order by” to see the magic

test-4
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition

FROM: mysqlslap

# partition table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.162 seconds
	Minimum number of seconds to run all queries: 0.162 seconds
	Maximum number of seconds to run all queries: 0.162 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10
# full table
Benchmark
	Running for engine innodb
	Average number of seconds to run all queries: 0.185 seconds
	Minimum number of seconds to run all queries: 0.185 seconds
	Maximum number of seconds to run all queries: 0.185 seconds
	Number of clients running queries: 10
	Average number of queries per client: 10

same touched rows as no “order by”.
but the time cost of queries are getting really closed.
makes sense “In this circumstance, you’re better off using indexes instead of partitioning”.
after all, there are different types of queries were influenced and Maintenance of PARTITION is also a big thing.
For example: select count() is much slower for partition tables. unless doing a partition count()

more tests?
let’s stop here

table vs (better view)

key/typepartitionnormalpartition+ordernormal+orderpartition+indexnormal+indexpartition+order+indexnormal+order+index
diskspace~590m~540m~590m~540m~750m~700m~750m~700m
mysqlslap-benchmark0.156s0.172s4.931s54.652s0.168s0.368s0.162s0.185s
mysql-explain-touched-rows9116259747207911625974720745581226417844558122641784
index////created_time_indexcreated_time_indexcreated_time_indexcreated_time_index

POINTS BASED ON TEST(mysqlslap & mysql workbench)

  1. Index works good without partitioning, most of cases even better
  2. Under condition of range query by partition field, partitioning tables works good indeed
  3. drop partitions is much more efficient when doing a big delete
  4. if queries use specific partition, performance will better

Other Points Related & documents & Links:

  1. Partitioning mainly helps when your full table is larger than RAM
  2. No partitioning without million rows, Only BY RANGE provides any performance…
  3. index order(DESC or ASC) is also important
  4. mysqlslap–benchmark tool
  5. questions about partition

http://www.kler.cn/a/427437.html

相关文章:

  • 志愿服务管理系统设计与实现
  • 网络安全基本原则
  • 原型模式(Prototype Pattern)——对象克隆、深克隆与浅克隆及适用场景
  • 排序算法入门:分类与基本概念详解
  • 单链表---回文结构
  • 静态路由与交换机配置实验
  • springboot的 nacos 配置获取不到导致启动失败及日志不输出问题
  • Java实现三种排序方式
  • 微信小程序px和rpx单位互转方法
  • 【JavaEE】多线程(5)
  • 爆肝Android JNI - 延展Android蓝牙JNI学习
  • HTTPS的工作过程
  • MySQL Group Replication
  • 【GESP】C++一级练习 luogu-P1035, [NOIP2002 普及组] 级数求和
  • 【opencv入门教程】9.视频加载
  • SecrureCRT设置每行的长度:
  • MySQL数据库(4)-基础->高阶查询
  • 乾元通渠道商中标福州市人防信息化建设项目
  • 魔改版kali分享(新增50多种渗透工具)
  • docker学习笔记(四)--DockerFile