Q2 电商订单数据分析优化
场景描述
某电商平台有3张核心表:
-- 用户表(1亿条数据)
CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(50),
reg_date DATE
);
-- 订单表(10亿条数据)
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
user_id INT,
order_date DATE,
status VARCHAR(20)
;
-- 订单明细表(100亿条数据)
CREATE TABLE order_details (
order_detail_id BIGINT PRIMARY KEY,
order_id BIGINT,
product_id INT,
quantity INT,
price DECIMAL(10,2)
);
原始查询需求
获取2023年注册用户的:
-
总订单金额
-
最近一次下单日期
-
订单完成率(completed订单数/总订单数)
现有低效查询:
SELECT
u.user_id,
(SELECT SUM(od.price)
FROM orders o
JOIN order_details od ON o.order_id = od.order_id
WHERE o.user_id = u.user_id) AS total_amount,
(SELECT MAX(order_date)
FROM orders
WHERE user_id = u.user_id) AS last_order_date,
(SELECT COUNT(*) FILTER (WHERE status = 'completed')::FLOAT / COUNT(*)
FROM orders
WHERE user_id = u.user_id) AS completion_rate
FROM users u
WHERE u.reg_date BETWEEN '2023-01-01' AND '2023-12-31';
题目要求
-
分析原始查询的性能瓶颈
-
重写优化后的SQL查询
-
建议需要创建的索引
-
解释优化思路
参考答案
优化后查询
WITH order_aggs AS (
SELECT
o.user_id,
SUM(od.price) AS total_amount,
MAX(o.order_date) AS last_order_date,
COUNT(*) FILTER (WHERE o.status = 'completed')::FLOAT / COUNT(*) AS completion_rate
FROM orders o
JOIN order_details od ON o.order_id = od.order_id
GROUP BY o.user_id
)
SELECT
u.user_id,
oa.total_amount,
oa.last_order_date,
oa.completion_rate
FROM users u
LEFT JOIN order_aggs oa ON u.user_id = oa.user_id
WHERE u.reg_date BETWEEN '2023-01-01' AND '2023-12-31';
建议索引
CREATE INDEX idx_users_regdate ON users(reg_date);
CREATE INDEX idx_orders_userid ON orders(user_id);
CREATE INDEX idx_orders_userid_status ON orders(user_id, status);
CREATE INDEX idx_orderdetails_orderid ON order_details(order_id);
优化思路
-
消除重复子查询:将三个关联子查询合并为一次CTE聚合查询,减少orders表的扫描次数
-
JOIN优化:
-
使用LEFT JOIN替代相关子查询
-
先对orders和order_details进行JOIN后再聚合
-
-
索引优化:
-
users.reg_date索引快速定位2023年用户
-
orders.user_id索引加速用户订单关联
-
复合索引orders(user_id, status)覆盖状态过滤
-
order_details.order_id索引加速订单明细关联
-
-
执行计划优化:
-
避免Nested Loop导致的大量重复扫描
-
减少临时中间表的生成
-
充分利用Hash Join和GroupAggregate
-
-
计算优化:
-
使用一次COUNT(*)配合FILTER条件,避免多次扫描订单表
-
使用CTE物化中间结果集
-
性能对比
优化项 | 原始查询 | 优化后查询 |
orders表扫描次数 | 3次 | 1次 |
order_details扫描次数 | 3次 | 1次 |
JOIN类型 | Nested Loop | Hash Join |
临时表生成 | 多次 | 1次CTE |
该优化可使执行时间从小时级降低到分钟级,特别是在海量数据场景下效果显著。