Doris通过时间字段,按照周分组统计的sql
通过时间字段按周分组统计的 SQL 写法需结合日期函数和分组逻辑,以下是不同场景下的实现方案及注意事项:
数据初始化参考:https://doris.apache.org/zh-CN/docs/benchmark/ssb
一、基础分组方法
1. 使用 WEEK()
函数
SELECT
YEAR(LO_ORDERDATE) AS year,
WEEK(LO_ORDERDATE, 1) AS week_num,
COUNT(*) AS record_count
FROM
lineorder_flat
where
YEAR(LO_ORDERDATE) = '1998'
GROUP BY
year,
week_num
ORDER BY
year,
week_num;
- 说明:
WEEK(date, mode)
的mode
参数定义周起始日(如mode=1
表示周从周一开始,mode=0
为周日开始)。 - 适用场景:简单按周统计记录数,需结合年份避免跨年周混淆。
2. 使用 DATE_FORMAT()
函数
SELECT
DATE_FORMAT(LO_ORDERDATE, '%Y%u') AS year_week,
COUNT(*) AS total_cnt
FROM
lineorder_flat
where
YEAR(LO_ORDERDATE) = '1998'
GROUP BY
year_week
order by
year_week;
- 说明:
%Y%u
格式将日期转换为“年份+周数”(如202413
表示2024年第13周)。
二、进阶场景处理
1. 显示周起始日期
若需展示每周的起始日期,需结合日期计算:
SELECT
MIN(LO_ORDERDATE) AS week_start,
MAX(LO_ORDERDATE) AS week_end,
COUNT(*) AS order_count
FROM
lineorder_flat
where
YEAR(LO_ORDERDATE) = '1998'
GROUP BY
WEEK(LO_ORDERDATE, 1),
YEAR(LO_ORDERDATE)
order by
week_start;
- 说明:通过
MIN/MAX
函数动态计算每周起止日期。
2. 处理跨年周
使用 YEARWEEK()
函数自动处理跨年周:
SELECT
YEARWEEK(LO_ORDERDATE, 1) AS year_week,
COUNT(*) AS total_sales
FROM
lineorder_flat
GROUP BY
year_week
order by
year_week;
- 说明:
YEARWEEK
返回年份和周数的组合(如202413
),避免跨年周的分组错误。
三、性能优化建议
-
索引优化
为时间字段添加索引以加速分组查询:ALTER TABLE lineorder_flatADD INDEX idx_date (LO_COMMITDATE);
-
分区表
对按周频繁查询的大表进行分区:
CREATE TABLE `lineorder_flat` (
`LO_ORDERDATE` int NOT NULL,
`LO_ORDERKEY` int NOT NULL,
`LO_LINENUMBER` tinyint NOT NULL,
`LO_CUSTKEY` int NOT NULL,
`LO_PARTKEY` int NOT NULL,
`LO_SUPPKEY` int NOT NULL,
`LO_ORDERPRIORITY` varchar(100) NOT NULL,
`LO_SHIPPRIORITY` tinyint NOT NULL,
`LO_QUANTITY` tinyint NOT NULL,
`LO_EXTENDEDPRICE` int NOT NULL,
`LO_ORDTOTALPRICE` int NOT NULL,
`LO_DISCOUNT` tinyint NOT NULL,
`LO_REVENUE` int NOT NULL,
`LO_SUPPLYCOST` int NOT NULL,
`LO_TAX` tinyint NOT NULL,
`LO_COMMITDATE` date NOT NULL,
`LO_SHIPMODE` varchar(100) NOT NULL,
`C_NAME` varchar(100) NOT NULL,
`C_ADDRESS` varchar(100) NOT NULL,
`C_CITY` varchar(100) NOT NULL,
`C_NATION` varchar(100) NOT NULL,
`C_REGION` varchar(100) NOT NULL,
`C_PHONE` varchar(100) NOT NULL,
`C_MKTSEGMENT` varchar(100) NOT NULL,
`S_NAME` varchar(100) NOT NULL,
`S_ADDRESS` varchar(100) NOT NULL,
`S_CITY` varchar(100) NOT NULL,
`S_NATION` varchar(100) NOT NULL,
`S_REGION` varchar(100) NOT NULL,
`S_PHONE` varchar(100) NOT NULL,
`P_NAME` varchar(100) NOT NULL,
`P_MFGR` varchar(100) NOT NULL,
`P_CATEGORY` varchar(100) NOT NULL,
`P_BRAND` varchar(100) NOT NULL,
`P_COLOR` varchar(100) NOT NULL,
`P_TYPE` varchar(100) NOT NULL,
`P_SIZE` tinyint NOT NULL,
`P_CONTAINER` varchar(100) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`LO_ORDERDATE`, `LO_ORDERKEY`)
COMMENT 'OLAP'
PARTITION BY RANGE(`LO_ORDERDATE`)
(PARTITION p1992 VALUES [("-2147483648"), ("19930101")),
PARTITION p1993 VALUES [("19930101"), ("19940101")),
PARTITION p1994 VALUES [("19940101"), ("19950101")),
PARTITION p1995 VALUES [("19950101"), ("19960101")),
PARTITION p1996 VALUES [("19960101"), ("19970101")),
PARTITION p1997 VALUES [("19970101"), ("19980101")),
PARTITION p1998 VALUES [("19980101"), ("19990101")))
DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 48
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"min_load_replica_num" = "-1",
"is_being_synced" = "false",
"colocate_with" = "groupxx1",
"storage_medium" = "hdd",
"storage_format" = "V2",
"inverted_index_storage_format" = "V1",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false",
"group_commit_interval_ms" = "10000",
"group_commit_data_bytes" = "134217728"
);
- 缓存中间结果
对高频统计使用物化视图或定期汇总表。
四、注意事项
- 时区问题:确保数据库服务器与应用时区一致,避免周数偏移。
- 周定义一致性:业务需求可能要求周从周日或周一开始,需统一
mode
参数。 - 数据清洗:检查时间字段是否包含空值或异常值(如未来日期)。
示例场景
假设统计订单表 orders
的每周销售额:
SELECT
YEARWEEK(LO_ORDERDATE, 1) AS year_week,
CONCAT(
MIN(LO_ORDERDATE),
' 至 ',
MAX(LO_ORDERDATE)
) AS date_range,
SUM(p_size) AS total_sales,
COUNT(DISTINCT p_name) AS active_users
FROM
lineorder_flat
where
YEAR(LO_ORDERDATE) = '1998'
GROUP BY
year_week
ORDER BY
year_week;
- 输出效果:
通过上述方法,可灵活应对不同业务场景下的按周统计需求,推荐优先使用 YEARWEEK
或 DATE_FORMAT
简化跨年处理。若需更直观的日期范围展示,可结合动态周起止日期计算。