Hive SQL业务场景:连续5天涨幅超过5%股票
一、需求描述
现有一张股票价格表 dwd_stock_trade_dtl 有3个字段分别是:
股票代码(stock_code),
日期(trade_date),
收盘价格(closing_price) 。
请找出满足连续5天以上(含)每天上涨超过5%的股票,并给出连续满足天数及开始和结束日期。
备注:不考虑停牌或其他情况,仅仅关注每天连续5天上涨超过5%的股票。
二、数据准备
1、建Hive表
DROP TABLE IF EXISTS dwd_stock_trade_dtl;
CREATE TABLE IF NOT EXISTS dwd_stock_trade_dtl (
stock_code STRING,
trade_date DATE,
closing_price DECIMAL(10,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
2、插入测试数据
--样例数据插入
INSERT INTO TABLE dwd_stock_trade_dtl
VALUES
('AAPP', '2024-02-26', 100.00),
('RAAB', '2024-02-27', 105.00),
('RAAB', '2024-02-28', 110.25),
('RAAB', '2024-03-01', 115.78),
('RAAB', '2024-03-02', 121.59),
('RAAB', '2024-03-03', 128.73),
('RAAB', '2024-03-04', 137.00),
('RAAB', '2024-03-05', 144.67),
('RAAB', '2024-03-06', 147.64),
('EWXN', '2024-02-26', 2000.00),
('EWXN', '2024-02-27', 2100.00),
('EWXN', '2024-02-28', 2205.00),
('EWXN', '2024-03-01', 2313.25),
('EWXN', '2024-03-02', 2431.01),
('EWXN', '2024-03-03', 2547.56),
('EWXN', '2024-03-04', 2680.19),
('EWXN', '2024-03-05', 2814.20),
('EWXN', '2024-03-06', 2955.91);
三、需求分析
用lag函数列出前一天的交易价格
算出每日涨幅:(今天交易价格 / 前一天交易价格)- 1
判断是否满足涨幅大于5%,满足打个flag
用row_number 函数算出连续
最后用min,max,count 函数求出连续上涨的最小日期,最大日期和天数
四、需求实现
1、用lag函数列出前一天的交易价格并算出每日涨幅:(今天交易价格 / 前一天交易价格)- 1
SELECT stock_code,
trade_date,
closing_price, -- -- 交易价格
LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) as a_closing_price, -- 前一天交易价格
(closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) ) - 1 AS daily_return -- 涨幅
FROM dwd_stock_trade_dtl;
2、判断是否满足涨幅大于5%,满足标记1,不满足0
SELECT stock_code,
trade_date,
closing_price, -- -- 交易价格
LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) as a_closing_price, -- 前一天交易价格
(closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) ) - 1 AS daily_return, -- 涨幅
if(closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - 1 >= 0.05, 1,
0) AS flag
FROM dwd_stock_trade_dtl
3、用row_number 函数算出每只股票日期排序,和每只股票是否满足条件下的日期排序,然后相减,相同则满足连续
SELECT stock_code,
trade_date,
flag,
row_number() OVER (PARTITION BY stock_code ORDER BY trade_date ASC) AS a_rn,
row_number() OVER (PARTITION BY stock_code, flag ORDER BY trade_date ASC) AS flag_rn,
row_number() OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - row_number() OVER (PARTITION BY stock_code, flag ORDER BY trade_date ASC) AS diff_rn
FROM (
SELECT stock_code,
trade_date,
if(closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - 1 >= 0.05, 1, 0) AS flag
FROM dwd_stock_trade_dtl
) a
ORDER BY stock_code, trade_date
4、最后,求出连续涨幅超过5%的开始日期,结束日期,天数。
SELECT stock_code,
min(trade_date) AS min_trade_date, -- 开始日期
max(trade_date) AS max_trade_date, -- 结束日期
count(1) AS day_cnt -- 天数
FROM (
SELECT stock_code,
trade_date,
flag,
row_number() OVER (PARTITION BY stock_code ORDER BY trade_date ASC) AS a_rn,
row_number() OVER (PARTITION BY stock_code, flag ORDER BY trade_date ASC) AS flag_rn,
row_number() OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - row_number() OVER (PARTITION BY stock_code, flag ORDER BY trade_date ASC) AS diff_rn
FROM (
SELECT stock_code,
trade_date,
if(closing_price / LAG(closing_price) OVER (PARTITION BY stock_code ORDER BY trade_date ASC) - 1 >= 0.05, 1, 0) AS flag
FROM dwd_stock_trade_dtl
) a
) b
WHERE flag = 1
GROUP BY stock_code,
diff_rn
HAVING count(1) >= 5