SQL进阶实战技巧:如何分析浏览到下单各步骤转化率及流失用户数?
目录
0 问题描述
1 数据准备
2 问题分析
3 问题拓展
3.1 跳出率计算
3.2 计算从浏览商品到支付订单的不同路径的用户数,并按照用户数降序排列。
往期精彩
0 问题描述
统计从浏览商品到最终下单的各个步骤的用户数和流失用户数,并计算转化率
用户表结构和数据
假设我们有一个名为user_behavior_log的用户行为日志表,包含以下字段:
字段名 | 数据类型 | 描述 |
---|---|---|
user_id | INT | 用户ID |
behavior | STRING | 用户行为,例如’view’,‘add_to_cart’,‘submit_order’,‘pay_order’ |
timestamp | BIGINT | 行为发生的时间戳 |
item_id | INT | 商品ID |
用户行为数据示例:
user_id | behavior | timestamp | item_id |
---|---|---|---|
1 | view | 1678886400 | 101 |
1 | add_to_cart | 1678886460 | 101 |
2 | view | 1678886520 | 201 |
1 | submit_order | 1678886580 | 101 |
2 | view | 1678886640 | 301 |
2 | pay_order | 1678886700 | 301 |
3 | view | 1678886760 | 401 |
3 | add_to_cart | 1678886820 | 401 |
3 | view | 1678886880 | 501 |
4 | view | 1678886900 | 601 |
4 | add_to_cart | 1678886960 | 601 |
4 | submit_order | 1678887020 | 601 |
4 | pay_order | 1678887080 | 601 |
行为步骤定义:将用户从浏览商品到最终下单的步骤定义为浏览商品(view)、加入购物车(add_to_cart)、提交订单(submit_order)、支付订单(pay_order。
1 数据准备
(1)建表语句
CREATE TABLE user_behavior_log (
user_id INT,
behavior STRING,
`timestamp` BIGINT,
item_id INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
(2)插入数据
INSERT INTO user_behavior_log VALUES
(1, 'view', 1678886400, 101),
(1, 'add_to_cart', 1678886460, 101),
(2, 'view', 1678886520, 201),
(1,'submit_order', 1678886580, 101),
(2, 'view', 1678886640, 301),
(2, 'pay_order', 1678886700, 301),
(3, 'view', 1678886760, 401),
(3, 'add_to_cart', 1678886820, 401),
(3, 'view', 1678886880, 501),
(4, 'view', 1678886900, 601),
(4, 'add_to_cart', 1678886960, 601),
(4,'submit_order', 1678887020, 601),
(4, 'pay_order', 1678887080, 601);
2 问题分析
步骤1:用户行为序列构建:按照用户 ID 和时间戳对行为进行排序,构建每个用户的行为序列。
行为序列构建核心操作:
collect_list(behavior) OVER (PARTITION BY user_id ORDER BY timestamp)
具体SQL如下:
WITH user_behavior_sequence AS (
SELECT
user_id,
collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`) AS behavior_sequence
FROM
user_behavior_log
)
SELECT
user_id,
behavior_sequence
FROM
user_behavior_sequence;
步骤2:按照用户行为顺序,构建转换漏斗矩阵
核心思路:将用户在各个阶段的行为状态用0-1进行标记,构建漏斗转换矩阵
目的:将用户在各个关键行为步骤的状态以简洁的 0 - 1 标记形式呈现出来,方便后续对用户行为漏斗转化率等指标的计算和分析。
关键操作:array_contains()函数进行判断
CASE WHEN array_contains(behavior_sequence, 'view') THEN 1 ELSE 0 END AS step_view,
具体SQL如下:
WITH user_behavior_sequence AS (
SELECT
user_id,
collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`) AS behavior_sequence
FROM
user_behavior_log
),
user_funnel_matrix AS (
SELECT
user_id,
-- 判断用户是否浏览过商品
CASE WHEN array_contains(behavior_sequence, 'view') THEN 1 ELSE 0 END AS step_view,
-- 判断用户是否加入过购物车(无论之前是否浏览过商品)
CASE WHEN array_contains(behavior_sequence, 'add_to_cart') THEN 1 ELSE 0 END AS step_add_to_cart,
-- 判断用户是否提交过订单(无论之前是否浏览过商品或加入过购物车)
CASE WHEN array_contains(behavior_sequence,'submit_order') THEN 1 ELSE 0 END AS step_submit_order,
-- 判断用户是否支付过订单(无论之前是否浏览过商品、加入过购物车或提交过订单)
CASE WHEN array_contains(behavior_sequence, 'pay_order') THEN 1 ELSE 0 END AS step_pay_order
FROM
user_behavior_sequence
)
SELECT
user_id,
step_view,
step_add_to_cart,
step_submit_order,
step_pay_order
FROM
user_funnel_matrix;
步骤3:转化率和流失用户数计算
WITH user_behavior_sequence AS (
SELECT
user_id,
collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`) AS behavior_sequence
FROM
user_behavior_log
),
user_funnel_matrix AS (
SELECT
user_id,
-- 判断用户是否浏览过商品
CASE WHEN array_contains(behavior_sequence, 'view') THEN 1 ELSE 0 END AS step_view,
-- 判断用户是否加入过购物车(无论之前是否浏览过商品)
CASE WHEN array_contains(behavior_sequence, 'add_to_cart') THEN 1 ELSE 0 END AS step_add_to_cart,
-- 判断用户是否提交过订单(无论之前是否浏览过商品或加入过购物车)
CASE WHEN array_contains(behavior_sequence,'submit_order') THEN 1 ELSE 0 END AS step_submit_order,
-- 判断用户是否支付过订单(无论之前是否浏览过商品、加入过购物车或提交过订单)
CASE WHEN array_contains(behavior_sequence, 'pay_order') THEN 1 ELSE 0 END AS step_pay_order
FROM
user_behavior_sequence
)
, funnel_step_map AS (
SELECT user_id,
max('step_view') AS step_name,
max(step_view) AS step_value
FROM
user_funnel_matrix
GROUP BY user_id
UNION ALL
SELECT user_id,
max('step_add_to_cart') AS step_name,
max(step_add_to_cart) AS step_value
FROM
user_funnel_matrix
WHERE step_view = 1 --只有浏览过商品的用户才会统计后续步骤
GROUP BY user_id
UNION ALL
SELECT user_id,
max('step_submit_order') AS step_name,
max(step_submit_order) AS step_value
FROM
user_funnel_matrix
WHERE step_view = 1 --只有浏览过商品的用户才会统计后续步骤
GROUP BY user_id
UNION ALL
SELECT user_id,
max('step_pay_order') AS step_name,
max(step_pay_order) AS step_value
FROM
user_funnel_matrix
WHERE step_view = 1 --只有浏览过商品的用户才会统计后续步骤
GROUP BY user_id
)
SELECT
step_name,
COUNT(DISTINCT user_id) AS total_user,
sum(step_value) user_count,
COUNT(DISTINCT user_id) - sum(step_value) AS lost_user_count,
cast(sum(step_value) * 1.0 / COUNT(DISTINCT user_id) as decimal(18,2)) AS conversion_rate
FROM
funnel_step_map
group by step_name ;
CET funnel_step_map 表示step_name与step_value进行key-value映射的过程组成map集合便于后续计算(标 1 成对)具体结果如下:
计算的最终结果如下:
上述代码,行转列部分也可以用stack()语句进行优化,优化后更简洁优雅
WITH user_behavior_sequence AS (
SELECT
user_id,
collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`) AS behavior_sequence
FROM
user_behavior_log
),
user_funnel_matrix AS (
SELECT
user_id,
CASE WHEN array_contains(behavior_sequence, 'view') THEN 1 ELSE 0 END AS step_view,
CASE WHEN array_contains(behavior_sequence, 'add_to_cart') THEN 1 ELSE 0 END AS step_add_to_cart,
CASE WHEN array_contains(behavior_sequence,'submit_order') THEN 1 ELSE 0 END AS step_submit_order,
CASE WHEN array_contains(behavior_sequence, 'pay_order') THEN 1 ELSE 0 END AS step_pay_order
FROM
user_behavior_sequence
),
flattened_steps AS (
SELECT
user_id,
step_name,
max(step_value) step_value
FROM
(select user_id
, step_view
, step_add_to_cart
, step_submit_order
, step_pay_order
from user_funnel_matrix
WHERE step_view = 1 --只有浏览过商品的用户才会统计后续步骤(非法数据过滤)
) t1
LATERAL VIEW
stack(
4,
'view', step_view,
'add_to_cart', step_add_to_cart,
'submit_order', step_submit_order,
'pay_order', step_pay_order
) sub_table AS step_name, step_value
group by user_id,step_name
)
SELECT
step_name,
SUM(step_value) AS user_count,
COUNT(DISTINCT user_id) AS total_user,
COUNT(DISTINCT user_id) - SUM(step_value) AS lost_user_count,
cast(SUM(step_value) * 1.0 / COUNT(DISTINCT user_id) as decimal(18,2)) AS conversion_rate
FROM
flattened_steps
GROUP BY
step_name;
中间结果如下
最终结果如下:
通过stack
函数和LATERAL VIEW
简化了数据的重组过程,相比多个UNION ALL
的方式更加简洁紧凑。
3 问题拓展
3.1 跳出率计算
- 跳出率:流失用户数 / 到达该步骤的用户数
- 即:(总用户数 - 到达该步骤的用户数) / 到达该步骤的用户数
SQL如下:
SELECT
step_name,
SUM(step_value) AS user_count,
COUNT(DISTINCT user_id) AS total_user,
COUNT(DISTINCT user_id) - SUM(step_value) AS lost_user_count,
cast(SUM(step_value) * 1.0 / COUNT(DISTINCT user_id) as decimal(18,2)) AS conversion_rate,
--计算跳出率
cast((COUNT(DISTINCT user_id) - SUM(step_value)) * 1.0 / SUM(step_value) as decimal(18,2)) AS bounce_rate
FROM
flattened_steps
GROUP BY
step_name;
3.2 计算从浏览商品到支付订单的不同路径的用户数,并按照用户数降序排列。
- 路径构建:使用
concat_ws
函数将每个用户的行为路径拼接成一个字符串。 - 路径统计:统计每个路径的用户数,并按照用户数降序排列。
步骤1:构建用户浏览路径
WITH user_behavior_sequence AS (
SELECT
user_id,
concat_ws('->',collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`)) AS user_path
FROM
user_behavior_log
)
步骤2:统计每个路径下的用户数,并按用户数降序排列
WITH user_behavior_sequence AS (
SELECT
user_id,
concat_ws('->',collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`)) AS user_path
FROM
user_behavior_log
)
SELECT
user_path,
count(DISTINCT user_id) AS user_count
FROM
user_behavior_sequence
GROUP BY
user_path
ORDER BY
user_count DESC;
往期精彩
别再为用户流失头疼啦!掌握SQL秘籍,从零构建用户流失风险评估模型
SQL进阶技巧:如何分析连续签到领金币数问题?
SQL进阶技巧:如何计算商品需求与到货队列表进出计划?
解锁SQL无限可能:如何利用HiveSQL实现0-1背包问题?
数据科学与SQL:组距分组分析 | 区间分布问题
SQL进阶技巧:如何分析互逆记录?| 相互关注为例分析
~~SQL进阶实战技巧系列~~
SQL很简单,可你却写不好?每天一点点,收获不止一点点_sql断点-CSDN博客文章浏览阅读1.3k次,点赞54次,收藏19次。在写本文之前,我需要跟大家探讨以下几个话题。SQL进阶技巧:车辆班次问题分析SQL 进阶技巧:断点重分组应用求连续段的最后一个数及每段的个数【拼多多面试题】SQL进阶技巧-:字符串时间序列分析法应用之用户连续签到天数及历史最大连续签到天数问题【腾讯面试题】SQL进阶技巧:断点重分组算法应用之用户订单日期间隔异常问题分析SQL进阶技巧:如何对连续多条记录进行合并?【GroingIO 面试题】SQL进阶技巧:断点重分组算法应用之相邻时间间隔跳变问题分析。_sql断点https://flyingsql.blog.csdn.net/article/details/143609283https://flyingsql.blog.csdn.net/article/details/143609283