当前位置: 首页 > article >正文

SQL进阶实战技巧:如何分析浏览到下单各步骤转化率及流失用户数?

目录

0 问题描述

1 数据准备 

2 问题分析

3 问题拓展

3.1 跳出率计算 

3.2  计算从浏览商品到支付订单的不同路径的用户数,并按照用户数降序排列。

往期精彩


0 问题描述

统计从浏览商品到最终下单的各个步骤的用户数和流失用户数,并计算转化率

用户表结构和数据
假设我们有一个名为user_behavior_log的用户行为日志表,包含以下字段: 

字段名数据类型描述
user_idINT用户ID
behaviorSTRING用户行为,例如’view’,‘add_to_cart’,‘submit_order’,‘pay_order’
timestampBIGINT行为发生的时间戳
item_idINT商品ID

 用户行为数据示例

user_idbehaviortimestampitem_id
1view1678886400101
1add_to_cart1678886460101
2view1678886520201
1submit_order1678886580101
2view1678886640301
2pay_order1678886700301
3view1678886760401
3add_to_cart1678886820401
3view1678886880501
4view1678886900601
4add_to_cart1678886960601
4submit_order1678887020601
4pay_order1678887080601

行为步骤定义:将用户从浏览商品到最终下单的步骤定义为浏览商品(view)、加入购物车(add_to_cart)、提交订单(submit_order)、支付订单(pay_order。 

1 数据准备 

(1)建表语句

CREATE TABLE user_behavior_log (
    user_id INT,
    behavior STRING,
    `timestamp` BIGINT,
    item_id INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

(2)插入数据

INSERT INTO user_behavior_log VALUES
(1, 'view', 1678886400, 101),
(1, 'add_to_cart', 1678886460, 101),
(2, 'view', 1678886520, 201),
(1,'submit_order', 1678886580, 101),
(2, 'view', 1678886640, 301),
(2, 'pay_order', 1678886700, 301),
(3, 'view', 1678886760, 401),
(3, 'add_to_cart', 1678886820, 401),
(3, 'view', 1678886880, 501),
(4, 'view', 1678886900, 601),
(4, 'add_to_cart', 1678886960, 601),
(4,'submit_order', 1678887020, 601),
(4, 'pay_order', 1678887080, 601);

2 问题分析

步骤1:用户行为序列构建:按照用户 ID 和时间戳对行为进行排序,构建每个用户的行为序列。

行为序列构建核心操作:

collect_list(behavior) OVER (PARTITION BY user_id ORDER BY timestamp) 

 具体SQL如下:

WITH user_behavior_sequence AS (
    SELECT
        user_id,
        collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`) AS behavior_sequence
    FROM
        user_behavior_log
)
SELECT
    user_id,
    behavior_sequence
FROM
    user_behavior_sequence;

步骤2:按照用户行为顺序,构建转换漏斗矩阵

核心思路:将用户在各个阶段的行为状态用0-1进行标记,构建漏斗转换矩阵

 目的:将用户在各个关键行为步骤的状态以简洁的 0 - 1 标记形式呈现出来,方便后续对用户行为漏斗转化率等指标的计算和分析。

关键操作:array_contains()函数进行判断

CASE WHEN array_contains(behavior_sequence, 'view') THEN 1 ELSE 0 END AS step_view,

具体SQL如下:

WITH user_behavior_sequence AS (
    SELECT
        user_id,
        collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`) AS behavior_sequence
    FROM
        user_behavior_log
),
user_funnel_matrix AS (
    SELECT
        user_id,
        -- 判断用户是否浏览过商品
        CASE WHEN array_contains(behavior_sequence, 'view') THEN 1 ELSE 0 END AS step_view,
        -- 判断用户是否加入过购物车(无论之前是否浏览过商品)
        CASE WHEN array_contains(behavior_sequence, 'add_to_cart') THEN 1 ELSE 0 END AS step_add_to_cart,
        -- 判断用户是否提交过订单(无论之前是否浏览过商品或加入过购物车)
        CASE WHEN array_contains(behavior_sequence,'submit_order') THEN 1 ELSE 0 END AS step_submit_order,
        -- 判断用户是否支付过订单(无论之前是否浏览过商品、加入过购物车或提交过订单)
        CASE WHEN array_contains(behavior_sequence, 'pay_order') THEN 1 ELSE 0 END AS step_pay_order
    FROM
        user_behavior_sequence
)
SELECT
    user_id,
    step_view,
    step_add_to_cart,
    step_submit_order,
    step_pay_order
FROM
    user_funnel_matrix;

 步骤3:转化率和流失用户数计算

WITH user_behavior_sequence AS (
    SELECT
        user_id,
        collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`) AS behavior_sequence
    FROM
        user_behavior_log
),
user_funnel_matrix AS (
    SELECT
        user_id,
        -- 判断用户是否浏览过商品
        CASE WHEN array_contains(behavior_sequence, 'view') THEN 1 ELSE 0 END AS step_view,
        -- 判断用户是否加入过购物车(无论之前是否浏览过商品)
        CASE WHEN array_contains(behavior_sequence, 'add_to_cart') THEN 1 ELSE 0 END AS step_add_to_cart,
        -- 判断用户是否提交过订单(无论之前是否浏览过商品或加入过购物车)
        CASE WHEN array_contains(behavior_sequence,'submit_order') THEN 1 ELSE 0 END AS step_submit_order,
        -- 判断用户是否支付过订单(无论之前是否浏览过商品、加入过购物车或提交过订单)
        CASE WHEN array_contains(behavior_sequence, 'pay_order') THEN 1 ELSE 0 END AS step_pay_order
    FROM
        user_behavior_sequence
)
    , funnel_step_map AS (
    SELECT  user_id,
        max('step_view') AS step_name,
        max(step_view) AS step_value
    FROM
        user_funnel_matrix
    GROUP BY user_id
    UNION ALL
    SELECT  user_id,
        max('step_add_to_cart') AS step_name,
        max(step_add_to_cart) AS step_value
    FROM
        user_funnel_matrix
    WHERE step_view = 1  --只有浏览过商品的用户才会统计后续步骤
    GROUP BY user_id
    UNION ALL
    SELECT  user_id,
        max('step_submit_order') AS step_name,
        max(step_submit_order) AS step_value
    FROM
        user_funnel_matrix
    WHERE step_view = 1 --只有浏览过商品的用户才会统计后续步骤
    GROUP BY user_id
    UNION ALL
    SELECT user_id,
        max('step_pay_order') AS step_name,
        max(step_pay_order) AS step_value
    FROM
        user_funnel_matrix
    WHERE step_view = 1 --只有浏览过商品的用户才会统计后续步骤
        GROUP BY user_id
)

SELECT
    step_name,
    COUNT(DISTINCT user_id) AS total_user,
    sum(step_value) user_count,
    COUNT(DISTINCT user_id) - sum(step_value) AS lost_user_count,
    cast(sum(step_value) * 1.0 /  COUNT(DISTINCT user_id) as decimal(18,2)) AS conversion_rate
FROM
    funnel_step_map
group by step_name ;

 CET funnel_step_map 表示step_name与step_value进行key-value映射的过程组成map集合便于后续计算(标 1 成对)具体结果如下:

计算的最终结果如下: 

上述代码,行转列部分也可以用stack()语句进行优化,优化后更简洁优雅

WITH user_behavior_sequence AS (
    SELECT
        user_id,
        collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`) AS behavior_sequence
    FROM
        user_behavior_log
),
user_funnel_matrix AS (
    SELECT
        user_id,
        CASE WHEN array_contains(behavior_sequence, 'view') THEN 1 ELSE 0 END AS step_view,
        CASE WHEN array_contains(behavior_sequence, 'add_to_cart') THEN 1 ELSE 0 END AS step_add_to_cart,
        CASE WHEN array_contains(behavior_sequence,'submit_order') THEN 1 ELSE 0 END AS step_submit_order,
        CASE WHEN array_contains(behavior_sequence, 'pay_order') THEN 1 ELSE 0 END AS step_pay_order
    FROM
        user_behavior_sequence
),
flattened_steps AS (
    SELECT
        user_id,
        step_name,
        max(step_value) step_value
    FROM
        (select user_id
              , step_view
              , step_add_to_cart
              , step_submit_order
              , step_pay_order
         from user_funnel_matrix
          WHERE step_view = 1 --只有浏览过商品的用户才会统计后续步骤(非法数据过滤)
         ) t1
    LATERAL VIEW
        stack(
            4,
            'view', step_view,
            'add_to_cart', step_add_to_cart,
           'submit_order', step_submit_order,
            'pay_order', step_pay_order
        ) sub_table AS step_name, step_value
  group by user_id,step_name
)

SELECT
    step_name,
    SUM(step_value) AS user_count,
    COUNT(DISTINCT user_id) AS total_user,
    COUNT(DISTINCT user_id) - SUM(step_value) AS lost_user_count,
    cast(SUM(step_value) * 1.0 / COUNT(DISTINCT user_id) as decimal(18,2)) AS conversion_rate
FROM
    flattened_steps
GROUP BY
    step_name;

 中间结果如下

最终结果如下:

通过stack函数和LATERAL VIEW简化了数据的重组过程,相比多个UNION ALL的方式更加简洁紧凑。 

3 问题拓展

3.1 跳出率计算 

  •  跳出率:流失用户数 / 到达该步骤的用户数
  • 即:(总用户数 - 到达该步骤的用户数) / 到达该步骤的用户数

 SQL如下:

SELECT
    step_name,
    SUM(step_value) AS user_count,
    COUNT(DISTINCT user_id) AS total_user,
    COUNT(DISTINCT user_id) - SUM(step_value) AS lost_user_count,
    cast(SUM(step_value) * 1.0 / COUNT(DISTINCT user_id) as decimal(18,2)) AS conversion_rate,
    --计算跳出率
    cast((COUNT(DISTINCT user_id) - SUM(step_value)) * 1.0 / SUM(step_value) as decimal(18,2)) AS bounce_rate
FROM
    flattened_steps
GROUP BY
    step_name;

3.2  计算从浏览商品到支付订单的不同路径的用户数,并按照用户数降序排列。

  1. 路径构建:使用concat_ws函数将每个用户的行为路径拼接成一个字符串。
  2. 路径统计:统计每个路径的用户数,并按照用户数降序排列。

步骤1:构建用户浏览路径 

WITH user_behavior_sequence AS (
    SELECT
        user_id,
        concat_ws('->',collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`)) AS user_path
    FROM
        user_behavior_log
)

步骤2:统计每个路径下的用户数,并按用户数降序排列

WITH user_behavior_sequence AS (
    SELECT
        user_id,
        concat_ws('->',collect_list(behavior) OVER (PARTITION BY user_id ORDER BY `timestamp`)) AS user_path
    FROM
        user_behavior_log
)

SELECT
    user_path,
    count(DISTINCT user_id) AS user_count
FROM
    user_behavior_sequence
GROUP BY
    user_path
ORDER BY
    user_count DESC;


往期精彩

别再为用户流失头疼啦!掌握SQL秘籍,从零构建用户流失风险评估模型

SQL进阶技巧:如何分析连续签到领金币数问题?

SQL进阶技巧:如何计算商品需求与到货队列表进出计划?

解锁SQL无限可能:如何利用HiveSQL实现0-1背包问题?

数据科学与SQL:组距分组分析 | 区间分布问题

SQL进阶技巧:如何分析互逆记录?| 相互关注为例分析

 ~~SQL进阶实战技巧系列~~  

SQL很简单,可你却写不好?每天一点点,收获不止一点点_sql断点-CSDN博客文章浏览阅读1.3k次,点赞54次,收藏19次。在写本文之前,我需要跟大家探讨以下几个话题。SQL进阶技巧:车辆班次问题分析SQL 进阶技巧:断点重分组应用求连续段的最后一个数及每段的个数【拼多多面试题】SQL进阶技巧-:字符串时间序列分析法应用之用户连续签到天数及历史最大连续签到天数问题【腾讯面试题】SQL进阶技巧:断点重分组算法应用之用户订单日期间隔异常问题分析SQL进阶技巧:如何对连续多条记录进行合并?【GroingIO 面试题】SQL进阶技巧:断点重分组算法应用之相邻时间间隔跳变问题分析。_sql断点https://flyingsql.blog.csdn.net/article/details/143609283https://flyingsql.blog.csdn.net/article/details/143609283


http://www.kler.cn/a/526883.html

相关文章:

  • 【解决方案】MuMu模拟器移植系统进度条卡住98%无法打开
  • OSCP:发送钓鱼电子邮件执行客户端攻击
  • 网络爬虫学习:应用selenium获取Edge浏览器版本号,自动下载对应版本msedgedriver,确保Edge浏览器顺利打开。
  • 基于Springboot的智能学习平台系统【附源码】
  • NLP模型大对比:Transformer > RNN > n-gram
  • spring中解决循环依赖的方法
  • 【C++语言】卡码网语言基础课系列----7. 摆平积木
  • Learning Vue 读书笔记 Chapter 4
  • DDD - 领域事件_解耦微服务的关键
  • char和varchar的区别、varchar(?)中问号部分的含义、索引的作用
  • 使用Pygame制作“俄罗斯方块”游戏
  • Spring Boot项目如何使用MyBatis实现分页查询及其相关原理
  • AJAX案例——图片上传个人信息操作
  • C++中vector追加vector
  • elasticsearch的常见面试题?
  • 亚博microros小车-原生ubuntu支持系列:15 激光雷达巡逻
  • 机器学习7-全连接神经网络3-过拟合与超参数
  • 信号模块--simulink操作
  • [Effective C++]条款53-55 杂项讨论
  • Linux第104步_基于AP3216C之I2C实验
  • Python学习之旅:进阶阶段(七)数据结构-计数器(collections.Counter)
  • TCP编程
  • 【Linux】日志设计模式与实现
  • DeepSeek Janus-Pro:多模态AI模型的突破与创新
  • 集群部署时的分布式 Session 如何实现?
  • VUE组件如何开发