Doris 游戏数据分析实战,计算留存
基于Doris BitMap函数计算留存率,Doris拥有丰富的BitMap函数,计算留存使用到的有:bitmap_union、intersect_count、bitmap_intersect等;
其中核心思路就是:第一天与第二天的交集就是第二天的留存;第一天与第三天的交集就是第三天留存;以此类推,采用BitMap函数交集可以快速计算留存
例如:
1、已有埋点数据表
CREATE TABLE rt_dwd_app_event (
user_uid varchar(40) NOT NULL COMMENT '用户id',
create_time DATETIMEV2 NOT NULL COMMENT '时间',
event varchar(40) NOT NULL COMMENT '事件名称'
)
2、定义用户埋点BitMap表
CREATE TABLE `rt_ads_app_event_1105_union` (
event VARCHAR(50),
create_time DATETIME,
user_id BITMAP BITMAP_UNION
) ENGINE=OLAP
AGGREGATE KEY(`event`, `create_time`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`create_time`) BUCKETS 2;
3、将埋点数据表导入BitMap表
INSERT INTO rt_ads_app_event_bigmap (event, create_time, user_id)
SELECT
event,
create_time,
bitmap_union(BITMAP_HASH(user_uid))
FROM
rt_dwd_app_event
group by event, create_time
4、假如初始事件是A,回访事件是A, 计算目标留存
select
'2024-11-01' actday,
intersect_count(user_id,DATE_FORMAT(create_time,'%Y-%m-%d'),'2024-11-01') as login_num,
intersect_count(user_id,DATE_FORMAT(create_time,'%Y-%m-%d'),'2024-11-01','2024-11-02') as retention_1101_1,
intersect_count(user_id,DATE_FORMAT(create_time,'%Y-%m-%d'),'2024-11-01','2024-11-03') as retention_1101_2,
intersect_count(user_id,DATE_FORMAT(create_time,'%Y-%m-%d'),'2024-11-01','2024-11-04') as retention_1101_3
from rt_ads_app_event_bigmap where event='login'
5、假如初始事件是A,回访事件是B, 计算目标留存
SELECT
a.event AS a_start_event,
a.create_time AS a_start_time,
b.event AS b_start_event,
b.create_time AS b_start_time,
bitmap_count(bitmap_and(a.user_id, b.user_id)) AS retention
FROM
(SELECT *
FROM rt_ads_app_event_1105_union
WHERE event = 'login'
AND DATE_FORMAT(create_time, '%Y-%m-%d') = '2024-11-01') AS a
CROSS JOIN
(SELECT *
FROM rt_ads_app_event_1105_union
WHERE event = 'logout') AS b
ORDER BY
a.create_time,
b.create_time;
小结:自此完成哈!3000w的埋点数据,形成68个bigmap列;留存计算耗时在70~100ms左右