当前位置: 首页 > article >正文

二百七十九、ClickHouse——用Kettle对DWD层清洗数据进行增量补全

一、目的

由于ODS层表数据会因为各种原因缺失部分,所以对缺失的数据进行补全

二、实施步骤

2.1 确认补全策略

比如使用使用前一周同期的历史数据进行补齐

2.2 SQL语句

select
generateUUIDv4()  as  id,
a2.device_no, t4.source_device_type, t4.sn, t4.model, a2.miss_time create_time,t4.cycle,
t4.volume_sum,t4.speed_avg, t4.volume_left,t4.speed_left,t4.volume_straight,
t4.speed_straight,t4.volume_right, t4.speed_right,t4.volume_turn,t4.speed_turn,
cast(a2.day as String) day
from (
select
       a1.device_no,a1.day, a1.all_time  miss_time,
       (all_time - interval 7 day) create_time_7
from (
select
t1.device_no,t1.day,t2.all_time
from hurys_jw.dwd_turnratio as t1
cross join(
select
frequency_rate,
toDateTime('2024-12-16 12:00:00') new_time,
toDateTime(concat(toString(toDate('2024-12-16 12:00:00')),' ', frequency_time)) all_time,
(toDateTime(concat(toString(toDate('2024-12-16 12:00:00')),' ', frequency_time))  + interval 5 minute) all_time_5
from hurys_jw.dwd_frequency_time
) as t2
where t2.frequency_rate='300' and  toDate(t2.all_time)=t1.day and all_time <= new_time  and all_time_5 > new_time
group by t1.device_no, t1.day, t2.all_time
) as a1
left join hurys_jw.dwd_turnratio as t3
on a1.device_no=t3.device_no and a1.all_time=t3.create_time  and a1.day=t3.day
where toYear(t3.create_time)=1970
    ) as a2
left join hurys_jw.dwd_turnratio as t4
on a2.device_no=t4.device_no  and a2.create_time_7 = t4.create_time
where t4.cycle is not null
;

最核心的是红色部分,由于每个任务是5分钟执行一次,因此每次时段是前5分钟的数据。

2.3 Kettle任务

2.3.1 newtime

select(
select
toDateTime(create_time)
from  hurys_jw.dwd_statistics
order by create_time desc limit 1) as new_time

2.3.2 替换NULL值

2.3.3 表输入

select
generateUUIDv4()  as  id,
a2.device_no, t4.source_device_type, t4.sn, t4.model, a2.miss_time create_time,t4.cycle,
a2.lane_no , t4.lane_type, a2.section_no,a2.coil_no,t4.volume_sum, t4.volume_person,
t4.volume_car_non,t4.volume_car_small,t4.volume_car_middle,t4.volume_car_big, t4.speed_avg,
t4.speed_85,t4.time_occupancy,t4.average_headway , t4.average_gap, cast(a2.day as String) day
from (
select
       a1.device_no,a1.day, a1.all_time  miss_time,a1.lane_no , a1.section_no,a1.coil_no,
       (all_time - interval 7 day) create_time_7
from (
select
t1.device_no,t1.day,t1.lane_no,t1.section_no,t1.coil_no,t2.all_time
from hurys_jw.dwd_statistics as t1
cross join(
select
frequency_rate,
toDateTime(?) new_time,
toDateTime(concat(toString(toDate(new_time)),' ', frequency_time)) all_time,
(all_time + interval 5 minute) all_time_5
from hurys_jw.dwd_frequency_time ) as t2
where t2.frequency_rate=t1.cycle  and  toDate(t2.all_time)=t1.day and all_time <= new_time  and all_time_5 > new_time
group by t1.device_no, t1.day, t1.lane_no, t1.section_no, t1.coil_no, t2.all_time

) as a1
left join hurys_jw.dwd_statistics as t3
on a1.device_no=t3.device_no and a1.all_time=t3.create_time and a1.lane_no=t3.lane_no
and a1.section_no=t3.section_no and a1.coil_no=t3.coil_no and a1.day=t3.day
where toYear(t3.create_time)=1970
    ) as a2
left join hurys_jw.dwd_statistics as t4
on a2.device_no=t4.device_no  and  a2.lane_no=t4.lane_no  and a2.section_no=t4.section_no
and a2.coil_no=t4.coil_no and a2.create_time_7 = t4.create_time
where t4.cycle is not null
;

最核心的是红色部分,怎么实现1个5分钟周期内的增量补全

2.3.4 字段选择

2.3.5 clickhouse输出

2.3.6 运行Kettle任务

搞定!!!


http://www.kler.cn/a/442968.html

相关文章:

  • 【AniGS】论文阅读
  • 2024AAAI SCTNet论文阅读笔记
  • Flink概念知识讲解之:Restart重启策略配置
  • FastAPI 的进阶应用与扩展技术:异步编程与协程、websocket、celery
  • Invalid bound statement (not found) 错误解决
  • BTP Integration Suite CPI Apache Camel
  • 太速科技-501-基于TMS320C6670的软件无线电核心板
  • 分布式事务seata(AT)与nacos整合-笔记2
  • Vue入门到精通:运行环境
  • CTFHUB 信息泄露 -phpinfo
  • Scratch教学作品 | 圣诞节平台游戏——在节日中挑战冒险,收集礼物吧! ✨
  • 基于Spring Boot的社区药房系统
  • STM32坑分享——擦写单片机内部Flash时影响串口通信
  • 在Linux系统中, 查询mysql
  • Linux高性能服务器编程 | 读书笔记 | 10. 高性能I/O框架库Libevent
  • 【SpringBoot中MySQL生成唯一ID的常见方法】
  • 服务器运行Vue项目
  • /:087启动游戏时提示丢失”d3dx···.dll””VCOMP···.dll”
  • React 第十七节 useMemo用法详解
  • [NOIP2016 普及组] 海港 -STL-队列queue
  • 剑指Offer|LCR 002. 二进制求和
  • vue3+ant design vue实现日期选择器不展示清除按钮
  • java微服务中,对分布式锁、分布式事务处理建议