将复杂类型列展开成多行,附带json解析
针对的问题:将数仓表中存储的json数组,展开多行后进行分析计算。
--HiveSQL
with tmp_score_table as
(
select '202401' as part_id,'2024级1班' as part_name
,'[{"stu_id":"20240101","score_1":99,"score_2":78},{"stu_id":"20240102","score_1":89,"score_2":100}]' as score_array
union all
select '202402' as part_id,'2024级2班' as part_name
,'[{"stu_id":"20240201","score_1":88,"score_2":59},{"stu_id":"20240202","score_1":100,"score_2":71}]' as score_array
)
select part_id,part_name,scorealias.stu_id,scorealias.score_1,scorealias.score_2
from
(
select part_id, part_name
,split(regexp_replace(regexp_replace(score_array,'\\[|\\]',''),'\\}\\,\\{','\\}\\;\\{'),'\\;') as score
from tmp_score_table
) t1
lateral view explode(t1.score) adtable as scoreview
lateral view json_tuple(scoreview,'stu_id','score_1','score_2') scorealias as stu_id,score_1,score_2
;
--SparkSQL
with tmp_score_table as
(
select '202401' as part_id,'2024级1班' as part_name
,'[{"stu_id":"20240101","score_1":99,"score_2":78},{"stu_id":"20240102","score_1":89,"score_2":100}]' as score_array
union all
select '202402' as part_id,'2024级2班' as part_name
,'[{"stu_id":"20240201","score_1":88,"score_2":59},{"stu_id":"20240202","score_1":100,"score_2":71}]' as score_array
)
select part_id,part_name,json_tuple(score,'stu_id','score_1','score_2') as (stu_id,score_1,score_2)
from
(
select part_id, part_name
,explode(split(regexp_replace(regexp_replace(score_array, '\\[|\\]',''),'\\}\\,\\{','\\}\\;\\{'),'\\;')) as score
from tmp_score_table
) t1
;
1、关于explode函数:将数组或映射结构拆分为多个单独的行
SparkSQL支持explode函数,但是Hive不直接支持explode函数。这是因为explode是一个非标准的Hive函数,它是属于Hive的内置Lateral View语法的一部分。Hive有自己的内置LATERAL VIEW EXPLODE功能,可以用来解决类似的问题。
2、关于json_tuple函数:用于一个标准的JSON字符串中,按照输入的一组键(key1,key2,...)抽取各个键指定的字符串。
json_tuple()是输入一行,输出一行。SparkSQL支持json_tuple函数,但是其属于UDTF,而Hive对于UDTF是不能直接查询表字段的,要使用侧视图才行。
3、关于Hive函数lateral view
主要是将原本汇总在一条(行)的数据拆分成多条(行)成虚拟表,再与原表进行笛卡尔积,从而得到明细表。配合UDTF函数使用,一般情况下经常与explode函数搭配