当前位置：首页 > article >正文

hive alter table add columns 是否使用 cascade 的方案

article 2025/2/21 3:03:59

结论

alter table xxx add columns 时加上 cascade 时，会把所有的分区都加上此字段。如果不加则只有新的分区会加上此字段，旧的分区没有此字段，即便数据文件里有对应的数据，也不能显示内容。

如果分区都是 insert overwrite 生成的，并且旧分区的数据不再重新生成，可以在 add columns 不用cascade，这样旧的分区对应的列显示 null。新的分区正常显示新增的列。
如果分区都是 insert overwrite 生成的，并且旧分区的数据需要重新生成。两种方案：1. 可以在 add columns 不用 cascade。然后每个分区先执行 drop partition，然后再执行 insert overwrite。2.可以在 add columns 使用 cascade，然后再执行 insert overwrite。如果方案2报错，则只能使用方案1.
如果文件是从外部生成，然后放到对应分区位置上，并且文件里已经有要加的字段数据。需要使用 cascade。如果使用 cascade 报错，那么看表是否是外部表。如果不是外部表，则先转成外部表。如果是外部表，则直接 drop partition 然后再 add partition location 增加分区。
如果文件是从外部生成，然后放到对应分区位置上，并且文件里没有要加的字段数据。不需要使用 cascade。旧的分区对应的列显示 null。新的分区正常显示新增的列。

注：

判断表是否是外部表，使用 ‘show create table xxx’, 如果生成的是 ‘CREATE TABLE’ 是内部表，如果是 CREATE EXTERNAL TABLE 是外部表。
把表从外部表转成内部表 ALTER TABLE xxx SET TBLPROPERTIES('EXTERNAL'='FALSE');
把表从内部表转成外部表 ALTER TABLE <table> SET TBLPROPERTIES('EXTERNAL'='TRUE');

测试 cascade 的作用

准备文件
data.txt

key1,value1
key2,value2

测试 no cascade

create table t_no_cascade(c1 string) partitioned by (pt string) row format delimited
FIELDS TERMINATED BY ',' stored as textfile;

增加分区 pt=1

load data local inpath 'data.txt' overwrite into table t_no_cascade partition(pt=1);

检索结果，显示 c1 和 pt 字段。

select * from t_no_cascade where pt=1;
OK
t_no_cascade.c1	t_no_cascade.pt
key1	1
key2	1

增加字段

alter table t_no_cascade add columns(c2 string) ;

再次检索分区pt=1，因为分区没有定义 c2, 所以 c2 为 null。

select * from t_no_cascade where pt=1;
OK
t_no_cascade.c1	t_no_cascade.c2	t_no_cascade.pt
key1	NULL	1
key2	NULL	1

增加新分区 pt=2
新增的字段对新增的分区有效。

load data local inpath 'data.txt' overwrite into table t_no_cascade partition(pt=2);

检索分区
因为新分区是表

select * from t_no_cascade where pt=2;
OK
t_no_cascade.c1	t_no_cascade.c2	t_no_cascade.pt
key1	value1	2
key2	value2	2

重新复写老分区
重新复写老分区不能看到新的列数据。如果需要，可以先删除老分区，再使用 insert overwrite。

insert overwrite table t_no_cascade partition(pt=1) select c1,c2 from t_no_cascade where pt=2;

insert overwrite table xxx partition 还是使用之前的 partition id，所以此分区还是没有新的字段。

select * from t_no_cascade where pt=1;
OK
t_no_cascade.c1	t_no_cascade.c2	t_no_cascade.pt
key1	NULL	1
key2	NULL	1

drop 老分区再使用 insert overwrite
drop 老分区再使用 insert overwrite，可以看到新的字段。

alter table t_no_cascade drop partition(pt=1);
insert overwrite table t_no_cascade partition(pt=1) select c1,c2 from t_no_cascade where pt=2;

这时的 partition(pt=1) 是新的分区id，这时可以看到新的数据。

select * from t_no_cascade where pt=1;
OK
t_no_cascade.c1	t_no_cascade.c2	t_no_cascade.pt
key1	value1	1
key2	value2	1

2. 测试 cascade

create table t_cascade(c1 string) partitioned by (pt string) row format delimited
FIELDS TERMINATED BY ',' stored as textfile;

增加分区 pt=1

load data local inpath 'data.txt' overwrite into table t_cascade partition(pt=1);

检索结果，显示 c1 和 pt 字段。

select * from t_cascade where pt=1;
OK
t_cascade.c1	t_cascade.pt
key1	1
key2	1

增加字段
使用 cascade 递归的给各分区增加上字段。

alter table t_cascade add columns(c2 string) cascade;

再次检索分区pt=1。
cascade 后，老的分区也加上了字段。

 select * from t_cascade where pt=1;
OK
t_cascade.c1	t_cascade.c2	t_cascade.pt
key1	value1	1
key2	value2	1

http://www.kler.cn/a/391545.html

相关文章：

Linux后台运行jar包，nohup、＞、

源码解析-Spring Eureka

Qt 获取当前系统中连接的所有USB设备的信息 lsusb版

Spring Boot编程训练系统：架构设计与技术选型

creo toolkit二次开发学习之获取任意选择模型作为元件，并进行获取约束等

6.2 对角化矩阵（1）

【机器学习导引】ch6-支持向量机

RabbitMQ队列详细属性（重要）

【MATLAB源码-第215期】基于matlab的8PSK调制CMA均衡和RLS-CMA均衡对比仿真，对比星座图和ISI。

Django前后端分离基本流程

计算机网络：运输层 —— 运输层端口号

解决全局安装@vue/cli 后vue -V不是内部或外部命令

JVM（二、类加载系统）

20. 类模板

SpringBoot Tomcat 请求处理全流程详解

汇川PLC EtherNET/IP无线通信，开启国产工控无线互联新时代

SASS 控制指令详解@for、@if、@each、@while

面试问答：TCP协议中的三开四断，三次握手四次挥手

关于CSS表达使中使用的 max() 函数

sqlite3数据库的相关API使用