当前位置: 首页 > article >正文

Hive msck 描述

MSCK SQL 语法如下:

MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];

1. 背景

先创建3个分区,把分区文件删除。这时 metastore 有这个3个分区,文件上不存在。再在文件系统上创建其他两个分区,这两个分区在 metastore 不存在。

1.1 初始化SQL

drop table if exists t_drop_partitions;
create table  t_drop_partitions(c1 string)partitioned by (pt string,hr string) stored as textfile location '/tmp/t_drop_partitions';
alter table t_drop_partitions add partition(pt='20230101',hr='00') location '/tmp/t_drop_partitions/pt=20230101/hr=00';
alter table t_drop_partitions add partition(pt='20230101',hr='01') location '/tmp/t_drop_partitions/pt=20230101/hr=01';
alter table t_drop_partitions add partition(pt='20230101',hr='02') location '/tmp/t_drop_partitions/pt=20230101/hr=02';
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=00;
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=01;
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=02;
! hadoop fs -mkdir -p /tmp/t_drop_partitions/pt=20230101/hr=03;
! hadoop fs -mkdir -p /tmp/t_drop_partitions/pt=20230101/hr=04;
  • show partitions t_drop_partitions;
    显示有 3 个分区
hive>  show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02
  • ls 文件系统显示2个目录
 ! hadoop fs -ls /tmp/t_drop_partitions/pt=20230101;
 Found 2 items
drwxr-xr-x   - hive hadoop          0 2023-04-23 14:52 /tmp/t_drop_partitions/pt=20230101/hr=03
drwxr-xr-x   - hive hadoop          0 2023-04-23 14:52 /tmp/t_drop_partitions/pt=20230101/hr=04

2. MSCK TABLE

2.1 MSCK TABLE 没有选项

执行初始化SQL。

hive> MSCK TABLE t_drop_partitions;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Time taken: 0.094 seconds, Fetched: 2 row(s)

MSCK TABLE 不做任何操作,仅仅显示 metastore 和 文件系统中不匹配的分区。
Partitions not in metastore: 这些分区存在于文件系统,但是不在 Metastore 。
Partitions missing from filesystem: 这些存在于Metastore,但是不在文件系统 。

  • show partitions 还是显示3个分区
hive> show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02
Time taken: 0.105 s

2.2 MSCK REPAIR TABLE

MSCK [REPAIR] TABLE 对表的分区进行修复。后面可以根修复的选项,默认是 ADD。
ADD PARTITIONS: 把文件系统上存在,metastore 上不存在的分区添加到 metastore。
DROP PARTITIONS: 把 metastore 上存在,文件系统上不存在的分区从 metastore 删除。
SYNC PARTITIONS: 代表 ADD PARTITIONS 和 DROP PARTITIONS 同时选中。

2.2. 1ADD PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE。

hive>  MSCK REPAIR TABLE t_drop_partitions;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=03
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=04

Repair: Added partition to metastore: 部分显示了 metastore 添加了哪些分区。

hive> show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02
pt=20230101/hr=03
pt=20230101/hr=04

现在有5个分区。

2.2. DROP PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE xxx DROP PARTITIONS。

hive>  MSCK REPAIR TABLE t_drop_partitions DROP PARTITIONS;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=01
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=00

Repair: Dropped partition: 部分显示了从metastore 删除了哪些分区。

hive> show partitions t_drop_partitions;
OK
Time taken: 0.066 seconds

现在有0个分区。

2.3. SYNC PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE xxx SYNC PARTITIONS。

Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=03
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=04
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=00
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=01
Time taken: 0.225 seconds, Fetched: 7 row(s)

Repair: Added partition to metastore: 显示了在 metastore 添加了哪些分区。
Repair: Dropped partition from metastore : 显示了从metastore 删除了哪些分区。

hive> show partitions t_drop_partitions;
OK
Time taken: 0.066 seconds

现在有0个分区。


http://www.kler.cn/a/13448.html

相关文章:

  • matlab数据归一化与反归一化处理
  • 信息安全复习四:置换密码乘积密码隐写术
  • Baumer工业相机中偏振相机如何使用Baumer堡盟GAPI SDK来进行偏振数据的计算转换输出(C#)
  • 【NestJs】数据库重构
  • 德赛西威上海车展重磅发布Smart Solution 2.0,有哪些革新点?
  • 2023年全国最新高校辅导员精选真题及答案60
  • 外链是什么意思,什么是外链
  • educoder实训——函数【1】
  • 应用程序性能监控系统(APM) :SkyWalking 简介
  • AI智能课程第一讲:chatgpt介绍
  • 快手社招Java后端开发岗面试,被问麻了
  • Python 机器人学习手册:6~10
  • 2023的指纹浏览器盘点,指纹浏览器使用全攻略
  • 告别StringUtil:使用Java的全新String API优化你的代码
  • python并行运算
  • Locust 压力测试helloworld
  • 从promise到await
  • Vue表单基本操作-收集表单数据
  • python json数据写入excel
  • 代码随想录Day58