doris使用使用broker从HDFS导入数据
前提:doris使用broker导入需要单独部署broker,我这边看着部署教程都是不带broker部署部分的。
1.建表
我测试环境的hive数据是用时间分区,在导入时总是报错
type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel
使用SHOW PARTITIONS FROM demo.`table1`发现没有我导入数据所对应的分区
解决办法:用如下建表语句在新建时就把分区建好,其中比较关键的是"dynamic_partition.create_history_partition" = "true"这个参数。由于默认是false,所以不管设置"dynamic_partition.start" = "-300",这个参数为多大,历史分区就是建不起来,如果导入的数据分区是用历史时间命名,需要带上第一个参数为true才能新建出来历史分区。
建表语句
HDFS导入doris
CREATE TABLE demo.`table1` (
`a` decimal(10, 0) NULL COMMENT "a",
`b` varchar(65533) NULL COMMENT "b",
`c` varchar(65533) NULL COMMENT "c",
`date` int NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`a`)
COMMENT "测试表"
PARTITION BY RANGE(`date`)
(
PARTITION `p20240301` VALUES LESS THAN ("20240302"),
PARTITION `p20240302` VALUES LESS THAN ("20240303"),
PARTITION `p20240303` VALUES LESS THAN ("20240304")
)
DISTRIBUTED BY RANDOM
PROPERTIES (
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-300",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "1",
"dynamic_partition.replication_num" = "1",
"dynamic_partition.create_history_partition" = "true"
)
;
2.导入
导入语句
use demo;
LOAD LABEL load_table1
(
DATA INFILE("hdfs://ip:port/user/hive/warehouse/table1/*/*")
INTO TABLE table1
FORMAT AS "ORC"
(
a,b,c
)
COLUMNS FROM PATH AS (date)
)
WITH BROKER broker1
(
"hadoop.security.authentication" = "kerberos",
"kerberos_principal" = "xxx/xxx@HADOOP.COM",
"kerberos_keytab" = "/opt/keytabs/xxx.keytab"
)
PROPERTIES
(
"timeout" = "72000"
);
3.查看导入结果
show load order by CreateTime desc;