当前位置: 首页 > article >正文

CDP集成Hudi实战-编译部署

[〇]关于本文

Hudi 1.0.0 是一个重要的里程碑版本,重点改进了数据格式、性能和并发写入支持,同时引入了更灵活的索引和文件格式管理,为用户带来了更高的可扩展性和易用性。本文是关于如何在CDP-7.3.1的环境下编译部署Hudi-1.0.0

软件版本
Hudi1.0.0
Hadoop Version3.1.1.7.3.1.0-197
Hive Version3.1.3000.7.3.1.0-197
Spark Version3.4.1.7.3.1.0-197

[一]编译Hudi

1-安装mvn

[root@cdp73-1 software]# ls -al
total 45872
drwxr-xr-x.  3 root root        87 Jan  1 04:08 .
drwxr-xr-x.  4 root root        38 Jan  1 01:53 ..
-rw-r--r--.  1 root root   9102945 Jan  1 04:09 apache-maven-3.9.9-bin.tar.gz
-rwx------.  1 root root  37863424 Jan  1 02:15 hudi-1.0.0.src.tar
[root@cdp73-1 software]# tar -xf apache-maven-3.9.9-bin.tar.gz
[root@cdp73-1 software]# mv apache-maven-3.9.9 maven
[root@cdp73-1 software]# export MAVEN_HOME=/opt/software/maven
[root@cdp73-1 software]# export PATH=$PATH:$MAVEN_HOME/bin
[root@cdp73-1 software]# mvn -v
Apache Maven 3.9.9 (8e8579a9e76f7d015ee5ec7bfcdc97d260186937)
Maven home: /opt/software/maven
Java version: 1.8.0_372, vendor: Temurin, runtime: /usr/java/jdk1.8u372-b07-cloudera/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.18.0-553.el8_10.x86_64", arch: "amd64", family: "unix"
[root@cdp73-1 software]#

2-解压

[root@cdp73-1 software]# tar -xf hudi-1.0.0.src.tar
[root@cdp73-1 software]# cd hudi-1.0.0/
[root@cdp73-1 hudi-1.0.0]#

3-编译

[root@cdp73-1 hudi-1.0.0]#  mvn clean package -DskipTests -Dspark3.4 -Dflink1.14 -Dscala-2.12 -Dhadoop.version=3.1.1 -Pflink-bundle-shade-hive3

错误一

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile (default-testCompile) on project hudi-hive-sync: Compilation failure
[ERROR] /home/opt/software/hudi-1.0.0/hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/HiveTestUtil.java:[250,17] method shutdown in class org.apache.zookeeper.server.ZooKeeperServer cannot be applied to given types;
[ERROR]   required: no arguments
[ERROR]   found: boolean
[ERROR]   reason: actual and formal argument lists differ in length
[ERROR]

解决办法:修改hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/HiveTestUtil.java文件第250行把 zkServer.shutdown(true);改为 zkServer.shutdown();

错误二

[INFO] --- enforcer:3.0.0-M1:enforce (enforce-logging) @ hudi-spark-client ---
[WARNING] The artifact org.slf4j:slf4j-log4j12:jar:2.0.7 has been relocated to org.slf4j:slf4j-reload4j:jar:2.0.7
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.BannedDependencies failed with message:
Found Banned Dependency: ch.qos.logback:logback-classic:jar:1.2.10
Use 'mvn dependency:tree' to locate the source of the banned dependencies.

解决办法去掉pom.xml中 <exclude>ch.qos.logback:logback-classic</exclude>

错误三

[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:1.0.0
[ERROR] dependency: io.confluent:kafka-avro-serializer:jar:5.5.0 (compile)
[ERROR] 	Could not find artifact io.confluent:kafka-avro-serializer:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:common-config:jar:5.5.0 (compile)
[ERROR] 	Could not find artifact io.confluent:common-config:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:common-utils:jar:5.5.0 (compile)
[ERROR] 	Could not find artifact io.confluent:common-utils:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:kafka-schema-registry-client:jar:5.5.0 (compile)
[ERROR] 	Could not find artifact io.confluent:kafka-schema-registry-client:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:kafka-protobuf-serializer:jar:5.5.0 (compile)
[ERROR] 	Could not find artifact io.confluent:kafka-protobuf-serializer:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:kafka-json-schema-serializer:jar:5.5.0 (compile)
[ERROR] 	Could not find artifact io.confluent:kafka-json-schema-serializer:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR]
[ERROR] -> [Help 1]

解决办法:手动添加依赖

[root@cdp73-1 software]# wget http://packages.confluent.io/archive/5.5/confluent-5.5.0-2.12.zip
[root@cdp73-1 software]# unzip confluent-5.5.0-2.12.zip
[root@cdp73-1 software]# cd confluent-5.5.0/
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/confluent-common/common-config-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=ommon-utils -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/confluent-common/ommon-utils-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/confluent-common/common-utils-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/kafka-rest/kafka-avro-serializer-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/kafka-rest/kafka-schema-registry-client-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-json-schema-serializer -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/kafka-rest/kafka-json-schema-serializer-5.5.0.jar

错误四

[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:1.0.0
[ERROR] dependency: com.github.everit-org.json-schema:org.everit.json.schema:jar:1.12.1 (compile)
[ERROR] 	Could not find artifact com.github.everit-org.json-schema:org.everit.json.schema:jar:1.12.1 in Mulesoft-Repository (https://repository.mulesoft.org/nexus/content/repositories/public/)
[ERROR] 	Could not find artifact com.github.everit-org.json-schema:org.everit.json.schema:jar:1.12.1 in confluent (http://packages.confluent.io/maven/)
[ERROR] 	Could not find artifact com.github.everit-org.json-schema:org.everit.json.schema:jar:1.12.1 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR]
[ERROR] -> [Help 1]

解决办法:手动添加文件

mvn install:install-file   -DgroupId=com.github.everit-org.json-schema   -DartifactId=org.everit.json.schema   -Dversion=1.12.1   -Dpackaging=jar   -Dfile=/home/opt/software/confluent-5.5.0/share/java/ksqldb/org.everit.json.schema-1.12.1.jar

错误五

25/01/03 00:56:55 WARN  hudi.AutoRecordKeyGenerationUtils$: [Thread-6]: Precombine field ts will be ignored with auto record key generation enabled
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/lib/spark3/python/pyspark/sql/readwriter.py", line 1398, in save
    self._jwrite.save(path)
  File "/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/lib/spark3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/lib/spark3/python/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
  File "/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/lib/spark3/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o104.save.
: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V

解决:排除低版本jetty​​​​​​​,添加hudi指定版本的jetty

vim /opt/software/hudi-1.0.0/packaging/hudi-spark-bundle/pom.xml

vim /opt/software/hudi-1.0.0/packaging/hudi-utilities-bundle/pom.xml

添加下面配置到pom.xml的<dependencies></dependencies>中

<!-- 增加hudi配置版本的jetty -->
    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-server</artifactId>
      <version>${jetty.version}</version>
    </dependency>

    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-util</artifactId>
      <version>${jetty.version}</version>
    </dependency>

    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-webapp</artifactId>
      <version>${jetty.version}</version>
    </dependency>

    <dependency>
      <groupId>org.eclipse.jetty</groupId>
      <artifactId>jetty-http</artifactId>
      <version>${jetty.version}</version>
    </dependency>

成功

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Hudi 1.0.0:
[INFO]
[INFO] Hudi ............................................... SUCCESS [  1.786 s]
[INFO] hudi-tests-common .................................. SUCCESS [  2.230 s]
[INFO] hudi-io ............................................ SUCCESS [  7.334 s]
[INFO] hudi-common ........................................ SUCCESS [ 26.870 s]
[INFO] hudi-hadoop-common ................................. SUCCESS [  9.127 s]
[INFO] hudi-hadoop-mr ..................................... SUCCESS [  6.489 s]
[INFO] hudi-sync-common ................................... SUCCESS [  2.384 s]
[INFO] hudi-hive-sync ..................................... SUCCESS [  7.011 s]
[INFO] hudi-aws ........................................... SUCCESS [  4.778 s]
[INFO] hudi-timeline-service .............................. SUCCESS [  2.520 s]
[INFO] hudi-client ........................................ SUCCESS [  0.071 s]
[INFO] hudi-client-common ................................. SUCCESS [ 14.514 s]
[INFO] hudi-spark-client .................................. SUCCESS [ 32.544 s]
[INFO] hudi-spark-datasource .............................. SUCCESS [  0.062 s]
[INFO] hudi-spark-common_2.12 ............................. SUCCESS [ 36.964 s]
[INFO] hudi-spark3-common ................................. SUCCESS [ 14.524 s]
[INFO] hudi-spark3.5.x_2.12 ............................... SUCCESS [ 20.656 s]
[INFO] hudi-java-client ................................... SUCCESS [  5.514 s]
[INFO] hudi-spark_2.12 .................................... SUCCESS [01:00 min]
[INFO] hudi-gcp ........................................... SUCCESS [ 32.993 s]
[INFO] hudi-utilities_2.12 ................................ SUCCESS [ 53.566 s]
[INFO] hudi-utilities-bundle_2.12 ......................... SUCCESS [02:49 min]
[INFO] hudi-cli ........................................... SUCCESS [ 29.945 s]
[INFO] hudi-flink-client .................................. SUCCESS [ 50.970 s]
[INFO] hudi-datahub-sync .................................. SUCCESS [  9.944 s]
[INFO] hudi-adb-sync ...................................... SUCCESS [  9.229 s]
[INFO] hudi-sync .......................................... SUCCESS [  0.055 s]
[INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [ 50.699 s]
[INFO] hudi-datahub-sync-bundle ........................... SUCCESS [ 36.165 s]
[INFO] hudi-hive-sync-bundle .............................. SUCCESS [ 30.287 s]
[INFO] hudi-aws-bundle .................................... SUCCESS [ 37.358 s]
[INFO] hudi-gcp-bundle .................................... SUCCESS [ 38.007 s]
[INFO] hudi-spark3.5-bundle_2.12 .......................... SUCCESS [ 48.464 s]
[INFO] hudi-presto-bundle ................................. SUCCESS [ 30.763 s]
[INFO] hudi-utilities-slim-bundle_2.12 .................... SUCCESS [01:43 min]
[INFO] hudi-timeline-server-bundle ........................ SUCCESS [ 36.132 s]
[INFO] hudi-trino-bundle .................................. SUCCESS [01:22 min]
[INFO] hudi-examples ...................................... SUCCESS [  0.058 s]
[INFO] hudi-examples-common ............................... SUCCESS [  1.550 s]
[INFO] hudi-examples-spark ................................ SUCCESS [ 10.365 s]
[INFO] hudi-flink-datasource .............................. SUCCESS [  0.055 s]
[INFO] hudi-flink1.20.x ................................... SUCCESS [ 46.349 s]
[INFO] hudi-flink ......................................... SUCCESS [ 39.761 s]
[INFO] hudi-examples-flink ................................ SUCCESS [  3.662 s]
[INFO] hudi-examples-java ................................. SUCCESS [  2.409 s]
[INFO] hudi-flink1.20-bundle .............................. SUCCESS [ 51.771 s]
[INFO] hudi-examples-k8s .................................. SUCCESS [01:55 min]
[INFO] hudi-flink1.14.x ................................... SUCCESS [ 39.660 s]
[INFO] hudi-flink1.15.x ................................... SUCCESS [ 34.491 s]
[INFO] hudi-flink1.16.x ................................... SUCCESS [ 25.597 s]
[INFO] hudi-flink1.17.x ................................... SUCCESS [ 28.441 s]
[INFO] hudi-flink1.18.x ................................... SUCCESS [ 25.675 s]
[INFO] hudi-flink1.19.x ................................... SUCCESS [ 24.259 s]
[INFO] hudi-kafka-connect ................................. SUCCESS [  9.929 s]
[INFO] hudi-kafka-connect-bundle .......................... SUCCESS [01:03 min]
[INFO] hudi-cli-bundle_2.12 ............................... SUCCESS [01:00 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  28:09 min
[INFO] Finished at: 2025-01-01T23:33:07-05:00
[INFO] ------------------------------------------------------------------------

编译成功后再各个模块的文件夹

[root@cdp73-1 hudi-1.0.0]# ls -al packaging/hudi-spark-bundle/target/
total 132928
drwxr-xr-x. 8 root root       4096 Jan  2 19:23 .
drwxr-xr-x. 4  501 games        46 Jan  2 19:22 ..
drwxr-xr-x. 4 root root         33 Jan  2 19:22 classes
-rw-r--r--. 1 root root      15319 Jan  2 19:22 dependency-reduced-pom.xml
drwxr-xr-x. 3 root root         25 Jan  2 19:22 generated-sources
-rw-r--r--. 1 root root  109618399 Jan  2 19:23 hudi-spark3.4-bundle_2.12-1.0.0.jar
-rw-r--r--. 1 root root   26437029 Jan  2 19:23 hudi-spark3.4-bundle_2.12-1.0.0-sources.jar
drwxr-xr-x. 2 root root         28 Jan  2 19:22 maven-archiver
drwxr-xr-x. 3 root root         22 Jan  2 19:22 maven-shared-archive-resources
drwxr-xr-x. 3 root root         35 Jan  2 19:22 maven-status
-rw-r--r--. 1 root root      14786 Jan  2 19:22 original-hudi-spark3.4-bundle_2.12-1.0.0.jar
-rw-r--r--. 1 root root      10730 Jan  2 19:22 original-hudi-spark3.4-bundle_2.12-1.0.0-sources.jar
-rw-r--r--. 1 root root         30 Jan  2 19:22 .plxarc
-rw-r--r--. 1 root root        843 Jan  2 19:22 rat.txt
drwxr-xr-x. 3 root root         22 Jan  2 19:22 test-classes
[root@cdp73-1 hudi-1.0.0]#


http://www.kler.cn/a/465923.html

相关文章:

  • ffmpeg之yuv格式转h264
  • ABAP 两个内表不同名称字段赋值的方法
  • Python爬虫 - 豆瓣图书数据爬取、处理与存储
  • 基于Spring Boot + Vue3实现的在线汽车保养维修预约管理系统源码+文档
  • 解决 ffmpeg “Unknown encoder ‘hevc_nvenc‘“
  • 分析服务器 systemctl 启动gozero项目报错的解决方案
  • gozero框架crm系统中的查询条件动态筛选业务设计与实践
  • unity学习5:创建一个自己的3D项目
  • 04-c++类和对象(下)
  • FreshTomato 路由器固件常见配置以及踩坑记录
  • PyQt5:自定义QListView显示
  • 高质量C++小白教程:2.10-预处理器简介
  • GIT 企业级开发学习 1
  • Emacs折腾日记(七)——布尔变量、逻辑运算符与位运算
  • token、cookie和session
  • 【AIGC】 ChatGPT实战教程:如何高效撰写学术论文引言
  • MR30分布式IO模块助力PLC,打造高效智能仓储系统
  • [redux] useDispatch的两种用法
  • OCR图片中文字识别(Tess4j)
  • Lua开发环境如何安装?保姆级教程
  • 大数据-268 实时数仓 - ODS层 将 Kafka 中的维度表写入 DIM
  • Java编程规约:集合处理
  • 线性变换在机器学习中的应用实例
  • 深入AIGC领域:ChatGPT开发者获取OpenAI API Key的实用指南
  • 公司一个bug引出的对象判空、空指针异常话题之NUll、isEmpty()和“”区别
  • 【华为OD-E卷 - Linux发行版的数量 100分(python、java、c++、js、c)】