CDP集成Hudi实战-编译部署
[〇]关于本文
Hudi 1.0.0 是一个重要的里程碑版本,重点改进了数据格式、性能和并发写入支持,同时引入了更灵活的索引和文件格式管理,为用户带来了更高的可扩展性和易用性。本文是关于如何在CDP-7.3.1的环境下编译部署Hudi-1.0.0
软件 | 版本 |
Hudi | 1.0.0 |
Hadoop Version | 3.1.1.7.3.1.0-197 |
Hive Version | 3.1.3000.7.3.1.0-197 |
Spark Version | 3.4.1.7.3.1.0-197 |
[一]编译Hudi
1-安装mvn
[root@cdp73-1 software]# ls -al
total 45872
drwxr-xr-x. 3 root root 87 Jan 1 04:08 .
drwxr-xr-x. 4 root root 38 Jan 1 01:53 ..
-rw-r--r--. 1 root root 9102945 Jan 1 04:09 apache-maven-3.9.9-bin.tar.gz
-rwx------. 1 root root 37863424 Jan 1 02:15 hudi-1.0.0.src.tar
[root@cdp73-1 software]# tar -xf apache-maven-3.9.9-bin.tar.gz
[root@cdp73-1 software]# mv apache-maven-3.9.9 maven
[root@cdp73-1 software]# export MAVEN_HOME=/opt/software/maven
[root@cdp73-1 software]# export PATH=$PATH:$MAVEN_HOME/bin
[root@cdp73-1 software]# mvn -v
Apache Maven 3.9.9 (8e8579a9e76f7d015ee5ec7bfcdc97d260186937)
Maven home: /opt/software/maven
Java version: 1.8.0_372, vendor: Temurin, runtime: /usr/java/jdk1.8u372-b07-cloudera/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.18.0-553.el8_10.x86_64", arch: "amd64", family: "unix"
[root@cdp73-1 software]#
2-解压
[root@cdp73-1 software]# tar -xf hudi-1.0.0.src.tar
[root@cdp73-1 software]# cd hudi-1.0.0/
[root@cdp73-1 hudi-1.0.0]#
3-编译
[root@cdp73-1 hudi-1.0.0]# mvn clean package -DskipTests -Dspark3.4 -Dflink1.14 -Dscala-2.12 -Dhadoop.version=3.1.1 -Pflink-bundle-shade-hive3
错误一
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile (default-testCompile) on project hudi-hive-sync: Compilation failure
[ERROR] /home/opt/software/hudi-1.0.0/hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/HiveTestUtil.java:[250,17] method shutdown in class org.apache.zookeeper.server.ZooKeeperServer cannot be applied to given types;
[ERROR] required: no arguments
[ERROR] found: boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR]
解决办法:修改hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/HiveTestUtil.java文件第250行把 zkServer.shutdown(true);改为 zkServer.shutdown();
把
错误二
[INFO] --- enforcer:3.0.0-M1:enforce (enforce-logging) @ hudi-spark-client ---
[WARNING] The artifact org.slf4j:slf4j-log4j12:jar:2.0.7 has been relocated to org.slf4j:slf4j-reload4j:jar:2.0.7
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.BannedDependencies failed with message:
Found Banned Dependency: ch.qos.logback:logback-classic:jar:1.2.10
Use 'mvn dependency:tree' to locate the source of the banned dependencies.
解决办法去掉pom.xml中 <exclude>ch.qos.logback:logback-classic</exclude>
错误三
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:1.0.0
[ERROR] dependency: io.confluent:kafka-avro-serializer:jar:5.5.0 (compile)
[ERROR] Could not find artifact io.confluent:kafka-avro-serializer:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:common-config:jar:5.5.0 (compile)
[ERROR] Could not find artifact io.confluent:common-config:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:common-utils:jar:5.5.0 (compile)
[ERROR] Could not find artifact io.confluent:common-utils:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:kafka-schema-registry-client:jar:5.5.0 (compile)
[ERROR] Could not find artifact io.confluent:kafka-schema-registry-client:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:kafka-protobuf-serializer:jar:5.5.0 (compile)
[ERROR] Could not find artifact io.confluent:kafka-protobuf-serializer:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR] dependency: io.confluent:kafka-json-schema-serializer:jar:5.5.0 (compile)
[ERROR] Could not find artifact io.confluent:kafka-json-schema-serializer:jar:5.5.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR]
[ERROR] -> [Help 1]
解决办法:手动添加依赖
[root@cdp73-1 software]# wget http://packages.confluent.io/archive/5.5/confluent-5.5.0-2.12.zip
[root@cdp73-1 software]# unzip confluent-5.5.0-2.12.zip
[root@cdp73-1 software]# cd confluent-5.5.0/
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/confluent-common/common-config-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=ommon-utils -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/confluent-common/ommon-utils-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/confluent-common/common-utils-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/kafka-rest/kafka-avro-serializer-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/kafka-rest/kafka-schema-registry-client-5.5.0.jar
[root@cdp73-1 software]# mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-json-schema-serializer -Dversion=5.5.0 -Dpackaging=jar -Dfile=./confluent-5.5.0/share/java/kafka-rest/kafka-json-schema-serializer-5.5.0.jar
错误四
[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:1.0.0
[ERROR] dependency: com.github.everit-org.json-schema:org.everit.json.schema:jar:1.12.1 (compile)
[ERROR] Could not find artifact com.github.everit-org.json-schema:org.everit.json.schema:jar:1.12.1 in Mulesoft-Repository (https://repository.mulesoft.org/nexus/content/repositories/public/)
[ERROR] Could not find artifact com.github.everit-org.json-schema:org.everit.json.schema:jar:1.12.1 in confluent (http://packages.confluent.io/maven/)
[ERROR] Could not find artifact com.github.everit-org.json-schema:org.everit.json.schema:jar:1.12.1 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public)
[ERROR]
[ERROR] -> [Help 1]
解决办法:手动添加文件
mvn install:install-file -DgroupId=com.github.everit-org.json-schema -DartifactId=org.everit.json.schema -Dversion=1.12.1 -Dpackaging=jar -Dfile=/home/opt/software/confluent-5.5.0/share/java/ksqldb/org.everit.json.schema-1.12.1.jar
错误五
25/01/03 00:56:55 WARN hudi.AutoRecordKeyGenerationUtils$: [Thread-6]: Precombine field ts will be ignored with auto record key generation enabled
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/lib/spark3/python/pyspark/sql/readwriter.py", line 1398, in save
self._jwrite.save(path)
File "/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/lib/spark3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/lib/spark3/python/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
File "/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/lib/spark3/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o104.save.
: java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V
解决:排除低版本jetty,添加hudi指定版本的jetty
vim /opt/software/hudi-1.0.0/packaging/hudi-spark-bundle/pom.xml
vim /opt/software/hudi-1.0.0/packaging/hudi-utilities-bundle/pom.xml
添加下面配置到pom.xml的<dependencies></dependencies>中
<!-- 增加hudi配置版本的jetty -->
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
<version>${jetty.version}</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
<version>${jetty.version}</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-webapp</artifactId>
<version>${jetty.version}</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-http</artifactId>
<version>${jetty.version}</version>
</dependency>
成功
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Hudi 1.0.0:
[INFO]
[INFO] Hudi ............................................... SUCCESS [ 1.786 s]
[INFO] hudi-tests-common .................................. SUCCESS [ 2.230 s]
[INFO] hudi-io ............................................ SUCCESS [ 7.334 s]
[INFO] hudi-common ........................................ SUCCESS [ 26.870 s]
[INFO] hudi-hadoop-common ................................. SUCCESS [ 9.127 s]
[INFO] hudi-hadoop-mr ..................................... SUCCESS [ 6.489 s]
[INFO] hudi-sync-common ................................... SUCCESS [ 2.384 s]
[INFO] hudi-hive-sync ..................................... SUCCESS [ 7.011 s]
[INFO] hudi-aws ........................................... SUCCESS [ 4.778 s]
[INFO] hudi-timeline-service .............................. SUCCESS [ 2.520 s]
[INFO] hudi-client ........................................ SUCCESS [ 0.071 s]
[INFO] hudi-client-common ................................. SUCCESS [ 14.514 s]
[INFO] hudi-spark-client .................................. SUCCESS [ 32.544 s]
[INFO] hudi-spark-datasource .............................. SUCCESS [ 0.062 s]
[INFO] hudi-spark-common_2.12 ............................. SUCCESS [ 36.964 s]
[INFO] hudi-spark3-common ................................. SUCCESS [ 14.524 s]
[INFO] hudi-spark3.5.x_2.12 ............................... SUCCESS [ 20.656 s]
[INFO] hudi-java-client ................................... SUCCESS [ 5.514 s]
[INFO] hudi-spark_2.12 .................................... SUCCESS [01:00 min]
[INFO] hudi-gcp ........................................... SUCCESS [ 32.993 s]
[INFO] hudi-utilities_2.12 ................................ SUCCESS [ 53.566 s]
[INFO] hudi-utilities-bundle_2.12 ......................... SUCCESS [02:49 min]
[INFO] hudi-cli ........................................... SUCCESS [ 29.945 s]
[INFO] hudi-flink-client .................................. SUCCESS [ 50.970 s]
[INFO] hudi-datahub-sync .................................. SUCCESS [ 9.944 s]
[INFO] hudi-adb-sync ...................................... SUCCESS [ 9.229 s]
[INFO] hudi-sync .......................................... SUCCESS [ 0.055 s]
[INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [ 50.699 s]
[INFO] hudi-datahub-sync-bundle ........................... SUCCESS [ 36.165 s]
[INFO] hudi-hive-sync-bundle .............................. SUCCESS [ 30.287 s]
[INFO] hudi-aws-bundle .................................... SUCCESS [ 37.358 s]
[INFO] hudi-gcp-bundle .................................... SUCCESS [ 38.007 s]
[INFO] hudi-spark3.5-bundle_2.12 .......................... SUCCESS [ 48.464 s]
[INFO] hudi-presto-bundle ................................. SUCCESS [ 30.763 s]
[INFO] hudi-utilities-slim-bundle_2.12 .................... SUCCESS [01:43 min]
[INFO] hudi-timeline-server-bundle ........................ SUCCESS [ 36.132 s]
[INFO] hudi-trino-bundle .................................. SUCCESS [01:22 min]
[INFO] hudi-examples ...................................... SUCCESS [ 0.058 s]
[INFO] hudi-examples-common ............................... SUCCESS [ 1.550 s]
[INFO] hudi-examples-spark ................................ SUCCESS [ 10.365 s]
[INFO] hudi-flink-datasource .............................. SUCCESS [ 0.055 s]
[INFO] hudi-flink1.20.x ................................... SUCCESS [ 46.349 s]
[INFO] hudi-flink ......................................... SUCCESS [ 39.761 s]
[INFO] hudi-examples-flink ................................ SUCCESS [ 3.662 s]
[INFO] hudi-examples-java ................................. SUCCESS [ 2.409 s]
[INFO] hudi-flink1.20-bundle .............................. SUCCESS [ 51.771 s]
[INFO] hudi-examples-k8s .................................. SUCCESS [01:55 min]
[INFO] hudi-flink1.14.x ................................... SUCCESS [ 39.660 s]
[INFO] hudi-flink1.15.x ................................... SUCCESS [ 34.491 s]
[INFO] hudi-flink1.16.x ................................... SUCCESS [ 25.597 s]
[INFO] hudi-flink1.17.x ................................... SUCCESS [ 28.441 s]
[INFO] hudi-flink1.18.x ................................... SUCCESS [ 25.675 s]
[INFO] hudi-flink1.19.x ................................... SUCCESS [ 24.259 s]
[INFO] hudi-kafka-connect ................................. SUCCESS [ 9.929 s]
[INFO] hudi-kafka-connect-bundle .......................... SUCCESS [01:03 min]
[INFO] hudi-cli-bundle_2.12 ............................... SUCCESS [01:00 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 28:09 min
[INFO] Finished at: 2025-01-01T23:33:07-05:00
[INFO] ------------------------------------------------------------------------
编译成功后再各个模块的文件夹
[root@cdp73-1 hudi-1.0.0]# ls -al packaging/hudi-spark-bundle/target/
total 132928
drwxr-xr-x. 8 root root 4096 Jan 2 19:23 .
drwxr-xr-x. 4 501 games 46 Jan 2 19:22 ..
drwxr-xr-x. 4 root root 33 Jan 2 19:22 classes
-rw-r--r--. 1 root root 15319 Jan 2 19:22 dependency-reduced-pom.xml
drwxr-xr-x. 3 root root 25 Jan 2 19:22 generated-sources
-rw-r--r--. 1 root root 109618399 Jan 2 19:23 hudi-spark3.4-bundle_2.12-1.0.0.jar
-rw-r--r--. 1 root root 26437029 Jan 2 19:23 hudi-spark3.4-bundle_2.12-1.0.0-sources.jar
drwxr-xr-x. 2 root root 28 Jan 2 19:22 maven-archiver
drwxr-xr-x. 3 root root 22 Jan 2 19:22 maven-shared-archive-resources
drwxr-xr-x. 3 root root 35 Jan 2 19:22 maven-status
-rw-r--r--. 1 root root 14786 Jan 2 19:22 original-hudi-spark3.4-bundle_2.12-1.0.0.jar
-rw-r--r--. 1 root root 10730 Jan 2 19:22 original-hudi-spark3.4-bundle_2.12-1.0.0-sources.jar
-rw-r--r--. 1 root root 30 Jan 2 19:22 .plxarc
-rw-r--r--. 1 root root 843 Jan 2 19:22 rat.txt
drwxr-xr-x. 3 root root 22 Jan 2 19:22 test-classes
[root@cdp73-1 hudi-1.0.0]#