HDFS应用-后端存储cephfs-文件存储和对象存储数据双向迁移
DistCp(分布式拷贝)是用于大规模集群内部和集群之间拷贝的工具。 它使用Map/Reduce实现文件分发,错误处理和恢复,以及报告生成。 它把文件和目录的列表作为map任务的输入,每个任务会完成源列表中部分文件的拷贝
配置/usr/local/hadoop/etc/hadoop/core-site.xml
<!-- CEPH file. -->
<property>
<name>fs.default.name</name>
<value>ceph://192.168.252.12:6789/</value>
</property>
<property>
<name>fs.ceph.impl</name>
<value>org.apache.hadoop.fs.ceph.CephFileSystem</value>
</property>
<property>
<name>ceph.mon.address</name>
<value>192.168.252.12:6789</value>
</property>
<property>
<name>ceph.auth.id</name>
<value>admin</value>
</property>
<property>
<name>ceph.conf.file</name>
<value>/etc/ceph/ceph.conf</value>
</property>
<property>
<name>ceph.auth.keyfile</name>
<value>/etc/ceph/ceph.client.admin.keyring</value>
</property>
<!-- S3A file. -->
<property>
<name>fs.s3a.access.key</name>
<value>CNGYG74H0F3QRSV4NGE4</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>ospPWGpPkVfnjIpgFKatQhkezsORmRQ95XF3597D</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>http://192.168.252.12:7480</value>
</property>
操作命令
文件存储目录
[root@3819a71f55f8 /]# hadoop fs -ls /
Loading libcephfs-jni from default path: /usr/local/hadoop/lib/native
Loading libcephfs-jni: Success!
Found 7 items
drw-r--r-- - root 0 2025-01-21 03:16 /benchmarks
drwxrwxrwx - root 1652 2025-01-20 13:28 /hdfs
drwxrwxrwx - root 9 2025-01-13 08:12 /http
drwxr-xr-x - root 4460 2025-01-17 08:34 /nfs
drwxrw-r-x - root 200962 2025-01-21 04:43 /tmp
drwxrwxrw- - root 0 2025-01-08 09:28 /user
drwxr--r-x - root 0 2025-01-20 06:44 /web
对象存储目录
[root@3819a71f55f8 /]# hadoop fs -ls s3a://new/
2025-01-21 05:06:03,713 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2025-01-21 05:06:03,816 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2025-01-21 05:06:03,816 INFO impl.MetricsSystemImpl: s3a-file-system metrics system started
Found 2 items
drwxrwxrwx - root root 0 2025-01-21 05:06 s3a://new/tmp
-rw-rw-rw- 1 root root 195792 2025-01-20 14:07 s3a://new/vmware.log
2025-01-21 05:06:05,516 INFO impl.MetricsSystemImpl: Stopping s3a-file-system metrics system...
2025-01-21 05:06:05,516 INFO impl.MetricsSystemImpl: s3a-file-system metrics system stopped.
2025-01-21 05:06:05,516 INFO impl.MetricsSystemImpl: s3a-file-system metrics system shutdown complete
对象存储迁移数据到文件存储
hadoop distcp s3a://new/ /tmp/
文件存储迁移数据到对象存储
hadoop distcp /tmp s3a://new/