Hadoop集群的高可用(HA)-(2、搭建resourcemanager的高可用)
第一步:检查mapred-site.xml ,里面只有yarn配置和historyServer的配置,不需要修改
第二步:修改yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/opt/installs/hadoop/etc/hadoop:/opt/installs/hadoop/share/hadoop/common/lib/*:/opt/installs/hadoop/share/hadoop/common/*:/opt/installs/hadoop/share/hadoop/hdfs:/opt/installs/hadoop/share/hadoop/hdfs/lib/*:/opt/installs/hadoop/share/hadoop/hdfs/*:/opt/installs/hadoop/share/hadoop/mapreduce/*:/opt/installs/hadoop/share/hadoop/yarn:/opt/installs/hadoop/share/hadoop/yarn/lib/*:/opt/installs/hadoop/share/hadoop/yarn/*</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 历史日志在HDFS保存的时间,单位是秒 -->
<!-- 默认的是-1,表示永久保存 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://bigdata01:19888/jobhistory/logs</value>
</property>
<!--配置resourcemanager的HA-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- RM 集群标识 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<!-- RM 的逻辑 ID 列表 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- RM1 的主机地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>bigdata01</value>
</property>
<!-- RM1 的主机web管理界面地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>bigdata01:8088</value>
</property>
<!-- RM2 的主机地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>bigdata02</value>
</property>
<!-- RM2 的主机web管理界面地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>bigdata02:8088</value>
</property>
<!-- ZooKeeper 集群的地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>bigdata01:2181,bigdata02:2181,bigdata03:2181</value>
</property>
<!-- 启用自动恢复 -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!-- 用于yarn故障转移持久化zk的类 -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 关闭虚拟内存检查 -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
以前配置的yarn以及日志服务的配置不动,添加ha的相关配置。
第三步:将mapred-site.xml 和 yarn-site.xml进行同步
[root@bigdata01 ~]# cd /opt/installs/hadoop/etc/hadoop/
[root@bigdata01 hadoop]# xsync.sh yarn-site.xml
只需要同步一个,因为另一个根本没有修改。
第四步:启动yarn
start-yarn.sh
如何查看谁是干活的,谁是备用的呢?
yarn rmadmin -getAllServiceState
假如你在启动过程中,发现resourcemanager有两个,但是都是standby,有可能是版本兼容问题:
hadoop3.1.6 需要兼容 zookeeper 3.4.10
假如你使用的hadoop版本是3.3.1 兼容 zookeeper 3.6.4 ,否则就会有问题。
namenode 高可用没问题,resourcemanager的高可用有问题。
解决办法
切换zookeeper集群。
先将以前的zk集群停止。
下载一个对应版本的zk安装包,解压
解压到 /opt/installs
将以前的老的zookeeper 文件夹进行重命名
xcall.sh mv /opt/installs/zookeeper /opt/installs/zookeeper-tmp
将新的zk,进行重命名 为zookeeper
接着分发一下:
xsync.sh zookeeper
xcall.sh cp /opt/installs/zookeeper-tmp/conf/zoo.cfg /opt/installs/zookeeper/conf
接着在三台服务上创建文件夹:
xcall.sh mkdir /opt/installs/zookeeper/zkData
接着将不同电脑上的myid拷贝到相应的zkData里面
xcall.sh cp /opt/installs/zookeeper-tmp/zkData/myid /opt/installs/zookeeper/zkData
启动zk集群:
zk.sh start
启动完毕之后,记得格式化一下:
hdfs zkfc -formatZK
接着启动start-all.sh 即可
测试一下RM的高可用:
[root@bigdata01 installs]# yarn rmadmin -getAllServiceState
bigdata01:8033 active
bigdata02:8033 standby
停止bigdata01中的RM,继续查看:
yarn --daemon stop resourcemanager
继续查看:
[root@bigdata01 installs]# yarn rmadmin -getAllServiceState
2023-08-23 14:40:15,547 INFO ipc.Client: Retrying connect to server: bigdata01/192.168.233.128:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
bigdata01:8033 Failed to connect: Call From bigdata01/192.168.233.128 to bigdata01:8033 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
bigdata02:8033 active
再次启动bigdata01:
[root@bigdata01 installs]# yarn rmadmin -getAllServiceState
bigdata01:8033 standby
bigdata02:8033 active
使用高可用的 yarn 集群,运行一个任务:
hadoop jar WordCount01-1.0-SNAPSHOT.jar com.bigdata.WordCountDriver /wc.txt /output3
访问界面:http://bigdata02:8088/
访问 standby 节点,自动跳转到 active 节点的 IP 网站。