rabbitmq高可用集群搭建
需求分析基本情况
在进行RabbitMQ搭建时,我们基于现有的连接数据和业务需求进行了深入分析。目前的统计数据显示,连接数为631,队列数为80418。为了确保业务需求的顺利满足,我们需要在云产品和自建RabbitMQ消息队列服务之间做出选择。
经过比较发现,即使选择腾讯云的最高规格配置,其Queue数也难以满足我们的需求,并且成本相对较高。因此,我们决定搭建自建服务。为此,计划使用三台配置为8核 16GB 100GB 5Mbps / 标准型SA5
的服务器节点,构建一个高可靠性集群,以确保系统的稳定性和可靠性。
腾讯云:
节点规格 | 2核4G | 4核12G | 8核24G | 16核32G |
---|---|---|---|---|
消息 TPS(生产+消费) | 600~1000 | 2100~3500 | 4200~7000 | 9000~15000 |
最大queue数量 | 100 | 200 | 300 | 800 |
最大连接数 | 500 | 2500 | 4000 | 8000 |
费用/月 | 2028 | 3537 | 6930 | 13434 |
自建服务:
序号 | 节点1 | 节点2 | 节点3 | 费用/月 |
---|---|---|---|---|
业务新选型 | 8核 16GB 100GB 5Mbps /标准型SA5 | 8核 16GB 100GB 5Mbps /标准型SA5 | 8核 16GB 100GB 5Mbps /标准型SA5 | 2485.2 |
需求变动:
前期功能业务体谅小基于目前的现状考虑,并且不影响未来的扩容的情况下的方案节点规格收容 4核8G内150GB硬(50G系统盘+100G数据盘)/标准型SA5
,以及搭建实现和优化需求:
1、 集群建设
2、 实现高可用
3、 节点只运行rabbitmq,所以内存阀值调制总在比的70%
rabbimtmq集群搭建
系统均使用CentOS7.9
节点名称 | 节点IP | rabbitmq版本 | docker/compose | 规格 | 数据盘 |
---|---|---|---|---|---|
pos_rabbitmq_1 | 172.17.80.27 | 3.8-manageme | 18.03.1/1.29.2 | 4核8G50GB | 100GB |
pos_rabbitmq_2 | 172.17.80.32 | 3.8-manageme | 18.03.1/1.29.2 | 4核8G50GB | 100GB |
pos_rabbitmq_1 | 172.17.80.6 | 3.8-manageme | 18.03.1/1.29.2 | 4核8G50GB | 100GB |
腾讯云申请三台实例节点
初始化三台实例主机
hostnamectl set-hostname POS_Rabbitmq_1
bash init.sh
init.sh 脚步内容,腾讯云内置了自己的yum源,可以不需要替换
yum clean all && yum makecache
yum install telnet curl wget lrzsz net-tools vim unzip zip htop tree -y
echo "=====系统环境初始化脚本====="
echo "1.关闭防火墙与SELinux"
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i '/SELINUX/{s/enforcing/disabled/}' /etc/selinux/config
echo "2.设置系统最大打开文件数"
if ! grep "* soft nofile 65535" /etc/security/limits.conf &>/dev/null; then
cat >> /etc/security/limits.conf << EOF
* soft nofile 65535 #软限制
* hard nofile 65535 #硬限制
EOF
fi
echo "3.系统内核优化"
cat >> /etc/sysctl.conf << EOF
net.ipv4.tcp_syncookies = 1 #防范SYN洪水攻击,0为关闭
net.ipv4.tcp_max_tw_buckets = 20480 #此项参数可以控制TIME_WAIT套接字的最大数量,避免Squid服务器被大量的TIME_WAIT套接字拖死
net.ipv4.tcp_max_syn_backlog = 20480 #表示SYN队列的长度,默认为1024,加大队列长度为8192,可以容纳更多等待连接的网络连接数
net.core.netdev_max_backlog = 262144 #每个网络接口 接受数据包的速率比内核处理这些包的速率快时,允许发送到队列的数据包的最大数目
net.ipv4.tcp_fin_timeout = 20 #FIN-WAIT-2状态的超时时间,避免内核崩溃
EOF
echo "4.减少SWAP使用"
echo "0" > /proc/sys/vm/swappiness
echo "5.安装系统性能分析工具及其他"
yum install -y gcc make autoconf vim sysstat net-tools iostat lrzsz
格式化数据磁盘
数据盘默认给的是一个空盘需要直接格式化在挂载,在对安全数据要求比较严苛的环境中可以组RAID,这里直接格式化挂载
mkfs.ext4 /dev/vdb
mount /dev/vdb /data
echo "/dev/vdb /data ext4 defaults 0 0" >> /etc/fstab
mount -a
mkdir -p /data/{apd,logs,prog,setup,backup,www}
tee /data/README.md << EOF
/data/
|-- apd 数据目录入口
|-- backup 数据缓存目录
|-- logs 日志目录
|-- prog 应用程序目录
|-- setup 程序下载目录
|-- www 网站的存放目录
EOF
安装docker,compose
三台实例主机安装docker、docker-compose 版本18.03.1、1.29.2
# step 1: 安装必要的一些系统工具
sudo yum install -y yum-utils device-mapper-persistent-data lvm2 git htop
# Step 2: 添加软件源信息
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# Step 3
sudo sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
# Step 4: 更新并安装Docker-CE
sudo yum makecache fast
yum -y install docker-ce-18.03.1.ce
# 安装指定版本的Docker-CE:
# Step 1: 查找Docker-CE的版本:
# yum list docker-ce.x86_64 --showduplicates | sort -r
# Step2: 安装指定版本的Docker-CE: (VERSION例如上面的17.03.0.ce.1-1.el7.centos)
# sudo yum -y install docker-ce-[VERSION]
# Step 5: 设置开机自启并且启动docker服务
systemctl enable --now docker
配置docker镜像加速器
mkdir -p /etc/docker
tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://rbmo5xql.mirror.aliyuncs.com"],
"log-driver":"json-file",
"bip": "192.168.1.5/24",
"log-opts": { "max-size": "50m", "max-file": "1" }
}
EOF
systemctl daemon-reload && systemctl restart docker
下载docker-compose
cd /data/setup
wget -O https://github.com/docker/compose/releases/download/1.29.2/docker-compose-Linux-x86_64
mv docker-compose-Linux-x86_64 docker-compose
chmod +x docker-compose
cp /data/setup/docker-compose /usr/local/bin/
ln -sf /usr/local/bin/docker-compose /usr/bin/docker-compose
docker-compose -v
# docker-compose version 1.29.2, build 5becea4c
部署rabbitmq集群
使用rabbitmq:3.8-management镜像,rabbitmq:3.8-management-apline包有高危漏洞[hub.docker.com官网查询](https://hub.docker.com/_/rabbitmq/tags?page=&page_size=&ordering=&name=3.8-managemen)
Step1 三台主机拉取rabbitmq镜像
[root@pos_rabbitmq_1 /data/setup/public/rabbitmq/mq_1] eth0 = 172.17.80.27
# docker pull rabbitmq:3.8-management
[root@pos_rabbitmq_2 /data/setup/public/rabbitmq/mq_2] eth0 = 172.17.80.32
# docker pull rabbitmq:3.8-management
[root@pos_rabbitmq_3 /data/setup/public/rabbitmq/mq_3] eth0 = 172.17.80.6
# docker pull rabbitmq:3.8-management
Step2 获取cookie
之前cookie可以在获取后写入docker-comose_env
中定义,但是被该方法以被弃用,所以使用挂载的方式
# Step 2: 获取cookie
[root@pos_rabbitmq_3 /data/setup/public/rabbitmq/mq_3] eth0 = 172.17.80.6
# cat > rabbitmq-cookie.sh << eof
docker run -d --name mq rabbitmq:3.8-management
sleep 10
docker exec -it mq cat /var/lib/rabbitmq/.erlang.cookie > .erlang.cookie
chmod 600 .erlang.cookie
docker rm -f mq
docker volume prune
eof
[root@pos_rabbitmq_3 /data/setup/public/rabbitmq/mq_3] eth0 = 172.17.80.6
# sh rabbitmq-cookie.sh
Step3 rabbitmq搭建集群配置文件
不适用guest用户,使用节点模式加入集群,rabbit@pos_rabbitmq_1
,在docker- compose中必须定义pos_rabbitmq_1映射IP,否则无法解析找不到节点
[root@pos_rabbitmq_3 /data/setup/public/rabbitmq/mq_3] eth0 = 172.17.80.6
# cat > rabbitmq.conf << eof
loopback_users.guest = false
listeners.tcp.default = 5672
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit@pos_rabbitmq_1
cluster_formation.classic_config.nodes.2 = rabbit@pos_rabbitmq_2
cluster_formation.classic_config.nodes.3 = rabbit@pos_rabbitmq_3
eof
Step4 docker-compose
rabbitmq内存使用率默认占比总内存的40%
,这里需要修改为70%
,env中使用 RABBITMQ_VM_MEMORY_HIGH_WATERMARK
定义设置
pos_rabbitmq_1 中docker-compose.yaml文件
version: "3.6"
services:
pos_rabbitmq_1:
image: rabbitmq:3.8-management
restart: always
container_name: pos_rabbitmq_1 #每个节点名称修改即可
network_mode: host
extra_hosts:
- "pos_rabbitmq_1:172.17.80.27"
- "pos_rabbitmq_2:172.17.80.32"
- "pos_rabbitmq_3:172.17.80.6"
volumes:
- /etc/localtime:/etc/localtime:ro
- /data/apd/rabbitmq:/var/lib/rabbitmq
- ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
- .erlang.cookie:/var/lib/rabbitmq/.erlang.cookie
- ../enabled_plugins:/etc/rabbitmq/enabled_plugins
- /data/logs/rabbitmq:/var/log/rabbitmq
environment:
- LANG=C.UTF-8
- RABBITMQ_DEFAULT_USER=root
- RABBITMQ_DEFAULT_PASS=xxxxxx
- RABBITMQ_VM_MEMORY_HIGH_WATERMARK=0.7 #内存默认阀值设置
pos_rabbitmq_2 中docker-compose.yaml文件
version: "3.6"
services:
pos_rabbitmq_1:
image: rabbitmq:3.8-management
restart: always
container_name: pos_rabbitmq_2 #每个节点名称修改即可
network_mode: host
extra_hosts:
- "pos_rabbitmq_1:172.17.80.27"
- "pos_rabbitmq_2:172.17.80.32"
- "pos_rabbitmq_3:172.17.80.6"
volumes:
- /etc/localtime:/etc/localtime:ro
- /data/apd/rabbitmq:/var/lib/rabbitmq
- ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
- .erlang.cookie:/var/lib/rabbitmq/.erlang.cookie
- ../enabled_plugins:/etc/rabbitmq/enabled_plugins
- /data/logs/rabbitmq:/var/log/rabbitmq
environment:
- LANG=C.UTF-8
- RABBITMQ_DEFAULT_USER=root
- RABBITMQ_DEFAULT_PASS=xxxxxx
- RABBITMQ_VM_MEMORY_HIGH_WATERMARK=0.7
pos_rabbitmq_3 中docker-compose.yaml文件
version: "3.6"
services:
pos_rabbitmq_1:
image: rabbitmq:3.8-management
restart: always
container_name: pos_rabbitmq_3 #每个节点名称修改即可
network_mode: host
extra_hosts:
- "pos_rabbitmq_1:172.17.80.27"
- "pos_rabbitmq_2:172.17.80.32"
- "pos_rabbitmq_3:172.17.80.6"
volumes:
- /etc/localtime:/etc/localtime:ro
- /data/apd/rabbitmq:/var/lib/rabbitmq
- ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
- .erlang.cookie:/var/lib/rabbitmq/.erlang.cookie
- ../enabled_plugins:/etc/rabbitmq/enabled_plugins
- /data/logs/rabbitmq:/var/log/rabbitmq
environment:
- LANG=C.UTF-8
- RABBITMQ_DEFAULT_USER=root
- RABBITMQ_DEFAULT_PASS=xxxxxx
- RABBITMQ_VM_MEMORY_HIGH_WATERMARK=0.7
Step5 启动集群,启动完成后逐步启动过mq2,mq3
# Step 4: 启动集群,启动完成后逐步启动过mq2,mq3
[root@pos_rabbitmq_1 /data/setup/public/rabbitmq/mq_1] eth0 = 172.17.80.27
# docker-compose up -d
Creating pos_rabbitmq_1 ... done
[root@pos_rabbitmq_1 /data/setup/public/rabbitmq/mq_1] eth0 = 172.17.80.27
# docker logs pos_rabbitmq_1 -f
2024-07-30 10:57:54.440 [info] <0.596.0> Server startup complete; 9 plugins started.
* rabbitmq_federation_management
* rabbitmq_federation
* rabbitmq_web_stomp
* rabbitmq_stomp
* rabbitmq_web_mqtt
* rabbitmq_mqtt
* rabbitmq_management
* rabbitmq_web_dispatch
* rabbitmq_management_agent
completed with 9 plugins.
2024-07-30 10:57:54.440 [info] <0.596.0> Resetting node maintenance status
Step6 集群高可用镜像ha,任意节点执行
[root@pos_rabbitmq_1 /data/setup/public/rabbitmq/mq_1] eth0 = 172.17.80.27
# docker exec -it pos_rabbitmq_1 /bin/bash
root@pos_rabbitmq_1:/# rabbitmqctl set_policy ha-all "^" '{"ha-mode":"all"}'
Setting policy "ha-all" for pattern "^" to "{"ha-mode":"all"}" with priority "0" for vhost "/" ...
root@pos_rabbitmq_1:/# exit
exit
每个节点目录结构及其enabled_plugins安装插件情况如下:
目录结构
[root@pos_rabbitmq_1 /data/setup/public/rabbitmq] eth0 = 172.17.80.27
# tree -a
.
|-- enabled_plugins
|-- mq_1
| |-- docker-compose.yml
| |-- .erlang.cookie
| |-- .rabbitmq.conf
| `-- rabbitmq.conf
|-- mq_2
| |-- docker-compose.yml
| |-- .erlang.cookie
| `-- rabbitmq.conf
|-- mq_3
| |-- docker-compose.yml
| |-- .erlang.cookie
| `-- rabbitmq.conf
`-- README.md
# cat enabled_plugins
[rabbitmq_federation_management,rabbitmq_management,rabbitmq_mqtt,rabbitmq_web_mqtt,rabbitmq_stomp,rabbitmq_web_stomp].
测试
1. 集群建设
pos_rabbitmq_1、pos_rabbitmq_2、pos_rabbitmq_3以组成集群
2. 实现高可用
集群实现ha镜像高可用,创建队列,镜像备份mq2,mq3
3. 节点只运行rabbitmq,所以内存阀值调制总在比的70%
总运行内存8G,占比70% 可用5.2GB