Kubernetes控制平面组件:etcd高可用集群搭建
云原生学习路线导航页(持续更新中)
- kubernetes学习系列快捷链接
- Kubernetes架构原则和对象设计(一)
- Kubernetes架构原则和对象设计(二)
- Kubernetes架构原则和对象设计(三)
- Kubernetes控制平面组件:etcd(一)
- Kubernetes控制平面组件:etcd(二)
- Kubernetes控制平面组件:etcd常用配置参数
- kubectl 和 kubeconfig 基本原理
- kubeadm 升级 k8s集群 1.17到1.20
- Kubernetes常见问题解答
- 查看云机器的一些常用配置
- 本文将给出 ETCD 高可用集群的搭建方法,并演示如何进行数据备份、数据恢复、集群停机和集群重启
- 参考链接:https://github.com/cncamp/101/blob/master/module5/etcd-ha-demo/install-ha-etcd.MD
1.etcd 高可用集群的搭建
推荐先阅读:Kubernetes控制平面组件:etcd常用配置参数,搞清楚etcd的常用参数,再阅读本节将会更加清晰
1.1.Install cfssl
# Debian/Ubuntu
apt install golang-cfssl
# 或者使用go直接安装
go install github.com/cloudflare/cfssl/cmd/cfssl@latest
go install github.com/cloudflare/cfssl/cmd/cfssljson@latest
- 作用:安装 cfssl 工具,用于生成 TLS 证书。
- 原因:ETCD 集群需要 TLS 证书来加密节点之间的通信,确保数据安全性。
1.2.Generate tls certs and clone etcd code
mkdir /root/go/src/github.com/etcd-io
cd /root/go/src/github.com/etcd-io
git clone https://github.com/etcd-io/etcd.git
cd etcd/hack/tls-setup
- 作用:
- 创建 Go 工作目录。
- 克隆 ETCD 官方仓库。
- 进入 TLS 证书生成脚本目录。目的是先生成证书,才能去启动etcd
- 原因:ETCD 官方提供了 TLS 证书生成的脚本和配置文件,方便用户快速生成证书。
1.3.Edit req-csr.json and keep 127.0.0.1 and localhost only for single cluster setup.
vi config/req-csr.json
- 作用:编辑证书签名请求(CSR)配置文件,配置文件编辑好就可以生成证书了
- 原因:
-
etcd 的证书签名请求文件,默认会生成一些ip,我们需要把ips改成自己的etcd集群ip
-
因为我这里虽然构建3节点etcd集群,但是都在本地一台机器上,所有只需要保留 127.0.0.1 和 localhost,避免生成不必要的证书。
-
1.4.Generate certs
export infra0=127.0.0.1
export infra1=127.0.0.1
export infra2=127.0.0.1
make
mkdir /tmp/etcd-certs
mv certs /tmp/etcd-certs
- 作用:
- 先设置环境变量,指定集群节点的 IP 地址。因为我们准备将etcd的三个节点分别命名为
infra0、infra1、infra2
- 使用 make 命令生成 TLS 证书。默认证书会生成到
当前目录/certs
- 创建证书存储目录,并将生成的证书移动到该目录。
- 先设置环境变量,指定集群节点的 IP 地址。因为我们准备将etcd的三个节点分别命名为
- 原因:
- 环境变量用于指定集群节点的 IP 地址。
- make 命令调用 cfssl 生成证书。
- 将证书集中存储,便于后续使用。后续使用etcdctl时需要执行cert目录
1.5.Start etcd cluster member1
-
创建 start-all.sh 文件,将下面的命令复制进去
- 声明了3个etcd实例,–initial-cluster-state为new,指明cert地址、节点名称、data-dir
- 因为我要在同一台机器上启动3个实例,所以3个实例的端口是各异的
# # each etcd instance name need to be unique # x380 is for peer communication # x379 is for client communication # dir-data cannot be shared # nohup etcd --name infra0 \ --data-dir=/tmp/etcd/infra0 \ --listen-peer-urls https://127.0.0.1:3380 \ --initial-advertise-peer-urls https://127.0.0.1:3380 \ --listen-client-urls https://127.0.0.1:3379 \ --advertise-client-urls https://127.0.0.1:3379 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \ --initial-cluster-state new \ --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \ --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra0.log & nohup etcd --name infra1 \ --data-dir=/tmp/etcd/infra1 \ --listen-peer-urls https://127.0.0.1:4380 \ --initial-advertise-peer-urls https://127.0.0.1:4380 \ --listen-client-urls https://127.0.0.1:4379 \ --advertise-client-urls https://127.0.0.1:4379 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \ --initial-cluster-state new \ --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \ --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra1.log & nohup etcd --name infra2 \ --data-dir=/tmp/etcd/infra2 \ --listen-peer-urls https://127.0.0.1:5380 \ --initial-advertise-peer-urls https://127.0.0.1:5380 \ --listen-client-urls https://127.0.0.1:5379 \ --advertise-client-urls https://127.0.0.1:5379 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \ --initial-cluster-state new \ --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \ --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra2.log &
-
执行创建集群
chmod +0777 start-all.sh ./start-all.sh
-
执行后集群就启动了,
ps -ef | grep etcd
可以看出3个etcd节点已经有了
-
常见错误
- 如果执行报错:
nohup: nohup: failed to run command ‘etcd’nohup: failed to run command ‘etcd’failed to run command ‘etcd’: No such file or directory: No such file or directory
,说明还没有etcd命令,需要安装一下# centos中 yum install etcd # 设置使用的etcdctl api为v3 export ETCDCTL_API=3
- 如果执行报错:
1.6.Member list 验证 etcd
etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem member list
- 作用:查看 ETCD 集群的成员列表。
- 原因:验证集群是否正常运行,并确认所有节点已成功加入集群。
- 如果报错:
flag provided but not defined: -cert
,说明没有设置 etcdctl 的版本export ETCDCTL_API=3
2.数据备份
2.1.Insert some data
- 插入一些数据,模拟etcd的正常使用
- key=a value=b
- key=/a value=/b
- key=/a/f value=ok
# 插入3条数据
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem put a b
OK
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem put /a /b
OK
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem put /a/f ok
OK
# 查看所有的数据
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem get --prefix ""
/a
/b
/a/f
ok
a
b
2.2.Backup
- 执行备份命令,将当前etcd集群全量备份为快照snapshot,备份生成文件snapshot.db
etcdctl --endpoints https://127.0.0.1:3379 \ --cert /tmp/etcd-certs/certs/127.0.0.1.pem \ --key /tmp/etcd-certs/certs/127.0.0.1-key.pem \ --cacert /tmp/etcd-certs/certs/ca.pem snapshot save snapshot.db
- 执行后集群就备份了,
ls
查看当前目录文件,会多出一个snapshot.db
。 - 在集群出现故障或数据丢失时,可以通过备份恢复数据。
3.销毁etcd集群,模拟故障
ps -ef | grep "/tmp/etcd/infra" | grep -v grep | awk '{print $2}'|xargs kill
- 作用:终止所有 ETCD 节点的进程。
- 原因:在恢复数据之前,需要停止所有 ETCD 实例。
rm -rf /tmp/etcd
- 作用:删除 ETCD 数据目录。
- 原因:模拟数据丢失场景,测试备份恢复功能。
4.使用快照恢复etcd集群数据
- 创建 restore.sh 文件,将下面的命令复制进去
- 使用 snapshot 恢复3个etcd实例,指定将数据恢复到哪里–data-dir
export ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \ --name infra0 \ --data-dir=/tmp/etcd/infra0 \ --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \ --initial-cluster-token etcd-cluster-1 \ --initial-advertise-peer-urls https://127.0.0.1:3380 etcdctl snapshot restore snapshot.db \ --name infra1 \ --data-dir=/tmp/etcd/infra1 \ --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \ --initial-cluster-token etcd-cluster-1 \ --initial-advertise-peer-urls https://127.0.0.1:4380 etcdctl snapshot restore snapshot.db \ --name infra2 \ --data-dir=/tmp/etcd/infra2 \ --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \ --initial-cluster-token etcd-cluster-1 \ --initial-advertise-peer-urls https://127.0.0.1:5380
- 执行恢复集群数据,完成后
ls /tmp/etcd
查看数据是否恢复回来了chmod +0777 restore.sh ./restore.sh ls /tmp/etcd
5.重启etcd集群
- 创建 restart-all.sh 文件,将下面的命令复制进去
- 使用 重新启动 3个etcd实例,–data-dir指定数据目录
nohup etcd --name infra0 \ --data-dir=/tmp/etcd/infra0 \ --listen-peer-urls https://127.0.0.1:3380 \ --listen-client-urls https://127.0.0.1:3379 \ --advertise-client-urls https://127.0.0.1:3379 \ --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \ --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra0.log & nohup etcd --name infra1 \ --data-dir=/tmp/etcd/infra1 \ --listen-peer-urls https://127.0.0.1:4380 \ --listen-client-urls https://127.0.0.1:4379 \ --advertise-client-urls https://127.0.0.1:4379 \ --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \ --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra1.log & nohup etcd --name infra2 \ --data-dir=/tmp/etcd/infra2 \ --listen-peer-urls https://127.0.0.1:5380 \ --listen-client-urls https://127.0.0.1:5379 \ --advertise-client-urls https://127.0.0.1:5379 \ --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \ --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \ --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \ --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra2.log &
- 执行重启集群,完成后
ps -ef | grep etcd
查看3个etcd节点是否都重新启动了ps -ef | grep etcd
6.验证数据是否恢复
- 获取etcd的member,查看节点是否正常
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem member list 1701f7e3861531d4, started, infra0, https://127.0.0.1:3380, https://127.0.0.1:3379 6a58b5afdcebd95d, started, infra1, https://127.0.0.1:4380, https://127.0.0.1:4379 84a1a2f39cda4029, started, infra2, https://127.0.0.1:5380, https://127.0.0.1:5379
- 获取etcd的所有数据,验证数据是否恢复
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem get --prefix "" /a /b /a/f ok a b