当前位置: 首页 > article >正文

【模块一】kubernetes容器编排进阶实战之etcd的介绍与使用

etcd进阶

etcd简介:

 etcd是CoreOS团队于2013年6月发起的开源项目,它的目标是构建一个高可用的分布式键值(key-value)数据库。etcd内部采用raft协议作为一致性算法,etcd基于Go语言实现。

 官方网站:https://etcd.io/
 github地址:https://github.com/etcd-io/etcd
 官方硬件推荐:https://etcd.io/docs/v3.5/op-guide/hardware/
 官方文档:https://etcd.io/docs/v3.5/op-guide/maintenance/

etcd具有下面这些属性 


 etcd具有下面这些属性:
 完全复制:集群中的每个节点都可以使用完整的存档
 高可用性:Etcd可用于避免硬件的单点故障或网络问题
 一致性:每次读取都会返回跨多主机的最新写入
 简单:包括一个定义良好、面向用户的API(gRPC)
 安全:实现了带有可选的客户端证书身份验证的自动化TLS
 快速:每秒10000次写入的基准速度
 可靠:使用Raft算法实现了存储的合理分布Etcd的工作原理

etcd的配置文件

root@k8s-etcd1:~# cat /etc/systemd/system/etcd.service 
[Unit]
 Description=Etcd Server
 After=network.target
 After=network-online.target
 Wants=network-online.target
 Documentation=https://github.com/coreos
 [Service]
 Type=notify
 WorkingDirectory=/var/lib/etcd/ #数据保存目录
ExecStart=/usr/bin/etcd \ #二进制文件路径--name=etcd1 \  #当前node 名称--cert-file=/etc/etcd/ssl/etcd.pem \--key-file=/etc/etcd/ssl/etcd-key.pem \--peer-cert-file=/etc/etcd/ssl/etcd.pem \--peer-key-file=/etc/etcd/ssl/etcd-key.pem \--trusted-ca-file=/etc/kubernetes/ssl/ca.pem \--peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \--initial-advertise-peer-urls=https://172.31.7.105:2380 \ #通告自己的集群端口--listen-peer-urls=https://172.31.7.105:2380 \ #集群之间通讯端口--listen-client-urls=https://172.31.7.105:2379,http://127.0.0.1:2379 \ #客户端访问地址--advertise-client-urls=https://172.31.7.105:2379 \ #通告自己的客户端端口--initial-cluster-token=etcd-cluster-0 \  #创建集群使用的token,一个集群内的节点保持一致--initial-cluster=etcd1=https://172.31.7.105:2380,etcd2=https://172.31.7.106:2380,etcd3=https://172.31.7.107:2380 \  #集群所有的节点信息--initial-cluster-state=new \ #新建集群的时候的值为new,如果是已经存在的集群为existing。--data-dir=/var/lib/etcd #数据目录路径
Restart=on-failure
 RestartSec=5
 LimitNOFILE=65536
 [Install]
 WantedBy=multi-user.target
 
 
 ---------------------------- ---------------------------- ---------------------------- ---------------------------- ---------------------------- ---------------------------- ---------------------------- ---------------------------- ---------------------------- ---------------------------- ---------------------
 #我的配置文件
 
 [Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd
ExecStart=/usr/local/bin/etcd \
  --name=etcd-10.0.0.116 \
  --cert-file=/etc/kubernetes/ssl/etcd.pem \
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://10.0.0.116:2380 \
  --listen-peer-urls=https://10.0.0.116:2380 \
  --listen-client-urls=https://10.0.0.116:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://10.0.0.116:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-10.0.0.116=https://10.0.0.116:2380,etcd-10.0.0.117=https://10.0.0.117:2380,etcd-10.0.0.118=https://10.0.0.118:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd \
  --wal-dir= \
  --snapshot-count=50000 \
  --auto-compaction-retention=1 \
  --auto-compaction-mode=periodic \
  --max-request-bytes=10485760 \
  --quota-backend-bytes=8589934592
Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

etcd进阶-选举简介

节点角色:集群中每个节点只能处于 Leader、Follower 和 Candidate 三种状态的一种
 follower:追随者(Redis Cluster的Slave节点)
 candidate:候选节点,选举过程中。
 leader:主节点(Redis Cluster的Master节点)

 
 
 节点启动后基于termID(任期ID)进行相互投票,termID是一个整数默认值为0,在Raft算法中,一个term代表leader
的一段任期周期,每当一个节点成为leader时,就会进入一个新的term, 然后每个节点都会在自己的term ID上加1,
以便与上一轮选举区分开来。
 etcd进阶-选举
首次选举:
 1、各etcd节点启动后默认为 follower角色、默认termID为0、如果发现集群内没有leader,则会变成 candidate角色并进行选举 leader。

 2、candidate(候选节点)向其它候选节点发送投票信息(RequestVote),默认投票给自己。

 3、各候选节点相互收到另外的投票信息(如A收到BC的,B收到AC的,C收到AB的),然后对比日志是否比自己的更新,如果比自己的更新,则将自己的选票投给目的候选人,并回复一个包含自己最新日志信息的响应消息,如果C的日志更新,那么将会得到A、B、C的投票,则C全票当选,如果B挂了,得到A、C的投票,则C超过半票当选。

 4、C向其它节点发送自己leader心跳信息,以维护自己的身份(heartbeatinterval、默认100毫秒)。

 5、其它节点将角色切换为Follower并向leader同步数据。

 6、如果选举超时(election-timeout )、则重新选举,如果选出来两个leader,则超过集群总数半票的生效。


 后期选举:
 当一个follower节点在规定时间内未收到leader的消息时,它将转换为candidate状态,向其他节点发送投票请求(自己的term ID和日志更新记录),并等待其他节点的响应,如果该candidate的(日志更新记录最新),则会获多数投票,它将成为新的leader。
 新的leader将自己的termID +1 并通告至其它节点。
 如果旧的leader恢复了,发现已有新的leader,则加入到已有的leader中并
将自己的term ID更新为和leader一致,在同一个任期内所有节点的term ID是
一致的。

etcd进阶-查看成员信息

 配置 优化:
--max-request-bytes=10485760 #request size limit(请求的最大字节数,默认一个key最大1.5Mib,官方推荐最大不要超出10Mib)
--quota-backend-bytes=8589934592 #storage size limit(磁盘存储空间大小限制,默认为2G,此值超过8G启动会有警告信息)


 集群碎片整理:
[root@k8s-etcd1 ~]#ETCDCTL_API=3 /usr/local/bin/etcdctl defrag --cluster   --endpoints=https://10.0.0.116:2379    --cacert=/etc/kubernetes/ssl/ca.pem   --cert=/etc/kubernetes/ssl/etcd.pem   --key=/etc/kubernetes/ssl/etcd-key.pem
Finished defragmenting etcd member[https://10.0.0.116:2379]
Finished defragmenting etcd member[https://10.0.0.117:2379]
Finished defragmenting etcd member[https://10.0.0.118:2379]





 etcd有多个不同的API访问版本,v1版本已经废弃,etcd v2 和 v3 本质上是共享同一套 raft 协议代码的两个独立的应用,接口不一样,存储不一
样,数据互相隔离。也就是说如果从 Etcd v2 升级到 Etcd v3,原来v2 的数据还是只能用 v2 的接口访问,v3 的接口创建的数据也只能访问通过 
v3 的接口访问。
WARNING:
 Environment variable ETCDCTL_API is not set; defaults to etcdctl v2. #默认使用V2版本
Set environment variable ETCDCTL_API=3 to use v3 API or ETCDCTL_API=2 to use v2 API. #设置API版本
root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl --help
 root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl member --help
 NAME:
 etcdctl member - member add, remove and list subcommands
 USAGE:
 etcdctl member command [command options] [arguments...]
 COMMANDS:
 list    enumerate existing cluster members
 add     add a new member to the etcd cluster
 remove  remove an existing member from the etcd cluster
 update  update an existing member in the etcd cluster
 OPTIONS:--help, -h  show help
 root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl member list



[root@k8s-etcd1 ~]#ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table member list   --endpoints=https://10.0.0.116:2379 --cacert=/etc/kubernetes/ssl/ca.pem   --cert=/etc/kubernetes/ssl/etcd.pem   --key=/etc/kubernetes/ssl/etcd-key.pem
+------------------+---------+-----------------+-------------------------+-------------------------+------------+
|        ID        | STATUS  |      NAME       |       PEER ADDRS        |      CLIENT ADDRS       | IS LEARNER |
+------------------+---------+-----------------+-------------------------+-------------------------+------------+
| ad494a8abe2cd50c | started | etcd-10.0.0.116 | https://10.0.0.116:2380 | https://10.0.0.116:2379 |      false |
| b5f9408145d08046 | started | etcd-10.0.0.117 | https://10.0.0.117:2380 | https://10.0.0.117:2379 |      false |
| eb9b6ed34d1464a0 | started | etcd-10.0.0.118 | https://10.0.0.118:2380 | https://10.0.0.118:2379 |      false |
+------------------+---------+-----------------+-------------------------+-------------------------+------------+

etcd进阶-验证节点心跳状态


[root@k8s-etcd1 ~]#export NODE_IPS="10.0.0.116 10.0.0.117 10.0.0.118"

[root@k8s-etcd1 ~]#for ip in ${NODE_IPS}; do   ETCDCTL_API=3 /usr/local/bin/etcdctl   --endpoints=https://${ip}:2379    --cacert=/etc/kubernetes/ssl/ca.pem   --cert=/etc/kubernetes/ssl/etcd.pem   --key=/etc/kubernetes/ssl/etcd-key.pem   endpoint health; done
https://10.0.0.116:2379 is healthy: successfully committed proposal: took = 18.725626ms
https://10.0.0.117:2379 is healthy: successfully committed proposal: took = 18.984525ms
https://10.0.0.118:2379 is healthy: successfully committed proposal: took = 26.566695ms

etcd进阶-详细信息

[root@k8s-etcd1 ~]#export NODE_IPS="10.0.0.116 10.0.0.117 10.0.0.118"


[root@k8s-etcd1 ~]#for ip in ${NODE_IPS}; do   ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status   --endpoints=https://${ip}:2379    --cacert=/etc/kubernetes/ssl/ca.pem   --cert=/etc/kubernetes/ssl/etcd.pem   --key=/etc/kubernetes/ssl/etcd-key.pem; done
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.0.0.116:2379 | ad494a8abe2cd50c |   3.5.5 |  623 kB |     false |      false |        59 |     194500 |             194500 |        |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.0.0.117:2379 | b5f9408145d08046 |   3.5.5 |  623 kB |      true |      false |        59 |     194500 |             194500 |        |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.0.0.118:2379 | eb9b6ed34d1464a0 |   3.5.5 |  623 kB |     false |      false |        59 |     194500 |             194500 |        |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

etcd进阶-查看etcd数据

 root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl get / --prefix --keys-only #以路径的方式所有key信息
 
 查看pod信息:
 root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl get / --prefix --keys-only | grep  pod 

 namespace信息:
 root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl get / --prefix --keys-only | grep namespaces

 
 查看deployment控制器信息:
 root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl get / --prefix --keys-only | grep deployment 

 查看calico组件信息:
 root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl get / --prefix --keys-only | grep calico 

#auger解码
[root@k8s-etcd1 ~]#etcdctl get /registry/pods/kube-system/calico-node-phl9w | auger decode

etcd进阶-etcd增删改查

#添加数据
 root@etcd1:~# ETCDCTL_API=3  /usr/local/bin/etcdctl put  /name "tom"
 OK
  #查询数据
 root@etcd1:~# ETCDCTL_API=3  /usr/local/bin/etcdctl get /name
 /name
 tom
  #改动数据,#直接覆盖就是更新数据
 root@etcd1:~#  ETCDCTL_API=3  /usr/local/bin/etcdctl put  /name "jack"
 OK
  #验证改动
 root@etcd1:~# ETCDCTL_API=3  /usr/local/bin/etcdctl get /name
 /name
 jack
  #删除数据
 root@etcd1:~# ETCDCTL_API=3  /usr/local/bin/etcdctl del /name
 1
  root@etcd1:~# ETCDCTL_API=3  /usr/local/bin/etcdctl get /name

etcd进阶etcd数据watch机制

 基于不断监看数据,发生变化就主动触发通知客户端,Etcd v3 的watch机制支持watch某个固定的key,也支持watch一个范围。
 #在etcd node1上watch一个key,没有此key也可以执行watch,后期可以再创建:
 root@k8s-etcd1:~# ETCDCTL_API=3  /usr/local/bin/etcdctl watch  /data
  #在etcd node2修改数据,验证etcd node1是否能够发现数据变化
 root@k8s-etcd2:~# ETCDCTL_API=3  /usr/local/bin/etcdctl put  /data "data v1"
 OK
  root@k8s-etcd2:~#  ETCDCTL_API=3  /usr/local/bin/etcdctl put  /data "data v1"
 OK
 
 [root@k8s-etcd2 ~]#ETCDCTL_API=3  /usr/local/bin/etcdctl watch  /data
 [root@k8s-etcd1 ~]#ETCDCTL_API=3  /usr/local/bin/etcdctl put  /data "data v1"
OK
[root@k8s-etcd1 ~]#ETCDCTL_API=3  /usr/local/bin/etcdctl put  /data "data v2"
OK
[root@k8s-etcd1 ~]#ETCDCTL_API=3  /usr/local/bin/etcdctl del  /data 
1


[root@k8s-etcd2 ~]#ETCDCTL_API=3  /usr/local/bin/etcdctl watch  /data
PUT
/data
data v1
PUT
/data
data v2
DELETE
/data

etcd进阶-etcd V3 API版本数据备份与恢复 

WAL是write ahead log(预写日志)的缩写,顾名思义,也就是在执行真正的写操作之前先写一个日志,预写日志。
 wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中,所有数据的修改在提交前,都要先写入到WAL中。
 V3版本备份数据:
 root@k8s-etcd1:~# ETCDCTL_API=3 etcdctl  snapshot save  snapshot.db

  V3版本恢复数据:
 root@k8s-etcd1:~#  ETCDCTL_API=3  etcdctl snapshot restore  snapshot.db  --data-dir=/opt/etcd-testdir #将数据恢复到一个新的不存在的目录中

 #自动备份数据
 root@k8s-etcd1:~# mkdir /data/etcd-backup-dir/ -p
  root@k8s-etcd1:~# cat  etcd-backup.sh 
#!/bin/bash
 source /etc/profile
 DATE=`date +%Y-%m-%d_%H-%M-%S`
 ETCDCTL_API=3 /usr/local/bin/etcdctl  snapshot save  /data/etcd-backup-dir/etcd-snapshot-${DATE}.db

etcd进阶-etcd 集群v3版本数据备份与恢复

修改成3.5.3的main.yaml

[root@k8s-deploy tasks]#pwd
/etc/kubeasz/roles/cluster-restore/tasks

#main.yml和etcd.service 对应的参数
[root@k8s-deploy tasks]#vim main.yml
- name: etcd 数据恢复
  shell: "cd /etcd_backup && \
	ETCDCTL_API=3 {{ bin_dir }}/etcdctl snapshot restore snapshot.db \
	--name etcd-{{ inventory_hostname }} \
	--initial-cluster {{ ETCD_NODES }} \
	--initial-cluster-token etcd-cluster-0 \
	--initial-advertise-peer-urls https://{{ inventory_hostname }}:2380"


[root@k8s-etcd1 ~]#cat /etc/systemd/system/etcd.service 
[Unit]
............................................
............................................
............................................
(省略)
[Service]
............................................
............................................

  --name=etcd-10.0.0.116 \
  --cert-file=/etc/kubernetes/ssl/etcd.pem \
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://10.0.0.116:2380 \
  --listen-peer-urls=https://10.0.0.116:2380 \
  --listen-client-urls=https://10.0.0.116:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://10.0.0.116:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-10.0.0.116=https://10.0.0.116:2380,etcd-10.0.0.117=https://10.0.0.117:2380,etcd-10.0.0.118=https://10.0.0.118:2380 \
  --initial-cluster-state=new \
  
............................................
............................................

[root@k8s-master1 ~]#kubectl apply -f nginx.yaml
[root@k8s-master1 ~]#kubectl get pod -n myserver -o wide
NAME                                         READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
myserver-nginx-deployment-6dc97c87d7-zgtkp   1/1     Running   0          36m   10.200.36.71   10.0.0.111   <none>           <none>

root@k8s-deploy:/etc/kubeasz# ./ezctl backup  k8s-cluster1
root@k8s-deploy:/etc/kubeasz# kubectl  get deployment -n myserver
root@k8s-deploy:/etc/kubeasz# kubectl  delete deployment -n myserver myserver-nginx-deployment
root@k8s-deploy:/etc/kubeasz# ./ezctl backup  k8s-cluster1
在恢复数据期间API server不可用,必须在业务低峰期操作或者是在其它紧急场景:
root@k8s-deploy:/etc/kubeasz# grep  db_to_restore ./roles/ -R #选择恢复的文件
./roles/cluster-restore/defaults/main.yml:db_to_restore: "snapshot.db"
./roles/cluster-restore/tasks/main.yml:    src: "{{ cluster_dir }}/backup/{{ db_to_restore }}"

#将第一次全量备份覆盖到snapshot.db
[root@k8s-deploy backup]#cp snapshot_202411081550.db snapshot.db 
[root@k8s-deploy backup]#ll
total 7808
drwxr-xr-x 2 root root    4096 Nov  8 16:00 ./
drwxr-xr-x 5 root root    4096 Oct 19 06:23 ../
-rw------- 1 root root 2658336 Nov  8 15:50 snapshot_202411081550.db
-rw------- 1 root root 2658336 Nov  8 15:53 snapshot_202411081553.db
-rw------- 1 root root 2658336 Nov  8 16:02 snapshot.db

[root@k8s-deploy kubeasz]#./ezctl restore k8s-cluster1

验证恢复后的集群状态

[root@k8s-master1 ~]#kubectl  get deployment -n myserver
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
myserver-nginx-deployment   1/1     1            1           37m

etcd进阶-ETCD数据恢复流程

当etcd集群宕机数量超过集群总节点数一半以上的时候(如总数为三台宕机两台),就会导致整合集群宕机,后期需要重新恢复数据,
则恢复流程如下:

 1、恢复服务器系统
 2、重新部署ETCD集群
 3、停止kube-apiserver/controller-manager/scheduler/kubelet/kube-proxy
 4、停止ETCD集群
 5、各ETCD节点恢复同一份备份数据
 6、启动各节点并验证ETCD集群
 7、启动kube-apiserver/controller-manager/scheduler/kubelet/kube-proxy
 8、验证k8s master状态及pod数据


http://www.kler.cn/a/394688.html

相关文章:

  • Android OpenGl(二) Shader
  • 嵌入式科普(25)Home Assistant米家集成意味着IOT的核心是智能设备
  • Excel中一次查询返回多列
  • Linux:alias别名永久有效
  • 基于YOLOV5+Flask安全帽RTSP视频流实时目标检测
  • 福特汽车物流仓储系统WMS:开源了,可直接下载
  • 基于树莓派的日志抓取工具制作
  • ssh和nfs
  • Vue之插槽(slot)
  • 力扣 LeetCode 344. 反转字符串(Day4:字符串)
  • 力扣889:根据先序和后序遍历构造二叉树
  • Spring Boot与Quartz定时任务集成:深入指南
  • Ubuntu中使用纯命令行进行Android开发
  • 【SQL】一文速通SQL
  • Spring Boot驱动的电子商务平台开发
  • 【go从零单排】SHA256 Hashes加密
  • 【已解决】git push一直提示输入用户名及密码、fatal: Could not read from remote repository的问题
  • 使用ensp配置单臂路由、静态路由,实现PC互相通信。
  • golang 实现比特币内核:从公钥创建wallet地址
  • 【缓存策略】你知道 Write Through(直写)这个缓存策略吗?
  • MySQL 的主从复制数据同步
  • 生成式GPT商品推荐:精准满足用户需求
  • 斯坦福iDP3——改进3D扩散策略以赋能人形机器人的训练:不再依赖相机校准和点云分割(含源码解析)
  • 计算机毕业设计Python+大模型农产品推荐系统 农产品爬虫 农产品商城 农产品大数据 农产品数据分析可视化 PySpark Hadoop
  • STM32系统的GPIO原理与结构
  • Python爬虫项目 | 一、网易云音乐热歌榜歌曲