互联网环境下CentOS7部署K8S
1.环境介绍
操作系统:CentOS Linux release 7.9.2009 (Core)
配置规格:2C 2G 30G VMware虚拟机
本安装手册将搭建一主两从K8S环境,并使用containerd作为容器运行时。
2.准备工作(主从节点通用)
本章节部分可在master与node节点都执行。
2.1 更换软件源
#更换阿里yum源
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
yum makecache
#更换阿里docker源
yum install -y yum-utils
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
#更换K8S源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
2.2 修改主机名
hostnamectl set-hostname master
#修改/etc/hosts
[root@localhost ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.153.138 master
2.3 同步时间
yum install chrony ntpdate -y
sed "s/^server/#server/g" /etc/chrony.conf
echo 'server tiger.sina.com.cn iburst' >> /etc/chrony.conf
echo 'server ntp1.aliyun.com iburst' >> /etc/chrony.conf
systemctl enable chronyd ; systemctl start chronyd
ntpdate tiger.sina.com.cn
2.4 关闭selinux及firewalld
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
systemctl disable firewalld; systemctl stop firewalld
2.5 关闭swap
Kubernetes 1.8开始要求关闭系统的Swap,如果不关闭,默认配置下kubelet将无法启动。
如果swap被启用,当容器超出分配的内存使用时,操作系统会将部分内存页面交换到磁盘上,即swap空间,这会导致容器性能急剧下降。此外,K8S可能无法准确监测和控制容器的实际资源使用情况,导致资源管理失败。
[root@localhost ~]# free -h
total used free shared buff/cache available
Mem: 1.8G 828M 117M 41M 872M 791M
Swap: 2.0G 3.5M 2.0G
[root@localhost ~]# sudo swapoff -a
[root@localhost ~]# sudo sed -i 's/.*swap.*/#&/' /etc/fstab
[root@localhost ~]# free -h
total used free shared buff/cache available
Mem: 1.8G 829M 112M 45M 877M 788M
Swap: 0B 0B 0B
2.6 启用数据包转发
设置net.bridge.bridge-nf-call-iptables = 1 #启用 Linux 系统中网络桥接模式下的 iptables 包过滤功能,允许对桥接设备上的网络流量进行 iptables 规则的过滤和转发
设置net.ipv4.ip_forward = 1 #启用 Linux 内核的 IP 转发功能,允许 Linux 主机作为一个路由器来转发 IP 数据包。
modprobe overlay
modprobe br_netfilter
# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# 应用 sysctl 参数而不重新启动
sudo sysctl --system
2.7 安装并配置 bash-completion,添加命令自动补充
yum -y install bash-completion
source /etc/profile
2.8 安装docker
K8S 1.24版本后已移除Dockershim,如果要继续使用Docker Engine作为容器运行时,需要额外安装cri-dockerd,比较麻烦。关于容器运行时的介绍请见官方文档。
但在安装docker时会默认安装containerd,因此本教程仍然安装docker,使用containerd作为容器运行时。
yum install docker-ce -y
systemctl start docker
systemctl enable docker
2.9 配置containerd
2.9.1 修改disabled_plugins
使用安装包安装的containerd会默认禁用作为容器运行时的功能,需要修改containerd配置文件。
cd /etc/containerd/
mv config.toml config.toml.orig
containerd config default > config.toml
[root@master containerd]# cat /etc/containerd/config.toml|grep disabled_plugins
disabled_plugins = []
2.9.2 配置systemd cgroup驱动
cgroups 的全称是 Linux Control Groups,主要作用是限制、记录和隔离进程组使用的物理资源(cpu、memory、IO 等)。
k8s有两种 cgroup 驱动:一种是 systemd,另外一种是 cgroupfs。使用 containerd 作为 CRI 运行时的时候,官方使用systemd作为cgroup驱动。
systemd是系统自带的cgroup管理器, 系统初始化就存在的, 和cgroups联系紧密,为每一个进程分配cgroups, 用它管理就行了. 如果设置成cgroupfs就存在2个cgroup控制管理器, 实验证明在资源有压力的情况下,会存在不稳定的情况。
由于 kubeadm 把 kubelet 视为一个系统服务来管理,所以对基于 kubeadm 的安装kubernetes, 我们推荐使用 systemd
驱动,不推荐 cgroupfs
驱动。
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
2.9.3 配置sandbox镜像
sed -i "s#registry.k8s.io/pause#registry.aliyuncs.com/google_containers/pause#g" /etc/containerd/config.toml
2.9.4 添加镜像加速源
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://hub-mirror.c.163.com"]
2.9.5 重启containerd
systemctl enable containerd;systemctl restart containerd
3.master节点安装
3.1 安装kubeadm、kubectl、kubelet
yum install -y kubeadm kubelet
systemctl enable kubelet && systemctl start kubelet
3.2 生成集群初始化文件
kubeadm config print init-defaults > kubeadm-init.yml
[root@master ~]# cat kubeadm-init.yml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 0s #修改token过期时间为无限制
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: x.x.x.x #修改为k8s-master节点IP
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: master #修改为你的主机名
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers #替换为国内的镜像仓库
kind: ClusterConfiguration
kubernetesVersion: 1.28.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/16 #为pod网络指定网络段,建议配置
scheduler: {}
3.3 查看镜像版本
[root@master ~]# kubeadm config images list --config=kubeadm-init.yml
registry.aliyuncs.com/google_containers/kube-apiserver:v1.28.0
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.28.0
registry.aliyuncs.com/google_containers/kube-scheduler:v1.28.0
registry.aliyuncs.com/google_containers/kube-proxy:v1.28.0
registry.aliyuncs.com/google_containers/pause:3.9
registry.aliyuncs.com/google_containers/etcd:3.5.9-0
registry.aliyuncs.com/google_containers/coredns:v1.10.1
注意:再次确认 registry.aliyuncs.com/google_containers/pause:3.9 就是 /etc/containerd/config.toml 中所需要填写正确的 pause镜像及版本号。
[root@master ~]# cat /etc/containerd/config.toml |grep sandbox
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
#重启containerd
[root@master ~]# systemctl restart containerd.service
3.4 下载镜像
kubeadm config images pull --config=kubeadm-init.yml
3.5 初始化集群
执行初始化并将日志记录到 kubeadm-init.log 日志文件中
kubeadm init --config=kubeadm-init.yml | tee kubeadm-init.log
初始化成功后,显示以下内容,按要求执行,最后的语句最好保存下来:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.153.138:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:3e887c7c4a8134d1076df1990555a37b33c3d78caed8c69f9e7bc48c411b77d9
如果遇到错误的情况,重新初始化前需要使用 kubeadm reset 重置,然后重启主机,再次进行初始化。
3.6 检查集群
[root@master ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy ok
3.7安装网络插件Calico
可自行选择网络插件,详见官网,此教程选择Calico。
curl https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/canal.yaml -O
#在3.2章节若配置pod cidr为10.244.0.0/16可直接执行下一步
kubectl apply -f canal.yaml
由于众所周知的原因,安装过程中镜像可能拉取失败,有几种方法解决,可自行选取解决办法:
- 使用有效的镜像加速【比较困难】
- 使用代理
- 离线导入镜像【小规模测试搭建时推荐使用】
- 搭建私有镜像仓库
安装完成后,node状态变为Ready
[root@master ~]# kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-8d76c5f9b-xmkbg 1/1 Running 0 121m
kube-system canal-kwh5j 2/2 Running 2 (5m21s ago) 121m
kube-system coredns-66f779496c-fvkxl 1/1 Running 0 3h32m
kube-system coredns-66f779496c-gz7pf 1/1 Running 0 3h32m
kube-system etcd-node 1/1 Running 1 (5m21s ago) 3h32m
kube-system kube-apiserver-node 1/1 Running 1 (5m21s ago) 3h32m
kube-system kube-controller-manager-node 1/1 Running 6 (5m21s ago) 3h32m
kube-system kube-proxy-7vm47 1/1 Running 1 (5m21s ago) 3h32m
kube-system kube-scheduler-node 1/1 Running 5 (5m21s ago) 3h32m
[root@master ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
node Ready control-plane 3h32m v1.28.2
4.node节点安装
4.1 安装kubeadm、kubelet
yum install -y kubeadm kubelet
systemctl enable kubelet && systemctl start kubelet
4.2 加入集群
在完成第二章节所有操作后,可以通过3.5章节给出的初始化命令加入集群。
Then you can join any number of worker nodes by running the following on each as root:
#执行以下语句
kubeadm join 192.168.153.138:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:3e887c7c4a8134d1076df1990555a37b33c3d78caed8c69f9e7bc48c411b77d9
如果join语句没有记录,可以通过命令重新打印:
kubeadm token create --print-join-command
附:报错参考
报错1
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2024-07-20T17:07:01+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
如果你从软件包(例如,RPM 或者 .deb)中安装 containerd,你可能会发现其中默认禁止了 CRI 集成插件。
你需要启用 CRI 支持才能在 Kubernetes 集群中使用 containerd。 要确保 cri
没有出现在 /etc/containerd/config.toml
文件中 disabled_plugins
列表内。如果你更改了这个文件,请重启 containerd
。
[root@master ~]# cat /etc/containerd/config.toml|grep disabled_plugins
disabled_plugins = []
[root@master ~]# systemctl restart containerd.service
报错2
执行crictl命令时报错:
FATA[0000] listing containers: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"
手动执行以下命令,指定运行时入口
[root@master ~]# crictl config runtime-endpoint unix:///run/containerd/containerd.sock
[root@master ~]# crictl config image-endpoint unix:///run/containerd/containerd.sock