当前位置：首页 > article >正文

ubuntu20.04安装k8sv1.26完整篇

article 2025/2/28 10:02:02

本文详细介绍了在 Ubuntu 20.04 上安装 Kubernetes 1.26.3-00 的步骤，包括环境配置、主机设置、kubeadm、kubectl 和 kubelet 的安装，以及containerd的配置。还涉及了集群初始化、节点加入、 Helm 安装、网络插件Calico的部署和coredns问题的排查与解决，为读者提供了一套完整的K8s集群搭建流程。

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.5 LTS
Release:	20.04
Codename:	focal

2个节点主机名分别是node01、node02
添加主机名解析
node01和node02都做解析
写到/etc/hosts文件中

192.168.30.4 node01
192.168.30.5 node02

规划
node01作为主节点
node02作为从节点

主机配置
说明：每个节点都执行。
#设置iptables设置，使其能够看到桥接流量

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
ip_vs
ip_vs_wrr
ip_vs_sh
ip_vs_rr
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF

应用参数

sudo sysctl --system

安装kubeadm kubectl kubelet
说明：所有节点都操作。
标题配置阿里k8s源

curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes.gpg

写入软件源列表

echo "deb [signed-by=/etc/apt/keyrings/kubernetes.gpg] https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

安装

sudo apt update
sudo apt install -y kubelet=1.26.3-00 kubeadm=1.26.3-00 kubectl=1.26.3-00

说明：这条命令执行的前提是containerd必须是启动的。
检查一下拉下来的pause镜像的版本，确保和 /etc/containerd/config.toml 里面sandbox_image的版本一致。

sudo kubeadm config images pull --image-repository=registry.aliyuncs.com/google_containers

配置crictl工具
说明：只在master节点（node01）操作即可，如有需要，所有节点都操作。
该工具跟docker命令差不多，也可用来查看管理containerd的镜像。

cat <<EOF> /etc/crictl.yaml 
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

安装配置containerd
说明：所有节点都操作。

问题排查：

k8s执行crictl images报错-CSDN博客

安装

apt-get  install containerd

配置部分参考官网截止本文发表前这个地址有效，如果无效，不用在意，照着本文配置即可。
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd
生成默认配置
注意：该配置文件默认没有，需要生成一份默认配置，在此基础上更改。

mkdir /etc/containerd
containerd config default > /etc/containerd/config.toml

修改cgroups
找到/etc/containerd/config.toml配置中如下配置，将SystemdCgroup值改为true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

确保/etc/containerd/config.toml配置文件中disabled_plugins列表中没有cri，如果有删除掉。如下

disabled_plugins = []

修改/etc/containerd/config.toml配置文件中sandbox_image的值，该值要执行命令
kubeadm config images pull --image-repository=registry.aliyuncs.com/google_containers获取。

sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"

containerd加入开机自启和启动

systemctl enable containerd
systemctl start containerd
systemctl restart containerd

初始化集群
说明：master节点操作。

kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.26.3 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all

root@ubuntu:~# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.26.3 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all
[init] Using Kubernetes version: v1.26.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ubuntu] and IPs [10.96.0.1 192.168.39.6]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost ubuntu] and IPs [192.168.39.6 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost ubuntu] and IPs [192.168.39.6 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s

查看加入集群命令
说明：只在master节点（node01）操作。
执行完kubeadm init命令成功后，会有加入集群命令输出，如果忘记了，执行如下命令即可。

kubeadm token create --print-join-command

kubeadm join 192.168.30.4:6443 --token ftb7lz.919ch4z4h6yqihqp --discovery-token-ca-cert-hash sha256:7a9fc662d5bfb999a6551235f30f1f76f277abc40a0a9bf7fd3381670fe1fc98

配置KUBECONFIG
说明：在想要操作k8s集群的节点加，本文加在了master（node01）节点上。

echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bashrc
source ~/.bashrc

加入集群
只在node02操作。如果后续有机器加入，先执行所有节点都操作的命令后，再加入集群即可。
在master节点（node01）执行kubeadm token create --print-join-command命令查看加入集群的命令。
执行如下命令加入集群

kubeadm join 192.168.30.4:6443 --token ftb7lz.919ch4z4h6yqihqp --discovery-token-ca-cert-hash sha256:7a9fc662d5bfb999a6551235f30f1f76f277abc40a0a9bf7fd3381670fe1fc98

如果成功，输出类似如下

[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

安装helm

说明：在master节点（node01）执行。也可以在能操作k8s的任意节点安装。
参考helm官网
https://helm.sh/docs/intro/install/
下载二进制安装包

wget https://get.helm.sh/helm-v3.11.3-linux-amd64.tar.gz

解压安装包

tar zxvf helm-v3.11.3-linux-amd64.tar.gz

复制helm二进制命令

cp linux-amd64/helm /usr/local/bin/

确认安装成功

helm version

如果输出类似如下内容说明安装成功。

version.BuildInfo{Version:"v3.11.3", GitCommit:"323249351482b3bbfc9f5004f65d400aa70f9ae7", GitTreeState:"clean", GoVersion:"go1.20.3"}

部署网络插件
说明：在master节点（node01）执行。
本文采用helm方式安装

先添加helm源

helm repo add projectcalico https://projectcalico.docs.tigera.io/charts

安装完helm源后更新下

helm repo update 

helm search repo projectcalico

输出

NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
projectcalico/tigera-operator	v3.25.1      	v3.25.1    	Installs the Tigera operator for Calico

下载calico
注意版本和源中的可以不一致。另外k8s和calico有版本对应关系，详情参考官网
https://docs.tigera.io/calico/latest/getting-started/kubernetes/requirements

helm pull  projectcalico/tigera-operator --version v3.25.1

解压calico包

tar zxvf tigera-operator-v3.25.1.tgz

安装calico

helm install calico -n kube-system --create-namespace -f tigera-operator/values.yaml tigera-operator

如果报错如下，是因为未设置KUBECONFIG。

Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp 127.0.0.1:8080: connect: connection refused

解决

export KUBECONFIG=/etc/kubernetes/admin.conf

再次执行安装

helm install calico -n kube-system --create-namespace -f tigera-operator/values.yaml tigera-operator

输出如下表示helm执行成功。

NAME: calico
LAST DEPLOYED: Sat May 13 19:26:37 2023
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

默认会将calico安装在calico-system名称空间中。查看该名称空间的pod状态都正常即可，如果出现问题，查看我的另外一篇博文calico网络问题排查

kubectl get pod -n calico-system
NAME                                       READY   STATUS             RESTARTS   AGE
calico-kube-controllers-6bb86c78b4-p4hmv   1/1     Running            0          86m
calico-node-gzwwd                          1/1     Running            0          86m
calico-node-k2vkc                          1/1     Running            0          86m
calico-typha-674597d59d-4dknd              1/1     Running            0          86m
csi-node-driver-cwwf2                      0/2     ImagePullBackOff   0          86m
csi-node-driver-k9lkh                      1/2     ImagePullBackOff   0          86m

安装calicoctl
说明：没个节点都安装。
如果ubuntu无法访问github，安装可能失败，解决方法如下，打开windows cmd，ping gitbub.com
ping objects.githubusercontent.com，看输出的ip地址分别是什么，如下图

在ubuntu上/etc/hosts中配置域名解析

20.205.243.166 github.com
185.199.111.133 objects.githubusercontent.com

下载calicoctl客户端工具，注意要和服务端版本一致。我的服务端版本是3.25.1
下载命令
说明:每个节点都下载。

curl -L https://github.com/projectcalico/calico/releases/latest/download/calicoctl-linux-amd64 -o calicoctl-v3.25.1

复制命令到PATH路径下

cp calicoctl-v3.25.1 /usr/local/bin/calicoctl

授予执行权限

chmod +x /usr/local/bin/calicoctl

在node01、mode02分别执行如下命令确认calico安装成功。
注意PEER ADDRESS的值必须是对端的ip地址，否则calico-node的状态是0/1 running（不正常）。

calicoctl node status

node01输出

Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.30.5 | node-to-node mesh | up    | 11:29:01 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

node02输出

Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.30.4 | node-to-node mesh | up    | 11:29:00 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

coredns状态异常排查

kubectl logs -f  coredns-5bbd96d687-zcz4q -n kube-system

[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.9.3
linux/amd64, go1.18.2, 45b0a11
[FATAL] plugin/loop: Loop (127.0.0.1:35209 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 5731636628710972604.1610418445198542983."

日志中已经有了定位这个问题的网址
https://coredns.io/plugins/loop/#troubleshooting
根据官网说法
CoreDNS日志包含消息循环时的故障排除…发现…这意味着循环检测插件已经在一个上游DNS服务器中检测到无限转发循环。这是一个致命的错误，因为无限循环操作将消耗内存和CPU，直到主机最终内存不足而死亡。转发循环通常由以下原因引起:最常见的是，CoreDNS将请求直接转发给自己。例如经由诸如127.0.0.1、:1或127.0.0.53的回送地址，CoreDNS转发到上游服务器，上游服务器又将请求转发回CoreDNS。要解决此问题，请在Corefile中查找检测到环路的区域的任何转发。确保它们没有转发到本地地址或另一个将请求转发回CoreDNS的DNS服务器。如果forward使用文件(例如/etc/resolv.conf)，请确保该文件不包含本地地址。
当部署在Kubernetes中的CoreDNS Pod检测到环路时，CoreDNS Pod将开始“CrashLoopBackOff”。这是因为每当CoreDNS检测到循环并退出时，Kubernetes都会尝试重新启动Pod。

Kubernetes集群中转发循环的一个常见原因是与主机节点上的本地DNS缓存的交互(例如systemd-resolved)。例如，在某些配置中，systemd-resolved会将环回地址127.0.0.53作为名称服务器放入/etc/resolv.conf。Kubernetes(通过kubelet)默认情况下会将此/etc/resolv . conf文件传递给所有使用默认DNS策略的pod，使它们无法进行DNS查找(这包括CoreDNS Pods)。CoreDNS使用/etc/resolv.conf作为转发请求的上游列表。由于它包含一个回送地址，CoreDNS最终会将请求转发给自己。

有许多方法可以解决这个问题，下面列出了一些方法:

将以下内容添加到kubelet配置yaml中:resolv conf:< path-to-your-real-resolv-conf-file >(或通过命令行flag - resolv-conf，在1.10中已弃用)。“真正的”resolv.conf包含上游服务器的实际IP地址，而没有本地/环回地址。这个标志告诉kubelet将一个替代的resolv.conf传递给Pods。对于使用systemd-resolved的系统，/run/systemd/resolve/resolv.conf通常是“真正的”resolv . conf的位置，尽管这可能因您的发行版而异。
禁用主机节点上的本地DNS缓存，并将/etc/resolv.conf恢复到原始状态。
一个快速和肮脏的修复是编辑您的核心文件，取代前进。/etc/resolv.conf带有您的上游DNS的IP地址，例如forward。8.8.8.8 .但这只是修复了CoreDNS的问题，kubelet会继续将无效的resolv.conf转发给所有默认的dnsPolicy Pods，让它们无法解析DNS.

coredns问题解决
cat /etc/netplan/00-installer-config.yaml
增加

nameservers:
addresses: [114.114.114.114]
修改后的/etc/netplan/00-installer-config.yaml文件内容

network:
  ethernets:
    ens33:
      addresses: [192.168.30.4/24]
      routes:
      - to: "default"
        via: "192.168.30.2"
      nameservers:
        addresses: [114.114.114.114]

应用配置
netplan apply
删除coredns的pod，等待重启即可。
仅仅配置好了配置文件执行sudo netplan apply是不能生效的，需要将配置软连接指向/etc/resolv.conf，参考这里.
具体操作（本文未做这个操作）
1、sudo rm -rf /etc/resolv.conf
2、sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf
确认集群状态
pod全部running即可。

kubectl get pod -A
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE
calico-apiserver   calico-apiserver-565d577889-blchx          1/1     Running   0          105m
calico-apiserver   calico-apiserver-565d577889-qwkwz          1/1     Running   0          105m
calico-system      calico-kube-controllers-6bb86c78b4-p4hmv   1/1     Running   0          110m
calico-system      calico-node-gzwwd                          1/1     Running   0          110m
calico-system      calico-node-k2vkc                          1/1     Running   0          110m
calico-system      calico-typha-674597d59d-4dknd              1/1     Running   0          110m
calico-system      csi-node-driver-cwwf2                      2/2     Running   0          110m
calico-system      csi-node-driver-k9lkh                      2/2     Running   0          110m
kube-system        coredns-5bbd96d687-pxm2v                   1/1     Running   0          6m1s
kube-system        coredns-5bbd96d687-vqjfr                   1/1     Running   0          3m23s
kube-system        etcd-node01                                1/1     Running   0          162m
kube-system        kube-apiserver-node01                      1/1     Running   0          162m
kube-system        kube-controller-manager-node01             1/1     Running   0          162m
kube-system        kube-proxy-gw294                           1/1     Running   0          130m
kube-system        kube-proxy-pkq42                           1/1     Running   0          162m
kube-system        kube-scheduler-node01                      1/1     Running   0          162m
kube-system        tigera-operator-5d6845b496-n6cq4           1/1     Running   0          110m

查看全文

http://www.kler.cn/a/289859.html