K8S中Pod调度之污点和容忍
污点和容忍
-
在 Kubernetes 中,污点(Taints)和容忍(Tolerations)是调度系统中的两个重要概念,它们允许管理员对节点(Node)进行标记,以此来影响 Pod 的调度行为。
-
前面的调度方式都是站在Pod的角度上,通过在Pod上添加属性,来确定Pod是否要调度到指定的Node上,其实我们也可以站在Node的角度上,通过在Node上添加污点属性,来决定是否允许Pod调度过来。
-
Node被设置上污点之后就和Pod之间存在了一种相斥的关系,进而拒绝Pod调度进来,甚至可以将已经存在的Pod驱逐出去。
污点(Taints)
污点是添加到节点上的标记,用来指示某种限制条件,它们可以阻止不符合特定条件的 Pod 被调度到这些节点上。污点的一般格式为:key=value:effect
-
key:污点的标识符。
-
value:污点的值,可以为空,用于进一步指定污点的条件。
-
effect:污点的效果,可以是以下几种之一:
-
NoSchedule:这是默认的效果,如果未指定。带有此效果的污点会阻止新的 Pod 被调度到该节点上,但不影响已经在上面运行的 Pod。
-
PreferNoSchedule:带有此效果的污点会推荐不要把 Pod 调度到该节点上,但不是强制的。如果实在没有其他节点可用,Pod 仍然可以被调度到这个节点。
-
NoExecute:带有此效果的污点不仅阻止新的 Pod 被调度到该节点,还会驱逐(驱逐是指 Pod 被删除,其资源被释放)所有不容忍该污点的现有 Pod。
-
基本命令:
-
设置污点
$ kubectl taint nodes NODE_NAME key=value:effect
- NODE_NAME 要添加污点的节点名称。
- key 是污点的标识符。
- value 是与污点相关联的值(如果需要)。
- effect 是污点的效果,可以是 NoSchedule, PreferNoSchedule 或 NoExecute。
- 去除污点
$ kubectl taint nodes NODE_NAME key=value:effect-
- 将 effect 替换为 -(连字符),这告诉 Kubernetes 移除该特定 key 和 value 的污点。
- 去除所有污点
$ kubectl taint nodes NODE_NAME key:effect-
- 由于没有指定 key 的值,这将移除所有匹配该 key 的污点。
-
设置污点为PreferNoSchedule
将node2节点机器关机,只留master和node1节点
在node1节点上设置PreferNoSchedule污点
[root@k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 22d v1.21.10
k8s-node1 Ready <none> 22d v1.21.10
k8s-node2 NotReady <none> 22d v1.21.10
[root@k8s-master ~]# kubectl taint nodes k8s-node1 tag=wyx:PreferNoSchedule
node/k8s-node1 tainted
[root@k8s-master ~]# kubectl describe nodes k8s-node1
Name: k8s-node1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8s-node1
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 192.168.58.232/24
projectcalico.org/IPv4IPIPTunnelAddr: 10.244.36.64
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 25 Dec 2024 07:56:26 -0500
Taints: tag=wyx:PreferNoSchedule
Unschedulable: false
Lease:
HolderIdentity: k8s-node1
AcquireTime: <unset>
RenewTime: Fri, 17 Jan 2025 03:27:42 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 17 Jan 2025 03:21:38 -0500 Fri, 17 Jan 2025 03:21:38 -0500 CalicoIsUp Calico is running on this node
MemoryPressure False Fri, 17 Jan 2025 03:23:05 -0500 Wed, 25 Dec 2024 07:56:26 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 17 Jan 2025 03:23:05 -0500 Wed, 25 Dec 2024 07:56:26 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 17 Jan 2025 03:23:05 -0500 Wed, 25 Dec 2024 07:56:26 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 17 Jan 2025 03:23:05 -0500 Wed, 25 Dec 2024 08:25:28 -0500 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.58.232
Hostname: k8s-node1
Capacity:
cpu: 2
ephemeral-storage: 17394Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3861288Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 16415037823
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3758888Ki
pods: 110
System Info:
Machine ID: a34fc0322dfe4557acf75b76f37487fb
System UUID: CD4D4D56-2260-3BF8-7A19-6F45865B4C71
Boot ID: b6caabb4-bcfd-4fef-b50a-8c9b0ff56f59
Kernel Version: 3.10.0-1160.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.8
Kubelet Version: v1.21.10
Kube-Proxy Version: v1.21.10
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-kube-controllers-697d846cf4-79hpj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 22d
kube-system calico-node-gc547 250m (12%) 0 (0%) 0 (0%) 0 (0%) 22d
kube-system coredns-6f6b8cc4f6-5nbb6 100m (5%) 0 (0%) 70Mi (1%) 170Mi (4%) 22d
kube-system coredns-6f6b8cc4f6-q9rhc 100m (5%) 0 (0%) 70Mi (1%) 170Mi (4%) 22d
kube-system kube-proxy-7hp6l 0 (0%) 0 (0%) 0 (0%) 0 (0%) 22d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 450m (22%) 0 (0%)
memory 140Mi (3%) 340Mi (9%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeReady 22d kubelet Node k8s-node1 status is now: NodeReady
Normal Starting 6m39s kube-proxy Starting kube-proxy.
[root@k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 22d v1.21.10
k8s-node1 Ready <none> 22d v1.21.10
k8s-node2 NotReady <none> 22d v1.21.10
[root@k8s-master ~]# kubectl create ns dev
namespace/dev created
[root@k8s-master ~]# kubectl run taint1 --image=nginx:1.17.1 -n dev
pod/taint1 created
[root@k8s-master ~]# kubectl get pod taint1 -n dev -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
taint1 0/1 ContainerCreating 0 14s <none> k8s-node1 <none> <none>
[root@k8s-master ~]# kubectl get pod taint1 -n dev -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
taint1 0/1 ContainerCreating 0 18s <none> k8s-node1 <none> <none>
taint1 1/1 Running 0 24s 10.244.36.71 k8s-node1 <none> <none>
^C[root@k8s-master ~]# kubectl describe pods taint1 -n dev
Name: taint1
Namespace: dev
Priority: 0
Node: k8s-node1/192.168.58.232
Start Time: Fri, 17 Jan 2025 03:29:06 -0500
Labels: run=taint1
Annotations: cni.projectcalico.org/containerID: a5db60ee3198eafa2d7e89b6a8f57030d33bb5bc6468d2b75431108f00da36d5
cni.projectcalico.org/podIP: 10.244.36.71/32
cni.projectcalico.org/podIPs: 10.244.36.71/32
Status: Running
IP: 10.244.36.71
IPs:
IP: 10.244.36.71
Containers:
taint1:
Container ID: docker://b43c30799394daaf62e7b9712da5a3c6c9a8ffd7dd71d07a21f26004e9ae0a92
Image: nginx:1.17.1
Image ID: docker-pullable://nginx@sha256:b4b9b3eee194703fc2fa8afa5b7510c77ae70cfba567af1376a573a967c03dbb
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 17 Jan 2025 03:29:30 -0500
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m47x8 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-m47x8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 56s default-scheduler Successfully assigned dev/taint1 to k8s-node1
Normal Pulling 54s kubelet Pulling image "nginx:1.17.1"
Normal Pulled 33s kubelet Successfully pulled image "nginx:1.17.1" in 20.507450779s
Normal Created 32s kubelet Created container taint1
Normal Started 32s kubelet Started container taint1
-
设置污点为NoSchedule
[root@k8s-master ~]# kubectl taint nodes k8s-node1 tag=wyx:PreferNoSchedule-
node/k8s-node1 untainted
[root@k8s-master ~]# kubectl taint nodes k8s-node1 tag=wyx:NoSchedule
node/k8s-node1 tainted
[root@k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 22d v1.21.10
k8s-node1 Ready <none> 22d v1.21.10
k8s-node2 NotReady <none> 22d v1.21.10
[root@k8s-master ~]# kubectl get pod taint1 -n dev -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
taint1 1/1 Running 0 2m58s 10.244.36.71 k8s-node1 <none> <none>
[root@k8s-master ~]# kubectl run taint2 --image=nginx:1.17.1 -n dev
pod/taint2 created
[root@k8s-master ~]# kubectl get pod taint2 -n dev -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
taint2 0/1 Pending 0 10s <none> <none> <none> <none>
[root@k8s-master ~]# kubectl describe pods taint2 -n dev
Name: taint2
Namespace: dev
Priority: 0
Node: <none>
Labels: run=taint2
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
taint2:
Image: nginx:1.17.1
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cb226 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-cb226:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 49s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate, 1 node(s) had taint {tag: wyx}, that the pod didn't tolerate.
Warning FailedScheduling 48s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate, 1 node(s) had taint {tag: wyx}, that the pod didn't tolerate.
- 设置污点为NoExecute
可见之前存在的pod全部被清除,而且新pod也不会被创建
[root@k8s-master ~]# kubectl taint nodes k8s-node1 tag=wyx:NoSchedule-
node/k8s-node1 untainted
[root@k8s-master ~]# kubectl taint nodes k8s-node1 tag=wyx:NoExecute
node/k8s-node1 tainted
[root@k8s-master ~]# kubectl get pod -n dev -o wide | grep k8s-node1
No resources found in dev namespace.
[root@k8s-master ~]# kubectl get pod -n dev -o wide
No resources found in dev namespace.
[root@k8s-master ~]# kubectl run taint3 --image=nginx:1.17.1 -n dev
pod/taint3 created
[root@k8s-master ~]# kubectl get pod -n dev -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
taint3 0/1 Pending 0 5s <none> <none> <none> <none>
问题:为什么创建pod时,pod不会被调度到master节点?
因为集群在创建时,master节点就默认加上了污点
容忍(Toleration)
-
在Kubernetes中,污点(Taint)是一种标记在节点上的特殊标记,用来指示该节点不希望某些Pods调度上去。而容忍(Toleration)是Pods的一个属性,它允许Pods忽视节点上的某些污点,从而允许Pods调度到这些节点上。
污点就是拒绝,容忍就是忽略,Node通过污点拒绝pod调度上去,Pod通过容忍忽略拒绝
[root@k8s-master ~]# kubectl describe node k8s-node1 | grep Taints
Taints: tag=wyx:NoExecute
#node1的污点为NoExecute,Pod是无法调度到node1的
[root@k8s-master ~]# vim pod-toleration.yaml
[root@k8s-master ~]# cat pod-toleration.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: pod-toleration
namespace: dev
spec:
containers:
- name: nginx
image: nginx:1.17.1
tolerations:
- key: "tag"
operator: "Equal"
value: "wyx"
effect: "NoExecute"
- key: "tag",这是节点上污点的键。
- operator: "Equal",这表明容忍将匹配具有相同键和值的污点。
- value: "wyx",这是节点上污点的值。
- effect: "NoExecute",这是污点的效果,表示如果Pod没有相应的容忍,则不会被调度到该节点上,如果已经存在,则会被驱逐。
[root@k8s-master ~]# kubectl apply -f pod-toleration.yaml
pod/pod-toleration created
[root@k8s-master ~]# kubectl get pod pod-toleration -n dev -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-toleration 1/1 Running 0 9s 10.244.36.73 k8s-node1 <none> <none>