测试环境下因网络环境变化导致集群无法正常使用解决办法
环境信息:
设备信息 | y9000p |
---|---|
kubernetes部署方式 | sealos |
kubernetes版本 | v1.28.0 |
原节点ip | 192.168.157.214 |
现节点ip | 192.168.31.187 |
节点名称 | ubuntu |
简单概述流程:
查看镜像是否启动–修改hosts解析–修改etcd.yaml、kube-apiserver.yaml–重启containerd.service、kubelet.service
问题及解决方法
所出现问题
在因网络环境变化(更换网卡/更换网络环境)导致集群无法正常使用出现如下错误,无法获取到节点/pod信息,节点报错apiserver
6443
类似字样
root@ubuntu:~# kubectl get node
E0812 22:41:24.005515 7428 memcache.go:265] couldn't get current server API group list: Get "https://apiserver.cluster.local:6443/api?timeout=32s": dial tcp 192.168.157.214:6443: connect: connection timed out
E0812 22:41:27.077445 7428 memcache.go:265] couldn't get current server API group list: Get "https://apiserver.cluster.local:6443/api?timeout=32s": dial tcp 192.168.157.214:6443: connect: connection timed out
E0812 22:41:30.149592 7428 memcache.go:265] couldn't get current server API group list: Get "https://apiserver.cluster.local:6443/api?timeout=32s": dial tcp 192.168.157.214:6443: connect: connection timed out
E0812 22:41:33.221398 7428 memcache.go:265] couldn't get current server API group list: Get "https://apiserver.cluster.local:6443/api?timeout=32s": dial tcp 192.168.157.214:6443: connect: connection timed out
E0812 22:41:36.293482 7428 memcache.go:265] couldn't get current server API group list: Get "https://apiserver.cluster.local:6443/api?timeout=32s": dial tcp 192.168.157.214:6443: connect: connection timed out
Unable to connect to the server: dial tcp 192.168.157.214:6443: connect: connection timed out
所需镜像均属于退出状态Exited
root@ubuntu:~# crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
5b68bc76fedef 7a640256a07e2 31 hours ago Exited config-manager 3 d4217aa5554dd nvidia-device-plugin-daemonset-4vtbs
612d67529d3a6 fdd26b65ab602 31 hours ago Exited gc 3 8f452c6821b37 gpu-operator-node-feature-discovery-gc-5df6bddb8b-vmpcl
bf7d11f8ee2c8 7a640256a07e2 31 hours ago Exited nvidia-device-plugin 3 d4217aa5554dd nvidia-device-plugin-daemonset-4vtbs
db22f08530e49 7a640256a07e2 31 hours ago Exited config-manager 3 fa17c047f63ad gpu-feature-discovery-lbwmz
3a3dfbb33d44b 2ec212cbadbe4 31 hours ago Exited toolkit-validation 3 d4217aa5554dd nvidia-device-plugin-daemonset-4vtbs
87eb6c164bad6 7a640256a07e2 31 hours ago Exited gpu-feature-discovery 3 fa17c047f63ad gpu-feature-discovery-lbwmz
1aeb2fa6d560b ead0a4a53df89 31 hours ago Exited coredns 5 caf38f94689f5 coredns-5dd5756b68-xmwjs
6d8f5f99cf21d 7516425fa9421 31 hours ago Exited calico-apiserver 5 b97616cffa399 calico-apiserver-75cc9b7f5f-dljzs
3c9cbe3bc8a08 2ec212cbadbe4 31 hours ago Exited toolkit-validation 3 fa17c047f63ad gpu-feature-discovery-lbwmz
fea38fce9515c 91f7e5f552688 31 hours ago Exited csi-node-driver-registrar 5 be967adf01956 csi-node-driver-gh5pm
f3246bbcabe18 eaf0c970c8270 31 hours ago Exited calico-csi 5 be967adf01956 csi-node-driver-gh5pm
fe06e16fc2d62 fdd26b65ab602 31 hours ago Exited worker 12 2d52da36853d7 gpu-operator-node-feature-discovery-worker-qjzfk
775dcd0e12799 6b1e38763f401 31 hours ago Exited calico-kube-controllers 5 142042851f058 calico-kube-controllers-85579cbc49-p8s88
8f1e61a939e02 7516425fa9421 31 hours ago Exited calico-apiserver 5 8eb76b56b2bdf calico-apiserver-75cc9b7f5f-5l4jw
b43e1b929c6b9 911a443af3a48 31 hours ago Exited gpu-operator 12 e399516203802 gpu-operator-5f444d849d-mbl27
4100d57f93018 2ec212cbadbe4 31 hours ago Exited nvidia-operator-validator 3 ee54a45bf2d57 nvidia-operator-validator-6fqb8
257893d7cd00b ead0a4a53df89 31 hours ago Exited coredns 5 713cf378bd6a4 coredns-5dd5756b68-86bc7
ab6561e4b7126 2ec212cbadbe4 31 hours ago Exited driver-validation 3 ee54a45bf2d57 nvidia-operator-validator-6fqb8
b7f8860b2d4b7 fdd26b65ab602 31 hours ago Exited master 9 d9677ff0449bf gpu-operator-node-feature-discovery-master-5d7584755c-8xnqj
b5faef446898f 51bef383dc2e5 31 hours ago Exited tigera-operator 8 8e610fe113ab9 tigera-operator-7fdd699b8c-hnv79
8514821c33cc7 dd1ce37f1c317 31 hours ago Exited calico-typha 6 b05634e1e6675 calico-typha-5f5677c9dd-dfjpp
a9a17552c2f1c ea1030da44aa1 31 hours ago Exited kube-proxy 6 528572f3ed53d kube-proxy-sffq7
9cebc8c53196b 3dd4390f2a85a 31 hours ago Exited calico-node 7 513aaf11ebda8 calico-node-dfxlr
a3a6c6e2abb8f 30289a4bd1d4e 31 hours ago Exited nccl-tests 3 070c046f77da1 nccl-tests-v7vjw
8c2278cd47bf8 72bfa61e35b35 31 hours ago Exited flexvol-driver 4 513aaf11ebda8 calico-node-dfxlr
cd408e61cc334 f6f496300a2ae 31 hours ago Exited kube-scheduler 14 4ee75e3ba82b7 kube-scheduler-ubuntu
5bea0c11e635c 4be79c38a4bab 31 hours ago Exited kube-controller-manager 15 95edf5bc520c2 kube-controller-manager-ubuntu
15756cc952a67 bb5e0dde9054c 31 hours ago Exited kube-apiserver 0 803c63008535d kube-apiserver-ubuntu
408d7a5bc729b 73deb9a3f7025 31 hours ago Exited etcd 0 f5662637ed317 etcd-ubuntu
9ea06cad676df 2ec212cbadbe4 2 days ago Exited nvidia-cuda-validator 0 e1a20a3801858 nvidia-cuda-validator-bz7kq
245809a709c98 2ec212cbadbe4 2 days ago Exited cuda-validation 0 e1a20a3801858 nvidia-cuda-validator-bz7kq
解决办法
修改/etc/hosts
,添加节点解析,注释之前ip解析
现节点ip:192.168.31.187
原节点ip:192.168.157.214
root@ubuntu:~# cat /etc/hosts
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.31.187 ubuntu
192.168.31.187 sealos.hub
192.168.31.187 apiserver.cluster.local
#192.168.157.214 ubuntu
#192.168.157.214 sealos.hub
#192.168.157.214 apiserver.cluster.local
修改kubernetes对应etcd.yaml
kube-apiserver.yaml
将文件内原有的192.168.157.214替换为192.168.31.187
sed -i 's/192\.168\.157\.214/192.168.31.187/g' /etc/kubernetes/manifests/etcd.yaml
sed -i 's/192\.168\.157\.214/192.168.31.187/g' /etc/kubernetes/manifests/kube-apiserver.yaml
重启containerd.service
kubelet.service
systemctl restart containerd.service
systemctl restart kube
验证集群是否可以正常使用
如下类似输出则集群可以正常使用
root@ubuntu:~# kubectl get node
NAME STATUS ROLES AGE VERSION
ubuntu `Ready` control-plane 4d1h v1.28.0
root@ubuntu:~# crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
2d11b1db35eff 91f7e5f552688 6 minutes ago Running csi-node-driver-registrar 6 df9733e23f3e7 csi-node-driver-gh5pm
0ed41fd7f9a6f eaf0c970c8270 6 minutes ago Running calico-csi 6 df9733e23f3e7 csi-node-driver-gh5pm
2368760944e60 7516425fa9421 6 minutes ago Running calico-apiserver 6 da5a649b35a56 calico-apiserver-75cc9b7f5f-dljzs
013500389e184 fdd26b65ab602 6 minutes ago Running master 10 e52ab9fe1d1e4 gpu-operator-node-feature-discovery-master-5d7584755c-8xnqj
152f16cab27c4 7a640256a07e2 6 minutes ago Running config-manager 4 aefd2bcbaa3c3 gpu-feature-discovery-lbwmz
e95cc4f072fbf 911a443af3a48 6 minutes ago Running gpu-operator 13 c6582ab36578b gpu-operator-5f444d849d-mbl27
fa7d4ea517695 7a640256a07e2 6 minutes ago Running gpu-feature-discovery 4 aefd2bcbaa3c3 gpu-feature-discovery-lbwmz
6bf6045d53a6f 2ec212cbadbe4 6 minutes ago Running toolkit-validation 4 aefd2bcbaa3c3 gpu-feature-discovery-lbwmz
728bebb205d51 fdd26b65ab602 6 minutes ago Running worker 13 9eb1195b29487 gpu-operator-node-feature-discovery-worker-qjzfk
5b444c7150c7e 7516425fa9421 6 minutes ago Running calico-apiserver 6 c26afbe9b56a8 calico-apiserver-75cc9b7f5f-5l4jw
0b39c5c57fe1f 6b1e38763f401 6 minutes ago Running calico-kube-controllers 6 7fdb765c82174 calico-kube-controllers-85579cbc49-p8s88
fa170b92c6fb1 ead0a4a53df89 6 minutes ago Running coredns 6 cab7c9ce8a1e2 coredns-5dd5756b68-86bc7
d6ba9005dae4d 7a640256a07e2 6 minutes ago Running config-manager 4 c9309555a93e6 nvidia-device-plugin-daemonset-4vtbs
62049d72ad9e0 2ec212cbadbe4 6 minutes ago Running nvidia-operator-validator 4 a9039c63470b0 nvidia-operator-validator-6fqb8
5fdf543364d44 ead0a4a53df89 6 minutes ago Running coredns 6 ddcc5612fd91c coredns-5dd5756b68-xmwjs
b0a1452a1c321 7a640256a07e2 6 minutes ago Running nvidia-device-plugin 4 c9309555a93e6 nvidia-device-plugin-daemonset-4vtbs
771ed3a63ab69 2ec212cbadbe4 6 minutes ago Running toolkit-validation 4 c9309555a93e6 nvidia-device-plugin-daemonset-4vtbs
7bf73a9e89229 fdd26b65ab602 6 minutes ago Running gc 4 37547dacde5b2 gpu-operator-node-feature-discovery-gc-5df6bddb8b-vmpcl
6b0a808f3b486 3dd4390f2a85a 6 minutes ago Running calico-node 9 72f7fa54b6990 calico-node-dfxlr
c6f9f1a795861 dd1ce37f1c317 7 minutes ago Running calico-typha 7 581f155e0f290 calico-typha-5f5677c9dd-dfjpp
root@ubuntu:~# kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-5dd5756b68-86bc7 1/1 Running 6 (22h ago) 4d1h
coredns-5dd5756b68-xmwjs 1/1 Running 6 (22h ago) 4d1h
etcd-ubuntu 1/1 Running 0 9m20s
kube-apiserver-ubuntu 1/1 Running 0 9m20sroot@ubuntu:~# kubectl -n calico-system get pod
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-85579cbc49-p8s88 1/1 Running 6 (22h ago) 4d1h
calico-node-dfxlr 1/1 Running 9 (10m ago) 4d
calico-typha-5f5677c9dd-dfjpp 1/1 Running 7 (22h ago) 4d1h
csi-node-driver-gh5pm 2/2 Running 12 (22h ago) 4d1h