虚拟机搭建了一套k8s集群node01(master),node02,node03,node04,node05
重启了node05后,节点状态一直是notReady,一直无法恢复
连上机器node05 ping node01,发现Ping不通,- PING 192.168.55.101 (192.168.55.101) 56(84) bytes of data.
- From 192.168.55.101 icmp_seq=1 Destination Port Unreachable
- From 192.168.55.101 icmp_seq=2 Destination Port Unreachable
复制代码 连上机器node01 ping node05,也不通,但是from不是92.168.55.105- PING 192.168.55.105 (192.168.55.105) 56(84) bytes of data.
- From 192.168.55.101 icmp_seq=42 Destination Host Unreachable
- From 192.168.55.101 icmp_seq=43 Destination Host Unreachable
- From 192.168.55.101 icmp_seq=48 Destination Host Unreachable
复制代码 很怪,其他节点之间,还有其他节点和node01 node05都是互相能通的,在node05上运行tcpdump,然后在其他机器上ping node05- ❯ tcpdump -i ens160 icmp
- tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
- listening on ens160, link-type EN10MB (Ethernet), snapshot length 262144 bytes
- 10:06:33.916694 IP node02 > node05: ICMP echo request, id 54464, seq 1, length 64
- 10:06:33.916742 IP node05 > node02: ICMP echo reply, id 54464, seq 1, length 64
- 10:06:34.953014 IP node02 > node05: ICMP echo request, id 54464, seq 2, length 64
- 10:06:34.953036 IP node05 > node02: ICMP echo reply, id 54464, seq 2, length 64
- 10:06:35.977224 IP node02 > node05: ICMP echo request, id 54464, seq 3, length 64
- 10:06:35.977304 IP node05 > node02: ICMP echo reply, id 54464, seq 3, length 64
- 10:06:45.592363 IP node01 > node05: ICMP echo request, id 29583, seq 1, length 64
- 10:06:46.639769 IP node01 > node05: ICMP echo request, id 29583, seq 2, length 64
- 10:06:47.662847 IP node01 > node05: ICMP echo request, id 29583, seq 3, length 64
- 10:06:48.686950 IP node01 > node05: ICMP echo request, id 29583, seq 4, length 64
- 10:06:49.710882 IP node01 > node05: ICMP echo request, id 29583, seq 5, length 64
- 10:06:50.735306 IP node01 > node05: ICMP echo request, id 29583, seq 6, length 64
复制代码 node01只进不出,搞不懂为啥,然后把机器轮番重启了好几遍,好家伙,全部节点都ping不通node01了,并且tcpdump发现进也进不了。。。

网上查了查,看看ip route,感觉没问题,有问题也不是影响节点之间连接的问题- node05 ❯ ip route
- default via 192.168.55.2 dev ens160 proto static
- 192.168.55.0/24 dev ens160 proto kernel scope link src 192.168.55.105
- node05 ❯ ifconfig
- ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
- inet 192.168.55.105 netmask 255.255.255.0 broadcast 192.168.55.255
- inet6 fe80::20c:29ff:fe67:9fdf prefixlen 64 scopeid 0x20<link>
- ether 00:0c:29:67:9f:df txqueuelen 1000 (Ethernet)
- RX packets 26385 bytes 8519203 (8.5 MB)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 12675 bytes 1453410 (1.4 MB)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- device interrupt 45 memory 0x3fe00000-3fe20000
- lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
- inet 127.0.0.1 netmask 255.0.0.0
- inet6 ::1 prefixlen 128 scopeid 0x10<host>
- loop txqueuelen 1000 (Local Loopback)
- RX packets 41172 bytes 3050209 (3.0 MB)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 41172 bytes 3050209 (3.0 MB)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- node01 ❯ ip route
- default via 192.168.55.2 dev ens160 proto static
- 10.60.114.0/26 via 192.168.55.105 dev ens160 proto 80 onlink
- 10.60.140.64/26 via 192.168.55.102 dev ens160 proto 80 onlink
- 10.60.186.192/26 via 192.168.55.103 dev ens160 proto 80 onlink
- blackhole 10.60.196.128/26 proto 80
- 10.60.196.136 dev cali014ebcc433b scope link
- 10.60.196.137 dev calid77cc23546a scope link
- 10.60.196.138 dev cali2a559d3a2ae scope link
- 10.60.196.139 dev calif7f206080d4 scope link
- 10.60.196.140 dev cali3c60dd4d5fd scope link
- 10.60.196.141 dev cali54df4040da6 scope link
- 10.60.248.192/26 via 192.168.55.104 dev ens160 proto 80 onlink
- 192.168.55.0/24 dev ens160 proto kernel scope link src 192.168.55.101
复制代码 问题应该是k8s集群导致,尝试卸载node05节点,看看这个节点卸载完后能不能Ping通,发现卸完照旧不行,然后看见说ipvs(划重点)可能有些配置无法通过kubeadm reset 卸载干净- ip a
- 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
- link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
- inet 127.0.0.1/8 scope host lo
- valid_lft forever preferred_lft forever
- inet6 ::1/128 scope host noprefixroute
- valid_lft forever preferred_lft forever
- 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
- link/ether 00:0c:29:67:9f:df brd ff:ff:ff:ff:ff:ff
- altname enp2s0
- inet 192.168.55.105/24 brd 192.168.55.255 scope global ens160
- valid_lft forever preferred_lft forever
- inet6 fe80::20c:29ff:fe67:9fdf/64 scope link
- valid_lft forever preferred_lft forever
- 3: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
- link/ether 42:0b:b7:81:e7:88 brd ff:ff:ff:ff:ff:ff
- inet 10.50.13.46/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 192.168.55.101/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.0.10/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.0.1/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.107.53/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.225.23/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.54.178/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.206.109/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.44.161/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.122.102/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.3.132/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.208.168/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
- inet 10.50.129.143/32 scope global kube-ipvs0
- valid_lft forever preferred_lft forever
复制代码 是有一个,卸载- ip link delete kube-ipvs0
复制代码 卸载完,ping通了!
好!这样其他节点就不用卸载了,直接去执行ip link delete kube-ipvs0
但是发现这玩意儿又会自动建回来,查了下,似乎是kube-proxy创建的,然后我先去mater把这个模式改回默认的"",(忘了为啥改的了。。。似乎是装metallb)- kubectl edit configmap -n kube-system kube-proxy
- apiVersion: kubeproxy.config.k8s.io/v1alpha1
- kind: KubeProxyConfiguration
- mode: "ipvs"
复制代码 然后去work节点一直执行ip link delete kube-ipvs0,同时重启containered的- ip link delete kube-ipvs0
- systemctl restart containerd
- ip link delete kube-ipvs0
复制代码 好,都恢复了,再把node05加回来,问题是解决了,然后再看一下ip a- ❯ ip a
- 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
- link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
- inet 127.0.0.1/8 scope host lo
- valid_lft forever preferred_lft forever
- inet6 ::1/128 scope host noprefixroute
- valid_lft forever preferred_lft forever
- 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
- link/ether 00:0c:29:44:45:6a brd ff:ff:ff:ff:ff:ff
- altname enp2s0
- inet 192.168.55.102/24 brd 192.168.55.255 scope global ens160
- valid_lft forever preferred_lft forever
- inet6 fe80::20c:29ff:fe44:456a/64 scope link
- valid_lft forever preferred_lft forever
- 5: cali221f307b1ff@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
- link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-c6ed74b0-209f-6a1a-81d5-e623de3135b4
- inet6 fe80::ecee:eeff:feee:eeee/64 scope link
- valid_lft forever preferred_lft forever
- 6: calic11425bcea5@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
- link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-58fdc146-52f2-61dc-0023-23523e1c85f3
- inet6 fe80::ecee:eeff:feee:eeee/64 scope link
- valid_lft forever preferred_lft forever
- 8: cali59255564aa0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
- link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-2e9f7e61-8b85-3707-0ff4-ef46db0df9f7
- inet6 fe80::ecee:eeff:feee:eeee/64 scope link
- valid_lft forever preferred_lft forever
- 14: cali607db25d3f3@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
- link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-a50ecf75-fa80-61cc-81eb-dc84195de7fe
- inet6 fe80::ecee:eeff:feee:eeee/64 scope link
- valid_lft forever preferred_lft forever
复制代码 在处理其他节点过程中发现,如果有cali这些,纵然kube-ipvs0存在,也是能通的,
以是是重启的时间,先创建出了kube-ipvs0,然后无法连接master,导致后面操纵卡住了,结果一直无法恢复?
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |