【问题记载】重启虚拟机后k8s worker连不上master

打印 上一主题 下一主题

主题 1675|帖子 1675|积分 5025

虚拟机搭建了一套k8s集群node01(master),node02,node03,node04,node05
重启了node05后,节点状态一直是notReady,一直无法恢复
连上机器node05 ping node01,发现Ping不通,
  1. PING 192.168.55.101 (192.168.55.101) 56(84) bytes of data.
  2. From 192.168.55.101 icmp_seq=1 Destination Port Unreachable
  3. From 192.168.55.101 icmp_seq=2 Destination Port Unreachable
复制代码
连上机器node01 ping node05,也不通,但是from不是92.168.55.105
  1. PING 192.168.55.105 (192.168.55.105) 56(84) bytes of data.
  2. From 192.168.55.101 icmp_seq=42 Destination Host Unreachable
  3. From 192.168.55.101 icmp_seq=43 Destination Host Unreachable
  4. From 192.168.55.101 icmp_seq=48 Destination Host Unreachable
复制代码
很怪,其他节点之间,还有其他节点和node01 node05都是互相能通的,在node05上运行tcpdump,然后在其他机器上ping node05
  1. ❯ tcpdump -i ens160 icmp
  2. tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
  3. listening on ens160, link-type EN10MB (Ethernet), snapshot length 262144 bytes
  4. 10:06:33.916694 IP node02 > node05: ICMP echo request, id 54464, seq 1, length 64
  5. 10:06:33.916742 IP node05 > node02: ICMP echo reply, id 54464, seq 1, length 64
  6. 10:06:34.953014 IP node02 > node05: ICMP echo request, id 54464, seq 2, length 64
  7. 10:06:34.953036 IP node05 > node02: ICMP echo reply, id 54464, seq 2, length 64
  8. 10:06:35.977224 IP node02 > node05: ICMP echo request, id 54464, seq 3, length 64
  9. 10:06:35.977304 IP node05 > node02: ICMP echo reply, id 54464, seq 3, length 64
  10. 10:06:45.592363 IP node01 > node05: ICMP echo request, id 29583, seq 1, length 64
  11. 10:06:46.639769 IP node01 > node05: ICMP echo request, id 29583, seq 2, length 64
  12. 10:06:47.662847 IP node01 > node05: ICMP echo request, id 29583, seq 3, length 64
  13. 10:06:48.686950 IP node01 > node05: ICMP echo request, id 29583, seq 4, length 64
  14. 10:06:49.710882 IP node01 > node05: ICMP echo request, id 29583, seq 5, length 64
  15. 10:06:50.735306 IP node01 > node05: ICMP echo request, id 29583, seq 6, length 64
复制代码
node01只进不出,搞不懂为啥,然后把机器轮番重启了好几遍,好家伙,全部节点都ping不通node01了,并且tcpdump发现进也进不了。。。

网上查了查,看看ip route,感觉没问题,有问题也不是影响节点之间连接的问题
  1. node05 ❯ ip route                                                                           
  2. default via 192.168.55.2 dev ens160 proto static
  3. 192.168.55.0/24 dev ens160 proto kernel scope link src 192.168.55.105
  4. node05 ❯ ifconfig           
  5. ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
  6.         inet 192.168.55.105  netmask 255.255.255.0  broadcast 192.168.55.255
  7.         inet6 fe80::20c:29ff:fe67:9fdf  prefixlen 64  scopeid 0x20<link>
  8.         ether 00:0c:29:67:9f:df  txqueuelen 1000  (Ethernet)
  9.         RX packets 26385  bytes 8519203 (8.5 MB)
  10.         RX errors 0  dropped 0  overruns 0  frame 0
  11.         TX packets 12675  bytes 1453410 (1.4 MB)
  12.         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  13.         device interrupt 45  memory 0x3fe00000-3fe20000  
  14. lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
  15.         inet 127.0.0.1  netmask 255.0.0.0
  16.         inet6 ::1  prefixlen 128  scopeid 0x10<host>
  17.         loop  txqueuelen 1000  (Local Loopback)
  18.         RX packets 41172  bytes 3050209 (3.0 MB)
  19.         RX errors 0  dropped 0  overruns 0  frame 0
  20.         TX packets 41172  bytes 3050209 (3.0 MB)
  21.         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  22. node01 ❯ ip route
  23. default via 192.168.55.2 dev ens160 proto static
  24. 10.60.114.0/26 via 192.168.55.105 dev ens160 proto 80 onlink
  25. 10.60.140.64/26 via 192.168.55.102 dev ens160 proto 80 onlink
  26. 10.60.186.192/26 via 192.168.55.103 dev ens160 proto 80 onlink
  27. blackhole 10.60.196.128/26 proto 80
  28. 10.60.196.136 dev cali014ebcc433b scope link
  29. 10.60.196.137 dev calid77cc23546a scope link
  30. 10.60.196.138 dev cali2a559d3a2ae scope link
  31. 10.60.196.139 dev calif7f206080d4 scope link
  32. 10.60.196.140 dev cali3c60dd4d5fd scope link
  33. 10.60.196.141 dev cali54df4040da6 scope link
  34. 10.60.248.192/26 via 192.168.55.104 dev ens160 proto 80 onlink
  35. 192.168.55.0/24 dev ens160 proto kernel scope link src 192.168.55.101
复制代码
问题应该是k8s集群导致,尝试卸载node05节点,看看这个节点卸载完后能不能Ping通,发现卸完照旧不行,然后看见说ipvs(划重点)可能有些配置无法通过kubeadm reset 卸载干净
  1. ip a
  2. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
  3.     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  4.     inet 127.0.0.1/8 scope host lo
  5.        valid_lft forever preferred_lft forever
  6.     inet6 ::1/128 scope host noprefixroute
  7.        valid_lft forever preferred_lft forever
  8. 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
  9.     link/ether 00:0c:29:67:9f:df brd ff:ff:ff:ff:ff:ff
  10.     altname enp2s0
  11.     inet 192.168.55.105/24 brd 192.168.55.255 scope global ens160
  12.        valid_lft forever preferred_lft forever
  13.     inet6 fe80::20c:29ff:fe67:9fdf/64 scope link
  14.        valid_lft forever preferred_lft forever
  15. 3: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
  16.     link/ether 42:0b:b7:81:e7:88 brd ff:ff:ff:ff:ff:ff
  17.     inet 10.50.13.46/32 scope global kube-ipvs0
  18.        valid_lft forever preferred_lft forever
  19.     inet 192.168.55.101/32 scope global kube-ipvs0
  20.        valid_lft forever preferred_lft forever
  21.     inet 10.50.0.10/32 scope global kube-ipvs0
  22.        valid_lft forever preferred_lft forever
  23.     inet 10.50.0.1/32 scope global kube-ipvs0
  24.        valid_lft forever preferred_lft forever
  25.     inet 10.50.107.53/32 scope global kube-ipvs0
  26.        valid_lft forever preferred_lft forever
  27.     inet 10.50.225.23/32 scope global kube-ipvs0
  28.        valid_lft forever preferred_lft forever
  29.     inet 10.50.54.178/32 scope global kube-ipvs0
  30.        valid_lft forever preferred_lft forever
  31.     inet 10.50.206.109/32 scope global kube-ipvs0
  32.        valid_lft forever preferred_lft forever
  33.     inet 10.50.44.161/32 scope global kube-ipvs0
  34.        valid_lft forever preferred_lft forever
  35.     inet 10.50.122.102/32 scope global kube-ipvs0
  36.        valid_lft forever preferred_lft forever
  37.     inet 10.50.3.132/32 scope global kube-ipvs0
  38.        valid_lft forever preferred_lft forever
  39.     inet 10.50.208.168/32 scope global kube-ipvs0
  40.        valid_lft forever preferred_lft forever
  41.     inet 10.50.129.143/32 scope global kube-ipvs0
  42.        valid_lft forever preferred_lft forever
复制代码
是有一个,卸载
  1. ip link delete  kube-ipvs0
复制代码
卸载完,ping通了!

好!这样其他节点就不用卸载了,直接去执行ip link delete  kube-ipvs0
但是发现这玩意儿又会自动建回来,查了下,似乎是kube-proxy创建的,然后我先去mater把这个模式改回默认的"",(忘了为啥改的了。。。似乎是装metallb)
  1. kubectl edit configmap -n kube-system kube-proxy
  2. apiVersion: kubeproxy.config.k8s.io/v1alpha1
  3. kind: KubeProxyConfiguration
  4. mode: "ipvs"
复制代码
然后去work节点一直执行ip link delete  kube-ipvs0,同时重启containered的
  1. ip link delete  kube-ipvs0
  2. systemctl restart containerd
  3. ip link delete  kube-ipvs0
复制代码
好,都恢复了,再把node05加回来,问题是解决了,然后再看一下ip a
  1. ❯ ip a
  2. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
  3.     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  4.     inet 127.0.0.1/8 scope host lo
  5.        valid_lft forever preferred_lft forever
  6.     inet6 ::1/128 scope host noprefixroute
  7.        valid_lft forever preferred_lft forever
  8. 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
  9.     link/ether 00:0c:29:44:45:6a brd ff:ff:ff:ff:ff:ff
  10.     altname enp2s0
  11.     inet 192.168.55.102/24 brd 192.168.55.255 scope global ens160
  12.        valid_lft forever preferred_lft forever
  13.     inet6 fe80::20c:29ff:fe44:456a/64 scope link
  14.        valid_lft forever preferred_lft forever
  15. 5: cali221f307b1ff@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
  16.     link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-c6ed74b0-209f-6a1a-81d5-e623de3135b4
  17.     inet6 fe80::ecee:eeff:feee:eeee/64 scope link
  18.        valid_lft forever preferred_lft forever
  19. 6: calic11425bcea5@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
  20.     link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-58fdc146-52f2-61dc-0023-23523e1c85f3
  21.     inet6 fe80::ecee:eeff:feee:eeee/64 scope link
  22.        valid_lft forever preferred_lft forever
  23. 8: cali59255564aa0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
  24.     link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-2e9f7e61-8b85-3707-0ff4-ef46db0df9f7
  25.     inet6 fe80::ecee:eeff:feee:eeee/64 scope link
  26.        valid_lft forever preferred_lft forever
  27. 14: cali607db25d3f3@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
  28.     link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-a50ecf75-fa80-61cc-81eb-dc84195de7fe
  29.     inet6 fe80::ecee:eeff:feee:eeee/64 scope link
  30.        valid_lft forever preferred_lft forever
复制代码
在处理其他节点过程中发现,如果有cali这些,纵然kube-ipvs0存在,也是能通的,
以是是重启的时间,先创建出了kube-ipvs0,然后无法连接master,导致后面操纵卡住了,结果一直无法恢复?

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

瑞星

论坛元老
这个人很懒什么都没写!
快速回复 返回顶部 返回列表