k8s单机容器网络(20250216)
Linux 容器能瞥见的“网络栈”,现实上是被隔离在它自己的 Network Namespace 当中的。
“网络栈”,就包罗了:网卡(Network Interface)、回环设备(Loopback Device)、路由表(Routing Table)和 iptables 规则。
Veth Pair 设备
Veth Pair 设备的特点是:它被创建出来后,总是以两张虚拟网卡(Veth Peer)的形式成对出现的。而且,从其中一个“网卡”发出的数据包,可以直接出现在与它对应的另一张“网卡”上,哪怕这两个“网卡”在差别的 Network Namespace 里。- Microsoft Windows [版本 10.0.26100.2894]
- (c) Microsoft Corporation。保留所有权利。
- C:\Users\admin>ssh root@192.168.117.207
- root@192.168.117.207's password:
- Last login: Mon Feb 17 08:19:32 2025
- [root@k8s-master ~]# docker ps
- CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
- 1ee1263b4193 cbb01a7bd410 "/coredns -conf /etc…" 1 second ago Up 1 second k8s_coredns_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_13
- 829516e501fa registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 3 seconds ago Up 2 seconds k8s_POD_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_8
- e0c8a6330d0e 9344fce2372f "/usr/local/bin/kube…" 7 seconds ago Up 6 seconds k8s_kube-proxy_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_9
- 255fea7d86a5 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 8 seconds ago Up 7 seconds k8s_POD_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_7
- 36c5922e79eb registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 9 seconds ago Up 8 seconds k8s_POD_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_7
- 1cfe981dc26a a0eed15eed44 "etcd --advertise-cl…" 23 seconds ago Up 23 seconds k8s_etcd_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_9
- 17717a8530ef 6fc5e6b7218c "kube-scheduler --au…" 23 seconds ago Up 23 seconds k8s_kube-scheduler_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_16
- e0df13dfff62 8a9000f98a52 "kube-apiserver --ad…" 24 seconds ago Up 23 seconds k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_10
- 6a21496a57a4 138fb5a3a2e3 "kube-controller-man…" 24 seconds ago Up 23 seconds k8s_kube-controller-manager_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_15
- 5631104357a5 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 25 seconds ago Up 25 seconds k8s_POD_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_7
- 562543f7a8d6 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 25 seconds ago Up 25 seconds k8s_POD_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_7
- 16dbdd75513f registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 25 seconds ago Up 25 seconds k8s_POD_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_7
- 5bfab6a1a042 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 26 seconds ago Up 25 seconds k8s_POD_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_7
- [root@k8s-master ~]# docker start nginx-1
- nginx-1
- [root@k8s-master ~]# docker ps
- CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
- 063af5a1782b 17e960f4e39c "start_runit" 12 seconds ago Up 12 seconds k8s_calico-node_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_66
- 133fda8d5c2f cbb01a7bd410 "/coredns -conf /etc…" 22 seconds ago Up 21 seconds k8s_coredns_coredns-857d9ff4c9-ntrmg_kube-system_9a07dc52-b60a-4376-add2-5a128335c9df_12
- 2cad37aaa64d 08c1b67c88ce "/usr/bin/kube-contr…" 22 seconds ago Up 22 seconds k8s_calico-kube-controllers_calico-kube-controllers-558d465845-x59c8_kube-system_1586cb4f-6051-4cf2-bcbc-7a05f93739ee_11
- 245ed185ea4a registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 27 seconds ago Up 26 seconds k8s_POD_coredns-857d9ff4c9-ntrmg_kube-system_9a07dc52-b60a-4376-add2-5a128335c9df_8
- 60a93585eea1 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 27 seconds ago Up 27 seconds k8s_POD_calico-kube-controllers-558d465845-x59c8_kube-system_1586cb4f-6051-4cf2-bcbc-7a05f93739ee_9
- 1ee1263b4193 cbb01a7bd410 "/coredns -conf /etc…" 45 seconds ago Up 45 seconds k8s_coredns_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_13
- 829516e501fa registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 47 seconds ago Up 46 seconds k8s_POD_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_8
- e0c8a6330d0e 9344fce2372f "/usr/local/bin/kube…" 51 seconds ago Up 50 seconds k8s_kube-proxy_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_9
- 255fea7d86a5 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 52 seconds ago Up 51 seconds k8s_POD_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_7
- 36c5922e79eb registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 53 seconds ago Up 52 seconds k8s_POD_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_7
- 1cfe981dc26a a0eed15eed44 "etcd --advertise-cl…" About a minute ago Up About a minute k8s_etcd_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_9
- 17717a8530ef 6fc5e6b7218c "kube-scheduler --au…" About a minute ago Up About a minute k8s_kube-scheduler_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_16
- e0df13dfff62 8a9000f98a52 "kube-apiserver --ad…" About a minute ago Up About a minute k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_10
- 6a21496a57a4 138fb5a3a2e3 "kube-controller-man…" About a minute ago Up About a minute k8s_kube-controller-manager_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_15
- 5631104357a5 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" About a minute ago Up About a minute k8s_POD_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_7
- 562543f7a8d6 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" About a minute ago Up About a minute k8s_POD_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_7
- 16dbdd75513f registry.aliyuncs.com/google_containers/pause:3.8 "/pause" About a minute ago Up About a minute k8s_POD_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_7
- 5bfab6a1a042 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" About a minute ago Up About a minute k8s_POD_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_7
- d85077c98a69 nginx "/docker-entrypoint.…" 18 hours ago Up 12 seconds 80/tcp nginx-1
复制代码- [root@k8s-master ~]# docker exec -it nginx-1 /bin/bash
- root@d85077c98a69:/# ifconfig
- eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
- inet 172.17.0.2 netmask 255.255.0.0 broadcast 172.17.255.255
- ether 02:42:ac:11:00:02 txqueuelen 0 (Ethernet)
- RX packets 14 bytes 1252 (1.2 KiB)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 0 bytes 0 (0.0 B)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
- inet 127.0.0.1 netmask 255.0.0.0
- inet6 ::1 prefixlen 128 scopeid 0x10<host>
- loop txqueuelen 1000 (Local Loopback)
- RX packets 0 bytes 0 (0.0 B)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 0 bytes 0 (0.0 B)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- root@d85077c98a69:/# route
- Kernel IP routing table
- Destination Gateway Genmask Flags Metric Ref Use Iface
- default 172.17.0.1 0.0.0.0 UG 0 0 0 eth0
- 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
- #宿主机
- root@d85077c98a69:/# exit
- exit
- [root@k8s-master ~]# ifconfig
- cali6632e2eedff: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
- inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
- ether ee:ee:ee:ee:ee:ee txqueuelen 1000 (Ethernet)
- RX packets 3 bytes 125 (125.0 B)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 8 bytes 770 (770.0 B)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- cali7b6489f2f47: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
- inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
- ether ee:ee:ee:ee:ee:ee txqueuelen 1000 (Ethernet)
- RX packets 3 bytes 125 (125.0 B)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 8 bytes 770 (770.0 B)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- calieaec58fb34e: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
- inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
- ether ee:ee:ee:ee:ee:ee txqueuelen 1000 (Ethernet)
- RX packets 3 bytes 125 (125.0 B)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 8 bytes 770 (770.0 B)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
- inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
- inet6 fe80::42:5fff:fe05:698c prefixlen 64 scopeid 0x20<link>
- ether 02:42:5f:05:69:8c txqueuelen 0 (Ethernet)
- RX packets 3 bytes 125 (125.0 B)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 8 bytes 770 (770.0 B)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
- inet 192.168.117.207 netmask 255.255.255.0 broadcast 192.168.117.255
- inet6 fe80::20c:29ff:fe96:278c prefixlen 64 scopeid 0x20<link>
- ether 00:0c:29:96:27:8c txqueuelen 1000 (Ethernet)
- RX packets 554 bytes 64561 (63.0 KiB)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 596 bytes 65850 (64.3 KiB)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
- inet 127.0.0.1 netmask 255.0.0.0
- inet6 ::1 prefixlen 128 scopeid 0x10<host>
- loop txqueuelen 1000 (Local Loopback)
- RX packets 49719 bytes 16290594 (15.5 MiB)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 49719 bytes 16290594 (15.5 MiB)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- veth6881202: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
- inet6 fe80::408f:cdff:fe98:623a prefixlen 64 scopeid 0x20<link>
- ether 42:8f:cd:98:62:3a txqueuelen 0 (Ethernet)
- RX packets 3 bytes 167 (167.0 B)
- RX errors 0 dropped 0 overruns 0 frame 0
- TX packets 18 bytes 1566 (1.5 KiB)
- TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- [root@k8s-master ~]#
- [root@k8s-master ~]# brctl show
- bridge name bridge id STP enabled interfaces
- docker0 8000.02425f05698c no veth6881202
- [root@k8s-master ~]#
复制代码 这就使得 Veth Pair 常常被用作连接差别 Network Namespace 的“网线”。
我们启动了一个叫作 nginx-1 的容器
这个容器里有一张叫作 eth0 的网卡,它正是一个 Veth Pair 设备在容器里的这一端。
通过 route 命令检察 nginx-1 容器的路由表,我们可以看到,这个 eth0 网卡是这个容器里的默认路由设备;所有对 172.17.0.0/16 网段的请求,也会被交给 eth0 来处理(第二条 172.17.0.0 路由规则)。
通过宿主机 ifconfig 命令的输出,你可以看到,nginx-1 容器对应的 Veth Pair 设备,在宿主机上是一张虚拟网卡。它的名字叫作veth6881202
而且,通过 brctl show 的输出,你可以看到这张网卡被“插”在了 docker0 上。
假如我们再在这台宿主机上启动另一个 Docker 容器,比如 nginx-2
- [root@k8s-master ~]# brctl show
- bridge name bridge id STP enabled interfaces
- docker0 8000.02425f05698c no veth6881202
- [root@k8s-master ~]# docker run -d --name nginx-2 nginx
- e3b1a33fa82952f99bdf47e1451d05d83a9686cb006798744d2e593f02cf65c8
- [root@k8s-master ~]# brctl show
- bridge name bridge id STP enabled interfaces
- docker0 8000.02425f05698c no veth40408f3
- veth6881202
- [root@k8s-master ~]#
复制代码 检察容器ip
- [root@k8s-master ~]#
- [root@k8s-master ~]# docker inspect nginx-1
- [
- {
- "Id": "d85077c98a69846efe9bf17c4b1b4efb2152ec2078f5de483edc524c674eed76",
- "Created": "2025-02-16T06:21:15.681636573Z",
- "Path": "/docker-entrypoint.sh",
- ----------
-
- "Links": null,
- "Aliases": null,
- "MacAddress": "02:42:ac:11:00:02",
- "DriverOpts": null,
- "NetworkID": "5ce1ccec1789844b6a4712acd0c8d6f0ef9fba840c00f53be667a0dd6fbae39c",
- "EndpointID": "786e7d287ca79fda20dc3895bb64b9830a99f1989538fd503e9f877e4ad574f3",
- "Gateway": "172.17.0.1",
- "IPAddress": "172.17.0.2",
- "IPPrefixLen": 16,
- "IPv6Gateway": "",
- "GlobalIPv6Address": "",
- "GlobalIPv6PrefixLen": 0,
- "DNSNames": null
- }
- }
- }
- }
- ]
复制代码 ip为"IPAddress": "172.17.0.2",
进入nginx-2来ping nginx-1(curl也行)- [root@k8s-master ~]# docker exec -it nginx-2 /bin/bash
- root@e3b1a33fa829:/# ping 172.17.0.2
- bash: ping: command not found
- root@e3b1a33fa829:/# yum -y install ping
- bash: yum: command not found
- root@e3b1a33fa829:/# apt-get install -y iputils-ping
- Reading package lists... Done
- Building dependency tree... Done
- Reading state information... Done
- E: Unable to locate package iputils-ping
- root@e3b1a33fa829:/# curl http://172.17.0.2
- <!DOCTYPE html>
- <html>
- <head>
- <title>Welcome to nginx!</title>
- </head>
- <body>
- <h1>Welcome to nginx!</h1>
- <p>If you see this page, the nginx web server is successfully installed and
- working. Further configuration is required.</p>
- <p>For online documentation and support please refer to
- <a target="_blank" href="http://nginx.org/">nginx.org</a>.<br/>
- Commercial support is available at
- <a target="_blank" href="http://nginx.com/">nginx.com</a>.</p>
- <p><em>Thank you for using nginx.</em></p>
- </body>
- </html>
- root@e3b1a33fa829:/#
复制代码
当你在 nginx-1 容器里访问 nginx-2 容器的 IP 地址(比如 ping 172.17.0.3)的时候,这个目的 IP 地址会匹配到 nginx-1 容器里的第二条路由规则。可以看到,这条路由规则的网关(Gateway)是 0.0.0.0,这就意味着这是一条直连规则,即:凡是匹配到这条规则的 IP 包,应该经过本机的 eth0 网卡,通过二层网络直接发往目的主机。
这个 eth0 网卡,是一个 Veth Pair,它的一端在这个 nginx-1 容器的 Network Namespace 里,而另一端则位于宿主机上(Host Namespace),而且被“插”在了宿主机的 docker0 网桥上。
一旦一张虚拟网卡被“插”在网桥上,它就会变成该网桥的“从设备”。从设备会被“剥夺”调用网络协议栈处理数据包的资格,从而“降级”成为网桥上的一个端口。而这个端口唯一的作用,就是接收流入的数据包,然后把这些数据包的“生杀大权”(比如转发或者丢弃),全部交给对应的网桥。
在收到这些 ARP 请求之后,docker0 网桥就会扮演二层互换机的角色,把 ARP 广播转发到其他被“插”在 docker0 上的虚拟网卡上。这样,同样连接在 docker0 上的 nginx-2 容器的网络协议栈就会收到这个 ARP 请求,从而将 172.17.0.3 所对应的 MAC 地址回复给 nginx-1 容器。
有了这个目的 MAC 地址,nginx-1 容器的 eth0 网卡就可以将数据包发出去。
被限制在 Network Namespace 里的容器进程,现实上是通过 Veth Pair 设备 + 宿主机网桥的方式,实现了跟同其他容器的数据互换。- 当一个容器试图连接到另外一个宿主机时,比如:ping 10.168.0.3,它发出的请求数据包,首先经过 docker0 网桥出现在宿主机上。然后根据宿主机的路由表里的直连路由规则(10.168.0.0/24 via eth0)),对 10.168.0.3 的访问请求就会交给宿主机的 eth0 处理。
复制代码 这个数据包就会经宿主机的 eth0 网卡转发到宿主机网络上,终极到达 10.168.0.3 对应的宿主机上。固然,这个过程的实现要求这两台宿主机本身是连通的
当你遇到容器连不通“外网”的时候,你都应该先试试 docker0 网桥能不能 ping 通,然后检察一下跟 docker0 和 Veth Pair 设备相关的 iptables 规则是不是有非常,往往就可以或许找到问题的答案了。
veth pair: 虚拟1 - docker0 - 虚拟2,每个上面都有一个地址,虚拟1,2不需要解析包,网桥docker0来解析,并做转发操作
“跨主通信”问题
假如在另外一台宿主机(比如:10.168.0.3)上,也有一个 Docker 容器。那么,我们的 nginx-1 容器又该怎样访问它呢?
在 Docker 的默认设置下,一台宿主机上的 docker0 网桥,和其他宿主机上的 docker0 网桥,没有任何关联,它们相互之间也没办法连通。以是,连接在这些网桥上的容器,自然也没办法进行通信了。
假如我们通过软件的方式,创建一个整个集群“公用”的网桥,然后把集群里的所有容器都连接到这个网桥上,不就可以相互通信了吗?
构建这种容器网络的核心在于:我们需要在已有的宿主机网络上,再通过软件构建一个覆盖在已有宿主机网络之上的、可以把所有容器连通在一起的虚拟网络。以是,这种技能就被称为:Overlay Network(覆盖网络)。
Overlay Network 本身,可以由每台宿主机上的一个“特别网桥”共同构成。比如,当 Node 1 上的 Container 1 要访问 Node 2 上的 Container 3 的时候,Node 1 上的“特别网桥”在收到数据包之后,可以或许通过某种方式,把数据包发送到精确的宿主机,比如 Node 2 上。而 Node 2 上的“特别网桥”在收到数据包后,也可以或许通过某种方式,把数据包转发给精确的容器,比如 Container 3。
甚至,每台宿主机上,都不需要有一个这种特别的网桥,而仅仅通过某种方式设置宿主机的路由表,就可以或许把数据包转发到精确的宿主机上。
这里的关键在于,容器要想跟外界进行通信,它发出的 IP 包就必须从它的 Network Namespace 里出来,来到宿主机上。而解决这个问题的方法就是:为容器创建一个一端在容器里充当默认网卡、另一端在宿主机上的 Veth Pair 设备。
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |