前言
《捏造局域网(VLAN)》一文中形貌了捏造网卡、捏造网桥的作用,以及通过iptables实现了vlan联网,实在学习到这里自然就会联想到目前主流的容器技术:Docker,因此接下来计划研究一下Docker的桥接网络与此有何异同。
推测
众所周知,Docker有host、bridge、none三种网络模式,这里我们仅分析桥接(bridge)模式。有了上一篇文章的基础,bridge这个概念我们应该已经认识了,bridge网桥是一种基于mac地址在数据链路层举行数据交换的一个捏造交换机。
所以我们现在可以大胆的举行推测:Docker也是基于此模式实现了内部网络通讯。
- 推测一:Docker引擎在创建容器的时间会自动为容器创建一对捏造网卡(veth)并为其分配私有ip,然后将veth一端毗连在docker0网桥中,另一端毗连在容器的内部网络中
- 推测二:Docker同样利用iptables的nat能力将容器内流量转发至互联网实现通讯。
求证
检查主机网卡列表
检查docker容器及网卡列表,观察是否存在docker网桥以及veth。
shell
- # 查看本机正在运行的coekr容器(mysql、redis、halo、debian)
- [root@VM-8-10-centos ~]# docker ps
- CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
- 56ffaf39316a debian "bash" 23 hours ago Up 7 minutes debian
- c8a273ce122e halohub/halo:1.5.3 "/bin/sh -c 'java -X…" 5 months ago Up 47 hours 0.0.0.0:8090->8090/tcp, :::8090->8090/tcp halo
- d09fcfa7de0f redis "docker-entrypoint.s…" 12 months ago Up 5 weeks 0.0.0.0:8805->6379/tcp, :::8805->6379/tcp redis
- 87a2192f6db4 mysql:5.7 "docker-entrypoint.s…" 2 years ago Up 5 weeks 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp mysql
- # 检查主机网卡列表(确认docker0、veth存在)
- [root@VM-12-15-centos ~]# ip link
- 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
- link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
- 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
- link/ether 52:54:00:b3:6f:20 brd ff:ff:ff:ff:ff:ff
- altname enp0s5
- altname ens5
- 3: br-67cf5bfe7a5c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
- link/ether 02:42:c5:07:22:c7 brd ff:ff:ff:ff:ff:ff
- 4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
- link/ether 02:42:38:d6:1b:ea brd ff:ff:ff:ff:ff:ff
- 5: br-9fd151a807e7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
- link/ether 02:42:35:7f:ed:76 brd ff:ff:ff:ff:ff:ff
- 315: vethf2afb37@if314: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether 3a:06:f0:8d:06:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 12
- 317: veth1ec30f9@if316: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-9fd151a807e7 state UP mode DEFAULT group default
- link/ether 4a:ad:1a:b0:5a:5f brd ff:ff:ff:ff:ff:ff link-netnsid 0
- 319: vethc408286@if318: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether 26:b0:3c:f4:c5:5b brd ff:ff:ff:ff:ff:ff link-netnsid 1
- 321: veth68fb8c6@if320: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether 96:ca:a9:42:f8:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 9
- 323: veth6dba394@if322: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether 92:1c:5e:9c:a2:b3 brd ff:ff:ff:ff:ff:ff link-netnsid 4
- 325: veth1509ed0@if324: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether fa:22:33:da:12:e0 brd ff:ff:ff:ff:ff:ff link-netnsid 11
- 329: vethef1dbac@if328: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether aa:db:d2:10:36:60 brd ff:ff:ff:ff:ff:ff link-netnsid 3
- 331: veth69d3e7d@if330: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether 86:45:d0:0e:6b:a7 brd ff:ff:ff:ff:ff:ff link-netnsid 5
- 335: veth98588ae@if334: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether 86:59:55:39:17:ad brd ff:ff:ff:ff:ff:ff link-netnsid 7
- 349: vetha84d717@if348: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-67cf5bfe7a5c state UP mode DEFAULT group default
- link/ether ee:7f:d2:27:15:83 brd ff:ff:ff:ff:ff:ff link-netnsid 6
- 354: veth1@if355: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-mybridge state UP mode DEFAULT group default qlen 1000
- link/ether 72:c8:9e:24:a6:a3 brd ff:ff:ff:ff:ff:ff link-netns n1
- 356: br-mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
- link/ether 72:c8:9e:24:a6:a3 brd ff:ff:ff:ff:ff:ff
复制代码 利用ip link查看本机网卡列表,可以发现宿主机存在一个名为docker0的捏造网桥,且捏造网桥下有四对捏造网卡分别对应 debian、halo、redis、mysql四个docker容器。
检查网桥ip及Docker内部容器的网络通讯
shell
- # docker0默认网桥的IP地址为172.17.0.1/16
- [root@VM-8-10-centos ~]# ip addr show docker0
- 3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
- link/ether 02:42:6f:d7:19:7e brd ff:ff:ff:ff:ff:ff
- inet 172.17.0.1/16 scope global docker0
- valid_lft forever preferred_lft forever
- inet6 fe80::42:6fff:fed7:197e/64 scope link
- valid_lft forever preferred_lft forever
- # 检查桥接网络内部容器的ip地址(分别为172.17.0.2/16、172.17.0.3/16、172.17.0.4/16、172.17.0.5/16)
- [root@VM-8-10-centos ~]# docker network inspect bridge
- [
- {
- "Name": "bridge",
- "Id": "2dc75e446719be8cad37e1ea9ae7d1385fcc728b8177646a3c62929c2b289e94",
- "Created": "2024-04-24T09:46:14.399901891+08:00",
- "Scope": "local",
- "Driver": "bridge",
- "EnableIPv6": false,
- "IPAM": {
- "Driver": "default",
- "Options": null,
- "Config": [
- {
- "Subnet": "172.17.0.0/16",
- "Gateway": "172.17.0.1"
- }
- ]
- },
- "Internal": false,
- "Attachable": false,
- "Ingress": false,
- "ConfigFrom": {
- "Network": ""
- },
- "ConfigOnly": false,
- "Containers": {
- "56ffaf39316ac9f776c6b3e2a8a79e9f42dfab42aa1f7de7525bd26c686defaa": {
- "Name": "debian",
- "EndpointID": "47dd9441d4a4c8b09afea3bca23652b80ba35e6baa13d44ec21ec89522e722a6",
- "MacAddress": "02:42:ac:11:00:05",
- "IPv4Address": "172.17.0.5/16",
- "IPv6Address": ""
- },
- "87a2192f6db48c9bf2996bf25c79d4c18c3ae2975cac9d55e7fdfdcec03f896b": {
- "Name": "mysql",
- "EndpointID": "00b93de23c5abf2ed1349bac1c2ec93bf7ed516370dabf23348b980f19cfaa9c",
- "MacAddress": "02:42:ac:11:00:02",
- "IPv4Address": "172.17.0.2/16",
- "IPv6Address": ""
- },
- "c8a273ce122ef5479583908f40898141a90933a3c41c8028dc7966b9af4c465d": {
- "Name": "halo",
- "EndpointID": "ba8ef83c80f3edb6e7987c95ae6d56816a1fc00d07e8bb2bfbb0f19ef543badf",
- "MacAddress": "02:42:ac:11:00:04",
- "IPv4Address": "172.17.0.4/16",
- "IPv6Address": ""
- },
- "d09fcfa7de0f2a7b3ef7927a7e53a8a53fb93021b119b1376fe4616381c5a57c": {
- "Name": "redis",
- "EndpointID": "afbc9128f7d27becfbf64e843a92d36ce23800cd42c131e550abea7afb6a131e",
- "MacAddress": "02:42:ac:11:00:03",
- "IPv4Address": "172.17.0.3/16",
- "IPv6Address": ""
- }
- },
- "Options": {
- "com.docker.network.bridge.default_bridge": "true",
- "com.docker.network.bridge.enable_icc": "true",
- "com.docker.network.bridge.enable_ip_masquerade": "true",
- "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
- "com.docker.network.bridge.name": "docker0",
- "com.docker.network.driver.mtu": "1500"
- },
- "Labels": {}
- }
- ]
- # 进入debian容器测试内部网络通信和互联网通信
- [root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
- root@56ffaf39316a:/# ping 172.17.0.1
- PING 172.17.0.1 (172.17.0.1) 56(84) bytes of data.
- 64 bytes from 172.17.0.1: icmp_seq=1 ttl=64 time=0.071 ms
- 64 bytes from 172.17.0.1: icmp_seq=2 ttl=64 time=0.036 ms
- --- 172.17.0.1 ping statistics ---
- 2 packets transmitted, 2 received, 0% packet loss, time 999ms
- rtt min/avg/max/mdev = 0.036/0.053/0.071/0.017 ms
- root@56ffaf39316a:/# ping 172.17.0.3
- PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
- 64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.067 ms
- 64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.047 ms
- --- 172.17.0.3 ping statistics ---
- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
- rtt min/avg/max/mdev = 0.047/0.057/0.067/0.010 ms
- root@56ffaf39316a:/# ping baidu.com
- PING baidu.com (39.156.66.10) 56(84) bytes of data.
- 64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=1 ttl=247 time=59.0 ms
- 64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=2 ttl=247 time=55.4 ms
- --- baidu.com ping statistics ---
- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms
- rtt min/avg/max/mdev = 55.400/57.221/59.043/1.821 ms
复制代码 小结
通过shell的结果分析:docker0网桥的ip为172.17.0.1/16,docker0各子网通讯正常,并且通过ping baidu.com检查了互联网通讯也正常。因此可以得出docker桥接模式与前一章中vlan模式是一致的,都是通过一个捏造网桥实现了内部网络的通讯。
Docker容器与互联网举行通讯
在上一章节中不小心留了个坑,因为firewalld在iptables中内置了许多的规则,所以对于流量的分析很不友爱,所以我索性直接关闭了firewalld,但是紧接着就发现如许做有一个副作用:firewalld关闭后,iptables也会被清空。当时不觉得有什么影响,现在细致回想了一下vlan之所以能够毗连互联网,很大一部分缘故起因是利用了iptables的nat功能,iptables被清空,意味着nat功能被关闭了,所以利用此功能的应用会失去网络毗连。下面利用shell命令来模拟并分析此征象。
shell
- # 关闭firewalld
- [root@VM-8-10-centos ~]# systemctl stop firewalld
- # 检查iptables
- [root@VM-8-10-centos ~]# iptables -nvL
- Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
- pkts bytes target prot opt in out source destination
- Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
- pkts bytes target prot opt in out source destination
- Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
- pkts bytes target prot opt in out source destination
- [root@VM-8-10-centos ~]# iptables -t nat -nvL
- Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
- pkts bytes target prot opt in out source destination
- Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
- pkts bytes target prot opt in out source destination
- Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
- pkts bytes target prot opt in out source destination
- Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
- pkts bytes target prot opt in out source destination
- # 检查debian容器互联网连接情况
- [root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
- root@56ffaf39316a:/# ping baidu.com
- PING baidu.com (110.242.68.66) 56(84) bytes of data.
- --- baidu.com ping statistics ---
- 5 packets transmitted, 0 received, 100% packet loss, time 4000ms
- # 检查内部网络连接情况
- root@56ffaf39316a:/# ping 172.17.0.1
- PING 172.17.0.1 (172.17.0.1) 56(84) bytes of data.
- 64 bytes from 172.17.0.1: icmp_seq=1 ttl=64 time=0.041 ms
- 64 bytes from 172.17.0.1: icmp_seq=2 ttl=64 time=0.046 ms
- --- 172.17.0.1 ping statistics ---
- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
- rtt min/avg/max/mdev = 0.041/0.043/0.046/0.002 ms
- root@56ffaf39316a:/# ping 172.17.0.2
- PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
- 64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.081 ms
- 64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.055 ms
- --- 172.17.0.2 ping statistics ---
- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms
- rtt min/avg/max/mdev = 0.055/0.068/0.081/0.013 ms
复制代码 通过清空iptables发现docker容器内部确实丢失了互联网毗连,但是没有影响内部网络的通讯。
手动添加nat记录规复Docker容器与互联网的通讯
- # 添加snat记录
- [root@VM-8-10-centos ~]# iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
- # 检查debian容器互联网连接情况
- [root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
- root@56ffaf39316a:/# ping baidu.com
- PING baidu.com (39.156.66.10) 56(84) bytes of data.
- 64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=1 ttl=247 time=55.8 ms
- 64 bytes from 39.156.66.10 (39.156.66.10): icmp_seq=2 ttl=247 time=55.4 ms
- --- baidu.com ping statistics ---
- 2 packets transmitted, 2 received, 0% packet loss, time 1002ms
- rtt min/avg/max/mdev = 55.386/55.610/55.834/0.224 ms
复制代码 个人总结: docker容器与互联网举行通讯时确实依赖iptables,且举动上与vlan几乎一致,因此我认为Docker实在是vlan+iptables一种高级应用。
思考
docker容器内的网络通讯是否也基于二层协议举行数据交换?
基于之前对vlan的了解,明白了bridge是一种工作在"数据链路层",根据mac地址交换数据帧的捏造交换机,既然工作在二层,那么意味着它在举行数据交换时是没有ip概念的,仅仅是按照mac地址转发数据帧。既然如此,那么即使删除了它的ip地址和路由表,应该也可以完成数据交换。
- [root@VM-8-10-centos ~]# ip addr del 172.17.0.1/16 dev docker0
- [root@VM-8-10-centos ~]# ip addr show docker0
- 3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
- link/ether 02:42:6f:d7:19:7e brd ff:ff:ff:ff:ff:ff
- inet6 fe80::42:6fff:fed7:197e/64 scope link
- valid_lft forever preferred_lft forever
- [root@VM-8-10-centos ~]# docker exec -it debian /bin/bash
- root@56ffaf39316a:/# ping 172.17.0.3
- PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
- 64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.066 ms
- 64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.049 ms
- --- 172.17.0.3 ping statistics ---
- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
- rtt min/avg/max/mdev = 0.049/0.057/0.066/0.008 ms
复制代码 iptables与路由表有何联系和区别?谁决定了流量的出口网卡?
学习vlan的时间就存在一个疑惑:**捏造网桥举行互联网通讯时,将流入网桥的流量转发到出口网卡是由谁决定的?**当时做vlan的nat通讯时,因为需要在iptables中配置FORWARD及NAT规则,自然而然的会认为是iptables实现的。如此的话,那么路由表存在的意义又是什么呢?**所以到底是iptables实现了流量转发,还是路由表(ip route)实现了流量转发?**或者详细点讲:是谁将流量从docker0网卡转发到eth0网卡?
详细过程需要深入分析iptables的工作原理,这里就不再赘述了,直接给出个人结论仅供参考。
个人结论:路由表不对流量做任何更改,仅仅用来确定数据包的出口网卡,iptables可以对ip数据包举行过滤、修改、转发,但最终还是由路由表确定出口网卡。
即使没有snat,数据包是不是应该也可以到达对方网络?
在互联网中基于ip协议举行通讯的流量都会被标注源地址和目的地址,目的地址决定了流量应该如何发送给对方主机,源地址决定了其他主机如何区分数据包是由谁发送的。而SNAT的焦点概念是通过转换源地址的方式举行工作的,这是否意味着即使不配置snat,数据包依然可以到达对方网络,只是对方网络无法回复。
- # 假设我有两台具有公网ipv4地址的云服务器xxx.xxx.xxx.xx1和xxx.xxx.xxx.xx2。xx1局域网内有另一台主机x10
- # xx1主机
- # 使用snat将源ip由xx1转换为xx2
- [root@VM-8-10-centos ~]# iptables -t nat -A POSTROUTING -s xx1 -o eth0 -j SNAT --to-source xxx.xxx.xxx.x10
- # 监听eth0网卡的icmp数据包
- [root@VM-8-10-centos ~]# tcpdump -i eth0 -p icmp -nv | grep x10
- # xx2主机
- # 监听eth0网卡的icmp数据包
- [root@VM-8-10-centos ~]# tcpdump -i eth0 -p icmp -nv
复制代码 根据tcpdump抓包分析xx1确实发送了源地址为x10的数据包,但是从xx2主机的监听结果看并没有收到来自xx1或来自x10发送的数据包。或许是数据包在中途路由的过程中被抛弃了,又或者是我明白错了??
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |