二层、三层网络基本原理
文章目录
- 二层网络
- 整体拓扑
- 相关配置
- 配置namespace
- 创建switch
- 创建veth设备
- 配置veth的IP
- 启动veth
- 测试
- 三层网络
- 配置
- vm1配置
- vm2配置
- 测试
二层网络
我们用Linux bridge模拟现实中的switch,用namespace模拟连接在交换机上的pc
整体拓扑
+------------------+ +------------------+ +------------------+
| | | | | |
| | | | | |
| | | | | |
| ns1 | | ns2 | | ns3 |
| | | | | |
| | | | | |
| | | | | |
| 192.168.1.1/24 | | 192.168.1.2/24 | | 192.168.1.3/24 |
+----(veth-ns1)----+ +----(veth-ns2)----+ +----(veth-ns3)----+
+ + +
| | |
| | |
+ + +
+--(veth-ns1-br)-------------(veth-ns2-br)------------(veth-ns3-br)--+
| |
| linux-bridge |
| |
+--------------------------------------------------------------------+
其中ns1、ns2、n3分别表示三条pc,linux-bridge为switch,veth设备可以看作网线
相关配置
配置namespace
root@i-pvirg1hu:~# ip netns add ns1
root@i-pvirg1hu:~# ip netns add ns2
root@i-pvirg1hu:~# ip netns add ns3
root@i-pvirg1hu:~# ip netns list
ns3
ns2
ns1
root@i-pvirg1hu:~#
创建switch
root@i-pvirg1hu:/etc/apt# brctl addbr virtual-bridge
root@i-pvirg1hu:/etc/apt# brctl show
bridge name bridge id STP enabled interfaces
virtual-bridge 8000.000000000000 no
创建veth设备
创建veth pair,然后将veth pair一端的虚拟网卡加入到namespace,再将另一端通过brctl addif命令加入到网桥上。这样就相当于用一条网线将三个namespace连接到了网桥上
root@i-pvirg1hu:/etc/apt# ip link add veth-ns1 type veth peer name veth-ns1-br
root@i-pvirg1hu:/etc/apt# ip link set veth-ns1 netns ns1
root@i-pvirg1hu:/etc/apt# brctl addif virtual-bridge veth-ns1-br
root@i-pvirg1hu:/etc/apt# ip link add veth-ns2 type veth peer name veth-ns2-br
root@i-pvirg1hu:/etc/apt# ip link set veth-ns2 netns ns2
root@i-pvirg1hu:/etc/apt# brctl addif virtual-bridge veth-ns2-br
root@i-pvirg1hu:/etc/apt#
root@i-pvirg1hu:/etc/apt# ip link add veth-ns3 type veth peer name veth-ns3-br
root@i-pvirg1hu:/etc/apt# ip link set veth-ns3 netns ns3
root@i-pvirg1hu:/etc/apt# brctl addif virtual-bridge veth-ns3-br
root@i-pvirg1hu:/etc/apt# ip -n ns1 a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: veth-ns1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:b8:cd:5d:e6:7f brd ff:ff:ff:ff:ff:ff link-netnsid 0
root@i-pvirg1hu:/etc/apt# ip -n ns3 a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
9: veth-ns3@if8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ca:f2:a3:de:a3:d5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
root@i-pvirg1hu:/etc/apt# ip -n ns2 a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
7: veth-ns2@if6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 5e:9b:f6:00:fc:df brd ff:ff:ff:ff:ff:ff link-netnsid 0
root@i-pvirg1hu:/etc/apt# brctl show
bridge name bridge id STP enabled interfaces
virtual-bridge 8000.1641be237cac no veth-ns1-br
veth-ns2-br
veth-ns3
配置veth的IP
为三个namespace中的虚拟网卡设置IP地址,这些IP地址位于同一个子网192.168.1.0/24中
root@i-pvirg1hu:/etc/apt# ip -n ns1 addr add local 192.168.1.1/24 dev veth-ns1
root@i-pvirg1hu:/etc/apt# ip -n ns2 addr add local 192.168.1.2/24 dev veth-ns2
root@i-pvirg1hu:/etc/apt# ip -n ns3 addr add local 192.168.1.3/24 dev veth-ns3
root@i-pvirg1hu:/etc/apt# ip -n ns1 a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: veth-ns1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 8a:b8:cd:5d:e6:7f brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.1.1/24 scope global veth-ns1
valid_lft forever preferred_lft forever
root@i-pvirg1hu:/etc/apt#
root@i-pvirg1hu:/etc/apt#
root@i-pvirg1hu:/etc/apt# ip -n ns2 a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
7: veth-ns2@if6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 5e:9b:f6:00:fc:df brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.1.2/24 scope global veth-ns2
valid_lft forever preferred_lft forever
root@i-pvirg1hu:/etc/apt#
root@i-pvirg1hu:/etc/apt#
root@i-pvirg1hu:/etc/apt# ip -n ns3 a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
9: veth-ns3@if8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ca:f2:a3:de:a3:d5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.1.3/24 scope global veth-ns3
valid_lft forever preferred_lft forever
启动veth
root@i-pvirg1hu:/etc/apt# ip link set virtual-bridge up
root@i-pvirg1hu:/etc/apt# ip link set veth-ns1-br up
root@i-pvirg1hu:/etc/apt# ip link set veth-ns2-br up
root@i-pvirg1hu:/etc/apt# ip link set veth-ns3-br up
root@i-pvirg1hu:/etc/apt# ip -n ns1 link set veth-ns1 up
root@i-pvirg1hu:/etc/apt# ip -n ns2 link set veth-ns2 up
root@i-pvirg1hu:/etc/apt# ip -n ns3 link set veth-ns3 up
测试
[root@i-pvirg1hu ~]# ip netns exec ns1 ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.083 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.050 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.048 ms
64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=0.058 ms
64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=0.055 ms
下面仔细看一下网络报文是怎么转发的:同一个网段通过二层进行通信,也就是使用mac地址互相访问。但是每个ns并不知道其他ns的mac地址,应用程序还是用的ip进行通信。所以建立网络连接的第一步是解析mac地址。下面通过抓包来看看这个过程:
先到ns1~ns3 删除之前已有的arp缓存,下面以ns1为例,ns2,ns3也要做类似操作,就不具体举例了。
[root@i-pvirg1hu ~]# ip netns exec ns1 ip neigh show dev veth-ns1
192.168.1.2 lladdr 7a:49:04:82:5c:65 STALE
[root@i-pvirg1hu ~]# ip netns exec ns1 ip neigh del 192.168.1.2 dev veth-ns1
[root@i-pvirg1hu ~]# ip netns exec ns1 ip neigh show dev veth-ns1
[root@i-pvirg1hu ~]#
然后到ns2和ns3启动"tcpdump -i {device} -nel",ns1 ping ns3 , 查看具体传输的报文:
ns1:
[root@i-pvirg1hu ~]# ip netns exec ns1 ping -c 1 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.087 ms
--- 192.168.1.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.087/0.087/0.087/0.000 ms
抓包结果如下
ns2:
[root@i-pvirg1hu ~]# tcpdump -i veth-ns2 -nel
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth-ns2, link-type EN10MB (Ethernet), capture size 262144 bytes
15:42:49.908862 b2:58:ab:9c:8b:03 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::b058:abff:fe9c:8b03 > ff02::2: ICMP6, router solicitation, length 16
15:43:03.240818 42:17:f1:4d:8a:0d > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.3 tell 192.168.1.1, length 28
ns3:
[root@i-pvirg1hu ~]# tcpdump -i veth-ns3 -nel
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth-ns3, link-type EN10MB (Ethernet), capture size 262144 bytes
15:42:49.908815 b2:58:ab:9c:8b:03 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::b058:abff:fe9c:8b03 > ff02::2: ICMP6, router solicitation, length 16
15:43:03.240802 42:17:f1:4d:8a:0d > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.3 tell 192.168.1.1, length 28
15:43:03.240815 b2:58:ab:9c:8b:03 > 42:17:f1:4d:8a:0d, ethertype ARP (0x0806), length 42: Reply 192.168.1.3 is-at b2:58:ab:9c:8b:03, length 28
15:43:03.240830 42:17:f1:4d:8a:0d > b2:58:ab:9c:8b:03, ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.1.3: ICMP echo request, id 7675, seq 1, length 64
15:43:03.240840 b2:58:ab:9c:8b:03 > 42:17:f1:4d:8a:0d, ethertype IPv4 (0x0800), length 98: 192.168.1.3 > 192.168.1.1: ICMP echo reply, id 7675, seq 1, length 64
15:43:08.340788 b2:58:ab:9c:8b:03 > 42:17:f1:4d:8a:0d, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.3, length 28
15:43:08.340837 42:17:f1:4d:8a:0d > b2:58:ab:9c:8b:03, ethertype ARP (0x0806), length 42: Reply 192.168.1.1 is-at 42:17:f1:4d:8a:0d, length 28
从上面可以看到,初始情况,ns1要通过arp广播(地址是ff:ff:ff:ff:ff:ff)解析192.168.1.3对应的mac地址,相当于到一个微信群里面@所有人 谁的ip是192.168.1.3。所以ns2和ns3都收到了同样的广播报文,但是只有ns3通过单播(相当于微信的私聊)做了回复,目标mac是ns1的mac地址。
下面看看这个过程中,交换机做了什么。到switch上可以查看交换机的fdb表(在物理交换机叫mac-address table)
[root@i-pvirg1hu ~]# bridge fdb show br virtual-bridge
33:33:00:00:00:01 dev dev virtual-bridge self permanent
01:00:5e:00:00:01 dev dev virtual-bridge self permanent
33:33:ff:ba:35:ad dev dev virtual-bridge self permanent
42:17:f1:4d:8a:0d dev dev veth-ns1-br master virtual-bridge
5e:09:23:ba:35:ad dev dev veth-ns1-br vlan 1 master virtual-bridge permanent
5e:09:23:ba:35:ad dev dev veth-ns1-br master virtual-bridge permanent
33:33:00:00:00:01 dev dev veth-ns1-br self permanent
01:00:5e:00:00:01 dev dev veth-ns1-br self permanent
33:33:ff:ba:35:ad dev dev veth-ns1-br self permanent
7a:49:04:82:5c:65 dev dev veth-ns2-br master virtual-bridge
a2:b4:56:53:f6:f2 dev dev veth-ns2-br vlan 1 master virtual-bridge permanent
a2:b4:56:53:f6:f2 dev dev veth-ns2-br master virtual-bridge permanent
33:33:00:00:00:01 dev dev veth-ns2-br self permanent
01:00:5e:00:00:01 dev dev veth-ns2-br self permanent
33:33:ff:53:f6:f2 dev dev veth-ns2-br self permanent
b2:58:ab:9c:8b:03 dev dev veth-ns3-br master virtual-bridge
ea:71:03:73:3c:6e dev dev veth-ns3-br vlan 1 master virtual-bridge permanent
ea:71:03:73:3c:6e dev dev veth-ns3-br master virtual-bridge permanent
33:33:00:00:00:01 dev dev veth-ns3-br self permanent
01:00:5e:00:00:01 dev dev veth-ns3-br self permanent
33:33:ff:73:3c:6e dev dev veth-ns3-br self permanent
包含 permanent 的表示bridge和端口的物理地址
# 其中 227/227 表示最近一次的”使用时间/更新时间“,单位是秒
[root@i-pvirg1hu ~]# bridge -statistics fdb show br virtual-bridge | grep -v perman
7a:49:04:82:5c:65 dev dev veth-ns2-br used 227/227 master virtual-bridge
它表示目标是7a:49:04:82:5c:65的报文发到 veth-ns2-br端口,也就是mac和端口的映射关系。初始状态的交换机这个表项是空白的,通过arp泛洪学习的机制来创建:
- 在端口收到报文时,记录源mac地址和当前时间到fdb表
- 如果报文的目标mac能在fdb表中能查到,则转发给对应端口
- 如果报文的目标mac是广播地址,未知单播或者组播, 又叫BUM(broadcast unknown-unicast multicast),发到所有端口
- fdb表记录的时间超过老化时间后(通常是5分钟),自动删除记录
上面例子的 used 227/227 表示这个mac地址上次学习到的时间是72秒前。只要持续有流量转发,就会不断重置时间。
三层网络
拓扑如下
route: default gw 192.168.1.1 route: default gw 192.168.2.1
(VM1) (VM2)
+------------------+ +------------------+ +------------------+ +------------------+
| | | | | | | |
| | | | | | | |
| | | | | | | |
| ns1 | | ns2 | | ns1 | | ns2 |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| 192.168.1.2/24 | | 192.168.1.3/24 | | 192.168.2.2/24 | | 192.168.2.3/24 |
+---+(veth-ns1)+---+ +---+(veth-ns2)+---+ +---+(veth-ns1)+---+ +---+(veth-ns2)+---+
+ + + +
| | | |
| | | |
+ + + +
+-+(veth-ns1-br)+-----------+(veth-ns2-br)+-+ +-+(veth-ns1-br)+-----------+(veth-ns2-br)+-+
| | | |
| Linux bridge | | Linux bridge |
| | | |
+-----------------(br0)---------------------+ +-----------------(br0)---------------------+
| |
| |
| |
+-----------------(br0)---------------------+ +-----------------(br0)---------------------+
| 192.168.1.1/24 | | 192.168.2.1/24 |
| default network namespace | | default network namespace |
| (Linux Kernel IP Forwarding) | | (Linux Kernel IP Forwarding) |
| | | |
| 172.16.0.3 | | 172.16.0.2 |
+-----------------(eth0)--------------------+ +-----------------(eth0)--------------------+
+ +
| route: 192.168.2.0/24 via 172.16.0.2 | route: 192.168.1.0/24 via 172.16.0.3
| |
| |
| |
+--------------------------------------------------------------+
配置
vm1配置
root@i-pvirg1hu:~# ip netns add ns1
root@i-pvirg1hu:~# ip netns add ns2
root@i-pvirg1hu:~# ip link add veth-ns1 type veth peer name veth-ns1-br
root@i-pvirg1hu:~# ip link add veth-ns2 type veth peer name veth-ns2-br
root@i-pvirg1hu:~# ip link set veth-ns1 netns ns1
root@i-pvirg1hu:~# ip link set veth-ns2 netns ns2
root@i-pvirg1hu:~# brctl addbr br0
root@i-pvirg1hu:~# brctl addif br0 veth-ns1-br
root@i-pvirg1hu:~# brctl addif br0 veth-ns2-b
# 设置ip
root@i-pvirg1hu:~# ip -n ns1 a a 192.168.1.2/24 dev veth-ns1
root@i-pvirg1hu:~# ip -n ns2 a a 192.168.1.3/24 dev veth-ns2
root@i-pvirg1hu:~# ip a a 192.168.1.1/24 dev br0
# up
root@i-pvirg1hu:~# ip link set br0 up
root@i-pvirg1hu:~# ip link set veth-ns1-br up
root@i-pvirg1hu:~# ip link set veth-ns2-br up
root@i-pvirg1hu:~# ip -n ns1 link set veth-ns1 up
root@i-pvirg1hu:~# ip -n ns2 link set veth-ns2 up
# 配置默认路由
root@i-pvirg1hu:~# ip -n ns1 route add default via 192.168.1.1
root@i-pvirg1hu:~# ip -n ns2 route add default via 192.168.1.1
vm2配置
root@i-pvirg1hu:~# ip netns add ns1
root@i-pvirg1hu:~# ip netns add ns2
root@i-pvirg1hu:~# ip link add veth-ns1 type veth peer name veth-ns1-br
root@i-pvirg1hu:~# ip link add veth-ns2 type veth peer name veth-ns2-br
root@i-pvirg1hu:~# ip link set veth-ns1 netns ns1
root@i-pvirg1hu:~# ip link set veth-ns2 netns ns2
root@i-pvirg1hu:~# brctl addbr br0
root@i-pvirg1hu:~# brctl addif br0 veth-ns1-br
root@i-pvirg1hu:~# brctl addif br0 veth-ns2-b
# 设置ip
root@i-pvirg1hu:~# ip -n ns1 a a 192.168.2.2/24 dev veth-ns1
root@i-pvirg1hu:~# ip -n ns2 a a 192.168.2.3/24 dev veth-ns2
root@i-pvirg1hu:~# ip a a 192.168.2.1/24 dev br0
# up
root@i-pvirg1hu:~# ip link set br0 up
root@i-pvirg1hu:~# ip link set veth-ns1-br up
root@i-pvirg1hu:~# ip link set veth-ns2-br up
root@i-pvirg1hu:~# ip -n ns1 link set veth-ns1 up
root@i-pvirg1hu:~# ip -n ns2 link set veth-ns2 up
# 配置默认路由
root@i-pvirg1hu:~# ip -n ns1 route add default via 192.168.2.1
root@i-pvirg1hu:~# ip -n ns2 route add default via 192.168.2.1
这个时候vm1和vm2各自的ns1和ns2是通的,同时,ns各自到主机的网络也是通的
# 到本机ns2
root@i-pvirg1hu:~# ip netns exec ns1 ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.053 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=0.052 ms
# 到本机
root@i-pvirg1hu:~# ip netns exec ns1 ping 172.16.0.3
PING 172.16.0.3 (172.16.0.3) 56(84) bytes of data.
64 bytes from 172.16.0.3: icmp_seq=1 ttl=64 time=0.038 ms
64 bytes from 172.16.0.3: icmp_seq=2 ttl=64 time=0.046 ms
但是vm1到vm2是不通的,我们需要最后在主机上配置路由来联通两个vm
# 在vm1上
root@i-pvirg1hu:~# ip route add 192.168.2.0/24 via 172.16.0.2
# 在vm2上
root@i-pvirg1hu:~# ip route add 192.168.1.0/24 via 172.16.0.3
测试
# vm1
root@i-pvirg1hu:~# ip netns exec ns1 ping 192.168.2.2
PING 192.168.2.2 (192.168.2.2) 56(84) bytes of data.
64 bytes from 192.168.2.2: icmp_seq=1 ttl=62 time=0.310 ms
64 bytes from 192.168.2.2: icmp_seq=2 ttl=62 time=0.275 ms
# vm2
root@i-pvirg1hu:~# ip netns exec ns1 ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=62 time=0.223 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=62 time=0.266 ms
我们看到ttl=62,代表经历了两次路由到达对端
我们做一下路由追踪
root@i-pvirg1hu:~# ip netns exec ns1 traceroute 192.168.2.2
traceroute to 192.168.2.2 (192.168.2.2), 30 hops max, 60 byte packets
1 192.168.1.1 (192.168.1.1) 0.030 ms 0.007 ms 0.005 ms
2 172.16.0.2 (172.16.0.2) 0.254 ms 0.231 ms 0.220 ms
3 192.168.2.2 (192.168.2.2) 0.212 ms 0.243 ms 0.239 ms
1、ns1向外发送一个ICMP数据包,源地址为192.168.1.2,目的地址为192.168.2.2
2、 因为目的地址192.168.2.2和源地址192.168.1.2不在同一子网上,因此数据包被发送到缺省网关192.168.1.1,也就是Linux bridge内部的自带网卡br0
3、 br0收到该数据包后,主机根据路由条目192.168.2.0/24 via 172.16.0.2判断应该将该数据包发送到对端网卡上,对端根据本地路由将数据包发送给br0,
4、 br0将数据包送到目的地址192.168.2.2