ubuntu上通过openvswitch卸载实现roce over vxlan
环境
操作系统:
uname -a
Linux 5.4.0-187-generic #207-Ubuntu SMP Mon Jun 10 08:16:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Mellanox网卡:
ethtool -i ens6np0
driver: mlx5_core
version: 23.10-2.1.3
firmware-version: 20.39.3004 (MT_0000000222)
openvswitch:
ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.15.8
配置SRIOV
具体见https://blog.csdn.net/aashuii/article/details/140313972
在本例中,为了操作方便启用了1个VF,结果确认:
rdma link
link mlx5_0/1 state DOWN physical_state DISABLED
link mlx5_0/2 state ACTIVE physical_state LINK_UP netdev ens6np0
link mlx5_1/1 state ACTIVE physical_state LINK_UP netdev ens6np0v0
可以看出PF的接口是ens6np0 ,VF的接口是ens6np0v0
安装openvswitch
可以通过apt-get install openvswitch-switch安装,但版本较低(2.9),所以这里选择手动安装。
下载解压源码:
安装依赖:
sudo apt-get install build-essential linux-headers-$(uname -r)
sudo apt-get install graphviz automake bzip2 debhelper dh-autoreconf procps python-all python-qt4 python-twisted-conch python-zopeinterface python-six dkms module-assistant ipsec-tools racoon libc6-dev module-init-tools netbase python-argparse uuid-runtime
sudo apt-get install openssl python-pip libtool autoconf
编译安装:
./boot
./configure --with-linux=/lib/modules/$(uname -r)/build
make
make install
make modules_install
启动:
utilities/ovs-ctl start --system-id=random
注意1:这种方式启动后的日志默认路径不是/var/log/openvswitch,而是/usr/local/var/log/openvswitch/
注意2:这种方式启动的没有systemctl守护,重启后可能需要手动启动
确认启动成功:
配置卸载
修改网卡模式:
#需要把每一个VF的pci地址卸载后才能更改
echo 0000:af:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind
#PF接口ens6np0
echo switchdev > /sys/class/net/ens6np0/compat/devlink/mode
echo 0000:af:00.1 > /sys/bus/pci/drivers/mlx5_core/bind
#VF接口ens6np0v0
ifconfig ens6np0v0 10.50.0.1/24 up
配置ovs:
ovs-vsctl add-br ovs-br
#192.168.1.3是本地PF接口的IP
ovs-vsctl add-port ovs-br vx16 -- set interface vx16 type=vxlan options:local_ip=192.168.1.1 options:remote_ip=192.168.1.3 options:key=16
ovs-vsctl add-port ovs-br eth0
配置完后使用ovs-vsctl show确认:
确认vxlan的tc规则tc -s qdisc show dev vxlan_sys_4789:
配置硬件卸载:
#可使用ethtool -k ens6np0 | grep tc-offload确认
ethtool -K ens6np0 hw-tc-offload on
这一步应该生成vf rep,用rdma link确认:
继续配置:
ifconfig ens6np0v0 10.50.0.1/24 up
ifconfig eth0 up
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
可使用使用ovs-vsctl get Open_vSwitch . other_config:hw-offload确认
验证
服务端执行:ib_send_bw -d mlx5_1 -x 3 --report_gbits
客户端执行:ib_send_bw -d mlx5_1 -x 3 --report_gbits 10.50.0.1
注:-x后面是gid,可根据show_gids查看,选择RoCEv2
这一步通了,并不能证明卸载成功,要确定走的是硬件转发还是软件转发,一般来说如果带宽特别低,可能就是软件转发,可以在vf rep(本例中为eth0)抓包查看,如果看到很多tcp和roce报文,那就是卸载失败了,正常应该是:
还可以在网卡上抓包确认:ib dump
卸载失败可以查看流表确认:
#卸载失败的
ovs-appctl dpctl/dump-flows type=non-offloaded
#卸载成功的
ovs-appctl dpctl/dump-flows type=offloaded
另外可以开启debug级别日志看是否有提示:
ovs-appctl vlog/set netdev_offload_tc:dbg
ovs-appctl vlog/set dpif_netlink:dbg
问题总结
make modules_install提示ssl相关c和h文件找不到
At main.c:167:
- SSL error:02001002:system library:fopen:No such file or directory: …/crypto/bio/bss_file.c:69
- SSL error:2006D080:BIO routines:BIO_new_file:no such file: …/crypto/bio/bss_file.c:76
sign-file: certs/signing_key.pem: No such file or directory
INSTALL /home/lz/openvswitch-2.15.8/datapath/linux/vport-geneve.ko
根据https://blog.51cto.com/SpaceVision/5071551执行以下操作后重新make
cd /lib/modules/$(uname -r)/build/certs
sudo tee x509.genkey > /dev/null << 'EOF'
[ req ]
default_bits = 4096
distinguished_name = req_distinguished_name
prompt = no
string_mask = utf8only
x509_extensions = myexts
[ req_distinguished_name ]
CN = Modules
[ myexts ]
basicConstraints=critical,CA:FALSE
keyUsage=digitalSignature
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid
EOF
sudo openssl req -new -nodes -utf8 -sha512 -days 36500 -batch -x509 -config x509.genkey -outform DER -out signing_key.x509 -keyout signing_key.pem