Intel I210网卡
I210 supports AVB and ETF (Earliest TxTime First, Time-Based Scheduling), but does not support TSN (802.1Qbv).
I225 (IGC) supports TSN, each Tx queue has the start_time and end_time, they are within [0, cycle_time].
1 Intel I210网卡
1.1 PHY配置
Intel的LAN芯片中,一般情况下PHY是集成的,所以通过EEPROM可以配置LAN芯片输出信号的种类,可以接电口RJ45和光口SFP。
如果LAN芯片接SFP,则在EEPROM中将Link Mode配置成SerDes(1000Base-BX);如果LAN芯片接RJ45,则在EEPROM中将Link Mode配置成copper PHY;如果LAN芯片接另外一颗PHY,则在EEPROM中将Link Mode配置成SGMII Mode。
1.2 MAC地址寄存器
EEPROM size is Nx16 bit, the first 3x16 bit for ethernet MAC, the value from this field is loaded into the Receive Address Register 0 (RAL0/RAH0). There are 16 pairs RALx/RAHx, they could be used to filter AVB Rx MAC to separate queues, refer to I210 page 259.
1.3 DMA
有5种不同形式的DMA描述符。Legacy和Advanced Receive描述符不能同时工作,由寄存器SRRCTL[n].DESCTYPE = 000b控制二选一;而Transmit描述符中的bit 29位DEXT表示是Legacy还是Advanced。E1000 DMA描述符指向内存的数据传输完毕后,网卡会修改描述符的数据,并且修改后的描述符格式与软件准备的数据差别很大。
Legacy Receive Descriptor
Advanced Receive Descriptor
Legacy Transmit Descriptor
Advanced Transmit Context Descriptor
Advanced Transmit Data Descriptor
RX BD有关的4个寄存器,其中n的范围从0到3。
RDBA[n]:存放描述符缓冲的首地址
RDH[n]和RDT[n]:头尾指针,存放相对基址的偏移量,网卡使用RDH[n]和RDT[n]之间的描述符进行接收报文处理。RDH[n]寄存器由网卡在回写一个报文接收描述符给驱动之后更新,RDT[n]寄存器由网卡驱动在提供报文接收描述符给网卡之后更新
RDLEN[n]:为缓冲区分配的总空间的字节大小
注:如果RDH[n]追上了RDT[n](RDH[n] == RDT[n]),那么说明接收队列已经没有空闲的描述符了,网卡将丢弃这个包。如果有空闲的接收描述符,它将复制这个包的数据到描述符指向的缓存中,设置这个描述符的DD和EOP状态位,并递增RDH[n]。参考文档Virtio networking: A case study of I/O paravirtualization。
1.4 FIFO
In legacy mode, all of Tx (Rx) queues share only one Tx (Rx) FIFO, refer to 82576EB.
In Qav mode, every Tx queue has a dedicated Tx FIFO, but all of Rx queues still share only one Rx FIFO, refer to I210 page 388, multiple Tx FIFO are similar to dwc eqos.
1.5 Linux网卡驱动2个重要函数
struct net_device {
[...]
dev_addr; // arp
// ifconfig eth0 hw ether 00:11:22:33:44:55
set_mac_address;
// promiscuous and multicast
// ifconfig eth0 promisc
// ifconfig eth0 -promisc
set_multicast_list;
[...]
};
1.6 关闭I210网卡的硬件tcp分段功能
ethtool -k eth0
ethtool -K eth0 tso off
1.7 校验和offload
软件校验和计算位置
dev_queue_xmit() -> validate_xmit_skb()
1.8 NIC PXE Option ROM
[24-Feb-2022]
Preboot Execution Environment is short for PXE, which is for diskless work station, iSCSI, etc. PXE includes simple BIOS NIC driver, dhcp and ftp applications, and is recognized as a NIC bootROM. Refer to C3000 Integrated 10 GbE LAN Controller PRM.
1.9 Linux网卡流量控制工具tc
Linux tc mqprio对接net_device_ops中的函数指针ndo_setup_tc()。
script实现
qdisc: dev_queue_xmit()
HTB: Hierarchical Token Bucket
burst: size of the bucket, in bytes. This is the maximum amount of bytes that tokens can be available for instantaneously, whose minimum buffer size is equal to rate / HZ(zcat /proc/config.gz | grep HZ). The final piece is a token-making machine that adds rate/HZ tokens to the bucket every tick
# tc uses the following units when passed as a parameter.
# kbps: Kilobytes per second
# mbps: Megabytes per second
# kbit: Kilobits per second
# mbit: Megabits per second
# bps: Bytes per second
#
# Amounts of data can be specified in:
# kb or k: Kilobytes
# mb or m: Megabytes
# mbit: Megabits
# kbit: Kilobits
# To get the byte figure from bits, divide the number by 8 bit
#
# disable eth0 qdisc
tc qdisc del dev eth0 root
# enable eth0 qdisc
# parent class: 1
# subclass ID: 1
tc qdisc add dev eth0 root handle 1: htb default 20
tc class add dev eth0 parent 1: classid 1:1 htb rate 200kbps ceil 200kbps burst 4k # burst max 4 Kilobytes per 1/HZ
tc class add dev eth0 parent 1: classid 1:20 htb rate 90mbit ceil 95mbit burst 14k # burst max 14 Kilobytes per 1/HZ
# fwmark classifier, u32 classifier (Universal/Ugly 32bit)
tc filter add dev eth0 parent 1: prio 1 protocol ip u32 match ip dport 8000 0xffff flowid 1:1
2 Wireshark
2.1 时间戳调整为UTC显示格式
View -> Time Display Format -> Date and Time of Day
2.2 常用过滤关键字
1)someip
2)ip.src == 192.168.1.1 - 改到对应的ip地址
3)ip.src == 192.168.1.1 and ip.dst == 192.168.1.2 - 改到对应的ip地址
4)someip.messageid == 0xffff8100 and ip.src == 192.168.1.2 - 改到对应的ip地址
5)dns - 调试域名解析
6)tcp.port == 8000
7)!(tcp.analysis.retransmission)
8)tcp.flags.syn==1 or tcp.flags.ack==0
9)tcp.flags.fin == 1
10)usb.src == "1.6.1" and usb.dst == "host" - 改到对应的USB bus_no.addr.ep_no
2.3 tcpdump抓包原理
__netif_receive_skb_core()
xmit_one()
3 网络丢包分析工具
set_irq_affinity
netstat -i
netstat -su (for Linux desktop)
ethtool -g eth0
ethtool -G eth0 tx 4096
MSS(Maximum Segment Size)就是IP数据包每次能传输的最大数据分段,这个值是MTU值减去IP数据包头大小20Byte和TCP数据段的包头20Byte。即MSS + 40 = MTU。
4 socket发送接收缓冲区修改
4.1 Windows
Windows默认是8K。
AFD:Ancillary Function Driver for WinSock
[HKEY_LOCAL_MACHINE SYSTEM CurrentControlSetServicesAfdParameters]
DefaultReceiveWindow = 1800 (16进制)
DefaultSendWindow = 1800(16进制)
4.2 Linux
/proc/sys/net/core/rmem_default
/proc/sys/net/core/wmem_default
5 Abbreviations
3Com: Computer, Communication, Compatibility, by Robert Metcalfe
FM2112: Fulcrum Microsystems switch IC, was acquired by Intel in July, 2011
IGB:Intel Gigabit Ethernet
ioat:Intel Ethernet I/O Acceleration Technology
MIB:Management Information Base,交换机收发数据统计模块
OFA:OpenFabrics Alliance
PTP:Precision Time Protocol ,精准时间同步协议
Qav:Queuing and Forwarding Protocol,队列及转发协议
tdt:tx_descriptor_tail
Traffic Shaping:流量整形
TSO:TCP Segment Offload,把TCP数据分段放到网卡(offload)来做
XAUI:读作Zowie,万兆以太网物理层接口收发都是4-Lane,每个Lane的速度是3.125Gb/s