Android wifi常见问题及分析
参考 Android Network/WiFi 那些事儿
前言
本文将讨论几个有意思的网络问题,同时介绍 Android 上常见WiFi 问题的分析思路。
网络基础Q & A
一. 网络分层缘由
分层想必大家很熟悉,是否想过为何需要这样分层?
网上大多都是介绍每一层的职责,似乎对为何要分层很少提及。
从软件设计模式的角度来看,分层有点像MVC、MVP、MVVM 等Android 架构模式。
其本质是为了解耦,职责更加明确,修改的话只改一层即可,不至于牵一发而动全身。
通过分层,对外屏蔽其复杂性,这样出问题的话容易定位到问题。
每一层的职责又是什么呢?
传输层解决的是进程定位的问题、网络层解决数据包寻路的问题、数据链路层解决局域网传输的问题。
二. MAC 层扮演的角色
关于MAC 层扮演的角色,你可能会说因为ip 会变,mac 地址的话,不过这似乎并没有解释清楚。
首先,我们需要达成这样的一个共识:
网络包在到达最终目的地之前,是不知道目标mac 地址的,这期间会经历无数次mac 层封装ip 层的过程,比如某一次mac层包装ip层时,这时候的mac层地址并不一定是最终的目标mac地址。
有可能是相连的一台路由器的mac地址,解码之后看ip,发现ip 不在这个局域网内,然后再次包装一层mac 地址,发给相连(也有可能不是相连,反正就是根据某种规则规定的下一个路由器)路由器。
数据包上ip 地址的作用是在外网上投递用的,内网就不行了,必须要用mac,使用mac 其中一个原因是为了在局域网内确认到那台正确的计算机。
信息传递时候,需要知道的其实就是两个地址:
• 终点地址(Final destination address)
• 下一跳的地址(Next hop address)
IP地址本质上是终点地址,它在跳过路由器(hop)的时候不会改变,而MAC 地址则是下一跳的地址,每跳过一次路由器都会改变。
[引用1]
归纳下来一句话:
MAC 作用是局域网内寻址,IP 作用是网络寻址。
三. DHCP
从日志中看DHCP 的过程 (logcat & driver)
Q:DHCP Offer 和 DHCP ACK是广播包吗?
A:取决于client 端的ip 协议栈的能力,如果协议栈在初始化过程中,不接收单播IP报文,请在DHCP Discovery/ Request报文的Flags里明确告知服务器,设置BROADCAST flag = 1,服务器就用广播来和客户端通信。
Q:是否存在static ip 和DHCP Server 分配给Client 的ip 冲突的情况?
A:其实在最后的确认阶段,当Client收到DHCP Server发送的DHCP ACK 报文之后,并不会马上使用Server 分配的这个地址,而是会发送目的地址为Server 分配地址的ARP 请求报文作最后的确认(即免费ARP)。
如果没有检测到冲突,则将此地址与自己绑定。如果检测到冲突,就向DHCP Server发送DHCPDECLINE报文,在Request IP Address(option 50)字段填入Server提供的发生冲突的IP地址.发送完成后,等待一段时间再开始重新申请IP地址,直至申请到一个可用的IP地址。
四. TCP
我们接下来讨论TCP 的几个疑问
为何TCP 是三次握手,而挥手是四次
握手的作用是为了初始化双边的序列号,这个是众所周知的事情。
问题是:为何是三次握手?
其实TCP 的3次握手是优化的结果,它应该是4次握手,由于是从零开始的建立连接,因此将SYN的ACK及被动打开的SYN合并成一个SYN-ACK,仅此而已。
为什么TCP的断链是4次挥手而不是3次?
其实这个问题可以换个问法:
为什么针对主动断开方的FIN的ACK以及本端的FIN不能合并?
因为TCP 连接是全双工的,握手阶段可以将ACK 和SYN 合并是因为此时没有数据上的连接,所以可以放心的合并。
然而,挥手阶段就不一样了,假设Client 端数据传输完了认为此时可以断开,而Server 端可能还有数据要传,为了避免数据丢失或者损坏,Server 端先发ACK,然后等数据传输完成后再发送FIN 和ACK。
上面提到的无论是三次握手还是四次挥手,其次数约定的核心是TCP 是全双工。
如果不理解这句话的话,不妨多看几遍上面的握手和挥手流程。
为何说TCP 是" 可靠 "协议
TCP 发送端和接收端并不存在真实的类似网络专线的东西,而是通过Client 和Server 各自维护一定数据结构(一种状态机),来记录和维护"连接"的状态。
TCP 作为这样一个抽象的协议,跑在不可靠的IP 协议之上,在一个不可靠的介质上传输,其本质不可能是可靠的。
既然下层链路不可靠,那么TCP 是如何实现所谓的可靠?
基于丢包检测以及超时重传
如何判断丢包和解决丢包的问题?
引入ACK 和 超时重传,事实上远没这么简单,复杂网络下,还会涉及到滑动窗口、拥塞控制等概念,受限篇幅,我们后续会另起一篇文章讨论拥塞控制。
超时时间内没收到Ack 则重发,可能是发的路上丢了也可能是Ack 丢了
图中绘制的是超时重传,还有一个经常被提及的快速重传,其本质也是超时重传,只是一个优化而已。
Q:快速重传时的优先级是怎样的?
A:
1. lost 数据段
IsLost (SeqNum):
This routine returns whether the given sequence number is
considered tobe lost. The routine returns truewheneither
DupThresh discontiguous SACKed sequences have arrived above
'SeqNum' or (DupThresh * SMSS) bytes with sequence numbers greater
than 'SeqNum' have been SACKed. Otherwise, the routine returns
false.
(摘自 RFC3517 A Conservative Selective Acknowledgment)
2. 新的尚未发送的数据段
3. 没有标记为LOST,没有标记为SACKed,没有重传过的数据段
Q:每个 seq 都需要一个 ack 吗?
A:这其实涉及一个概念:ACK 延迟确认
延迟确认可以让我们同时对多个接受的报文进行一次确认,又称之为累积确认
这样接收方就不必对每个报文都进行确认,接收到多个报文后如果延迟时间内没有报文到来,就发送下一个期望接收报文的 ack。
好处显而易见就是节省带宽,弊端也很明显那就是延迟带来的性能损失甚至会造成不必要的超时重传。
Q:RTO 依据RTT 计算出来准确吗?
A:某些场景下不准确会导致误判重传
“The Transmission Control Protocol (TCP) usesa retransmission timer to
ensure data delivery inthe absence ofany feedback from the remote data
receiver. The duration ofthis timer isreferred toasRTO (retransmission timeout).”
根据RTT 计算出RTO 在内核中是一个比较复杂的过程,这里不展开讨论其细节,有兴趣的话可以看下 RFC 6298 的 The Basic Algorithm。
除了真实的网络丢包导致的RTO 之外,还有一种情况下的RTO 超时被称为虚假重传超时,比如突发拥塞导致的RTT 极速拉升、路由链路变更导致新RTT 较高、RTT 波动较大的受干扰无线环境下等都可能会出现这种情况。
WiFi 常见问题
我们平时在实际项目上,经常会遇到一些WiFi 相关的问题,比如连接不上、异常掉线等问题。我们打算例举出WiFi 常见问题,讨论其「一般分析思路」。
1. 连接问题
为何会出现连接失败?
比较常见的原因:
密码错误、Auth 失败、握手异常、DHCP 失败等。
一. 握手失败
1, 2 握手失败,通常是由于密码错误
17:29:23.276148 [ cds_ol][ 0x10cb45409a9][ 14:28:23.026125]wlan: [6161:D:QDF] DPT: 0379:255 EAPOL: [0] [M1] SA: ec:41:18:08:df:9d < --DA:bc:7f:a4:35:0a:95
17:29:23.276159[ soft_i][ 0x10cb454559e][ 14:28:23.027139] wlan:[ 0:D:QDF] DPT:0381:255EAPOL:[ 0] [ M2] SA:bc:7f:a4:35:0a:95--> DA: ec:41:18:08:df:9d
//重复进行1,2握手,间隔时间 1s
17:29:26.838232 [ cds_ol][ 0x10cb6a1b219][ 14:28:25.038825]wlan: [6161:D:QDF] DPT: 0384:255 EAPOL: [0] [M1] SA: ec:41:18:08:df:9d < --DA:bc:7f:a4:35:0a:95
17:29:26.838243[ soft_i][ 0x10cb6a1f966][ 14:28:25.039776] wlan:[ 0:D:QDF] DPT:0386:255EAPOL:[ 0] [ M2] SA:bc:7f:a4:35:0a:95--> DA: ec:41:18:08:df:9d
17:29:27.038660 [ cds_ol][ 0x10cb8eb8450][ 14:28:27.038427]wlan: [6161:D:QDF] DPT: 0389:255 EAPOL: [0] [M1] SA: ec:41:18:08:df:9d < --DA:bc:7f:a4:35:0a:95
17:29:29.043572[ soft_i][ 0x10cb8eba31f][ 14:28:27.038839] wlan:[ 0:D:QDF] DPT:0391:255EAPOL:[ 0] [ M2] SA:bc:7f:a4:35:0a:95--> DA: ec:41:18:08:df:9d
//重复两次后失败
17:29:29.043606 [ kworke][ 0x10cbb354e58][ 14:28:29.037922]wlan: [15785:D:MGMT _TXRX] tgt_mgmt _txrx_rx _frame_handler: 1043: Rcvd mgmt frame subtype c0 (frame type 10) from ec:41:18:08:df:9d, seq _num = 1049, rssi = -17 tsf_delta: 0
密码错误可以从driver 日志中进一步证实
03-0618: 23: 06.58565010241024D wpa_supplicant: wlan0: State:
4WAY_HANDSHAKE -> DISCONNECTED
03-0618: 23: 06.5861887491247D WifiHW : [ 1] getevent: IFNAME=wlan0 < 3>WPA:
4-Way Handshake failed - pre- sharedkeymay be incorrect
03-0618: 23: 06.5861987491247D WifiHW : [ 2] getevent:
IFNAME=wlan0 WPA:
4-Way Handshake failed - pre- sharedkeymay be incorrect
而3, 4 握手失败,通常是路由设置问题
10:31:35.954886 [ cds_ol][ 22131061630][ 18:33:35.944171]wlan: [12981:I:QDF] qdf _dp_display _proto_pkt_always: 2093: DPT: 1145:255 EAPOL: [0] [M1] SA: 30:0d:9e:5a:83:6f < --DA:64:a2:00:77:4a:94
10:31:35.954914[ soft_i][ 22131098001][ 18:33:35.946066] wlan:[ 0:I:QDF] qdf_dp_display_proto_pkt_always:2093:DPT:1147:255EAPOL:[ 0] [ M2] SA:64:a2:00:77:4a:94--> DA: 30:0d:9e:5a:83:6f
10:31:35.955028 [ cds_ol][ 22131150806][ 18:33:35.948815]wlan: [12981:I:QDF] qdf _dp_display _proto_pkt_always: 2093: DPT: 1150:255 EAPOL: [0] [M3] SA: 30:0d:9e:5a:83:6f < --DA:64:a2:00:77:4a:94
10:31:35.964895[ soft_i][ 22131186337][ 18:33:35.950667] wlan:[ 0:I:QDF] qdf_dp_display_proto_pkt_always:2093:DPT:1152:255EAPOL:[ 0] [ M4] SA:64:a2:00:77:4a:94--> DA: 30:0d:9e:5a:83:6f
//重复进行3,4握手,间隔时间1s
10:31:36.948740 [ cds_ol][ 22150330716][ 18:33:36.947769]wlan: [12981:I:QDF] qdf _dp_display _proto_pkt_always: 2093: DPT: 1173:255 EAPOL: [0] [M3] SA: 30:0d:9e:5a:83:6f < --DA:64:a2:00:77:4a:94
10:31:36.954601[ soft_i][ 22150370206][ 18:33:36.949827] wlan:[ 0:I:QDF] qdf_dp_display_proto_pkt_always:2093:DPT:1175:255EAPOL:[ 0] [ M4] SA:64:a2:00:77:4a:94--> DA: 30:0d:9e:5a:83:6f
10:31:37.964228 [ cds_ol][ 22169779798][ 18:33:37.960742]wlan: [12981:I:QDF] qdf _dp_display _proto_pkt_always: 2093: DPT: 1181:255 EAPOL: [0] [M3] SA: 30:0d:9e:5a:83:6f < --DA:64:a2:00:77:4a:94
10:31:37.964267[ soft_i][ 22169799422][ 18:33:37.961765] wlan:[ 0:I:QDF] qdf_dp_display_proto_pkt_always:2093:DPT:1183:255EAPOL:[ 0] [ M4] SA:64:a2:00:77:4a:94--> DA: 30:0d:9e:5a:83:6f
//重复两次后失败
10:31:38.948461 [ schedu][ 22188729766][ 18:33:38.947719]wlan: [12979:I:PE] Deauth RX: vdev 0 from 30:0d:9e:5a:83:6f for 64:a2:00:77:4a:94 RSSI = -48 reason 15 mlm state = 16, sme state = 11 systemrole = 3
二. DHCP 失败
//正常 IpClient.wlan0 StateMachine dump: 关键字"onProvisioningSuccess"
2021-02-07T09:29:31.941 - CMD_START wlan0/30 0 0 ProvisioningConfiguration{mEnableIPv4: true, mEnableIPv6: true, mEnablePreconnection: false
2021-02-07T09:29:32.033 - EVENT _PRE_DHCP _ACTION_COMPLETE wlan0/30 0 0 null [rcvd _in=RunningState, proc_in=RunningState]
2021-02-07T09:29:32.591 - INVOKE onProvisioningSuccess({{InterfaceName: wlan0 LinkAddresses: [ 192.168.31.45/24 ] DnsAddresses: [ /192.168.31.1 ]
//异常 IpClient.wlan0 StateMachine dump: 关键字"onProvisioningFailure"
2021-02-07T12:24:32.069 - CMD_START wlan0/20 0 0 ProvisioningConfiguration{mEnableIPv4: true, mEnableIPv6: true, mEnablePreconnection: false
2021-02-07T12:24:32.095 - EVENT_PRE_DHCP_ACTION_COMPLETE wlan0/20 0 0 null [rcvd_in=RunningState, proc_in=RunningState]
2021-02-07T12:25:08.071 - INVOKE onProvisioningFailure({{InterfaceName: wlan0 LinkAddresses: [ fe80::63aa:c19f:9cf:2868/64 ] DnsAddresses: [ ] Domains: null MTU: 0 TcpBufferSizes: 524288,1048576,8808040,262144,524288,6710886 Routes: [ fe80::/64 -> :: wlan0 ]}})
//正常 DHCP client:收到DHCP server分配的IP等信息
02-07 09:29:39.834 root 0 0 I [ soft_i][ 0x2d51ef84972][ 09:29:39.834889] wlan: [0:I:QDF] DHCP-D TX: SA:10:3f:44:af:8a:d4 DA:ff:ff:ff:ff:ff:ff msdu_id:861 status: succ
02-07 09:29:39.838 1073 2519 7028 D DhcpClient: Received packet: 10:3f:44:af:8a:d4 OFFER, ip /192.168.31.45, mask /255.255.255.0, DNS servers: /192.168.31.1 , gateways [/192.168.31.1] lease time 43200, domain null
//异常 DHCP client:没有收到DHCP server分配的IP等信息
[ 371.350648] [ soft_i][ 375560042590][ 12:24:32.098901] wlan: [0:I:QDF] DHCP-D TX: SA:e8:5a:8b:de:1f:43 DA:ff:ff:ff:ff:ff:ff msdu_id:1199 status: succ
分析:
通常是由于信号弱导致,因为DHCP 数据帧较大,在信号弱时会出现交互失败的情形,可以从日志中检查下rssi 值,并在信号佳的位置对比测试。
三. Auth 失败
2021-05-12 10:22:43.578458 [ schedu][ 589237120609][ 22:22:43.577742]wlan: [26358:I:PE] Auth TX: success
2021-05-12 10:22:43.578514 [ schedu][ 589237125306][ 22:22:43.577987]wlan: [26358:I:PE] Auth RX: vdev 1 sys role 3 lim _state 7 from 54:75:95:e7:6f:61 rssi -63 auth_alg 0 seq 3976
//正常情况下code 是0,这里的code 1表示路由拒绝认证,需要进一步看路由拒绝原因
2021-05-12 10:22:43.578519 [ schedu][ 589237125762][ 22:22:43.578011]wlan: [26358:E:PE] lim _process_auth _frame_type2: 740: rx Auth frame from peer with failure code 1 54:75:95:e7:6f:61
2021-05-12 10:22:43.578524 [ schedu][ 589237126959][ 22:22:43.578073]wlan: [26358:E:PE] lim _process_mlm _auth_cnf: 543: Auth Failure occurred
分析:
对应到driver 日志,通常会打印出Auth frame from peer with failure,其后跟的code 数值代表了本地失败的原因,正常情况下是0,这个值也可以通过sniffer 中auth帧的status值确定。
关于code 数值的含义可以查看 802.11 Association Status Codes
比如我们项目上之前还遇到过code 17这种情况
从表格中可以看出17表示被 AP 拒绝,说明连接AP 的STA 太多已超出了AP 负载能力。
2. 掉线问题
为何会出现异常掉线?
常见原因:
framework 断开、Beacon miss、Kickout、Nud failed、Rx disassoc or deauth等
在介绍常见的掉线问题之前,先回答这样一个问题:
如何判断是上层断线还是底层断线?
笔者经验而言:
1. 如果是底层断线,会先收到底层断线发上来的DISCONNECT Event,然后supplicant才把state 切到 DISCONENCTED。
2. 如果是上层断线,disconnection request会先到supplicant,supplicant state会先切到DISCONNECTED, 发送disconnect request 给driver 去真正断开wifi 连接。
从日志中判断的话,一个简单的方法是搜关键字:
wpa_supplicant: nl80211, wpa_supplicant: wlan0看打印关键日志的先后关系。
一. RSSI 信号差
搜索日志关键字 "processed=L2ConnectedState"
rec[ 902]: time= 03-0316: 01: 25.327processed=L2ConnectedState
org=ConnectedState dest=< null> what=CMD_RSSI_POLL screen= on560
//rssi 数值-52,说明此时信号较好
"stevenli_5G"88:c3: 97: 32:c6:cc rssi= -52f= 5805sc= 60link= 720tx= 0.1,
0.0, 0.0rx= 1.5bcn= 48014[ on: 2941tx: 831rx: 24period: 3009]
fromscreen [ on: 79247period: 154965] score= 60
分析:
判断当前信号强弱,我们通常是通过查看rssi 数值以及TX、RX数值来判断。
通常来讲,RSSI 数值较好是介于-40~-60 之间,<-65 则认为信号较差。
二. Beacon 超时
首先需要先明白一点:
Beacon 时间间隔是AP 设置的,一般是100ms一个, 和station 无关。
原理机制(MTK平台):
首先Hw 会判断如果连续10个Beacon 没有收到,这时候会上报一个event 给 Fw,Fw收到这个Interrupt 之后,会重新设置 Linkdetect 的检查机制,拉大检查监听Beacon 的 interrupt 为96Ms ,如果接下来Hw 还是上报 Beacontimeout interrupt 。
这个时候Fw才真正上报 event 给 driver ,driver 收到这个Beacontimeout 之后,会延迟5S做一个重连,如果重连成功,此次Beacontimeout 不处理,认为是正常的。
如5S内还是重连不上,上报给Fwk 做断线处理。
< 6> [ 133.745238] (3)[ 827:main_thread][ wlan]
[827]aisIndicationOfMediaStateToHost:(AIS INFO) Postpone the indication of Disconnect for 5 seconds
分析:
通常是由于RSSI 信号差导致,不过如果当时的网络环境比较差,Noise 比较大,也会导致Beacon 无法收到,进而触发Beacontimeout。
三. ARP or DNS 异常
周期性检测DNS 状态过程中若发生fail,会触发断线
//关键字 "CMD_IP_REACHABILITY_LOST "或 "NUD_FAILED"
rec[135]: time=09-12 19:51:14.715 processed=L2ConnectedStateorg=ConnectedState dest=<null>
what=131221(0x20095)!CMD_IP_REACHABILITY_LOST
rt=210940/210940 FAILURE: LOST_PROVISIONING, NeighborEvent
{elapsedMs=210940, 192.168.1.1, [(null)], RTM_NEWNEIGH, NUD_FAILED}
或者搜索关键字" arp who-has" ,查看arp 是否有reply,delay高不高。
四. Datastall 触发
原理机制:
谷歌的一个在亮屏下一分钟检测一次网络状态的机制,一旦判定异常则会通知给CS。
什么情况下会被判定是异常即发生了datastall呢?
1. 过去一个poll 周期内丢包率达到了80%
/**
* Default tcp packets fail rate to suspect as a data stall.
*
* Calculated by ((# of packets lost)+(# of packets retrans))/(# of packets sent)*100. Ideally,
* the percentage should be 100%. However, the ongoing packets may not be considered as neither
* lost or retrans yet. It will cause the percentage lower.
*/
publicstaticfinal intDEFAULT_TCP_PACKETS_FAIL_PERCENTAGE = 80;
2. 半小时内DNS 失败达到5次
// Default configuration values for data stall detection.
publicstaticfinalintDEFAULT_CONSECUTIVE_DNS_TIMEOUT_THRESHOLD = 5;
publicstaticfinalintDEFAULT_DATA_STALL_VALID_DNS_TIME_THRESHOLD_MS = 30* 60* 1000;
driver日志中查看是否发生datastall,类似日志如下
13:44:35.605317 [ kworker/u16:10][ 0x96c51f025d][ 13:44:35.602210]wlan: [6253:D:WMA] Received reason code 6 from FW
13:44:35.605322 [ kworker/u16:10][ 0x96c51f032a][ 13:44:35.602220]wlan: [6253:D:WMA] Data Stall event:
13:44:35.605389 [ kworker/u16:10][ 0x96c51f038a][ 13:44:35.602225]wlan: [6253:D:WMA] data _stall_type: 3 vdev _id_bitmap: 1 reason _code1: 0 reason_code2: 7 recovery_type: 0
13:44:35.605397 [ kworker/u16:10][ 0x96c51f11a3][ 13:44:35.602413]wlan: [6253:D:QDF] cds _set_log _completion: 2136: is_fatal 1 indicator 3 reason_code 6 recovery needed 0
13:44:35.629190 [ wlan_logging_th][ 0x96c51fc43e][ 13:44:35.604795]wlan: [900:D:HDD] send _flush_completion _to_user: Sending flush done to userspace reason code 6
五. TX/RX 收发包异常
除了看是否datastall,我们还会查看TX/RX 数值变化,以此来判断当前网络是否正常。
//异常:RX的pkt cnt 涨幅很小,正常情况下是持续增长
15:04:57.822610 [ wifico][ 0x183a3b70779][ 15:04:57.369087]wlan: [1372:D:HDD] wlan _hdd_get _sta_stats:
[TX: Reporting MCS rate 8, flags 0x14 pkt cnt 267148, nss 2, bw 4]-[RX: Reporting MCS rate 8, flags 0x14 pkt cnt 450071, nss 2, bw 4]
15:05:00.379218 [ wifico][ 0x183a728afa9][ 15:05:00.378476]wlan: [1372:D:HDD] wlan _hdd_get _sta_stats:
[TX: Reporting MCS rate 8, flags 0x14 pkt cnt 267148, nss 2, bw 4]-[RX: Reporting MCS rate 8, flags 0x14 pkt cnt 450073, nss 2, bw 4]
15:05:03.686570 [ wifico][ 0x183aa9a651c][ 15:05:03.388042]wlan: [1372:D:HDD] wlan _hdd_get _sta_stats:
[TX: Reporting MCS rate 8, flags 0x14 pkt cnt 267148, nss 2, bw 4]-[RX: Reporting MCS rate 8, flags 0x14 pkt cnt 450074, nss 2, bw 4]
15:05:07.132915 [ wifico][ 0x183ae0eb713][ 15:05:06.406522]wlan: [1372:D:HDD] wlan _hdd_get _sta_stats:
[TX: Reporting MCS rate 8, flags 0x14 pkt cnt 267148, nss 2, bw 4]-[RX: Reporting MCS rate 8, flags 0x14 pkt cnt 450075, nss 2, bw 4]
15:05:09.422157 [ wifico][ 0x183b18213a7][ 15:05:09.421730]wlan: [1372:D:HDD] wlan _hdd_get _sta_stats:
[TX: Reporting MCS rate 8, flags 0x14 pkt cnt 267148, nss 2, bw 4]-[RX: Reporting MCS rate 8, flags 0x14 pkt cnt 450105, nss 2, bw 4]
15:05:13.535129 [ wifico][ 0x183b4f56b07][ 15:05:12.436868]wlan: [1372:D:HDD] wlan _hdd_get _sta_stats:
[TX: Reporting MCS rate 8, flags 0x14 pkt cnt 267148, nss 2, bw 4]-[RX: Reporting MCS rate 8, flags 0x14 pkt cnt 450107, nss 2, bw 4]
分析:
driver日志查看TX/RX 收发包是否正常,不正常的话则通过Sniffer 进一步分析是路由没转发还是底层收了没往上传。
六 . AP 侧异常
09-06 10:48:56.919 < 3> [ 1555.321037] (1)[ 3319:tx_thread][ wlan]
Rx Deauth frame from BSSID=[aa:63:df:5c:db:c5].
09-06 10:48:56.919 < 3> [ 1555.321093] (1)[ 3319:tx_thread][ wlan] Reason code = 7
除了上面这种AP 侧主动发起的掉线外,还有一些情况比如AP 侧更改了 security
2022 -02-1009 :31:32.092<4> [ 3273.328514]<2> (1) [4122:tx_thread]
[wlan]rsnCheckSecurityModeChanged: ( RSNINFO) securitychange, WPA2-> WEP
2022 -02-1009 :31:32.092<4> [ 3273.328530]<2> (1) [4122:tx_thread]
[wlan]scanProcessBeaconAndProbeResp: ( SCNINFO) Beaconsecuritymodechangedetected
2022 -02-1009 :31:32.092<4> [ 3273.328544]<2> (1) [4122:tx_thread]
[wlan]aisFsmStateAbort_NORMAL_TR: ( INITINFO) aisFsmStateAbort_NORMAL_TR
2022 -02-1009 :31:32.092<4> [ 3273.328559]<2> (1) [4122:tx_thread]
[wlan]aisFsmDisconnect: ( INITINFO) aisFsmDisconnect
3. 漫游问题
触发Roaming 最常见的原因是low rssi,除此之外,还有一些原因:
/* qcom* ROAM_TRIGGER_REASON_PER:
* Set if the roam has to be triggered based on a bad packet error rates (PER).
* ROAM_TRIGGER_REASON_BEACON_MISS:
* Set if the roam has to be triggered based on beacon misses from the connected AP.
* ROAM_TRIGGER_REASON_POOR_RSSI:
* Set if the roam has to be triggered due to poor RSSI of the connected AP.
* ROAM_TRIGGER_REASON_BETTER_RSSI:
* Set if the roam has to be triggered upon finding a BSSID with a better RSSI than the connected BSSID.
* Here the RSSI of the current BSSID need not be poor.
* ROAM_TRIGGER_REASON_PERIODIC:
* Set if the roam has to be triggered by triggering a periodic scan to find a better AP to roam.
* ROAM_TRIGGER_REASON_DENSE:
* Set if the roam has to be triggered when the connected channel environment is too noisy/congested.
* ROAM_TRIGGER_REASON_BTM:
* Set if the roam has to be triggered when BTM Request frame is received from the connected AP.
//.....
* ROAM_TRIGGER_REASON_DEAUTH:
* Set if the roam has to be triggered when device receives Deauthentication/Disassociation frame from connected AP.
* ROAM_TRIGGER_REASON_IDLE:
* Set if the roam has to be triggered when the device is in idle state (no TX/RX) and suspend mode, if the current RSSI is determined to be a poor one.
* ROAM_TRIGGER_REASON_TX_FAILURES:
* Set if the roam has to be triggered based on continuous TX Data frame failures to the connected AP.
* ROAM_TRIGGER_REASON_EXTERNAL_SCAN:
* Set if the roam has to be triggered based on the scan results obtained from an external scan (not triggered to aim roaming).
以其中比较常见的一种ROAM_TRIGGER_REASON_BEACON_MISS 为例(qcom)
# Set beacon missed count threshold
# if beacon missed counter > gRoamBmissFirstBcnt+gRoamBmissFinalBcnt,
# heartbeat error triggered
gRoamBmissFirstBcnt= 10
gRoamBmissFinalBcnt= 20
当某一次的beacon miss 次数超过gRoamBmissFirstBcnt+gRoamBmissFinalBcnt之和,则触发漫游,寻找合适的目标漫游热点,如果找不到候选者,则会断开网络。
//first_bmiss_detected=1 10个beacon miss
11:39:49.810713 R0: FWMSG: [2dd0beecbe] SWBMISS_TIMER_FN: vdev_id=0, curr_bmiss_bcnt=26, pre_bmiss_detected=1,
first_bmiss_detected=1, final_bmiss_detected=0, b_timeout=1, cons_bmiss_count = 61
11:39:49.834534 R0: FWMSG: [2dd0dcd3ef] SWBMISS_TIMER_FN: vdev_id=0, curr_bmiss_bcnt=27, pre_bmiss_detected=1, first_bmiss_detected=1, final_bmiss_detected=0, b_timeout=1, cons_bmiss_count = 62
11:39:49.835649 R0: FWMSG: [2dd0fb07a0] SWBMISS_TIMER_FN: vdev_id=0, curr_bmiss_bcnt=28, pre_bmiss_detected=1, first_bmiss_detected=1, final_bmiss_detected=0, b_timeout=1, cons_bmiss_count = 63
11:39:49.837121 R0: FWMSG: [2dd118f087] SWBMISS_TIMER_FN: vdev_id=0, curr_bmiss_bcnt=29, pre_bmiss_detected=1, first_bmiss_detected=1, final_bmiss_detected=0, b_timeout=1, cons_bmiss_count = 64
//final_bmiss_detected=1 在前面丢了10个beacon 的基础上又丢了20个beacon
11:39:49.837648 R0: FWMSG: [2dd11ed6b0] SWBMISS_NULL_SEND_CMPLT: vdev_id=0, isQosNullSuccess=0, isFinalBmiss=1, first_bmiss_detected=1,
final_bmiss_detected=1, fbmiss_evnt_posted=0, vbmiss->connected = 1, cons_bmiss_count = 65
11:39:49.838136 R0: FWMSG: [2dd11eda17] VDEV_MGR_FINAL_BMISS_DETECTED: vdev_id = 0
11:39:49.838657 R0: FWMSG: [2dd11ee1b0] ROAM_FINAL_BMISS_RECVD curr_rssi_avg = -68 dBm bcn_rssi_last = -68 dBm roam_scan_state(IDLE 0/ACTIVE_SCAN 1/ACTIVE_ROAM 2) = 0 scan_param.mode(5-7)/fina_scan(0-4) = 0x62 scan_status(bit3)/candidate_found(bit2)/send_beacon_to_host(bit1)/final_bmiss_recvd(bit0) = 0x06 is_qnull_fbmiss = 0
//触发漫游前的扫描
11:39:49.838724 R0: FWMSG: [2dd11ee4d5] ROAM_SCAN_REQUESTED send_beacon_to_host = 1 scan_type(FULL 0/CHANLIST 1/CHANMAP 2/FORCED 3/5G_ONLY 4) = 2 roam_scan_state (IDLE 0/ACTIVE_SCAN 1/ACTIVE_ROAM 2) = 0 final_scan = 1 num_chan = 13
11:39:49.838738 R0: FWMSG: [2dd11ee532] ROAM_FINAL_BMISS_SCAN scan_policy=0xf,skip_dfs=0,active_dwell=55 birst_dur=900
小结
至此,我们讨论了网络上几个有趣的问题,同时讨论了Android 平台上常见的WiFi 问题分析思路。本文由于受限于篇幅,加之笔者水平限制,相信会存在不足的地方,希望本文能够起到抛砖引玉的作用。
参考文献
https: //blog.csdn.net/shuyan1115/article/details/102476599
https: //juejin.cn/post/6844903986189844493
https: //blog.csdn.net/dog250/article/details/52548345