dump与strace命令实战之分析keystore死锁导致watchdog问题
一、问题描述
近期发生一个系统卡死的问题。推测是system_server出现watchdog。查看watchdog文件,是PackageManager卡在了KeyStore.clearUid()接口:
----- pid 1091 at 2023-10-26 09:40:10 -----
Cmd line: system_server
ABI: 'arm64'
"PackageManager" sysTid=1352
#00 pc 000000000006e1c4 /system/lib64/libc.so (__ioctl+4)
#01 pc 00000000000290d0 /system/lib64/libc.so (ioctl+136)
#02 pc 000000000005a5e4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+256)
#03 pc 000000000005b48c /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+340)
#04 pc 000000000005b1c0 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+176)
#05 pc 0000000000051318 /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
#06 pc 000000000012f72c /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+152)
#07 pc 0000000000406e34 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.BinderProxy.transactNative+196)
#08 pc 0000000000c00d5c /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.BinderProxy.transact+748)
#09 pc 000000000065e4e8 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.security.IKeystoreService$Stub$Proxy.clear_uid+216)
#10 pc 0000000000556578 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.security.KeyStore.clearUid+88)
#11 pc 00000000014bdba4 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.removeKeystoreDataIfNeeded+244)
#12 pc 00000000014bea94 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.removePackageDataLIF+2836)
#13 pc 00000000014900b0 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deleteInstalledPackageLIF+688)
#14 pc 000000000149161c /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deletePackageLIF+3532)
#15 pc 00000000014dd554 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deletePackageX+2724)
#16 pc 0000000000d9735c /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService$12.run+476)
#17 pc 0000000000ab737c /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.Handler.dispatchMessage+76)
#18 pc 0000000000aba4e0 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.Looper.loop+1264)
#19 pc 0000000000ab9380 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.HandlerThread.run+544)
#20 pc 00000000007d57b4 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.ServiceThread.run+100)
#21 pc 0000000000554d88 /system/lib64/libart.so (art_quick_invoke_stub+584)
#22 pc 00000000000cf6c8 /system/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+200)
#23 pc 000000000045c840 /system/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104)
#24 pc 000000000045d8fc /system/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue*)+424)
#25 pc 0000000000488a94 /system/lib64/libart.so (art::Thread::CreateCallback(void*)+1120)
#26 pc 00000000000819b4 /system/lib64/libc.so (__pthread_start(void*)+36)
#27 pc 0000000000023478 /system/lib64/libc.so (__start_thread+68)
出现问题的场景是在应用商店安装卸载某些应用。经过后面调查,在安装删除某个应用后,即出现这个问题。使用adb uninstall 该应用可以发现命令行一直卡在卸载中,没有正常返回。
经过代码走读,keystore服务注册的线程池为1。删除这个应用时候如果keystoreservice卡住了,则后面调用keystoreservice接口的模块也必定会卡住。
二,使用strace命令
8155平台keystoreservice是android的。keymaster是高通的qtikeymaster。于是写下面三个批处理命令,跟踪system_server, keystoreservice, qtikeymaster的系统调用情况。
1,写一个如下的批处理程序 strace_keystore.bat,开始捕获keystore与keymaster的系统调用。
echo "enable adb connecting first..."
adb wait-for-device
adb root
echo Wscript.sleep Wscript.Arguments(0)>Delay.vbs
Delay.vbs 5000
del Delay.vbs
adb shell mkdir /data/strace
md d:\stracelog
adb shell "echo 512 > /sys/module/binder/parameters/debug_mask"
adb shell pidof android.hardware.keymaster@4.0-service-qti > d:\stracelog\keymaster.pid
adb shell pidof keystore > d:\stracelog\keystore.pid
adb shell "strace -p `pidof keystore` -p `pidof android.hardware.keymaster@4.0-service-qti` -o /data/strace/process -ff -tt"
上面命令会把trace log保存在系统/data/strace目录。本地windows目录d:\stracelog文件夹保存出现问题时候keystore与keymaster的进程后,便于后面导出/data/strace/目录LOG后确认进程。
2,通过adb uninstall com.xxx.car复现该问题。
3,写一个 export_strace_logs.bat 批处理程序导出LOG。
adb pull /data/strace/ d:\stracelog
adb shell "rm /data/strace/*"
4,同时导出adb log。
5,分析strace log看,最后keystore卡在了:
09:47:17.190694 ioctl(4, BINDER_WRITE_READ, 0x7ff422e268) = 0
09:47:17.199964 ioctl(7, BINDER_WRITE_READ, 0x7ff4236a78) = 0
09:47:17.200665 futex(0x7339a462c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
09:47:17.200762 ioctl(7, BINDER_WRITE_READ, 0x7ff4236ca8) = 0
09:47:17.200844 futex(0x7339a462c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
09:47:17.200929 newfstatat(AT_FDCWD, "timestamp", {st_mode=S_IFREG|0600, st_size=0, ...}, 0) = 0
09:47:17.201304 ioctl(7, BINDER_WRITE_READ, 0x7ff422e138) = 0
09:47:17.201724 mprotect(0x73bbd6e000, 4096, PROT_READ|PROT_WRITE) = 0
09:47:17.201806 mprotect(0x73bbd6e000, 4096, PROT_READ) = 0
09:47:17.202226 futex(0x73bb575080, FUTEX_WAKE_PRIVATE, 2147483647) = 0
09:47:17.202310 ioctl(7, BINDER_WRITE_READ, 0x7ff422e208) = 0
09:47:17.203316 faccessat(AT_FDCWD, "user_0/10041_USRPKEY_no506b3822wb", R_OK|W_OK) = 0
09:47:17.203436 openat(AT_FDCWD, "user_0/10041_USRPKEY_no506b3822wb", O_RDONLY) = 9
09:47:17.203536 read(9, "\3\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32824) = 823
09:47:17.203612 read(9, "", 32001) = 0
09:47:17.203670 close(9) = 0
09:47:17.203756 ioctl(4, BINDER_WRITE_READ
这个是不正常的,这种情况预示这正在等待binder对端的返回。如果是平时的空闲态,则进程的最后打印应该阻塞在如下代码:
00:05:07.682578 ioctl(7, BINDER_WRITE_READ <detached ...>
继续查看qtikeymaster的strace log:
9:47:17.205302 close(6) = 0
09:47:17.205372 madvise(0x765dcb4000, 102400, MADV_DONTNEED) = 0
09:47:17.205447 madvise(0x765dc92000, 102400, MADV_DONTNEED) = 0
09:47:17.205526 openat(AT_FDCWD, "/dev/hab", O_RDWR) = 6
09:47:17.205613 ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0xa, 0x4, 0x10), 0x7fe6247b68) = 0
09:47:17.206340 getuid() = 1000
09:47:17.206412 writev(5, [{iov_base="\0a\1\245s8e\351,J\f", iov_len=11}, {iov_base="\4", iov_len=1}, {iov_base="uhab\0", iov_len=5}, {iov_base="habmm_socket_open: opened fd 6, "..., iov_len=67}], 4) = 84
09:47:17.206709 ioctl(6, _IOC(_IOC_WRITE, 0xa, 0x2, 0x18), 0x7fe6247b70) = 0
09:47:17.206960 ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0xa, 0x3, 0x18)
最后也是卡在ioctl接口的一个read函数。这个函数没有返回值,说明在卡着。由于keystoreservice也是qtikeymaster的一个client,security环境是厂家实现的。卡在了qtikeymaster。
三、确认卡在哪个接口
通过在system/security/keystore/key_store_service.cpp 加LOG,最后确认了,问题APP在联网与不联网情况下初始化时候调用的keystore接口是不一样的:
可联网场景安装:
keystore: del USRPKEY_CMCC_SDK keystore: del USRCERT_CMCC_SDK keystore: del CACERT_CMCC_SDK keystore: del USRPKEY_no506b3822wb keystore: del USRCERT_no506b3822wb keystore: del CACERT_no506b3822wb keystore: attestKey keystore: del USRPKEY_sec506b3822 keystore: del USRCERT_sec506b3822 keystore: del CACERT_sec506b3822
有网络卸载:
keystore: clear_uid
无网络安装:
keystore: del USRPKEY_CMCC_SDK keystore: del USRCERT_CMCC_SDK keystore: del CACERT_CMCC_SDK
无网络卸载:
keystore: clear_uid
最后卡住的接口是keystore: attestKey。问了确认调用函数栈,我们需要命令来在卡住时候打印调用栈:
dump1_system_server.bat
adb shell "debuggerd -b `pidof system_server`" >d:\stracelog\dump_system_server.txt
dump2_keystore.bat
adb shell "debuggerd -b `pidof keystore`" >d:\stracelog\dump_keystore.txt
dump3_keymaster.bat
adb shell "debuggerd -b `pidof android.hardware.keymaster@4.0-service-qti` " >d:\stracelog\dump_keymaster.txt
注意,debuggerd -b 命令是会让当前进程退出block状态继续往下进行的。所以如果锁有嵌套关系,要先dump出来外层的应该用,后dump根层的应用。所以本例中dump的顺序是先system_server,最后是qtikeymaster.
dump的结果是:
----- pid 962 at 2023-10-26 20:47:27 -----
Cmd line: system_server
ABI: 'arm64'
"PackageManager" sysTid=1121
#00 pc 000000000006e1c4 /system/lib64/libc.so (__ioctl+4)
#01 pc 00000000000290d0 /system/lib64/libc.so (ioctl+136)
#02 pc 000000000005a5e4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+256)
#03 pc 000000000005b48c /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+340)
#04 pc 000000000005b1c0 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+176)
#05 pc 0000000000051318 /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
#06 pc 000000000012f72c /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+152)
#07 pc 0000000000406e34 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.BinderProxy.transactNative+196)
#08 pc 0000000000c00d5c /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.BinderProxy.transact+748)
#09 pc 000000000065e4e8 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.security.IKeystoreService$Stub$Proxy.clear_uid+216)
#10 pc 0000000000556578 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.security.KeyStore.clearUid+88)
#11 pc 00000000014bdba4 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.removeKeystoreDataIfNeeded+244)
#12 pc 00000000014bea94 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.removePackageDataLIF+2836)
#13 pc 00000000014900b0 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deleteInstalledPackageLIF+688)
#14 pc 000000000149161c /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deletePackageLIF+3532)
#15 pc 00000000014dd554 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService.deletePackageX+2724)
#16 pc 0000000000d97634 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.pm.PackageManagerService$12.run+1204)
#17 pc 0000000000ab737c /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.Handler.dispatchMessage+76)
#18 pc 0000000000aba4e0 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.Looper.loop+1264)
#19 pc 0000000000ab9380 /system/framework/arm64/boot-framework.oat (offset 0x3d2000) (android.os.HandlerThread.run+544)
#20 pc 00000000007d57b4 /system/framework/oat/arm64/services.odex (offset 0x64d000) (com.android.server.ServiceThread.run+100)
#21 pc 0000000000554d88 /system/lib64/libart.so (art_quick_invoke_stub+584)
#22 pc 00000000000cf6c8 /system/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+200)
#23 pc 000000000045c840 /system/lib64/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+104)
#24 pc 000000000045d8fc /system/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue*)+424)
#25 pc 0000000000488a94 /system/lib64/libart.so (art::Thread::CreateCallback(void*)+1120)
#26 pc 00000000000819b4 /system/lib64/libc.so (__pthread_start(void*)+36)
#27 pc 0000000000023478 /system/lib64/libc.so (__start_thread+68)
----- pid 904 at 2023-10-26 20:47:30 -----
Cmd line: /system/bin/keystore
ABI: 'arm64'
"keystore" sysTid=904
#00 pc 000000000006e1c4 /system/lib64/libc.so (__ioctl+4)
#01 pc 00000000000290d0 /system/lib64/libc.so (ioctl+136)
#02 pc 000000000001cfdc /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool)+240)
#03 pc 000000000001e018 /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::waitForResponse(android::hardware::Parcel*, int*)+60)
#04 pc 000000000001cce4 /system/lib64/libhwbinder.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+132)
#05 pc 00000000000148e8 /system/lib64/android.hardware.keymaster@4.0.so (android::hardware::keymaster::V4_0::BpHwKeymasterDevice::_hidl_attestKey(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::keymaster::V4_0::KeyParameter> const&, std::__1::function<void (android::hardware::keymaster::V4_0::ErrorCode, android::hardware::hidl_vec<android::hardware::hidl_vec<unsigned char>> co #06 pc 00000000000184d8 /system/lib64/android.hardware.keymaster@4.0.so (android::hardware::keymaster::V4_0::BpHwKeymasterDevice::attestKey(android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::keymaster::V4_0::KeyParameter> const&, std::__1::function<void (android::hardware::keymaster::V4_0::ErrorCode, android::hardware::hidl_vec<android::hardware::hidl_vec<unsigned char>> const&)>)+160)
#07 pc 00000000000482c4 /system/bin/keystore (android::hardware::keymaster::V4_0::support::Keymaster4::attestKey(android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::keymaster::V4_0::KeyParameter> const&, std::__1::function<void (android::hardware::keymaster::V4_0::ErrorCode, android::hardware::hidl_vec<android::hardware::hidl_vec<unsigned char>> const&)>)+212)
#08 pc 000000000003d604 /system/bin/keystore (keystore::KeyStoreService::attestKey(android::String16 const&, android::security::keymaster::KeymasterArguments const&, android::security::keymaster::KeymasterCertificateChain*, int*)+832)
#09 pc 0000000000016554 /system/lib64/libkeystore_aidl.so (android::security::BnKeystoreService::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+1432)
#10 pc 000000000004fa60 /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+136)
#11 pc 000000000005ab00 /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+520)
#12 pc 000000000005a83c /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+156)
#13 pc 000000000005af04 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+60)
#14 pc 0000000000049484 /system/bin/keystore (main.cfi+1364)
#15 pc 00000000000acac8 /system/lib64/libc.so (__libc_init+88)
"HwBinder:904_1" sysTid=941
#00 pc 000000000006e1c4 /system/lib64/libc.so (__ioctl+4)
#01 pc 00000000000290d0 /system/lib64/libc.so (ioctl+136)
#02 pc 000000000001cfdc /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::talkWithDriver(bool)+240)
#03 pc 000000000001425c /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::getAndExecuteCommand()+20)
#04 pc 0000000000014588 /system/lib64/libhwbinder.so (android::hardware::IPCThreadState::joinThreadPool(bool)+268)
#05 pc 000000000001c1ac /system/lib64/libhwbinder.so (android::hardware::PoolThread::threadLoop()+24)
#06 pc 000000000000f9d8 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+280)
#07 pc 00000000000819b4 /system/lib64/libc.so (__pthread_start(void*)+36)
#08 pc 0000000000023478 /system/lib64/libc.so (__start_thread+68)
----- end 904 -----
----- pid 354 at 2023-10-26 20:47:33 -----
Cmd line: /vendor/bin/hw/android.hardware.keymaster@4.0-service-qti
ABI: 'arm64'
"keymaster@4.0-s" sysTid=354
#00 pc 000000000006e1c8 /system/lib64/libc.so (__ioctl+8)
#01 pc 00000000000290d0 /system/lib64/libc.so (ioctl+136)
#02 pc 0000000000001260 /vendor/lib64/libuhab.so (habmm_socket_recv+96)
#03 pc 0000000000002150 /vendor/lib64/libkeymasterdeviceutils.so (km_comm_send_cmd+444)
#04 pc 000000000000cd00 /vendor/lib64/libqtikeymaster4.so (keymasterdevice::KeymasterHalDevice::attest_key(keymaster_key_blob_t const*, keymaster_key_param_set_t const*, keymaster_cert_chain_t*)+892)
#05 pc 000000000000919c /vendor/lib64/libqtikeymaster4.so (keymaster::V4_0::Keymaster4Device::attestKey(android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::keymaster::V4_0::KeyParameter> const&, std::__1::function<void (android::hardware::keymaster::V4_0::ErrorCode, android::hardware::hidl_vec<android::hardware::hidl_vec<unsigned char>> const&)>)+132)
#06 pc 000000000001bf18 /system/lib64/vndk-28/android.hardware.keymaster@4.0.so (android::hardware::keymaster::V4_0::BnHwKeymasterDevice::_hidl_attestKey(android::hidl::base::V1_0::BnHwBase*, android::hardware::Parcel const&, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel&)>)+484)
#07 pc 000000000001e820 /system/lib64/vndk-28/android.hardware.keymaster@4.0.so (android::hardware::keymaster::V4_0::BnHwKeymasterDevice::onTransact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+2048)
#08 pc 000000000001d334 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::executeCommand(int)+640)
#09 pc 000000000001430c /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::getAndExecuteCommand()+196)
#10 pc 0000000000014588 /system/lib64/vndk-sp-28/libhwbinder.so (android::hardware::IPCThreadState::joinThreadPool(bool)+268)
#11 pc 0000000000000f64 /vendor/bin/hw/android.hardware.keymaster@4.0-service-qti (main+128)
#12 pc 00000000000acac8 /system/lib64/libc.so (__libc_init+88)
----- end 354 -----
最后就是让qtikeymaster查:
keymasterdevice::KeymasterHalDevice::attest_key
四、QSEE log
QSEE下面,需要把tz log导出来给高通分析。tz log可以在问题复现的时候,在QNX终端敲命令获取:
cat /dev/pdbg/qcore/tzdiag/qseecom
cat /dev/pdbg/qcore/tzdiag/log