内存泄漏系列专题分析之二十:camx swap内存泄漏实例分析
【关注我,后续持续新增专题博文,谢谢!!!】
上一篇我们讲了:内存泄漏系列专题分析之十九: 开机内存占用超标分析一:cameraserver超标1.8M分析
这一篇我们开始讲: 内存泄漏系列专题分析之二十:swap内存泄漏实例分析
目录
一、背景
二、:swap内存泄漏思路
2.1:首先确认camx hal进程内存占用是否正常
2.2 :首先需确认camx hal进程内存占用是否正常?是否泄漏?
2.3 :基于原理来分析swap内存泄漏
2.4 :分析低内存下kernel dump进程内存分布
2.5 :导出Camx HAL进程的内存细化情况
2.6 :此时,12小时之前不增长,后面swap异常增长变得难以理解了
2.7 :继续压测monkey正式版本,无法复现
2.8 :基于测试情况对老化版本的猜测
2.9 :使用老化版本高负载下压测,果然复现
2.10 :基于上面swap原理分析,我们开始复现并分析内存泄漏
2.11 :分析Camx HAL fdnode代码
2.12 :加日志debug
2.13 :解决问题
2.14 :总结
一、背景
之前一个项目出现老化过程中,出现相机等各种app ANR爆了。分析发现camx hal进程swap内存持续增长到4.5G,很明显的内存泄漏。而swap内存并没有用户态分配内存接口,它由kernel内核控制,导致内存泄漏分析思路完全颠覆了,常规直接切入swap内存分配无法进行,此时该如何分析呢?
二、:swap内存泄漏思路
2.1:首先确认camx hal进程内存占用是否正常
1:我们在这份日志里找到 (4,544,364K in swap),即camx hal进程swap内存已经达到4.5G,而此时相机是退出状态,也说明了这个swap内存并没有释放,即使相机关闭状态,也一直驻留在PSS里。
1739472003612@1739472703954@2025_02_14_02_52_53@SystemLog.zip 4,597,384K: vendor.qti.camera.provider@2.7-service_64 (pid 1634) (4,544,364K in swap)
2.2 :首先需确认camx hal进程内存占用是否正常?是否泄漏?
我们下载老化单台机器所有日志进一步确认是否持续增长,结果下载老化几天的日志,dump出camx hal进程内存不同时间段的情况。
可以发现老化压测几天几夜的过程中,单台机器的Camx HAL进程swap内存一直是持续增长过程,无论相机是否打开,swap内存就一直没有降低或释放过,因此我们断定swap内存泄漏了。
单台机器的详细日志:相机camx hal进程内存增长情况: 1739266999645@1739326404235@2025_02_12_10_28_48@SystemLog.zip 198,013K: vendor.qti.camera.provider@2.7-service_64 (pid 1634) ( 145,308K in swap) 1739327328830@1739341135895@2025_02_12_14_41_18@SystemLog.zip 894,815K: vendor.qti.camera.provider@2.7-service_64 (pid 1634) ( 841,472K in swap) 1739341385357@1739357045550@2025_02_12_18_45_45@SystemLog.zip 1,501,088K: vendor.qti.camera.provider@2.7-service_64 (pid 1634) (1,447,940K in swap) 1739368747545@1739369513656@2025_02_12_22_13_35@SystemLog.zip 2,253,697K: vendor.qti.camera.provider@2.7-service_64 (pid 1634) (2,200,436K in swap) 1739391291701@1739392395773@2025_02_13_04_34_54@SystemLog.zip 3,112,163K: vendor.qti.camera.provider@2.7-service_64 (pid 1634) (3,059,024K in swap) 1739472003612@1739472703954@2025_02_14_02_52_53@SystemLog.zip 4,597,384K: vendor.qti.camera.provider@2.7-service_64 (pid 1634) (4,544,364K in swap)
2.3 :基于原理来分析swap内存泄漏
这时需要理解swap内存的定义是什么?是否真的内存泄漏了?swap内存的来源是什么?
swap主要是在内存不够用的时候,将部分内存上的数据交换到swap空间上,以便让系统不会因为内存不够用而导致oom或者更致命的情况出现。当内存使用存在压力的时候,开始触发内存回收行为,就可能会使用swap空间。Swap 把这些不常访问的内存先写到磁盘中,然后释放这些内存,给其他更需要的进程使用。再次访问这些内存时,重新从磁盘读入内存就可以了swap内存主要是压缩不活跃的内存,如果内存泄漏了,swap的来源肯定也是持续增长趋势。通常情况下,swap不会去压缩ION内存,更多是压缩malloc分配的ANON等匿名页,也就是进程通过malloc/mmap使用的内存部分。于是我们需要细化观察进程各个内存变化过程。
2.4 :分析低内存下kernel dump进程内存分布
通常如下:(内存理论知识,后续补充发布)
======= dump_procs
comm 32 uid s f pid ppid oom vss anon file shmem swap
vendor.qti.came 0 1047 S 1634 1 -1000 2276752 56001 27834 181 463869
======= dump_procs_dmabuf_info
comm pid size
vendor.qti.came 1634 249424
2.5 :导出Camx HAL进程的内存细化情况
我们写个脚本导出所有老化过程中这个进程的内存细化情况,如下:
梳理swap的过程数据,发现swap后期恶化了,过程数据可以说明和相机无关。压测14小时相机swap稳定在150~200M左右,没有增长。大概02-12 08点之后长时间(几个小时)低内存状态,但这个低内存不是相机导致的。swap恶化时间段,相机是没运行的,很可能是长时间低内存导致不断恶化。日志详解:
{"HAL_Memory":{"pid":"1634","pss":"106200","swapPss":"272728","total":"378928"}}
表示Camx HAL进程的pid是1634,pss是106M,swap内存272M,合计378M。
kernel_log_20250212_074848.txt:02-12 09:15:23.704673 vendor.qti.came 0 1047 S 1634 1 -1000 1771517 89069 30060 510 79668 //356M 318M
表示AON内存是89069,size是89069 * 4 = 356M,swap内存是79668,size是79668 * 4 = 318M(这里的单位是page,即4K,所以要乘以4)
android_log_20250211_181654.txt:02-11 18:17:58.134540 2419 2704 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"107716","swapPss":"140076","total":"247792"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:13.836740 2419 2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"452426","swapPss":"271648","total":"724074"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:15.554367 2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"111951","swapPss":"271468","total":"383419"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:18.157295 2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106260","swapPss":"274080","total":"380340"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:19.174628 2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"433823","swapPss":"273476","total":"707299"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:19.519812 2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"476763","swapPss":"273424","total":"750187"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:20.854754 2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"105258","swapPss":"273416","total":"378674"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:25.353172 2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106388","swapPss":"273416","total":"379804"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:26.068137 2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"459636","swapPss":"272712","total":"732348"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:26.154126 2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"486884","swapPss":"272712","total":"759596"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:27.256947 2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"493341","swapPss":"273156","total":"766497"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:28.080545 2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"530484","swapPss":"272756","total":"803240"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:28.599953 2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"225294","swapPss":"272752","total":"498046"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:29.195314 2419 2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106200","swapPss":"272728","total":"378928"}}, len: 80kernel_log_20250212_074848.txt:02-12 09:15:23.704673 vendor.qti.came 0 1047 S 1634 1 -1000 1771517 89069 30060 510 79668 //356M 318M
kernel_log_20250212_074848.txt:02-12 09:15:23.719473 vendor.qti.came 1634 534520
kernel_log_20250212_074848.txt:02-12 09:15:23.732906 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:15:23.779898 vendor.qti.came 1634 57436
kernel_log_20250212_074848.txt:02-12 09:22:26.605359 vendor.qti.came 0 1047 S 1634 1 -1000 1787105 89807 32791 486 90479 //350M 361M
kernel_log_20250212_074848.txt:02-12 09:22:26.641372 vendor.qti.came 1634 543048
kernel_log_20250212_074848.txt:02-12 09:22:26.676573 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:22:26.713047 vendor.qti.came 1634 37016
kernel_log_20250212_074848.txt:02-12 09:23:22.603924 vendor.qti.came 0 1047 S 1634 1 -1000 1696732 87060 26704 460 90752
kernel_log_20250212_074848.txt:02-12 09:23:22.607066 vendor.qti.came 1634 374220
kernel_log_20250212_074848.txt:02-12 09:23:22.613789 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:23:22.625893 vendor.qti.came 1634 21600
kernel_log_20250212_074848.txt:02-12 09:27:39.210729 vendor.qti.came 0 1047 S 1634 1 -1000 1785854 88981 31053 486 90044
kernel_log_20250212_074848.txt:02-12 09:27:39.221070 vendor.qti.came 1634 545020
kernel_log_20250212_074848.txt:02-12 09:27:39.238331 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:27:39.251598 vendor.qti.came 1634 57436
kernel_log_20250212_074848.txt:02-12 09:27:49.209565 vendor.qti.came 0 1047 R 1634 1 -1000 1970717 58090 28009 695 135891 //232M 543M
kernel_log_20250212_074848.txt:02-12 09:27:49.219898 vendor.qti.came 1634 1209240
kernel_log_20250212_074848.txt:02-12 09:27:49.231472 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:27:49.235120 vendor.qti.came 1634 57436
kernel_log_20250212_074848.txt:02-12 09:28:46.534259 vendor.qti.came 0 1047 S 1634 1 -1000 1799432 89018 34058 466 90156
kernel_log_20250212_074848.txt:02-12 09:28:46.550716 vendor.qti.came 1634 477708
kernel_log_20250212_074848.txt:02-12 09:28:46.604629 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:28:46.613610 vendor.qti.came 1634 57436
kernel_log_20250212_074848.txt:02-12 09:30:23.664100 vendor.qti.came 0 1047 S 1634 1 -1000 1740557 87743 30190 464 90309
kernel_log_20250212_074848.txt:02-12 09:30:23.668068 vendor.qti.came 1634 463936
kernel_log_20250212_074848.txt:02-12 09:30:23.671577 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:30:23.687504 vendor.qti.came 1634 33752
kernel_log_20250212_074848.txt:02-12 09:30:38.004229 vendor.qti.came 0 1047 S 1634 1 -1000 1776584 88696 30085 469 90435
kernel_log_20250212_074848.txt:02-12 09:30:38.008920 vendor.qti.came 1634 488416
kernel_log_20250212_074848.txt:02-12 09:30:38.012524 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:30:38.014416 vendor.qti.came 1634 10756
kernel_log_20250212_074848.txt:02-12 09:31:30.068130 vendor.qti.came 0 1047 S 1634 1 -1000 1799189 88731 32692 469 90584
kernel_log_20250212_074848.txt:02-12 09:31:30.073191 vendor.qti.came 1634 528200
kernel_log_20250212_074848.txt:02-12 09:31:30.078858 vendor.qti.came 1634 2056
kernel_log_20250212_074848.txt:02-12 09:31:30.081252 vendor.qti.came 1634 47524
kernel_log_20250212_093326.txt:02-12 09:34:15.483537 vendor.qti.came 0 1047 R 1634 1 -1000 1696184 61990 29558 443 90922 //247m, 363M
kernel_log_20250212_093326.txt:02-12 09:34:15.518644 vendor.qti.came 1634 259644
kernel_log_20250212_093326.txt:02-12 09:34:15.569083 vendor.qti.came 1634 2056
kernel_log_20250212_093326.txt:02-12 09:34:15.576392 vendor.qti.came 1634 50804android_log_20250212_093815.txt:02-12 09:38:20.401365 2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"53561","swapPss":"441428","total":"494989"}}, len: 79
android_log_20250212_093815.txt:02-12 09:38:21.495971 2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"419354","swapPss":"404776","total":"824130"}}, len: 80
android_log_20250212_093815.txt:02-12 09:38:22.117979 2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"446997","swapPss":"402052","total":"849049"}}, len: 80
kernel_log_20250212_093326.txt:02-12 09:38:22.903274 5174 5174 I [57697.813535]osvelte: vendor.qti.came 0 1047 S 1634 1 -1000 1795295 89358 29790 526 100309
kernel_log_20250212_093326.txt:02-12 09:38:22.924792 5174 5174 I [57697.835053]osvelte: vendor.qti.came 1634 558480
kernel_log_20250212_093326.txt:02-12 09:38:22.989238 5174 5174 I [57697.899499]osvelte: vendor.qti.came 1634 2056
kernel_log_20250212_093326.txt:02-12 09:38:23.024083 5174 5174 I [57697.934344]osvelte: vendor.qti.came 1634 57436
android_log_20250212_093815.txt:02-12 09:38:23.974748 2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"436127","swapPss":"400380","total":"836507"}}, len: 80
android_log_20250212_093815.txt:02-12 09:38:25.048338 2419 2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"131587","swapPss":"399708","total":"531295"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:45.990464 2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"108438","swapPss":"524372","total":"632810"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:46.877925 2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"413473","swapPss":"522412","total":"935885"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:47.067209 2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"423721","swapPss":"522400","total":"946121"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:48.851122 2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"453347","swapPss":"522660","total":"976007"}}, len: 80
2.6 :此时,12小时之前不增长,后面swap异常增长变得难以理解了
于是测试和正面分析同步进行。经过压测大量测试场景,均未发现相机进程内存泄漏。一定程度排除内存泄漏的可能,而swap的写入写出不是用户态控制,由kernel内核态控制。分析内核日志后也无特别异常。
2.7 :继续压测monkey正式版本,无法复现
压测monkey正式版本发现hal进程crash,测试完,相机退出,hal进程内存也非常正常,swap内存也只有几百M。
2.8 :基于测试情况对老化版本的猜测
开始怀疑老化版本在高负载情况下才出现,正式版本一般情况下不会出现。
2.9 :使用老化版本高负载下压测,果然复现
于是使用老化版本高负载下压测并dump内存果然出现heap增长,但不是每次都增长,而是概率性增长,而PSS增长到2G左右,swap内存开始增长。正式版本却很难复现。
2.10 :基于上面swap原理分析,我们开始复现并分析内存泄漏
开启malloc debug进行复测,发现FDnode代码中VendorFDDeinit没有执行cv_face_destroy_tracker导致出现泄漏。其中malloc debug分析方式参考:内存泄漏系列专题分析之五:使用malloc_debug定位C/C++ native heap内存泄露
VOID FDManagerNode::VendorFDLoad(senseTime_lib_struct *p_lib) {int rc = 0;struct timeval mStartTime, mEndTime;gettimeofday(&mStartTime, NULL);……………………………………ret = g_senseTime_lib.cv_face_create_tracker(&m_SenseTimeEngine.cv_tracker_handle, NULL, flag);if ((ret != CV_OK) || (m_SenseTimeEngine.cv_tracker_handle == NULL)){CAMX_LOG_ERROR(CamxLogGroupFD, "WIDE: cv_face_create_tracker error: %d", ret);return ret;} } VOID FDManagerNode::VendorFDDeinit() {struct timeval mStartTime, mEndTime;gettimeofday(&mStartTime, NULL);if (m_SenseTimeEngine.cv_tracker_handle){g_senseTime_lib.cv_face_destroy_tracker(m_SenseTimeEngine.cv_tracker_handle);m_SenseTimeEngine.cv_tracker_handle = NULL;}CamxAtomicDecU(&m_engineCounter);VendorFDModelDeinit();if (m_attrFeature != NULL){free(m_attrFeature);m_attrFeature = NULL;}if (m_attrEmotion != NULL){free(m_attrEmotion);m_attrEmotion = NULL;}gettimeofday(&mEndTime, NULL);CAMX_LOG_INFO(CamxLogGroupFD, "FD deinit spends %lld ms (pending engine: %d)",((mEndTime.tv_sec - mStartTime.tv_sec) * 1000 + (mEndTime.tv_usec - mStartTime.tv_usec)/1000), m_engineCounter); }
.
2.11 :分析Camx HAL fdnode代码
经过代码分析,FDnode申请和释放是对应关系,并没有特别异常。
申请内存:在VendorFDLoad里调用cv_face_create_tracker
释放内存:在VendorFDDeinit里调用cv_face_destroy_tracker
从代码上看,并没有内存泄漏。
.
2.12 :加日志debug
于是加日志debug,发现FDnode是创建多线程运行,且多个锁操作和原子操作。而在性能极差的情况下,线程同步会出现问题,导致内存handler在构造函数创建线程时已经置空,导致另外一个线程在析构函数未释放。代码如下:
void FDManagerNode::OPLUSFDManagerNode() {……//多线程这里全局变量置NULL了m_SenseTimeEngine.cv_tracker_handle = NULL;…… }VOID FDManagerNode::VendorFDDeinit() {//导致这里无法进入是否内存if (m_SenseTimeEngine.cv_tracker_handle){g_senseTime_lib.cv_face_destroy_tracker(m_SenseTimeEngine.cv_tracker_handle);m_SenseTimeEngine.cv_tracker_handle = NULL;} }
2.13 :解决问题
解决线程同步异常问题就解决该问题。
2.14 :总结
终于有了最终结论,性能极差下,非常低概率下,线程同步异常导致内存泄漏。
【关注我,后续持续新增专题博文,谢谢!!!】
下一篇讲解: