当前位置: 首页 > news >正文

内存泄漏系列专题分析之二十:camx swap内存泄漏实例分析

【关注我,后续持续新增专题博文,谢谢!!!】

上一篇我们讲了内存泄漏系列专题分析之十九: 开机内存占用超标分析一:cameraserver超标1.8M分析

        这一篇我们开始讲 内存泄漏系列专题分析之二十:swap内存泄漏实例分析

目录

一、背景

二、:swap内存泄漏思路

    2.1:首先确认camx hal进程内存占用是否正常

    2.2 :首先需确认camx hal进程内存占用是否正常?是否泄漏?

    2.3 :基于原理来分析swap内存泄漏

    2.4 :分析低内存下kernel dump进程内存分布

    2.5 :导出Camx HAL进程的内存细化情况

    2.6 :此时,12小时之前不增长,后面swap异常增长变得难以理解了

    2.7 :继续压测monkey正式版本,无法复现

    2.8 :基于测试情况对老化版本的猜测

    2.9 :使用老化版本高负载下压测,果然复现

    2.10 :基于上面swap原理分析,我们开始复现并分析内存泄漏

    2.11 :分析Camx HAL fdnode代码

    2.12 :加日志debug

    2.13 :解决问题

    2.14 :总结


一、背景

        之前一个项目出现老化过程中,出现相机等各种app ANR爆了。分析发现camx hal进程swap内存持续增长到4.5G,很明显的内存泄漏。而swap内存并没有用户态分配内存接口,它由kernel内核控制,导致内存泄漏分析思路完全颠覆了,常规直接切入swap内存分配无法进行,此时该如何分析呢?

二、:swap内存泄漏思路

    2.1:首先确认camx hal进程内存占用是否正常

1:我们在这份日志里找到 (4,544,364K in swap),即camx hal进程swap内存已经达到4.5G,而此时相机是退出状态,也说明了这个swap内存并没有释放,即使相机关闭状态,也一直驻留在PSS里。

1739472003612@1739472703954@2025_02_14_02_52_53@SystemLog.zip    4,597,384K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (4,544,364K in swap)

    2.2 :首先需确认camx hal进程内存占用是否正常?是否泄漏?

        我们下载老化单台机器所有日志进一步确认是否持续增长,结果下载老化几天的日志,dump出camx hal进程内存不同时间段的情况。

        可以发现老化压测几天几夜的过程中,单台机器的Camx HAL进程swap内存一直是持续增长过程,无论相机是否打开,swap内存就一直没有降低或释放过,因此我们断定swap内存泄漏了。

单台机器的详细日志:相机camx hal进程内存增长情况:
1739266999645@1739326404235@2025_02_12_10_28_48@SystemLog.zip    198,013K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (  145,308K in swap)
1739327328830@1739341135895@2025_02_12_14_41_18@SystemLog.zip    894,815K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (  841,472K in swap)
1739341385357@1739357045550@2025_02_12_18_45_45@SystemLog.zip    1,501,088K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (1,447,940K in swap)
1739368747545@1739369513656@2025_02_12_22_13_35@SystemLog.zip    2,253,697K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (2,200,436K in swap)
1739391291701@1739392395773@2025_02_13_04_34_54@SystemLog.zip    3,112,163K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (3,059,024K in swap)
1739472003612@1739472703954@2025_02_14_02_52_53@SystemLog.zip    4,597,384K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (4,544,364K in swap)

    2.3 :基于原理来分析swap内存泄漏

        这时需要理解swap内存的定义是什么?是否真的内存泄漏了?swap内存的来源是什么?
        swap主要是在内存不够用的时候,将部分内存上的数据交换到swap空间上,以便让系统不会因为内存不够用而导致oom或者更致命的情况出现。当内存使用存在压力的时候,开始触发内存回收行为,就可能会使用swap空间。Swap 把这些不常访问的内存先写到磁盘中,然后释放这些内存,给其他更需要的进程使用。再次访问这些内存时,重新从磁盘读入内存就可以了

        swap内存主要是压缩不活跃的内存,如果内存泄漏了,swap的来源肯定也是持续增长趋势。通常情况下,swap不会去压缩ION内存,更多是压缩malloc分配的ANON等匿名页,也就是进程通过malloc/mmap使用的内存部分。于是我们需要细化观察进程各个内存变化过程。

    2.4 :分析低内存下kernel dump进程内存分布

通常如下:(内存理论知识,后续补充发布)
======= dump_procs
comm             32   uid s f   pid  ppid   oom       vss    anon    file   shmem    swap
vendor.qti.came   0  1047 S    1634     1 -1000   2276752   56001   27834     181  463869
======= dump_procs_dmabuf_info
comm             pid   size
vendor.qti.came   1634 249424

    2.5 :导出Camx HAL进程的内存细化情况

        我们写个脚本导出所有老化过程中这个进程的内存细化情况,如下:
        梳理swap的过程数据,发现swap后期恶化了,过程数据可以说明和相机无关。压测14小时相机swap稳定在150~200M左右,没有增长。大概02-12 08点之后长时间(几个小时)低内存状态,但这个低内存不是相机导致的。swap恶化时间段,相机是没运行的,很可能是长时间低内存导致不断恶化。

        日志详解:

{"HAL_Memory":{"pid":"1634","pss":"106200","swapPss":"272728","total":"378928"}}

表示Camx HAL进程的pid是1634,pss是106M,swap内存272M,合计378M。

kernel_log_20250212_074848.txt:02-12 09:15:23.704673    vendor.qti.came   0  1047 S    1634     1 -1000   1771517   89069   30060     510   79668   //356M 318M

表示AON内存是89069,size是89069 * 4 = 356M,swap内存是79668,size是79668 * 4 = 318M(这里的单位是page,即4K,所以要乘以4)

android_log_20250211_181654.txt:02-11 18:17:58.134540  2419  2704 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"107716","swapPss":"140076","total":"247792"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:13.836740  2419  2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"452426","swapPss":"271648","total":"724074"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:15.554367  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"111951","swapPss":"271468","total":"383419"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:18.157295  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106260","swapPss":"274080","total":"380340"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:19.174628  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"433823","swapPss":"273476","total":"707299"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:19.519812  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"476763","swapPss":"273424","total":"750187"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:20.854754  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"105258","swapPss":"273416","total":"378674"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:25.353172  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106388","swapPss":"273416","total":"379804"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:26.068137  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"459636","swapPss":"272712","total":"732348"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:26.154126  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"486884","swapPss":"272712","total":"759596"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:27.256947  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"493341","swapPss":"273156","total":"766497"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:28.080545  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"530484","swapPss":"272756","total":"803240"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:28.599953  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"225294","swapPss":"272752","total":"498046"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:29.195314  2419  2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106200","swapPss":"272728","total":"378928"}}, len: 80kernel_log_20250212_074848.txt:02-12 09:15:23.704673    vendor.qti.came   0  1047 S    1634     1 -1000   1771517   89069   30060     510   79668   //356M 318M
kernel_log_20250212_074848.txt:02-12 09:15:23.719473    vendor.qti.came   1634 534520
kernel_log_20250212_074848.txt:02-12 09:15:23.732906    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:15:23.779898    vendor.qti.came  1634  57436
kernel_log_20250212_074848.txt:02-12 09:22:26.605359    vendor.qti.came   0  1047 S    1634     1 -1000   1787105   89807   32791     486   90479	//350M 361M
kernel_log_20250212_074848.txt:02-12 09:22:26.641372    vendor.qti.came   1634 543048
kernel_log_20250212_074848.txt:02-12 09:22:26.676573    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:22:26.713047    vendor.qti.came  1634  37016
kernel_log_20250212_074848.txt:02-12 09:23:22.603924    vendor.qti.came   0  1047 S    1634     1 -1000   1696732   87060   26704     460   90752
kernel_log_20250212_074848.txt:02-12 09:23:22.607066    vendor.qti.came   1634 374220
kernel_log_20250212_074848.txt:02-12 09:23:22.613789    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:23:22.625893    vendor.qti.came  1634  21600
kernel_log_20250212_074848.txt:02-12 09:27:39.210729    vendor.qti.came   0  1047 S    1634     1 -1000   1785854   88981   31053     486   90044
kernel_log_20250212_074848.txt:02-12 09:27:39.221070    vendor.qti.came   1634 545020
kernel_log_20250212_074848.txt:02-12 09:27:39.238331    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:27:39.251598    vendor.qti.came  1634  57436
kernel_log_20250212_074848.txt:02-12 09:27:49.209565    vendor.qti.came   0  1047 R    1634     1 -1000   1970717   58090   28009     695  135891   //232M 543M
kernel_log_20250212_074848.txt:02-12 09:27:49.219898    vendor.qti.came   1634 1209240
kernel_log_20250212_074848.txt:02-12 09:27:49.231472    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:27:49.235120    vendor.qti.came  1634  57436
kernel_log_20250212_074848.txt:02-12 09:28:46.534259    vendor.qti.came   0  1047 S    1634     1 -1000   1799432   89018   34058     466   90156
kernel_log_20250212_074848.txt:02-12 09:28:46.550716    vendor.qti.came   1634 477708
kernel_log_20250212_074848.txt:02-12 09:28:46.604629    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:28:46.613610    vendor.qti.came  1634  57436
kernel_log_20250212_074848.txt:02-12 09:30:23.664100    vendor.qti.came   0  1047 S    1634     1 -1000   1740557   87743   30190     464   90309
kernel_log_20250212_074848.txt:02-12 09:30:23.668068    vendor.qti.came   1634 463936
kernel_log_20250212_074848.txt:02-12 09:30:23.671577    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:30:23.687504    vendor.qti.came  1634  33752
kernel_log_20250212_074848.txt:02-12 09:30:38.004229    vendor.qti.came   0  1047 S    1634     1 -1000   1776584   88696   30085     469   90435
kernel_log_20250212_074848.txt:02-12 09:30:38.008920    vendor.qti.came   1634 488416
kernel_log_20250212_074848.txt:02-12 09:30:38.012524    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:30:38.014416    vendor.qti.came  1634  10756
kernel_log_20250212_074848.txt:02-12 09:31:30.068130    vendor.qti.came   0  1047 S    1634     1 -1000   1799189   88731   32692     469   90584
kernel_log_20250212_074848.txt:02-12 09:31:30.073191    vendor.qti.came   1634 528200
kernel_log_20250212_074848.txt:02-12 09:31:30.078858    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:31:30.081252    vendor.qti.came  1634  47524
kernel_log_20250212_093326.txt:02-12 09:34:15.483537    vendor.qti.came   0  1047 R    1634     1 -1000   1696184   61990   29558     443   90922  //247m, 363M
kernel_log_20250212_093326.txt:02-12 09:34:15.518644    vendor.qti.came   1634 259644
kernel_log_20250212_093326.txt:02-12 09:34:15.569083    vendor.qti.came   1634 2056
kernel_log_20250212_093326.txt:02-12 09:34:15.576392    vendor.qti.came  1634  50804android_log_20250212_093815.txt:02-12 09:38:20.401365  2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"53561","swapPss":"441428","total":"494989"}}, len: 79
android_log_20250212_093815.txt:02-12 09:38:21.495971  2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"419354","swapPss":"404776","total":"824130"}}, len: 80
android_log_20250212_093815.txt:02-12 09:38:22.117979  2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"446997","swapPss":"402052","total":"849049"}}, len: 80
kernel_log_20250212_093326.txt:02-12 09:38:22.903274  5174  5174 I [57697.813535]osvelte: vendor.qti.came   0  1047 S    1634     1 -1000   1795295   89358   29790     526  100309
kernel_log_20250212_093326.txt:02-12 09:38:22.924792  5174  5174 I [57697.835053]osvelte: vendor.qti.came   1634 558480
kernel_log_20250212_093326.txt:02-12 09:38:22.989238  5174  5174 I [57697.899499]osvelte: vendor.qti.came   1634 2056
kernel_log_20250212_093326.txt:02-12 09:38:23.024083  5174  5174 I [57697.934344]osvelte: vendor.qti.came  1634  57436
android_log_20250212_093815.txt:02-12 09:38:23.974748  2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"436127","swapPss":"400380","total":"836507"}}, len: 80
android_log_20250212_093815.txt:02-12 09:38:25.048338  2419  2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"131587","swapPss":"399708","total":"531295"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:45.990464  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"108438","swapPss":"524372","total":"632810"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:46.877925  2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"413473","swapPss":"522412","total":"935885"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:47.067209  2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"423721","swapPss":"522400","total":"946121"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:48.851122  2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"453347","swapPss":"522660","total":"976007"}}, len: 80

    2.6 :此时,12小时之前不增长,后面swap异常增长变得难以理解了

        于是测试和正面分析同步进行。经过压测大量测试场景,均未发现相机进程内存泄漏。一定程度排除内存泄漏的可能,而swap的写入写出不是用户态控制,由kernel内核态控制。分析内核日志后也无特别异常。

    2.7 :继续压测monkey正式版本,无法复现

压测monkey正式版本发现hal进程crash,测试完,相机退出,hal进程内存也非常正常,swap内存也只有几百M。

    2.8 :基于测试情况对老化版本的猜测

开始怀疑老化版本在高负载情况下才出现,正式版本一般情况下不会出现。

    2.9 :使用老化版本高负载下压测,果然复现

于是使用老化版本高负载下压测并dump内存果然出现heap增长,但不是每次都增长,而是概率性增长,而PSS增长到2G左右,swap内存开始增长。正式版本却很难复现。

    2.10 :基于上面swap原理分析,我们开始复现并分析内存泄漏

开启malloc debug进行复测,发现FDnode代码中VendorFDDeinit没有执行cv_face_destroy_tracker导致出现泄漏。其中malloc debug分析方式参考:内存泄漏系列专题分析之五:使用malloc_debug定位C/C++ native heap内存泄露

VOID FDManagerNode::VendorFDLoad(senseTime_lib_struct *p_lib)
{int rc = 0;struct timeval mStartTime, mEndTime;gettimeofday(&mStartTime, NULL);……………………………………ret = g_senseTime_lib.cv_face_create_tracker(&m_SenseTimeEngine.cv_tracker_handle, NULL, flag);if ((ret != CV_OK) || (m_SenseTimeEngine.cv_tracker_handle == NULL)){CAMX_LOG_ERROR(CamxLogGroupFD, "WIDE: cv_face_create_tracker error: %d", ret);return ret;}
}
VOID FDManagerNode::VendorFDDeinit()
{struct timeval mStartTime, mEndTime;gettimeofday(&mStartTime, NULL);if (m_SenseTimeEngine.cv_tracker_handle){g_senseTime_lib.cv_face_destroy_tracker(m_SenseTimeEngine.cv_tracker_handle);m_SenseTimeEngine.cv_tracker_handle = NULL;}CamxAtomicDecU(&m_engineCounter);VendorFDModelDeinit();if (m_attrFeature != NULL){free(m_attrFeature);m_attrFeature = NULL;}if (m_attrEmotion != NULL){free(m_attrEmotion);m_attrEmotion = NULL;}gettimeofday(&mEndTime, NULL);CAMX_LOG_INFO(CamxLogGroupFD, "FD deinit spends %lld ms (pending engine: %d)",((mEndTime.tv_sec - mStartTime.tv_sec) * 1000 + (mEndTime.tv_usec - mStartTime.tv_usec)/1000), m_engineCounter);
}

  .

    2.11 :分析Camx HAL fdnode代码

经过代码分析,FDnode申请和释放是对应关系,并没有特别异常。

申请内存:在VendorFDLoad里调用cv_face_create_tracker

释放内存:在VendorFDDeinit里调用cv_face_destroy_tracker

从代码上看,并没有内存泄漏。

  .

    2.12 :加日志debug

        于是加日志debug,发现FDnode是创建多线程运行,且多个锁操作和原子操作。而在性能极差的情况下,线程同步会出现问题,导致内存handler在构造函数创建线程时已经置空,导致另外一个线程在析构函数未释放。代码如下:

void FDManagerNode::OPLUSFDManagerNode()
{……//多线程这里全局变量置NULL了m_SenseTimeEngine.cv_tracker_handle = NULL;……
}VOID FDManagerNode::VendorFDDeinit()
{//导致这里无法进入是否内存if (m_SenseTimeEngine.cv_tracker_handle){g_senseTime_lib.cv_face_destroy_tracker(m_SenseTimeEngine.cv_tracker_handle);m_SenseTimeEngine.cv_tracker_handle = NULL;}
}

    2.13 :解决问题

解决线程同步异常问题就解决该问题。

    2.14 :总结

终于有了最终结论,性能极差下,非常低概率下,线程同步异常导致内存泄漏。

【关注我,后续持续新增专题博文,谢谢!!!】

下一篇讲解

http://www.xdnf.cn/news/978517.html

相关文章:

  • Babylon.js引擎(二)
  • 【Chipyard】 conda 环境安装与使用
  • k8s在节点上加污点
  • k8s 部署服务常见错误原因
  • Windows 安装 Maven
  • 1Panel 部署 OpenResty + Redis 实现 IP 动态封禁教程
  • 软考 系统架构设计师系列知识点之杂项集萃(87)
  • Visual Studio 2022 运行提示:未在本地计算机上注册“Microsoft.Jet.OLEDB.4.0”提供程序。
  • jsoncpp ubuntu编译问题
  • Proof of Talk专访CertiK联创顾荣辉:全周期安全方案护航Web3生态
  • Cilium动手实验室: 精通之旅---22.Cilium Traffic Optimization
  • OA协同平台有哪些功能?OA协同办公软件平台如何选择?
  • 腾讯开源 ovCompose 跨平台框架:实现一次跨三端(Android/iOS/鸿蒙)
  • 网络请求与本地存储:Axios 与 AsyncStorage 在 React Native 中的应用
  • 升级 Ubuntu Linux 内核的几种不同方法
  • 同步与异步:软件工程中的时空艺术与实践智慧-以蜻蜓hr人才系统举例-优雅草卓伊凡
  • 二刷苍穹外卖 day02
  • 2023蓝桥杯C/C++ B组国赛
  • PyTorch:让深度学习飞入寻常百姓家(从零开始玩转张量与神经网络!)
  • 开疆智能ModbusTCP转Canopen网关连接汇川PLC配置案例
  • 【android bluetooth 框架分析 04】【bt-framework 层详解 4】【AdapterState介绍】
  • 25-Oracle 23ai DBMS_SEARCH — Ubiquitous Search(无处不在的搜索)
  • Qt Connections详解:信号与槽的核心机制
  • spring boot2 +java-jwt轻量实现jwt
  • AI中间件,构建大模型应用的标准化接入枢纽
  • 文献管理软件EndNote下载与安装教程(详细教程)2025最新版详细图文安装教程
  • 2025年- H80-Lc188--198.打家劫舍(动态规划)--Java版
  • 前端基础知识ES6系列 - 03(数组新增了哪些扩展)
  • SCADA|信创KingSCADA4.0与KingSCADA3.8的几点不同
  • 基于微信小程序的天气预报app