当前位置：首页 > news >正文

【CPP】死锁产生、排查、避免

news 2025/7/3 9:21:33

一、死锁产生

死锁是指两个或多个线程互相等待对方释放资源，导致程序无法继续执行的现象。在多线程编程中，死锁是一种常见且严重的并发问题。死锁产生必须要四个条件同时满足才会发生：

互斥条件：某些资源只能由一个线程占用。
占有且等待：线程已经占有至少一个资源，同时等待其他资源。
不可剥夺：资源不能被强制剥夺，只能由持有线程主动释放。
环路等待：多个线程形成环形等待链，导致死锁。

二、死锁排查

假如我们有如下代码需要排查死锁。

#include <iostream>
#include <thread>
#include <mutex>std::mutex mtx1;
std::mutex mtx2;void thread_func1() {std::lock_guard<std::mutex> lock1(mtx1); // 锁定 mtx1std::this_thread::sleep_for(std::chrono::milliseconds(100)); // 模拟工作std::lock_guard<std::mutex> lock2(mtx2); // 尝试锁定 mtx2
}void thread_func2() {std::lock_guard<std::mutex> lock1(mtx2); // 锁定 mtx2std::this_thread::sleep_for(std::chrono::milliseconds(100)); // 模拟工作std::lock_guard<std::mutex> lock2(mtx1); // 尝试锁定 mtx1
}int main() {std::thread t1(thread_func1);std::thread t2(thread_func2);t1.join();t2.join();return 0;
}

由于主线程的存在，在后续的表述中t1 对应线程2，t2 对应线程3

1）Linux命令

通过Linux命令初步排查判断死锁的发生。

1. ps

查看进程状态：ps -T -p <pid> -o pid,tid,user,stat,cmd

-T：显示指定进程的所有线程
-p <pid>：指定进程的 PID
-o：自定义输出的列，后面可以指定字段列表，用逗号分隔。

Ubuntu ❯❯❯ ps -T -p 2311810 -o pid,tid,user,stat,cmdPID     TID USER     STAT CMD
2311810 2311810 Raizero+ Sl+  ./deadlock
2311810 2311811 Raizero+ Sl+  ./deadlock
2311810 2311812 Raizero+ Sl+  ./deadlock

线程状态（STAT）：所有的线程都处于 S（Sleeping）状态。S 表示线程在等待某些操作（如 I/O、资源等），并且处于可中断的睡眠状态。此时线程可能处于的正常状态，而不是死锁的直接标志。死锁线程可能处于 S 或 D 状态，需结合工具进一步分析。

2. top

实时监控线程和进程：top -H -p <pid>

-H：显示线程信息（Thread）
-p：指定进程的 PID（Process ID）
<pid>：可能发生死锁的进程pid

 进程号 USER      PR  NI    VIRT    RES    SHR    %CPU  %MEM     TIME+ COMMAND                                   
2292389 Raizero+  20   0   87992   2880   2880 S   0.0   0.0   0:00.00 deadlock                                  
2292390 Raizero+  20   0   87992   2880   2880 S   0.0   0.0   0:00.00 deadlock                                  
2292391 Raizero+  20   0   87992   2880   2880 S   0.0   0.0   0:00.00 deadlock

所有线程的 %CPU 和 %MEM 都是 0.0，说明它们没有占用 CPU 资源，也没有执行任何工作。
线程正在处于某种等待状态，而没有消耗 CPU 时间。死锁中，线程通常会持续等待并且不会执行任何任务，可能会表现出这种行为。

2）工具

1. gdb

使用 gdb 附加到正在运行的进程：gdb -p <pid>

-p：表示附加到正在运行的进程
<pid>：目标进程的进程 ID

Ubuntu ❯❯❯ gdb -p 2301629
...
__futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=2301630, futex_word=0x7c29d5a00910) at ./nptl/futex-internal.c:57
...

显示了一个关于 futex 的调用，futex 是一种用于线程同步的机制，通常与死锁或等待条件相关。这里的错误表明某个线程在等待某个 futex 锁，但 gdb 不能找到相关的源文件，因此它无法提供更多的调试信息。需要进一步调试：

查看所有线程的堆栈：info threads

(gdb) info threads
···
Thread 0x7c29d62ac740 (LWP 2301629) "deadlock" 
__futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=2301630, futex_word=0x7c29d5a00910) 
at ./nptl/futex-internal.c:57
···

这表示线程 1 正在调用 futex 等待操作，可能是等待某个资源或锁的释放。

...
Thread 0x7c29d5a00640 (LWP 2301630) "deadlock" 
futex_wait (private=0, expected=2, futex_word=0x63974eebd1a0 <mtx2>) 
at ../sysdeps/nptl/futex-internal.h:146
...

线程 2 正在等待 mtx2 锁。

···
Thread 0x7c29d5000640 (LWP 2301631) "deadlock" 
futex_wait (private=0, expected=2, futex_word=0x63974eebd160 <mtx1>) 
at ../sysdeps/nptl/futex-internal.h:146
···

线程 3 正在等待 mtx1 锁。

从这些信息可以推断出死锁的可能性。线程 2 正在等待 mtx2 锁，而线程 3 正在等待 mtx1 锁。考虑到这两个锁的互相依赖性，很可能发生了一个循环依赖（死锁）：

线程 1 可能持有 mtx1 或 mtx2 锁，并且等待另一个锁。
线程 2 等待 mtx2，而线程 3 等待 mtx1，它们互相等待对方释放锁，导致无法继续执行。

检查线程堆栈：thread <thread> bt

(gdb) thread 1 bt
[Switching to thread 1 (Thread 0x7c29d62ac740 (LWP 2301629))]
#0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=2301630, futex_word=0x7c29d5a00910) at ./nptl/futex-internal.c:57
57      in ./nptl/futex-internal.c

(gdb) thread 2 bt
[Switching to thread 2 (Thread 0x7c29d5a00640 (LWP 2301630))]
#0  futex_wait (private=0, expected=2, futex_word=0x63974eebd1a0 <mtx2>) at ../sysdeps/nptl/futex-internal.h:146

(gdb) thread 3 bt
[Switching to thread 3 (Thread 0x7c29d5000640 (LWP 2301631))]
#0  futex_wait (private=0, expected=2, futex_word=0x63974eebd160 <mtx1>) at ../sysdeps/nptl/futex-internal.h:146

线程 1 正在执行 __futex_abstimed_wait_common64，这表示它在等待某个资源或锁释放。
线程 2 和 线程 3 都在执行 futex_wait，分别等待 mtx2 和 mtx1 锁的释放。

这些信息表明，线程 2 和线程 3 正在等待彼此持有的锁，这可能是死锁的典型表现。死锁的发生通常是由于线程形成了循环等待的依赖关系：

线程 2 等待 mtx2 锁，而 mtx2 锁由线程 3 持有。
线程 3 等待 mtx1 锁，而 mtx1 锁由线程 2 持有。

2. valgrind

使用其线程分析工具 Helgrind 检测死锁：valgrind --tool=helgrind <process_name>

Ubuntu ❯❯❯ valgrind --tool=helgrind ./a.out

锁的首次获取路径
- mtx1 (0x10E160) 首次被线程2获取：
```
==2305017==  Lock at 0x10E160 was first observed
==2305017==    by 0x109365: thread_func1() (deadlock.cc:9)
```
  对应 thread_func1() 中第9行的 std::lock_guard<std::mutex> 锁定 mtx1。
- mtx2 (0x10E1A0) 首次被线程3获取：
```
==2305017==  Lock at 0x10E1A0 was first observed
==2305017==    by 0x10946C: thread_func2() (deadlock.cc:17)
```
  对应 thread_func2() 中第17行的 std::lock_guard<std::mutex> 锁定 mtx2。
线程终止时的锁持有状态
- 线程2（持有 mtx1）尝试获取 mtx2 时被阻塞：
```
==2305017== Thread #2: Exiting thread still holds 1 lock
==2305017==    by 0x1093BA: thread_func1() (deadlock.cc:12)
```
  对应 thread_func1() 中第12行尝试获取 mtx2 时卡住（例如 std::lock_guard<std::mutex> lock2(mtx2)）。
- 线程3（持有 mtx2）尝试获取 mtx1 时被阻塞：
```
==2305017== Thread #3: Exiting thread still holds 1 lock
==2305017==    by 0x1094C1: thread_func2() (deadlock.cc:20)
```
  对应 thread_func2() 中第20行尝试获取 mtx1 时卡住（例如 std::lock_guard<std::mutex> lock1(mtx1)）。

根据以上信息说明两个线程在终止前未能释放已获得的锁，导致其他线程无法获取这些锁。Thread #2（假设对应代码中的线程1）在 thread_func1() 中先锁定 mtx1，然后尝试获取 mtx2。Thread #3（假设对应代码中的线程2）在 thread_func2() 中先锁定 mtx2，然后尝试获取 mtx1。这种交叉锁定形成了循环等待，是死锁的典型条件。

3. 日志

在代码中加入日志记录，标注线程锁定和解锁的时间点、资源ID等信息，以分析死锁发生的位置。我们对于死锁代码加入如下日志：

#include <iostream>
#include <thread>
#include <mutex>std::mutex mtx1;
std::mutex mtx2;void thread_func1() {std::cout << "Thread 1: Trying to lock mtx1\n";std::lock_guard<std::mutex> lock1(mtx1); // 锁定 mtx1std::cout << "Thread 1: Acquired mtx1\n";std::this_thread::sleep_for(std::chrono::milliseconds(100)); // 模拟工作std::cout << "Thread 1: Trying to lock mtx2\n";std::lock_guard<std::mutex> lock2(mtx2); // 尝试锁定 mtx2std::cout << "Thread 1: Acquired mtx2 and completed\n";
}void thread_func2() {std::cout << "Thread 2: Trying to lock mtx2\n";std::lock_guard<std::mutex> lock1(mtx2); // 锁定 mtx2std::cout << "Thread 2: Acquired mtx2\n";std::this_thread::sleep_for(std::chrono::milliseconds(100)); // 模拟工作std::cout << "Thread 2: Trying to lock mtx1\n";std::lock_guard<std::mutex> lock2(mtx1); // 尝试锁定 mtx1std::cout << "Thread 2: Acquired mtx1 and completed\n";
}int main() {std::thread t1(thread_func1);std::thread t2(thread_func2);t1.join();t2.join();return 0;
}

线程 1 停止在尝试锁定 mtx2：此时 mtx1 被线程 1 持有，而 mtx2 被线程 2 持有，两个线程互相等待解锁。

Thread 2: Trying to lock mtx2
Thread 2: Acquired mtx2
Thread 2: Trying to lock mtx1
Thread 1: Trying to lock mtx1
Thread 1: Acquired mtx1
Thread 1: Trying to lock mtx2

线程 2 停止在尝试锁定 mtx1：情况相同，线程 1 和线程 2 分别持有一个锁，互相等待。

Thread 1: Trying to lock mtx1
Thread 1: Acquired mtx1
Thread 1: Trying to lock mtx2
Thread 2: Trying to lock mtx2
Thread 2: Acquired mtx2
Thread 2: Trying to lock mtx1

通过分析日志，如果发现某些线程的锁定日志输出到一半停止，后续锁未能成功获取且程序未继续运行，则可以推测程序可能发生了死锁。

三、死锁预防与解决

实际上来讲死锁几乎无法避免，我们只能说尽量避免并在死锁发生时利用各种手段解决。

1. 使用C++ RAII，避免忘记解锁造成的死锁

利用RAII（Resource Acquisition Is Initialization）机制，通过对象的构造和析构自动管理锁的生命周期，确保异常安全。

std::lock_guard

std::mutex mtx;
{std::lock_guard<std::mutex> lock(mtx); // 自动加锁// 临界区操作
} // 离开作用域自动解锁

std::unique_lock

std::mutex mtx;
try {std::unique_lock<std::mutex> lock(mtx);// 可能抛出异常的操作
} catch (...) {// 锁会在栈展开时自动释放
}

优势：

避免因忘记调用unlock()导致的死锁。
异常安全：即使临界区代码抛出异常，析构时仍会释放锁。

2. 使用c++ std::lock锁，来避免多次加锁顺序导致的死锁

std::lock(mtx1, mtx2, ...)原子性地同时锁定多个互斥量，避免因不同线程加锁顺序不一致导致的死锁。

std::mutex mtx1, mtx2;
{std::unique_lock<std::mutex> lock1(mtx1, std::defer_lock);std::unique_lock<std::mutex> lock2(mtx2, std::defer_lock);std::lock(lock1, lock2); // 原子性同时加锁// 操作共享资源
} // 自动解锁

关键点：

结合std::adopt_lock标记表示锁已被获取，避免重复加锁。
必须使用std::unique_lock（std::lock_guard不支持手动管理）。

3. 递归锁解决单线程重复加锁问题

使用std::recursive_mutex允许同一线程多次加锁。

std::recursive_mutex rmtx;
{std::lock_guard<std::recursive_mutex> lock1(rmtx); // 第一次加锁{std::lock_guard<std::recursive_mutex> lock2(rmtx); // 同一线程内再次加锁}
}

关键点：

递归锁性能低于普通锁，且需确保lock()与unlock()次数匹配。
优先考虑重构代码逻辑，避免嵌套加锁。

4. 避免嵌套锁

减小临界区范围仅在必须访问共享资源时加锁，或者将临界区代码提取为独立函数，减少锁的嵌套层次。

void critical_operation() {std::lock_guard<std::mutex> lock(mtx);// 仅包含必须同步的操作
}void outer_function() {// 非临界区代码critical_operation(); // 调用独立加锁的函数
}

5. 使用锁顺序

全局约定所有线程以相同顺序获取锁，破坏循环等待条件。

// 总是先锁地址最小的
void lock_in_order(std::mutex& mtx1, std::mutex& mtx2) {if (&mtx1 < &mtx2) {std::lock_guard<std::mutex> lock1(mtx1);std::lock_guard<std::mutex> lock2(mtx2);} else {std::lock_guard<std::mutex> lock2(mtx2);std::lock_guard<std::mutex> lock1(mtx1);}
}

按地址排序的锁顺序策略可能因平台或编译器差异失效。

6. 锁超时

使用try_lock_for()或try_lock_until()设置超时。

std::timed_mutex tmtx;
if (tmtx.try_lock_for(std::chrono::milliseconds(100))) {// 成功获取锁tmtx.unlock();
} else {// 超时处理
}

7. 原子操作

使用原子变量彻底避免锁的使用，消除死锁风险。

std::atomic<int> counter{0};
counter.fetch_add(1, std::memory_order_relaxed);

8. 检测并恢复

动态检测：记录锁获取顺序，检测循环等待。
超时回滚：设定超时时间，超时后释放资源并重试。

9. 数据分区

通过分区技术避免不同线程同时访问同一个资源，消除死锁可能性。

将数据划分为多个独立的部分，分配给不同的线程。
常用于多线程处理大规模数据的场景。

查看全文

http://www.xdnf.cn/news/21997.html

国内主要半导体厂家

Java 接入deepseek(非流式)

数据资产登记导则详解 | 企业如何规范化登记与管理数据资产？

机械臂速成小指南（二十五）：机械臂与人工智能的有机结合

基于 S2SH 架构的企业车辆管理系统：设计、实现与应用

OOM 未触发 JVM 崩溃的可能原因

椰汁椰肉生产车间设计

怎么隐藏QTabWidget内的页面

Git 解决“Filename too long”问题

企业应用大模型报告：如何应对变革，构建专属“我的AI”

生物化学笔记：医学免疫学原理22 肿瘤及肿瘤治疗

vue3+vite 实现.env全局配置

大模型时代：AI应用的变革与挑战

冰箱在储存各种疫苗时要经过多少道程序又会面临哪些风险？

linux获取cpu使用率（sy%+us%）

文件二进制读写和文本读写以及编码解码

Android 12系统静态壁纸深度定制指南

day2-小白学习JAVA---java第一个程序

【AI学习】OpenAI：《A practical guide to building agents》(中文介绍与原文)

关于嵌入式系统的知识课堂（二）

Unity粒子特效打包后不显示

【天外之物】叉乘（向量积）的行列式表示方法

前端如何构建跨平台可复用的业务逻辑层（Web、App、小程序）

LIMS引领综合质检中心数字化变革，赋能质量强国战略