彻底修复funasr本身内存泄漏问题(仅剩ORT内部working set、堆不立刻还内核造成的少量内存泄漏) by liuyt6515 · Pull Request #2842 · modelscope/FunASR

liuyt6515 · 2026-04-10T08:10:07Z

背景与目标
核心目标不是“盲目降 RSS”，而是先通过可观测手段把 2pass WebSocket 链路中的内存行为分段定位，再针对高风险路径做释放与结构优化。重点场景是：
● 单进程、多次 WebSocket 连接（以及不同并发压测下的对比）
● 关注 VmRSS 在 chunk、final、连接结束三类时刻的变化
● 区分“真实泄漏”与“分配器/运行时缓存导致的 RSS 不回落”
采取的主要治理手段
2.1 建立端到端内存埋点体系（NDJSON + trace_id）
在 V4 中新增了统一内存追踪能力，按一次 do_decoder 建立 trace_id，记录关键阶段的 rss_kb 和相对起点增量：
● 入口与基础能力：runtime/onnxruntime/include/memtrace.h、runtime/onnxruntime/src/memtrace.cpp
● WebSocket 分段埋点：runtime/websocket/bin/websocket-server-2pass.cpp
● 2pass 处理链埋点：runtime/onnxruntime/src/funasrruntime.cpp
● 在线/离线模型关键阶段埋点：runtime/onnxruntime/src/paraformer-online.cpp、runtime/onnxruntime/src/paraformer.cpp
价值：把“容器看起来在涨”拆解到具体阶段（在线 encoder、离线 forward、result 处理、trim 后状态等），避免误判。

2.2 在 chunk/final 路径补齐显式释放与 trim
在 WebSocket 2pass 服务路径中，针对每次 chunk 与 final 结果处理后统一执行释放动作（结果对象释放 + Linux 下 malloc_trim）并打点观察：
● 文件：runtime/websocket/bin/websocket-server-2pass.cpp
结果：chunk/final 后可见回落，尤其首轮冷启动后常有明显回收；但并不保证每次都回到起点（这在 glibc/ORT 场景是预期现象）。

2.3 2pass 会话结束时做显式状态重置（input_finished）
在 input_finished（final）路径中，增加在线状态与解码器会话级 reset/end-utterance，避免跨句/跨段状态累积：
● 文件：runtime/onnxruntime/src/funasrruntime.cpp
包括在线 ASR/VAD cache reset、离线模型 utterance 结束、WFST 相关会话重置等。
结果：final 之后的“状态性占用”得到控制，final 链路多数样本回收明显。

2.4 离线 Paraformer 路径的临时内存治理
在离线 Forward 路径中，V4 做了两类处理：

Linux 下输入 tensor 侧采用非 arena 方式（device allocator）
Run 后尽早释放临时大容器，并在合适点 malloc_trim
● 文件：runtime/onnxruntime/src/paraformer.cpp
结果：final 路径“峰值高但可回落”的特征更明显；多数情况下残留可压到较低量级。

2.5 在线 Paraformer（chunk 热点）专项优化
针对日志中反复出现的在线 chunk 抬升，V4 进行了针对性调整：

在线路径 I/O tensor 侧改为非 arena（与离线主路径对齐）
去掉 CifSearch 的按值传参大拷贝（改为引用语义）
enc_vec/alpha_vec/list_frame/emb_input 等中间大对象尽早释放，缩短峰值重叠窗口
● 文件：runtime/onnxruntime/src/paraformer-online.cpp
● 声明同步：runtime/onnxruntime/src/paraformer-online.h
结果：在线路径有改善，但“encoder 阶段仍是主要抬升点”的结论依旧成立（说明不仅是容器侧 C++ 对象，还有 ORT/分配器层面的常驻行为）。

2.6 连接关闭资源回收链路核查
V4 还检查并依赖了连接关闭时的在线句柄与解码器释放逻辑：
● 文件：runtime/websocket/bin/websocket-server-2pass.cpp
包括 on_close 标志位与清理线程中对 FunTpassOnlineUninit、FunASRWfstDecoderUninit 等释放调用。
结果：连接级资源不会无限堆积，跨请求“纯泄漏”迹象不明显。

通过本轮实测，已“实质解决/缓解”的问题
说明：以下“解决”采用工程语义（已控制风险并有证据），不等于数学意义“RSS 必回初始值”。
3.1 已解决：缺乏可观测性导致无法定位
● 现在可以按 trace_id 精确追踪 WS0→chunk→final→WS9 的每段内存变化。
● 解决了过去只能看 docker stats、难判定原因的问题。
3.2 已缓解：final 路径的大峰值后回收
● 离线 final 的 Run 峰值仍在，但多数样本后续能明显回落。
● final 结束残留整体处于可控区间（相较早期“每次似乎都涨很多”观感，已大幅改善）。
3.3 已缓解：会话结束不重置导致状态积累
● input_finished 会话 reset 补齐后，状态类占用不再持续跨轮累积。
3.4 已缓解：在线路径冗余拷贝导致峰值放大
● 去掉 CifSearch 按值拷贝、缩短中间对象生命周期后，C++ 层冗余内存重叠减少。
仍未完全解决、但已明确归因的问题
4.1 在线 chunk 的 encoder 阶段仍是最大抬升点
● 在多份日志中，主要台阶依然集中在 ParaformerOnline::ForwardChunk 的 encoder Run 前后。
● trim 后仍可能高于当次 WS0，属于当前最主要“未完全收回”来源。
4.2 RSS 不回到起点并不必然等于泄漏
● 已有证据表明：连接级/会话级对象能回收，且整段连接有时末态低于首态。
● 剩余问题更像“运行时/分配器保留 + 工作集”而非明显对象泄漏。
解决或显著缓解了以下问题：
诊断层面泄漏黑盒问题：已从黑盒变白盒，可定位到具体阶段与代码路径。
会话状态类累积问题：input_finished 的显式 reset 基本解决跨轮状态堆积风险。
结果对象/临时对象未及时释放问题：chunk/final 后释放链路完善，临时对象生命周期更可控。
在线路径不必要内存复制问题：已修正按值复制与部分大对象滞留。
同时也应明确：
● 尚未把“在线 encoder 导致的 RSS 台阶”完全消除。
● 当前证据更支持“部分常驻/缓存行为”而非“持续线性泄漏”。
关键文件索引
● FunASR-main/runtime/onnxruntime/include/memtrace.h
● FunASR-main/runtime/onnxruntime/src/memtrace.cpp
● FunASR-main/runtime/websocket/bin/websocket-server-2pass.cpp
● FunASR-main/runtime/onnxruntime/src/funasrruntime.cpp
● FunASR-main/runtime/onnxruntime/src/paraformer.cpp
● FunASR-main/runtime/onnxruntime/src/paraformer-online.cpp
●FunASR-main/runtime/onnxruntime/src/paraformer-online.h

注：实测速度相对于原版本降低了2.5倍，原版单核cpu的RTF为1的条件下并发可以达到5个，现在只能有2个。

gemini-code-assist

Code Review

This pull request introduces a memory tracing framework and several memory management optimizations, including switching to the ONNX Runtime device allocator to reduce arena retention, using smart pointers, and explicitly releasing vector memory via shrink_to_fit and malloc_trim. Feedback identifies a hardcoded local path in the tracing utility and highlights critical performance bottlenecks caused by synchronous file I/O in the logging system and the excessive use of malloc_trim within high-frequency processing loops.

gemini-code-assist · 2026-04-10T08:11:46Z

runtime/onnxruntime/src/memtrace.cpp

+    FILE* fp = std::fopen(e, "a");
+    if (fp) return fp;
+  }
+  FILE* fp2 = std::fopen("/data/project/liuyuntao/FUN_ASR/.cursor/debug-08e994.log", "a");


This hardcoded local path (/data/project/liuyuntao/...) is a leftover from the development environment. It should be removed to ensure portability and avoid errors on systems where this directory structure does not exist.

这里确实应该移除

gemini-code-assist · 2026-04-10T08:11:46Z

runtime/onnxruntime/src/memtrace.cpp

+  std::lock_guard<std::mutex> lk(g_memtrace_mu);
+  FILE* fp = OpenMemtraceLog();
+  if (!fp) return;
+  auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(
+                std::chrono::system_clock::now().time_since_epoch())
+                .count();
+  std::fprintf(fp,
+               "{\"sessionId\":\"08e994\",\"hypothesisId\":\"%s\",\"location\":\"memtrace\",\"message\":\"%s\","
+               "\"data\":{\"trace_id\":%lld,\"rss_kb\":%llu,\"delta_from_decoder_start_kb\":%lld,"
+               "\"data_a\":%lld,\"data_b\":%lld},\"timestamp\":%lld}\n",
+               hypothesisId, phase, (long long)trace_id, (unsigned long long)rss, delta, data_a, data_b,
+               (long long)ms);
+  std::fclose(fp);


The logging implementation in MemtraceLog is a major performance bottleneck and likely contributes significantly to the reported 2.5x slowdown:\n\n1. Global Serialization: The use of g_memtrace_mu serializes all logging across all threads, which is a significant bottleneck in a concurrent server environment.\n2. File I/O Overhead: Opening and closing the log file on every single log entry is extremely expensive.\n3. Frequent System Calls: ReadVmRSSKb (called at line 66) performs file I/O on /proc/self/status every time a log is generated.\n\nConsider using an asynchronous logging approach where logs are buffered and written by a background thread, or keep the file handle open if debugging is enabled.

生产环境建议移除日志

gemini-code-assist · 2026-04-10T08:11:46Z

runtime/websocket/bin/websocket-server-2pass.cpp

+#if defined(__linux__)
+        malloc_trim(0);
+        // #region agent log
+        funasr::MemtraceLog("ws_after_chunk_trim", "WS4t", mem_sess.trace_id(), 0, 0);
+        // #endregion
+#endif


Executing malloc_trim(0) inside the chunk processing loop (while (buffer.size() >= 800 * 2 ...)) is extremely expensive and is likely the primary cause of the 2.5x performance drop mentioned in the PR description. malloc_trim forces the allocator to scan all arenas and release free pages back to the OS, which is a high-latency operation.\n\nIt should be removed from the chunk loop and only called at the end of an utterance (when is_final is true) or at a much lower frequency to balance memory footprint and throughput.

也可以，能加快一点速度

彻底修复funasr本身内存泄漏问题(仅剩ORT内部working set、堆不立刻还内核造成的少量内存泄漏)

cb79c4a

gemini-code-assist bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

彻底修复funasr本身内存泄漏问题(仅剩ORT内部working set、堆不立刻还内核造成的少量内存泄漏)#2842

彻底修复funasr本身内存泄漏问题(仅剩ORT内部working set、堆不立刻还内核造成的少量内存泄漏)#2842
liuyt6515 wants to merge 1 commit intomodelscope:mainfrom
liuyt6515:bugfix/fix-login-error

liuyt6515 commented Apr 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

liuyt6515 Apr 10, 2026

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

liuyt6515 Apr 10, 2026

Uh oh!

gemini-code-assist bot Apr 10, 2026

Uh oh!

liuyt6515 Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liuyt6515 commented Apr 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

liuyt6515 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

liuyt6515 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

liuyt6515 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant