一:背景
1. 讲故事
前段时间有位朋友找到我,说他的程序内存会出现暴涨,让我看下是怎么事情?而且还告诉我是在 Linux 环境下,说实话在Linux上分析.NET程序难度会很大,难度大的原因在于Linux上的各种开源工具主要是针对 C/C++, 和 .NET 一毛钱关系都没有,说到底微软在 Linux 上的调试领域支持度还远远不够。
虽然知道分析起来难度可能会很大,但该分析还是要分析的,让朋友抓一个 dump 过来,上 WinDbg 说话。
二:WinDbg 分析
1. 到底是哪里的泄露
只要是进程都会有内存段的,所以分析Linux的dump一样可以使用 !address -summary 命令来观察。- 0:000> !address -summary
- --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
- <unknown> 1607 ffffffff`cd7a9e00 ( 16.000 EB) 100.00% 100.00%
- Image 41699 0`31e57200 ( 798.340 MB) 0.00% 0.00%
- --- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
- 1247 fffffffe`1c910000 ( 16.000 EB) 100.00%
- MEM_PRIVATE 42059 1`e2cf1000 ( 7.544 GB) 0.00% 0.00%
- --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
- 1247 fffffffe`1c910000 ( 16.000 EB) 100.00% 100.00%
- MEM_COMMIT 42059 1`e2cf1000 ( 7.544 GB) 0.00% 0.00%
- --- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
- PAGE_READWRITE 41067 1`cff54000 ( 7.249 GB) 0.00% 0.00%
- PAGE_READONLY 644 0`07268000 ( 114.406 MB) 0.00% 0.00%
- PAGE_EXECUTE_READ 223 0`06d1f000 ( 109.121 MB) 0.00% 0.00%
- PAGE_EXECUTE_WRITECOPY 125 0`04e16000 ( 78.086 MB) 0.00% 0.00%
- --- Largest Region by Usage ----------- Base Address -------- Region Size ----------
- <unknown> 7ffc`78f8e000 ffff8003`86672000 ( 16.000 EB)
- Image 7ff8`49102000 0`10000000 ( 256.000 MB)
复制代码 这里简单提一下,我发现有很多朋友搞不清楚这里的 16.000 EB 是什么意思,它其实是2的64次方,即程序的用户态空间的寻址范围。
从卦中的 MEM_COMMIT=7.544G 来看当前提交内存不小,接下来用 !eeheap -gc 观察下托管堆内存占用。- 0:000> !eeheap -gc
- ========================================
- Number of GC Heaps: 1
- ----------------------------------------
- generation 0 starts at 7ff688f78f10
- generation 1 starts at 7ff688484e70
- generation 2 starts at 7ff8f7fff000
- ephemeral segment allocation context: none
- Small object heap
- segment begin allocated committed allocated size committed size
- 7ff63fffa000 7ff63fffb000 7ff64fe51d80 7ff64fe5d000 0xfe56d80 (266694016) 0xfe63000 (266743808)
- 7ff6772c8000 7ff6772c9000 7ff6872c2d38 7ff6872c8000 0xfff9d38 (268410168) 0x10000000 (268435456)
- 7ff74bffe000 7ff74bfff000 7ff75bffdfc0 7ff75bffe000 0xfffefc0 (268431296) 0x10000000 (268435456)
- 7ff773ffe000 7ff773fff000 7ff783ffdfc8 7ff783ffe000 0xfffefc8 (268431304) 0x10000000 (268435456)
- 7ff849102000 7ff849103000 7ff859101fe8 7ff859102000 0xfffefe8 (268431336) 0x10000000 (268435456)
- 7ff8f7ffe000 7ff8f7fff000 7ff907ffce88 7ff907ffe000 0xfffde88 (268426888) 0x10000000 (268435456)
- 7ff6872ca000 7ff6872cb000 7ff68a768438 7ff68aa4b000 0x349d438 (55170104) 0x3781000 (58200064)
- Large object heap starts at 7ff907fff000
- segment begin allocated committed allocated size committed size
- 7ff733ff8000 7ff733ff9000 7ff73aedd058 7ff73aefe000 0x6ee4058 (116277336) 0x6f06000 (116416512)
- 7ff743ffc000 7ff743ffd000 7ff744358f10 7ff744379000 0x35bf10 (3522320) 0x37d000 (3657728)
- 7ff7a3ffe000 7ff7a3fff000 7ff7a9d63ee0 7ff7a9d84000 0x5d64ee0 (97930976) 0x5d86000 (98066432)
- 7ff7bbffe000 7ff7bbfff000 7ff7c3dc1090 7ff7c3de2000 0x7dc2090 (131866768) 0x7de4000 (132005888)
- 7ff907ffe000 7ff907fff000 7ff90f048b30 7ff90f069000 0x7049b30 (117742384) 0x706b000 (117878784)
- Pinned object heap starts at 7ff90ffff000
- segment begin allocated committed allocated size committed size
- 7ff90fffe000 7ff90ffff000 7ff9102d15b0 7ff9102d2000 0x2d25b0 (2958768) 0x2d4000 (2965504)
- ------------------------------
- GC Allocated Heap Size: Size: 0x7f36bca0 (2134293664) bytes.
- GC Committed Heap Size: Size: 0x7f710000 (2138112000) bytes.
复制代码 从卦中看当前提交内存也仅有 2.13G,这和 7.5G 相距甚远,说明这是最复杂的 非托管内存泄漏。
2. 非托管泄露分析
作为一个.NET调试者,需要像医生一样尽自己最大可能救治病人,那接下来我们的研究方向在哪里呢?大家需要知道所有的内存占用的基本盘都在 虚拟地址 上,结果一搜索,发现有大概 4w+ 的 dll,一个程序怎么可能会有这么多动态链接库呢?截图如下:

既然找到了可疑之处那就继续挖吧,接下来就是要考虑这个dll是托管代码创建的还是非托管代码创建的,用排除法就好了,如果是托管代码创建的,那就肯定属于 Assembly 下的某一个module,可以查下加载堆看看。- 0:000> !eeheap -loader
- Loader Heap:
- --------------------------------------
- ...
- Module 00007ff95e265778: Size: 0x0 (0) bytes.
- Module 00007ff95e2661e0: Size: 0x0 (0) bytes.
- Module 00007ff95e266c48: Size: 0x0 (0) bytes.
- Module 00007ff95e2676b0: Size: 0x0 (0) bytes.
- Module 00007ff95e268118: Size: 0x0 (0) bytes.
- Module 00007ff95e268b80: Size: 0x0 (0) bytes.
- Module 00007ff95e2695e8: Size: 0x0 (0) bytes.
- Total size: Size: 0x0 (0) bytes.
- --------------------------------------
- Total LoaderHeap size: Size: 0x4bf4b000 (1274327040) bytes total, 0x4da000 (5087232) bytes wasted.
- =======================================
- 0:000> !dumpmodule 00007ff95e2695e8
- Name: *75db8939-8b3a-4075-94ac-e9bb52acf9d1#40147-0.dll
- Attributes: PEFile IsInMemory IsFileLayout
- ...
- MetaData start address: 00007FF85BBCD330 (2068 bytes)
- 0:000> !DumpAssembly /d 00007ff80d71bba0
- Parent Domain: 0000559009249080
- Name: Unknown
- ClassLoader: 00007FF80D71BC00
- Module
- 00007ff95e2695e8 *75db8939-8b3a-4075-94ac-e9bb52acf9d1#40147-0.dll
复制代码 从卦中看虽然加载堆只有 1.27G,但它还有很多关联的内存,而且动态module高多4w+,接下来用 !dumpmodule -mt 观察内部是什么类型。- 0:000> !dumpmodule -mt 00007ff95e2695e8
- Name: *75db8939-8b3a-4075-94ac-e9bb52acf9d1#40147-0.dll
- Attributes: PEFile IsInMemory IsFileLayout
- ...
- Types defined in this module
- MT TypeDef Name
- ------------------------------------------------------------------------------
- 00007ff95e269d80 0x02000004 Submission#0
- 00007ff95e269ec0 0x02000005 Submission#0+<>d__0
- Types referenced in this module
- MT TypeRef Name
- ------------------------------------------------------------------------------
- 00007ff92d6152a8 0x0200000c System.Object
- 00007ff92ead4d18 0x0200000d System.Threading.Tasks.Task`1
- 00007ff93133f298 0x0200000e System.Runtime.CompilerServices.IAsyncStateMachine
- 00007ff92d6cec90 0x0200000f System.Exception
- 00007ff93133f648 0x02000010 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1
- 0:000> !dumpmt -md 00007ff95e269ec0
- ...
- MethodDesc Table
- Entry MethodDesc JIT Name
- 00007FF92D620030 00007ff92d615238 JIT System.Object.Finalize()
- 00007FF92D620038 00007ff92d615248 PreJIT System.Object.ToString()
- 00007FF92D620040 00007ff92d615258 JIT System.Object.Equals(System.Object)
- 00007FF92D620058 00007ff92d615298 JIT System.Object.GetHashCode()
- 00007FF95DFFB0E0 00007ff95e269e58 JIT Submission#0+<>d__0.MoveNext()
- 00007FF95DFF9B70 00007ff95e269e78 NONE Submission#0+<>d__0.SetStateMachine(System.Runtime.CompilerServices.IAsyncStateMachine)
- 00007FF95DFF9B60 00007ff95e269e48 JIT Submission#0+<>d__0..ctor()
复制代码 从卦中看也只能看到一些 Submission 为前缀的类与之相关的状态机类,也看不出来是谁创建的,结果又入了困境。
3. 到底是谁作的孽
要想获取动态程序集的创建事件,有一个好办法就是用跨平台的 dotnet-trace,让它捕获程序集的加载事件即可,详情可参考:https://learn.microsoft.com/en-us/dotnet/core/dependency-loading/collect-details ,然后让朋友跑 30min 看看,参考命令如下:- dotnet-trace collect -p 4108 --clrevents loader --duration 00:00:30:00
复制代码 有了生成好的 dotnet_xxxx.nettrace 之后就可以用 perfview 观察了,打开 Event视图,搜索 AssemblyLoad 事件,截图如下:

通过 Time MSec 的748前缀来看,这1s种能生成几十个动态程序集,接下来右键选择 Open Any Stacks 观察是什么代码调用的,截图如下:

从 perfivew 的输出看,原来是 XXXCusDis 方法内部调用 Microsoft.CodeAnalysis.CSharp.Scripting.CSharpScript.EvaluateAsync 生成了非常多的程序集。
最后就是把 CSharpScript.EvaluateAsync 告诉朋友,能不能给剔除掉做个排查?
三:总结
网上查了下 Microsoft.CodeAnalysis.CSharp.Scripting 可以用来生成C#脚本代码,大家在用的时候小心点吧。

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |