一:背景
1. 讲故事
前段时间有位朋友在微信上找到我,说他的程序出现了内存泄漏,能不能帮他看一下,这个问题还是比较经典的,加上好久没上非托管方面的东西了,这篇就和大家分享一下,话不多说,上 WinDbg 说话。
二:WinDbg 分析
1. 到底是哪里的泄漏
好的开始就是成功的一半,否则就南辕北辙了,对吧,还是用经典的 !address -summary 看一下内存排布情况。- 0:000> !address -summary
- --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
- Heap 1935 553b3000 ( 1.332 GB) 70.57% 66.59%
- Image 1022 c306000 ( 195.023 MB) 10.09% 9.52%
- <unknown> 1202 c09d000 ( 192.613 MB) 9.97% 9.41%
- Stack 541 b280000 ( 178.500 MB) 9.24% 8.72%
- Free 1158 73ab000 ( 115.668 MB) 5.65%
- TEB 180 20f000 ( 2.059 MB) 0.11% 0.10%
- Other 8 5d000 ( 372.000 kB) 0.02% 0.02%
- PEB 1 3000 ( 12.000 kB) 0.00% 0.00%
- --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
- MEM_COMMIT 3077 643c6000 ( 1.566 GB) 83.00% 78.31%
- MEM_RESERVE 1812 1487f000 ( 328.496 MB) 17.00% 16.04%
- MEM_FREE 1158 73ab000 ( 115.668 MB) 5.65%
复制代码 从卦中可以看出,当前 MEM_COMMIT = 1.56 G, 并且 Heap= 1.3 G,既然超出了朋友的预期,很明显这是一个非托管内存泄漏,既然 NTHeap 出现了泄漏,那就挖一下看看,使用 !heap -s 观察一下各个heap句柄。- 0:000> !heap -s
- ************************************************************************************************************************
- NT HEAP STATS BELOW
- ************************************************************************************************************************
- LFH Key : 0xbb72f2a3
- Termination on corruption : DISABLED
- Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
- (k) (k) (k) (k) length blocks cont. heap
- -----------------------------------------------------------------------------
- 00770000 00000002 16576 9716 16364 33 195 5 0 0 LFH
- 006f0000 00001002 1292 148 1080 11 4 2 0 0 LFH
- 00a80000 00001002 3336 1972 3124 88 25 3 0 0 LFH
- 02460000 00001002 60 4 60 0 1 1 0 0
- 023b0000 00041002 60 4 60 2 1 1 0 0
- 02450000 00001002 272 24 60 1 3 1 0 0 LFH
- 04a40000 00041002 1292 80 1080 8 4 2 0 0 LFH
- 06e90000 00001002 64180 56660 63968 1434 473 9 624 7 LFH
- 09dc0000 00001002 60 12 60 3 2 1 0 0
- 0a500000 00001002 7428 3772 7216 43 35 4 0 0 LFH
- -----------------------------------------------------------------------------
复制代码 从卦中的 Commit 列来看,内存占用都不大,最大的也不过 56M ,如果经验丰富的话,你会发现 Virt blocks 高达 624 个,明白 ntheap 的朋友应该知道,凡是大于 512k 的 heapentry 都会单独安排到 VirtualAllocdBlocks 数组中,可以用 dt ntdll!_HEAP 06e90000 给show出来。- 0:000> dt ntdll!_HEAP 06e90000
- ...
- +0x05c VirtualMemoryThreshold : 0xfe00
- +0x09c VirtualAllocdBlocks : _LIST_ENTRY [ 0x6ea4000 - 0x7c0d0000 ]
- ...
复制代码 为了更好的输出 VirtualAllocdBlocks 数组,我们用 windbg 自带的 heap 分析命令。- 0:000> !heap 06e90000 -m
- Index Address Name Debugging options enabled
- 8: 06e90000
- Segment at 06e90000 to 06e9f000 (0000f000 bytes committed)
- Segment at 078f0000 to 079ef000 (000ff000 bytes committed)
- Segment at 08870000 to 08a6f000 (001ff000 bytes committed)
- Segment at 0ec60000 to 0f05f000 (003f9000 bytes committed)
- Segment at 18660000 to 18e5f000 (007fa000 bytes committed)
- Segment at 26b20000 to 27aef000 (00fc0000 bytes committed)
- Segment at 45320000 to 462ef000 (00fcf000 bytes committed)
- Segment at 65bf0000 to 66bbf000 (008bf000 bytes committed)
- Flags: 00001002
- ForceFlags: 00000000
- Granularity: 8 bytes
- Segment Reserve: 03f70000
- Segment Commit: 00002000
- DeCommit Block Thres: 00000800
- DeCommit Total Thres: 00002000
- Total Free Size: 0002cd56
- Max. Allocation Size: 7ffdefff
- Lock Variable at: 06e90258
- Next TagIndex: 0000
- Maximum TagIndex: 0000
- Tag Entries: 00000000
- PsuedoTag Entries: 00000000
- Virtual Alloc List: 06e9009c
- 06ea4000: 00200000 [commited 201000, unused 1000] - busy (b)
- 070b2000: 00200000 [commited 201000, unused 1000] - busy (b)
- 079f4000: 00200000 [commited 201000, unused 1000] - busy (b)
- 07c0f000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0802b000: 00200000 [commited 201000, unused 1000] - busy (b)
- 08238000: 00200000 [commited 201000, unused 1000] - busy (b)
- 08444000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0865f000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0e20f000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0e42b000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0e635000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0e841000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0c661000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0c87e000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0ca8b000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0ea56000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0f062000: 00200000 [commited 201000, unused 1000] - busy (b)
- 0f275000: 00200000 [commited 201000, unused 1000] - busy (b)
- ...
复制代码 从卦中可以看到大量的 commited 201000, unused 1000 ,这里的 0x201000 转换一下大概就是 2M,以经验来说,这 2M 大概就是 pdf,image,bitmap 等这些玩意了,由于没有开启 pageheap 或 ust,没法追踪到底是什么东西分配的,到这里就没法进展下去了。
2. 到底是谁分配的 2M 数据
首先能进入 VirtualAllocdBlocks 数组自然是高层调用了 HeapAlloc 这类API,同时这个数据量高度怀疑是 Bitmap,Pdf 之类的大文件,很大可能是托管代码做了什么导致这个资源没有释放,接下来使用 !dumpheap -stat 看下托管堆。- 0:000> !dumpheap -stat
- Statistics:
- MT Count TotalSize Class Name
- ...
- 09ae7e48 627 15048 System.Drawing.Bitmap
- 6b267040 178 366680 System.Decimal[]
- 6b2cb4a0 1850 601588 System.String[]
- 6b2cdd14 1379 638190 System.Byte[]
- 6b2cac14 15919 1146764 System.String
- 09aec720 66332 1326640 System.Drawing.FontFamily
- 09ae8590 66074 2907256 System.Drawing.Font
- Total 289300 objects
复制代码 从卦中看,System.Drawing.Font 居然高达 6w 个,而且 System.Drawing.Bitmap 和 heap 上的 624 也非常接近,看样子就是 Bitmap 啦,那为什么这个 Bitmap 没有善终呢? 可以用 !frq -stat 观察下终结器队列的 Freachable Queue 情况。- 0:000> !frq -stat
- Freachable Queue:
- Count Total Size Type
- ---------------------------------------------------------
- 152 3648 System.Data.Odbc.CNativeBuffer
- 76 2128 System.Data.Odbc.OdbcConnectionHandle
- 77 1540 System.Transactions.SafeIUnknown
- 76 1824 System.Data.Odbc.OdbcStatementHandle
- 2432 145920 System.Windows.Forms.Control+ControlNativeWindow
- 304 7296 System.Drawing.Bitmap
- 66062 2906728 System.Drawing.Font
- 258 5160 System.Drawing.FontFamily
- 308 9856 System.Drawing.Graphics
- 308 3696 System.Windows.Forms.ImageList+NativeImageList
- 1 12 System.Drawing.Text.InstalledFontCollection
- 12 240 System.Threading.ThreadPoolWorkQueueThreadLocals
- 1 20 System.Security.Cryptography.SafeKeyHandle
- 1 20 Microsoft.Win32.SafeHandles.SafeWaitHandle
- 6 120 Microsoft.Win32.SafeHandles.SafeRegistryHandle
- 12 624 System.Threading.Thread
- 1577 69388 System.Threading.ReaderWriterLock
- 1 20 System.Security.Cryptography.SafeProvHandle
- 71,664 objects, 3,158,240 bytes
复制代码 我去,这个可终结队列居然高达 7.1w ,这是很有问题的,大概率当前的终结器线程瓦特了,接下来追查下 终结器线程 此时正在做什么 ?- 0:000> !t
- ThreadCount: 107
- UnstartedThread: 0
- BackgroundThread: 93
- PendingThread: 0
- DeadThread: 12
- Hosted Runtime: no
- Lock
- ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
- 0 1 138ac 0079fef0 26020 Preemptive 00000000:00000000 00798f38 1 STA
- 2 2 12b08 007adac0 2b220 Preemptive 00000000:00000000 00798f38 0 MTA (Finalizer)
- ...
- 0:000> ~2s
- eax=00000001 ebx=00000000 ecx=00000000 edx=00000000 esi=00000001 edi=00000001
- eip=777b2f8c esp=0466eaf4 ebp=0466ec84 iopl=0 nv up ei pl nz na po nc
- cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
- ntdll!NtWaitForMultipleObjects+0xc:
- 777b2f8c c21400 ret 14h
- 0:002> k
- # ChildEBP RetAddr
- 00 0466ec84 762fc753 ntdll!NtWaitForMultipleObjects+0xc
- 01 0466ec84 7695d9aa KERNELBASE!WaitForMultipleObjectsEx+0x103
- 02 0466ed34 7695c564 combase!MTAThreadWaitForCall+0x1ca [onecore\com\combase\dcomrem\channelb.cxx @ 7234]
- 03 0466edc0 769a9923 combase!MTAThreadDispatchCrossApartmentCall+0xf4 [onecore\com\combase\dcomrem\chancont.cxx @ 229]
- 04 (Inline) -------- combase!CSyncClientCall::SwitchAptAndDispatchCall+0x9e4 [onecore\com\combase\dcomrem\channelb.cxx @ 5856]
- 05 0466efa0 769ab739 combase!CSyncClientCall::SendReceive2+0xad3 [onecore\com\combase\dcomrem\channelb.cxx @ 5459]
- 06 (Inline) -------- combase!SyncClientCallRetryContext::SendReceiveWithRetry+0x29 [onecore\com\combase\dcomrem\callctrl.cxx @ 1542]
- 07 (Inline) -------- combase!CSyncClientCall::SendReceiveInRetryContext+0x29 [onecore\com\combase\dcomrem\callctrl.cxx @ 565]
- ...
复制代码 从上面的 MTAThreadDispatchCrossApartmentCall 可知,这又是一个经典的 COM 释放问题导致终结器线程卡死。。。接下来仔细看下 线程列表的 STA 情况,可以发现有大量的线程是 STA 模式。

接下来就是将结果告诉朋友,为什么开的线程都是 STA 套件模式。
三:总结
总的来说,这次内存泄漏的原因在于朋友开了 STA 模式的线程,导致终结器线程卡死,进而导致 Bitmap 分配之后无法释放,最终引发非托管泄漏。
这个dump告诉我们,不要放弃,一定可以在绝望中找到希望。

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |