一:背景
1. 讲故事
上个月在社区写的文章比较少,不停关注的朋侪应该知道那段时间被狗咬了以及一些琐事处置惩罚,所以手头上也攒了不少需要分享的案例,这段时间比较空闲,逐个给大
家做个分享吧,刚好年后为新版的 .NET高级调试练习营 做案例储备,所以抓紧时间疯狂输出吧!
这次生产变乱的dump是练习营里一位朋侪给到我的,由于朋侪没有分析出来,让我帮忙看看,毕竟我的修车经验相对来说更丰富一些,算是他们背后坚实的保障吧,话不多说上windbg说话。
二:WinDbg分析
1. 为什么会崩溃
由于windbg默认自动定位到崩溃的线程,而崩溃的dump重点是观察它的崩溃前上下文,这里利用 .ecxr 和 k 下令,输出参考如下:- 0:000> .ecxr
- eax=00000000 ebx=4d6d8360 ecx=00000003 edx=00000000 esi=4d6f0ca0 edi=4d6f0c74
- eip=71a567c7 esp=026fd834 ebp=026fd83c iopl=0 nv up di pl nz na po nc
- cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
- System_Windows_Forms_ni!System.Windows.Forms.ImageList.ImageCollection.SetKeyName+0x1b:
- 71a567c7 cc int 3
- 0:000> k
- *** Stack trace for last set context - .thread/.cxr resets it
- # ChildEBP RetAddr
- 00 026fd83c 0c2c4e7e System_Windows_Forms_ni!System.Windows.Forms.ImageList.ImageCollection.SetKeyName+0x1b
- 01 026fe474 0c2c063b xxx!xxx.MainForm.InitializeComponent+0x198e
- 02 026fe488 095cb9de xxx!xxx.MainForm..ctor+0x5fb
- 03 026fe4e4 0da5bc7a xxx!xxx.LoginForm.button_OK_Click+0x52e
- 04 026fe4f8 71a38bdf xxx!xxx.LoginForm.LoginForm_Load+0x9a
- 05 026fe528 710b325a System_Windows_Forms_ni!System.Windows.Forms.Form.OnLoad+0x2f
- ...
复制代码 从卦象看是崩溃在 System.Windows.Forms.ImageList.ImageCollection.SetKeyName 方法上,很显然这个方法属于微软的SDK底层库,不管怎么说是一个托管异常,既然是托管异常我们可以用 !t 观察到具体的崩溃信息。- 0:000> !t
- ThreadCount: 26
- UnstartedThread: 0
- BackgroundThread: 12
- PendingThread: 0
- DeadThread: 13
- Hosted Runtime: no
- Lock
- ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
- 0 1 534 0299e700 a6028 Preemptive 4D6F0EFC:00000000 02997bd0 0 STA System.IndexOutOfRangeException 4d6f0ca0
- 2 2 5b4 029af278 2b228 Preemptive 00000000:00000000 02997bd0 0 MTA (Finalizer)
- 3 6 ff0 02a60eb0 102a228 Preemptive 00000000:00000000 02997bd0 0 MTA (Threadpool Worker)
- ...
- 0:000> !pe
- Exception object: 4d6f0ca0
- Exception type: System.IndexOutOfRangeException
- Message: Index was outside the bounds of the array.
- InnerException: <none>
- StackTrace (generated):
- SP IP Function
- 026FD834 71A567C7 System_Windows_Forms_ni!System.Windows.Forms.ImageList+ImageCollection.SetKeyName(Int32, System.String)+0x33e157
- StackTraceString: <none>
- HResult: 80131508
复制代码 从卦象看非常希奇,怎么底层库中抛了一个数组索引越界异常?难道是底层的bug?一般来说这些代码都是铜墙铁壁,固若金汤,坚如磐石,稳如泰山,自作掩饰,不可能有云云低级的bug。。。
2. 真的是底层库bug吗?
要想寻找答案,可以根据线程栈上的函数寻找底层源码,从源码上寻找答案,修剪后的代码如下:- private void InitializeComponent()
- {
- this.imageList_btnbg.ImageStream = (System.Windows.Forms.ImageListStreamer)resources.GetObject("imageList_btnbg.ImageStream");
- this.imageList_btnbg.TransparentColor = System.Drawing.Color.Transparent;
- this.imageList_btnbg.Images.SetKeyName(0, "normal-main.bmp");
- this.imageList_btnbg.Images.SetKeyName(1, "focus-main.bmp");
- this.imageList_btnbg.Images.SetKeyName(2, "select-main.bmp");
- this.imageList_btnbg.Images.SetKeyName(3, "gray-main.bmp");
- this.imageList_btnbg.Images.SetKeyName(4, "down_1.png");
- this.imageList_btnbg.Images.SetKeyName(5, "down_2.png");
- this.imageList_btnbg.Images.SetKeyName(6, "down_3.png");
- this.imageList_btnbg.Images.SetKeyName(7, "up_1.png");
- this.imageList_btnbg.Images.SetKeyName(8, "up_2.png");
- this.imageList_btnbg.Images.SetKeyName(9, "up_3.png");
- }
- public void SetKeyName(int index, string name)
- {
- if (!IsValidIndex(index))
- {
- throw new IndexOutOfRangeException();
- }
- if (imageInfoCollection[index] == null)
- {
- imageInfoCollection[index] = new ImageInfo();
- }
- ((ImageInfo)imageInfoCollection[index]).Name = name;
- }
- private bool IsValidIndex(int index)
- {
- if (index >= 0)
- {
- return index < Count;
- }
- return false;
- }
- public int Count
- {
- get
- {
- if (owner.HandleCreated)
- {
- return SafeNativeMethods.ImageList_GetImageCount(new HandleRef(owner, owner.Handle));
- }
- int num = 0;
- foreach (Original original in owner.originals)
- {
- if (original != null)
- {
- num += original.nImages;
- }
- }
- return num;
- }
- }
- [Browsable(false)]
- [EditorBrowsable(EditorBrowsableState.Advanced)]
- [DesignerSerializationVisibility(DesignerSerializationVisibility.Hidden)]
- [SRDescription("ImageListHandleCreatedDescr")]
- public bool HandleCreated => nativeImageList != null;
复制代码 仔细通读卦中的代码逻辑,看样子是 IsValidIndex()=false 导致的手工 IndexOutOfRangeException 异常,而 IsValidIndex()=false 是由于 index < Count 的条件成立,背面的 Count 是取自 ImageList_GetImageCount 或者 owner.originals 值。
代码逻辑我们是分析清楚了,接下来就是看汇编来分析下这个dump的现状,入手点就是从 index 值入手,即对 InitializeComponent() 方法举行反汇编。- 0:000> !clrstack
- OS Thread Id: 0x534 (0)
- Child SP IP Call Site
- 026fd784 771316bc [HelperMethodFrame: 026fd784]
- 026fd834 71a567c7 System.Windows.Forms.ImageList+ImageCollection.SetKeyName(Int32, System.String)
- 026fd848 0c2c4e7e xxx.MainForm.InitializeComponent()
- 026fe47c 0c2c063b xxx.MainForm..ctor()
- 026fe490 095cb9de xxx.LoginForm.button_OK_Click(System.Object, System.EventArgs)
- ...
- 0:000> !U /d 0c2c4e7e
- Normal JIT generated code
- xxx.MainForm.InitializeComponent()
- Begin 0c2c34f0, size 5ded
- ...
- 0c2c4e62 e8d9b5d864 call System_Windows_Forms_ni+0x160440 (71050440) (System.Windows.Forms.ImageList.get_Images(), mdToken: 06002599)
- 0c2c4e67 898514f5ffff mov dword ptr [ebp-0AECh],eax
- 0c2c4e6d ff35f0a07f05 push dword ptr ds:[57FA0F0h] ("normal-main.bmp")
- 0c2c4e73 8bc8 mov ecx,eax
- 0c2c4e75 33d2 xor edx,edx
- 0c2c4e77 3909 cmp dword ptr [ecx],ecx
- 0c2c4e79 e8f2374565 call System_Windows_Forms_ni!System.Windows.Forms.ImageList.ImageCollection.SetKeyName (71718670)
- >>> 0c2c4e7e 8b8e74020000 mov ecx,dword ptr [esi+274h]
- ...
复制代码 从卦象看,尼玛。。。执行第一个 SetKeyName(0, "normal-main.bmp"); 就异常啦,这就说明谁人 Count=0,无语了,为什么 Count=0 呢? 接下来寻找Count数据源ImageCollection 集合,可以从线程栈中寻找,利用 !dso 下令即可。- 0:000> !dso
- OS Thread Id: 0x534 (0)
- ESP/REG Object Name
- 026FD774 4d6f0c74 System.Windows.Forms.ImageList+ImageCollection
- 026FD778 4d6f0ca0 System.IndexOutOfRangeException
- ...
- 0:000> !do 4d6f0c74
- Name: System.Windows.Forms.ImageList+ImageCollection
- MethodTable: 71120ff0
- EEClass: 70f230ec
- Size: 20(0x14) bytes
- File: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Windows.Forms\v4.0_4.0.0.0__b77a5c561934e089\System.Windows.Forms.dll
- Fields:
- MT Field Offset Type VT Attr Value Name
- 7111ecc0 4003916 4 ...s.Forms.ImageList 0 instance 4d6d97b0 owner
- 72d909dc 4003917 8 ...ections.ArrayList 0 instance 4d6f0c88 imageInfoCollection
- 72d8df5c 4003918 c System.Int32 1 instance -1 lastAccessedIndex
- 0:000> !DumpObj /d 4d6d97b0
- Name: System.Windows.Forms.ImageList
- ...
- Fields:
- MT Field Offset Type VT Attr Value Name
- 71121b0c 4001013 10 ...t+NativeImageList 0 instance 4d6f0c40 nativeImageList
- 728e15a0 4001019 1c ...Collections.IList 0 instance 00000000 originals
复制代码 根据卦中的 nativeImageList 和 originals 再配合源代码,应该就是祸首 SafeNativeMethods.ImageList_GetImageCount 方法返回 0 导致的,先观察一下它的署名。- [DllImport("comctl32.dll")]
- public static extern int ImageList_GetImageCount(HandleRef himl);
复制代码 从署名看这是C++写的外部方法,这就沃草了。。。我总不能用 ida 去捋这里面的逻辑吧。。。到这里貌似已经快要撞到南墙了。。。有点慌了。
3. 天要绝人之路吗
颠末短暂的恍含糊惚之后,我突然灵光一现,尼玛这是32bit的内存地点,是不是2G的空间不够用哦?刚好 ImageList_GetImageCount 是一个关于图片的UI控件,用了底层的COM资源,会不会真的是空间不足导致的?有了这个想法之后赶紧 !address -summary 观察提交内存。- 0:000> !address -summary
- ...
- --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
- MEM_COMMIT 1933 6e768000 ( 1.726 GB) 93.52% 86.30%
- MEM_FREE 631 9e01000 ( 158.004 MB) 7.72%
- MEM_RESERVE 607 7a87000 ( 122.527 MB) 6.48% 5.98%
- ...
复制代码 尼玛。。。卦象中的 MEM_COMMIT=1.72G, %ofBusy= 93.52% 早已凌驾了1.2G的临界值,终于真相大白。。。
办理办法就比较简朴了,开启大地点,让程序吃 4G 的内存,厥后朋侪反馈这个问题已不再出来。。。
三:总结
分析完这个dump之后其实我挺感慨的,人生也云云dump一样,在真相和假象之间不断的交织穿梭,有些人走出来了,有些人永远留在了里面。。。
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |