.Net 7 GCInfo推导GC标记对象

楔子

本篇起始,开始搞lldb。本篇通过GC信息来推导出GC是如何标记对象的。

之前在VS上Debug过,此次Linux下面通过LLDB尝试下。

代码

先上一段简单代码.

 internal class Program
 {
    static void Main(string[] args)
    {
       Program pm = new Program();
       Console.WriteLine("Hello, World!");
       GC.Collect();
    }
 }

GCInfo

查看示例代码Main函数的GCInfo需要等这个Main函数被RyuJIT完全编译之后,才能看到。所以可以在Main函数的末尾下个断点。

首先在托管Main函数运行前下个断点:RunMainInternal

(lldb) b RunMainInternal
Breakpoint 5: where = libcoreclr.so`RunMainInternal(Param*) + 31 at assembly.cpp:1335:37, address = 0x00007ffff701b8bf

命令r运行到RunMainInternal

(lldb) r
Process 111676 launched: '/home/tang/Downloads/runtime/artifacts/bin/coreclr/Linux.x64.Debug/corerun' (x86_64)
Process 111676 stopped
* thread #1, name = 'corerun', stop reason = breakpoint 5.1
    frame #0: 0x00007ffff701b8bf libcoreclr.so`RunMainInternal(pParam=0x00007fffffffcf30) at assembly.cpp:1335:37
   1332	
   1333	static void RunMainInternal(Param* pParam)
   1334	{
-> 1335	    MethodDescCallSite  threadStart(pParam->pFD);
   1336	
   1337	    PTRARRAYREF StrArgArray = NULL;
   1338	    GCPROTECT_BEGIN(StrArgArray);

给PreStubWorker函数下断点,跟上面一样

(lldb) b PreStubWorker
Breakpoint 6: where = libcoreclr.so`::PreStubWorker(TransitionBlock *, MethodDesc *) + 38 at prestub.cpp:1866:11, address = 0x00007

C命令运行到PreStubWorker函数处

(lldb) c
Process 111676 resuming
Process 111676 stopped
* thread #1, name = 'corerun', stop reason = breakpoint 6.1
    frame #0: 0x00007ffff71bcf36 libcoreclr.so`::PreStubWorker(pTransitionBlock=0x00007fffffffc828, pMD=0x00007fff795a5008) at prestub.cpp:1866:11
   1863	//=============================================================================
   1864	extern "C" PCODE STDCALL PreStubWorker(TransitionBlock* pTransitionBlock, MethodDesc* pMD)
   1865	{
-> 1866	    PCODE pbRetVal = NULL;
   1867	
   1868	    BEGIN_PRESERVE_LAST_ERROR;
   1869	

此时就可以看下托管Main函数的MethodDesc,首先运行下p命令打印托管Main的MethodDesc的地址

(lldb) p pMD
(MethodDesc *) $4 = 0x00007fff795a5008

然后就可以看到Main的MethodDesc结构

(lldb) p *pMD 
(MethodDesc) $5 = {
  m_pszDebugMethodName = 0x00007fff795c2a58 "Main"
  m_pszDebugClassName = 0x00007fff795a4fb8 "Program"
  m_pszDebugMethodSignature = 0x00007fff7959efc8 "void *(string[])"
  m_pDebugMethodTable = 0x00007fff795a50a0
  m_GcCover = nullptr
  m_wFlags3AndTokenRemainder = 6
  m_chunkIndex = '\0'
  m_bFlags2 = '\x03'
  m_wSlotNumber = 5
  m_wFlags = 168
}

然后连续运行五次C命令,让托管Main函数被彻底的编译完成。然后才能查看其GC信息,运行如下命,0x00007fff795a5008是上面的托管函数Main的MethodDesc的地址

(lldb) sos gcinfo 0x00007fff795a5008
entry point 00007FFF78FB59A0
Normal JIT generated code
GC info 00007FFF797573F0
Pointer table:
Prolog size: 0
Security object: <none>
GS cookie: <none>
PSPSym: <none>
Generics inst context: <none>
PSP slot: <none>
GenericInst slot: <none>
Varargs: 0
Frame pointer: rbp
Wants Report Only Leaf: 0
Size of parameter area: 0
Return Kind: Scalar
Code size: 6b
Untracked: +rbp-8 +rbp-10 +rbp-18
00000018 interruptible
00000036 +rax
0000003e +rdi
00000044 -rdi -rax
00000048 +rdi
0000005c -rdi
00000065 not interruptible

注意看它这个里面的

+rbp-8 +rbp-10 +rbp-18

这些变量都是托管Main函数里面的寄存器加偏移地址处保存的对象。

通过如下命令反汇编托管Main函数,地址0x00007fff795a5008是托管Main的MethodDes地址。

(lldb) sos u 0x00007fff795a5008
Normal JIT generated code
Program.Main(System.String[])
ilAddr is 00007FFFF41622A4 pImport is 00000000037AEAD0
Begin 00007FFF78FB59A0, size 6b

/home/tang/Downloads/dotnetDemo/abc/Program.cs @ 4:
00007fff78fb59a0 55                   push    rbp
00007fff78fb59a1 4883ec20             sub     rsp, 0x20
00007fff78fb59a5 488d6c2420           lea     rbp, [rsp + 0x20]
00007fff78fb59aa 33c0                 xor     eax, eax
00007fff78fb59ac 488945f0             mov     qword ptr [rbp - 0x10], rax
00007fff78fb59b0 488945e8             mov     qword ptr [rbp - 0x18], rax
00007fff78fb59b4 48897df8             mov     qword ptr [rbp - 0x8], rdi
00007fff78fb59b8 833d29d35e0000       cmp     dword ptr [rip + 0x5ed329], 0x0
00007fff78fb59bf 7405                 je      0x7fff78fb59c6
00007fff78fb59c1 e8fa3d3a7e           call    0x7ffff73597c0 (JitHelp: CORINFO_HELP_DBG_IS_JUST_MY_CODE)
00007fff78fb59c6 90                   nop     

/home/tang/Downloads/dotnetDemo/abc/Program.cs @ 5:
00007fff78fb59c7 48bfa0505a79ff7f0000 movabs  rdi, 0x7fff795a50a0
00007fff78fb59d1 e8da1f397e           call    0x7ffff73479b0 (JitHelp: CORINFO_HELP_NEWSFAST)
00007fff78fb59d6 488945e8             mov     qword ptr [rbp - 0x18], rax
00007fff78fb59da 488b7de8             mov     rdi, qword ptr [rbp - 0x18]
00007fff78fb59de ff15f4765b00         call    qword ptr [rip + 0x5b76f4]
00007fff78fb59e4 488b7de8             mov     rdi, qword ptr [rbp - 0x18]
00007fff78fb59e8 48897df0             mov     qword ptr [rbp - 0x10], rdi

/home/tang/Downloads/dotnetDemo/abc/Program.cs @ 6:
00007fff78fb59ec 48bfa0180c79ff7f0000 movabs  rdi, 0x7fff790c18a0
00007fff78fb59f6 ff155c647600         call    qword ptr [rip + 0x76645c]
00007fff78fb59fc 90                   nop     

/home/tang/Downloads/dotnetDemo/abc/Program.cs @ 7:
00007fff78fb59fd ff1525217000         call    qword ptr [rip + 0x702125]
00007fff78fb5a03 90                   nop     

/home/tang/Downloads/dotnetDemo/abc/Program.cs @ 8:
00007fff78fb5a04 90                   nop     
00007fff78fb5a05 4883c420             add     rsp, 0x20
00007fff78fb5a09 5d                   pop     rbp
00007fff78fb5a0a c3                   ret     

通过上面的汇编然后对照GCInfo的变量,可以看到

+rbp-8保存的是寄存器rdi的值,
+rbp-10 这个有两个最先保存的是rax的值,后面被覆盖了。保存的是new Program类的时候,调用的默认构造函数.Ctor返回的值,也就是Program对象的地址。
+rbp-18 这个跟上面一样,只不过非默认构造

注意

sos name2ee abc.dll abc.Program.Main这个命令在Main函数没被完全编译之前,在lldb里面是无法运行的,会出错。

有了这些信息以后,我们进一步看下。

Find Object

为了了解GC标记对象的运作过程,先看下它如何找到对象。找对象是通过函数里面gcinfo里局部变量保存,然后GC的时候寻找得到。看其过程,可以在函数ReportStackSlotToGC处下个断点,然后把其它的两个断点给它删掉

(lldb) b ReportStackSlotToGC
Breakpoint 7: where = libcoreclr.so`GcInfoDecoder::ReportStackSlotToGC(int, GcStackSlotBase, unsigned int, REGDISPLAY*, unsigned int, void (*)(void*, OBJECTREF*, unsigned int), void*) + 48 at gcinfodecoder.cpp:2005:39, address = 0x00007ffff74ad9b0
(lldb) b
Current breakpoints:
5: name = 'RunMainInternal', locations = 1, resolved = 1, hit count = 1
  5.1: where = libcoreclr.so`RunMainInternal(Param*) + 31 at assembly.cpp:1335:37, address = 0x00007ffff701b8bf, resolved, hit count = 1 

6: name = 'PreStubWorker', locations = 1, resolved = 1, hit count = 6
  6.1: where = libcoreclr.so`::PreStubWorker(TransitionBlock *, MethodDesc *) + 38 at prestub.cpp:1866:11, address = 0x00007ffff71bcf36, resolved, hit count = 6 

7: name = 'ReportStackSlotToGC', locations = 1, resolved = 1, hit count = 0
  7.1: where = libcoreclr.so`GcInfoDecoder::ReportStackSlotToGC(int, GcStackSlotBase, unsigned int, REGDISPLAY*, unsigned int, void (*)(void*, OBJECTREF*, unsigned int), void*) + 48 at gcinfodecoder.cpp:2005:39, address = 0x00007ffff74ad9b0, resolved, hit count = 0 

(lldb) br del 5 6
2 breakpoints deleted; 0 breakpoint locations disabled.
(lldb) b
Current breakpoints:
7: name = 'ReportStackSlotToGC', locations = 1, resolved = 1, hit count = 0
  7.1: where = libcoreclr.so`GcInfoDecoder::ReportStackSlotToGC(int, GcStackSlotBase, unsigned int, REGDISPLAY*, unsigned int, void (*)(void*, OBJECTREF*, unsigned int), void*) + 48 at gcinfodecoder.cpp:2005:39, address = 0x00007ffff74ad9b0, resolved, hit count = 0 

运行c命令,转到ReportStackSlotToGC函数

(lldb) c
Process 111676 resuming
Hello, World!
Process 111676 stopped
* thread #1, name = 'corerun', stop reason = breakpoint 7.1
    frame #0: 0x00007ffff74ad9b0 libcoreclr.so`GcInfoDecoder::ReportStackSlotToGC(this=0x00007fffffffa0f0, spOffset=-24, spBase=GC_FRAMEREG_REL, gcFlags=0, pRD=0x00007fffffffab30, flags=0, pCallBack=(libcoreclr.so`GcEnumObject(void*, OBJECTREF*, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007fffffffbc80)(void*, OBJECTREF*, unsigned int), void*) at gcinfodecoder.cpp:2005:39
   2002	{
   2003	    GCINFODECODER_CONTRACT;
   2004	
-> 2005	    OBJECTREF* pObjRef = GetStackSlot(spOffset, spBase, pRD);
   2006	    _ASSERTE(IS_ALIGNED(pObjRef, sizeof(OBJECTREF*)));
   2007	
   2008	#ifdef _DEBUG

我们看下参数里面的变量spOffset的值是-24,也可以通过命令看下

(lldb) p spOffset 
(INT32) $6 = -24
(lldb) p/x spOffset  //  p/x后面的x表示十六进制显示
(INT32) $7 = 0xffffffe8

spOffset=0xffffffe8十进制的-24,十六进制的-18。还记得上面那三个GCInfo的变量吗?其中一个就是rbp-0x18。

继续C命令运行下

(lldb) c
Process 111676 resuming
Process 111676 stopped
* thread #1, name = 'corerun', stop reason = breakpoint 7.1
    frame #0: 0x00007ffff74ad9b0 libcoreclr.so`GcInfoDecoder::ReportStackSlotToGC(this=0x00007fffffffa0f0, spOffset=-16, spBase=GC_FRAMEREG_REL, gcFlags=0, pRD=0x00007fffffffab30, flags=0, pCallBack=(libcoreclr.so`GcEnumObject(void*, OBJECTREF*, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007fffffffbc80)(void*, OBJECTREF*, unsigned int), void*) at gcinfodecoder.cpp:2005:39
   2002	{
   2003	    GCINFODECODER_CONTRACT;
   2004	
-> 2005	    OBJECTREF* pObjRef = GetStackSlot(spOffset, spBase, pRD);
   2006	    _ASSERTE(IS_ALIGNED(pObjRef, sizeof(OBJECTREF*)));
   2007	
   2008	#ifdef _DEBUG

从参数可以看到这次spOffset=-16,十六进制的-10,跟上面GCInfo的第二个变量相同。

继续c命令

(lldb) c
Process 111676 resuming
Process 111676 stopped
* thread #1, name = 'corerun', stop reason = breakpoint 7.1
    frame #0: 0x00007ffff74ad9b0 libcoreclr.so`GcInfoDecoder::ReportStackSlotToGC(this=0x00007fffffffa0f0, spOffset=-8, spBase=GC_FRAMEREG_REL, gcFlags=0, pRD=0x00007fffffffab30, flags=0, pCallBack=(libcoreclr.so`GcEnumObject(void*, OBJECTREF*, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007fffffffbc80)(void*, OBJECTREF*, unsigned int), void*) at gcinfodecoder.cpp:2005:39
   2002	{
   2003	    GCINFODECODER_CONTRACT;
   2004	
-> 2005	    OBJECTREF* pObjRef = GetStackSlot(spOffset, spBase, pRD);
   2006	    _ASSERTE(IS_ALIGNED(pObjRef, sizeof(OBJECTREF*)));
   2007	
   2008	#ifdef _DEBUG

可以看到spOffset=-8,跟GCInfo的第一个变量相同。可以看到gcinfo的三个变量全部找到,也就是三个对象地址。它的原理如本节标题头所述无差。

这里还需要再看下找到对象的过程

->OBJECTREF* pObjRef = GetStackSlot(spOffset, spBase, pRD);
->SIZE_T * pFrameReg = (SIZE_T*) GetRegisterSlot(m_StackBaseRegister, pRD);
->ULONGLONG **ppRax = &pRD->pCurrentContextPointers->Rax; //可以看到这个地方获取托管Mian函数汇编代码返回值rax
->return (OBJECTREF*)*(ppRax + regNum);//托管Main的汇编rbp等于*(rax+regNum)的地方,其中regNum固定等于5。以为rbp在rax偏移5的地方,一个单元是8字节。

上面主要看下最后一句话

(lldb) p *pRD->pCurrentContextPointers
(KNONVOLATILE_CONTEXT_POINTERS) $11 = {
   = {
    FloatingContext = {
      [0] = nullptr
      [1] = nullptr
      [2] = nullptr
      [3] = nullptr
      [4] = nullptr
      [5] = nullptr
      [6] = nullptr
      [7] = nullptr
      [8] = nullptr
      [9] = nullptr
      [10] = nullptr
      [11] = nullptr
      [12] = nullptr
      [13] = nullptr
      [14] = nullptr
      [15] = nullptr
    }
     = {
      Xmm0 = nullptr
      Xmm1 = nullptr
      Xmm2 = nullptr
      Xmm3 = nullptr
      Xmm4 = nullptr
      Xmm5 = nullptr
      Xmm6 = nullptr
      Xmm7 = nullptr
      Xmm8 = nullptr
      Xmm9 = nullptr
      Xmm10 = nullptr
      Xmm11 = nullptr
      Xmm12 = nullptr
      Xmm13 = nullptr
      Xmm14 = nullptr
      Xmm15 = nullptr
    }
  }
   = {
    IntegerContext = {
      [0] = 0x0000000000000000
      [1] = 0x0000000000000000
      [2] = 0x0000000000000000
      [3] = 0x00007fffffffc828
      [4] = 0x00007fffffffb7b8
      [5] = 0x00007fffffffc850
      [6] = 0x0000000000000000
      [7] = 0x0000000000000000
      [8] = 0x0000000000000000
      [9] = 0x0000000000000000
      [10] = 0x0000000000000000
      [11] = 0x0000000000000000
      [12] = 0x00007fffffffc830
      [13] = 0x00007fffffffc838
      [14] = 0x00007fffffffc840
      [15] = 0x00007fffffffc848
    }
     = {
      Rax = 0x0000000000000000
      Rcx = 0x0000000000000000
      Rdx = 0x0000000000000000
      Rbx = 0x00007fffffffc828
      Rsp = 0x00007fffffffb7b8
      Rbp = 0x00007fffffffc850
      Rsi = 0x0000000000000000
      Rdi = 0x0000000000000000
      R8 = 0x0000000000000000
      R9 = 0x0000000000000000
      R10 = 0x0000000000000000
      R11 = 0x0000000000000000
      R12 = 0x00007fffffffc830
      R13 = 0x00007fffffffc838
      R14 = 0x00007fffffffc840
      R15 = 0x00007fffffffc848
    }
  }
}

跟上面推断的一模一样。不再赘述。

还有一个地方需要注意的是,如何切换到下一帧,也就是下个函数的地址以及帧栈变量,这里有两个地方

       PCODE adr = m_crawl.pFrame->GetReturnAddress();
        _ASSERTE(adr != (PCODE)POISONC);

        _ASSERTE(!pInlinedFrame || adr);

        if (adr)
        {
            ProcessIp(adr);
			// 通过pThread = ThreadStore::GetThreadList(pThread)) != NULL循环获取下一个Thread的pFrame。然后填充m_crawl,通过ProcessIP初始化。来推动当前rax,rbp的前进。
			//只贴部分代码,太长了。

还有一个就是

UnwindStackFrame-》VirtualUnwindCallFrame来实现,此处不述。

最后再看下堆栈调用,bt命令

(lldb) bt
* thread #1, name = 'corerun', stop reason = breakpoint 7.1
  * frame #0: 0x00007ffff74ad9b0 libcoreclr.so`GcInfoDecoder::ReportStackSlotToGC(this=0x00007fffffffa0f0, spOffset=-8, spBase=GC_FRAMEREG_REL, gcFlags=0, pRD=0x00007fffffffab30, flags=0, pCallBack=(libcoreclr.so`GcEnumObject(void*, OBJECTREF*, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007fffffffbc80)(void*, OBJECTREF*, unsigned int), void*) at gcinfodecoder.cpp:2005:39
    frame #1: 0x00007ffff74ac90a libcoreclr.so`GcInfoDecoder::ReportSlotToGC(this=0x00007fffffffa0f0, slotDecoder=0x00007fffffff9e90, slotIndex=4, pRD=0x00007fffffffab30, reportScratchSlots=true, inputFlags=0, pCallBack=(libcoreclr.so`GcEnumObject(void*, OBJECTREF*, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007fffffffbc80)(void*, OBJECTREF*, unsigned int), void*) at gcinfodecoder.h:696:17
    frame #2: 0x00007ffff74aca9e libcoreclr.so`GcInfoDecoder::ReportUntrackedSlots(this=0x00007fffffffa0f0, slotDecoder=0x00007fffffff9e90, pRD=0x00007fffffffab30, inputFlags=0, pCallBack=(libcoreclr.so`GcEnumObject(void*, OBJECTREF*, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007fffffffbc80)(void*, OBJECTREF*, unsigned int), void*) at gcinfodecoder.cpp:1040:9
    frame #3: 0x00007ffff74abf13 libcoreclr.so`GcInfoDecoder::EnumerateLiveSlots(this=0x00007fffffffa0f0, pRD=0x00007fffffffab30, reportScratchSlots=false, inputFlags=0, pCallBack=(libcoreclr.so`GcEnumObject(void*, OBJECTREF*, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007fffffffbc80)(void*, OBJECTREF*, unsigned int), void*) at gcinfodecoder.cpp:989:9
    frame #4: 0x00007ffff70b385c libcoreclr.so`EECodeManager::EnumGcRefs(this=0x0000555555602770, pRD=0x00007fffffffab30, pCodeInfo=0x00007fffffffa990, flags=0, pCallBack=(libcoreclr.so`GcEnumObject(void*, OBJECTREF*, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007fffffffbc80, relOffsetOverride=4294967295)(void*, OBJECTREF*, unsigned int), void*, unsigned int) at eetwain.cpp:5320:24

	//这里省略了一部分堆栈,因为太长了,不贴出来了。

总结

看这个过程,其实很明了了。当RyuJIT编译完整个托管Main函数,会把Main函数里面局部变量存储到GCInfo里面去。在进行垃圾回收,标记对象的时候,从GCInfo里面取出当前函数需要标记的对象,对它进行一个标记。