Bochs source code analysis - 24: Bochs implementation code for Non-PAE-4K page

preface

On the basis of constructing the paging mode, we will analyze bochs's Non-PAE-4K paging mechanism through the analysis of mov instructions. In the analysis process, we will see various checks on page attributes, and then summarize.

Note that the cache part (TLB) may be involved, which we decided to analyze later. Therefore, if we encounter TLB information, we will skip it for the time being and do not interpret it too much.

Disassembly location code

Let's start with ex11-2 / protected The code of test 2 in ASM is commented out. For intuitive analysis, mov[xx],0 is changed to mov[xx],4 Then compile according to the Shell script supplemented in our previous article, as follows:

; Test 2: in CR0.WP=0 Write 0 in level 0 code x400000 address
       mov DWORD [0x400000], 4   

Use IDA to reverse this part and locate its assembly address, as follows:

Now run bochs disassembly code through bochs, as follows:

Stack Analysis-1

Its core code is BX_CPU_C::access_write_linear(...) From CPU - > execute1 (...) To access_write_linear(...) After a series of functions, these functions have no core code, but basically simply encapsulate the call. We directly show their function stack.

>	bochs.exe!BX_CPU_C::access_write_linear(unsigned __int64 laddr, unsigned int len, unsigned int curr_pl, unsigned int xlate_rw, unsigned int ac_mask, void * data) Line 2402	C++
 	bochs.exe!BX_CPU_C::write_linear_dword(unsigned int s, unsigned __int64 laddr, unsigned int data) Line 102	C++
 	bochs.exe!BX_CPU_C::write_virtual_dword(unsigned int s, unsigned __int64 offset, unsigned int data) Line 200	C++
 	bochs.exe!BX_CPU_C::MOV_EdIdM(bxInstruction_c * i) Line 30	C++
 	bochs.exe!BX_CPU_C::cpu_loop() Line 109	C++

access_write_linear(...) function analysis

Let's now analyze access_write_linear(..) The function is defined as follows:

int BX_CPU_C::access_write_linear(
    bx_address laddr, // Linear address 0x400000
    unsigned len,     // Data length 4
    unsigned curr_pl, // Current CPL 0
    unsigned xlate_rw, // Read write BX_WRITE
    Bit32u ac_mask,     // 0x3
    void *data // Pointer to write data
){
    ....
}

In this part of the function, there are two very important codes, which we will focus on below.

        translate_linear(...) The function converts a linear address into a physical address, including a tlbEntry, which is a TLB page buffer. If the address has a cache in the TLB, it will be obtained from the cache first.

    bx_bool user = (curr_pl == 3);
    BX_CPU_THIS_PTR address_xlation.paddress1 = 
        translate_linear(tlbEntry, laddr, user, xlate_rw);

The following is the write data. At this time, the physical address has been obtained. Write the function to the address.

access_write_physical(
    BX_CPU_THIS_PTR address_xlation.paddress1, len, data);

translate_linear(...) function analysis

This part of the code is as follows. First, check the current permission. At this time, we only write. Here we calculate the lpf, which is the integer of the linear address. Use this value to query in the tlb cache (not analyzed temporarily). If it can be queried, directly query the corresponding physical address from the tlb and return it directly.

  unsigned isWrite = rw & 1; // write or r-m-w
  unsigned isExecute = (rw == BX_EXECUTE);
  unsigned isShadowStack = (rw & 4); // 4 if shadowstack and 0 otherwise
  bx_address lpf = LPFOf(laddr); // return laddr & 0xfffffffffffff000; 

    // Query from cache
  if (! isExecute && TLB_LPFOf(tlbEntry->lpf) == lpf){
    ...

  }

The following code is the code to judge the addressing mode. It is easy to understand. First check whether the cr0 register code is turned on, and then check whether long is turned on_ Mode, if it is enabled, directly select the 4-level paging mode for parsing, otherwise judge whether to enable PAE, and select PAE and NonPAE paging modes accordingly.

We must call translate here_ linear_ legacy(...), Legacy here means "traditional", which corresponds to the non page paging mode. Let's continue to analyze this function.

if(BX_CPU_THIS_PTR cr0.get_PG())
  {
    BX_DEBUG(("page walk for%s address 0x" FMT_LIN_ADDRX, isShadowStack ? " shadow stack" : "", laddr));

#if BX_CPU_LEVEL >= 6
#if BX_SUPPORT_X86_64
    if (long_mode())
      paddress = translate_linear_long_mode(laddr, lpf_mask, pkey, user, rw);
    else
#endif
      if (BX_CPU_THIS_PTR cr4.get_PAE())
        paddress = translate_linear_PAE(laddr, lpf_mask, user, rw);
      else
#endif 
        paddress = translate_linear_legacy(laddr, lpf_mask, user, rw);
#endif
  }  else {
    // no paging
    paddress = (bx_phy_address) laddr;
    combined_access |= (BX_MEMTYPE_WB << 9); // act as memory type by paging is WB
  }

 translate_linear_legacy(...) function analysis

This function is used to parse the NonPage paging mode and obtain the physical page. The following is the core code of this part, which we will analyze later.

bx_phy_address BX_CPU_C::translate_linear_legacy(bx_address laddr, Bit32u &lpf_mask, unsigned user, unsigned rw)
{    
  bx_phy_address entry_addr[2], ppf = (Bit32u) BX_CPU_THIS_PTR cr3 & BX_CR3_PAGING_MASK;
  Bit32u entry[2];
  BxMemtype entry_memtype[2] = { 0 };
  int leaf;

  lpf_mask = 0xfff;
  Bit32u combined_access = (BX_COMBINED_ACCESS_WRITE | BX_COMBINED_ACCESS_USER);
  Bit32u curr_entry = (Bit32u) BX_CPU_THIS_PTR cr3;

  for (leaf = BX_LEVEL_PDE;; --leaf) {
    entry_addr[leaf] = ppf + ((laddr >> (10 + 10*leaf)) & 0xffc);

    }

    access_read_physical(entry_addr[leaf], 4, &entry[leaf]);

    curr_entry = entry[leaf];
    if (!(curr_entry & 0x1)) {
      BX_DEBUG(("%s: entry not present", bx_paging_level[leaf]));
      page_fault(ERROR_NOT_PRESENT, laddr, user, rw);
    }

    ppf = curr_entry & 0xfffff000;

    if (leaf == BX_LEVEL_PTE) break;

#if BX_CPU_LEVEL >= 5
    if ((curr_entry & 0x80) != 0 && BX_CPU_THIS_PTR cr4.get_PSE()) {
      // 4M paging, only if CR4.PSE enabled, ignore PDE.PS otherwise
      if (curr_entry & PAGING_PDE4M_RESERVED_BITS) {
        BX_DEBUG(("PSE PDE4M: reserved bit is set: PDE=0x%08x", entry[BX_LEVEL_PDE]));
        page_fault(ERROR_RESERVED | ERROR_PROTECTION, laddr, user, rw);
      }

      // make up the physical frame number
      ppf = (curr_entry & 0xffc00000);
#if BX_PHY_ADDRESS_WIDTH > 32
      ppf |= ((bx_phy_address)(curr_entry & 0x003fe000)) << 19;
#endif
      lpf_mask = 0x3fffff;
      break;
    }
#endif

    combined_access &= curr_entry; // U/S and R/W
  }
    ...
    ...
}

First, it passes through the CPU CR3 to obtain the base address 0x100000 of the pde page table. The code is as follows. We also show the relevant code in the assembly.

ppf = (Bit32u) BX_CPU_THIS_PTR cr3 & BX_CR3_PAGING_MASK;


;*** 32-bit paging �� ***
%define PDT32_BASE              100000h
mov eax, PDT32_BASE
mov cr3, eax

Later, the variable combined is constructed_ Access, the reason for this is that the page permission reading and writing is jointly determined by multi-level page table items. Therefore, we preset a maximum value first, and then perform a & operation every time we read a page table item, and finally get the final access value of the page.

Bit32u combined_access = (BX_COMBINED_ACCESS_WRITE | BX_COMBINED_ACCESS_USER);

Next, we come to the core loop. Note that the structure has no termination conditions, so its exit is break (there is no false in the loop). We are non page. Therefore, we start recursive search from PDE, leaf = BX_LEVEL_PDE.

enum {
  BX_LEVEL_PML4 = 3,
  BX_LEVEL_PDPTE = 2,
  BX_LEVEL_PDE = 1,
  BX_LEVEL_PTE = 0
};

for (leaf = BX_LEVEL_PDE;; --leaf) { 
    ... 
}

The following part is easy to understand. It is to calculate the address offset. Note that it is (LADDR > > 20) & 0xffc, which is equivalent to (LADDR > > 22) * 4, because the length of the item in the PDT is 4 bytes, such as entry_ The PDE item is stored in addr [1].

entry_addr[leaf] = ppf + ((laddr >> (10 + 10*leaf)) & 0xffc);

// Code previously simulated with c + +
*offset_1 = (address >> 22);

Then call access_. read_ physical(...) This function obtains the address of PTT and puts it in entry[1].

access_read_physical(entry_addr[leaf], 4, &entry[leaf]);

After obtaining the page, it judges whether the p-bit is valid, and then judges whether it is currently in BX_LEVEL_PTE, if yes, the parsing is completed, otherwise exit. Of course, we're still at BX_LEVEL_PDE, continue to analyze.

    curr_entry = entry[leaf];
    if (!(curr_entry & 0x1)) {
      BX_DEBUG(("%s: entry not present", bx_paging_level[leaf]));
      page_fault(ERROR_NOT_PRESENT, laddr, user, rw);
    }

    ppf = curr_entry & 0xfffff000;

    if (leaf == BX_LEVEL_PTE) break;

The following is to judge whether it is a large page PS(4M) mode. In addition to judging PDE items, it also judges Cr4 PSE bit to check whether the CPU turns on this mode. If it is in the large page mode, get the corresponding address directly, and then break down and continue parsing. This is a 4K mode, so continue to analyze it.

    if ((curr_entry & 0x80) != 0 && BX_CPU_THIS_PTR cr4.get_PSE()) {
      // 4M paging, only if CR4.PSE enabled, ignore PDE.PS otherwise
      if (curr_entry & PAGING_PDE4M_RESERVED_BITS) {
        BX_DEBUG(("PSE PDE4M: reserved bit is set: PDE=0x%08x", entry[BX_LEVEL_PDE]));
        page_fault(ERROR_RESERVED | ERROR_PROTECTION, laddr, user, rw);
      }

      // make up the physical frame number
      ppf = (curr_entry & 0xffc00000);
#if BX_PHY_ADDRESS_WIDTH > 32
      ppf |= ((bx_phy_address)(curr_entry & 0x003fe000)) << 19;
#endif
      lpf_mask = 0x3fffff;
      break;
    }

The following is to update the access mode of the current item, using the & operator, as follows:

combined_access &= curr_entry; // U/S and R/W

Then start the second cycle, leaf --, and reach BX_ LEVEL_ In PTE mode, you will go through another round of operations such as reading memory. Finally, you will encounter the following statement to end the loop and then return.

    ppf = curr_entry & 0xfffff000;

    if (leaf == BX_LEVEL_PTE) break;

What we return is pff, which is combined_access, the physical address and read-write permission corresponding to the linear address we need. The second half of this code is all kinds of permission checks. Let's go around and don't continue the analysis.

return (ppf | combined_access);

access_write_physical(...) function analysis

The function call stack is as follows. There is nothing to analyze in this part of the code, that is, a simple function call.

>	bochs.exe!WriteHostDWordToLittleEndian(unsigned int * hostPtr, unsigned int nativeVar32) Line 545	C++
 	bochs.exe!BX_MEM_C::writePhysicalPage(BX_CPU_C * cpu, unsigned __int64 addr, unsigned int len, void * data) Line 99	C++
 	bochs.exe!BX_CPU_C::access_write_physical(unsigned __int64 paddr, unsigned int len, void * data) Line 2599	C++
 	bochs.exe!BX_CPU_C::access_write_linear(unsigned __int64 laddr, unsigned int len, unsigned int curr_pl, unsigned int xlate_rw, unsigned int ac_mask, void * data) Line 2410	C++

It is worth noting that the following code, in WritePhysicalPage, passes the pointer through get_vector(a20addr) is returned by this function. You can see that its physical page is stored in mm Blocks [block] structure, and then obtain the corresponding offset through (bit32u) (addr & (bx_mem_block_len-1)).

Mem & (0x1000 - 1) this algorithm means to obtain the contents of 0x1000 after Mem. This method should be clear.

Mm here blocks[..] We explained in the bochs initialization chapter earlier. You can go back and read the relevant contents.

WriteHostDWordToLittleEndian((Bit32u*) BX_MEM_THIS get_vector(a20addr), *(Bit32u*)data);


BX_CPP_INLINE Bit8u* BX_MEM_C::get_vector(bx_phy_address addr)
{
  Bit32u block = (Bit32u)(addr / BX_MEM_BLOCK_LEN);
#if (BX_LARGE_RAMFILE)
  if (!BX_MEM_THIS blocks[block] || (BX_MEM_THIS blocks[block] == BX_MEM_THIS swapped_out))
#else
  if (!BX_MEM_THIS blocks[block])
#endif
    allocate_block(block);

  return BX_MEM_THIS blocks[block] + (Bit32u)(addr & (BX_MEM_BLOCK_LEN-1));
}

Finally, this is writing data in

BX_CPP_INLINE void WriteHostDWordToLittleEndian(Bit32u *hostPtr, Bit32u nativeVar32)
{
  *(hostPtr) = nativeVar32;
}

summary

After analyzing bochs's implementation of the code here, we only analyze a general process here. We can see the analysis of three paging modes, and then we will focus on the analysis of relevant attribute permission checks.

Several important functions

access_write_physical - write physical page

translate_linear - convert linear address to physical address

translate_linear_long_mode - LongMode mode page parsing

translate_linear_PAE - PAE mode analysis

translate_ linear_ Analysis of legacy - non PAE model

The following is a value taking technique obtained from the analysis of the source code. Its OFFSET is defined to obtain the last 12 bits. Here, 12 means to obtain the last 12 bits, and finally output the result 0x234. We have often encountered this skill before, but we may not be able to learn it at all because of carelessness or poor technology at that time. Now it's really clever.

#define OFFSET 12
int main(int argc, char const* argv[]) {

	int laddr = 0x1234;
	int offset = 0x1234 & ((1 << OFFSET) - 1);
	cout << hex << offset << endl; // 234

	return 0;
}

Added by bigswifty on Wed, 29 Dec 2021 01:10:37 +0200