Common API obfuscation methods and handling methods of malicious code

Many articles can't be updated to CSDN, and can be concerned about my official account number (programmer sailing).

1. Summary

When analyzing malicious code, we often encounter the situation that there is no import function in the import table when statically Analyzing Malicious Code. This situation is usually that malicious code confused the API. Many malicious codes try to confuse the API they use to resist static analysis. After the API is confused, static analysis can hardly get effective information, Below, I summarize the methods of confusing APIs often used by malicious code and how to deal with them

 

2. Common api obfuscation methods of malicious code

The first kind of malicious code creates IAT by itself and implements functions similar to LoadLibrary and GetProcAddress. The parameters passed in are usually the hash value of dll name and function name. The function address is stored in the pointer array, and then different functions are called through the pointer array. The mailto blackmail software described below belongs to this kind

The second kind of malicious code calculates the real function entry point and uses jmp instruction to jump over. shellcode of xshell backdoor uses this method

The third way is to store the true function entrance point after the encryption is stored in the global variable. When calling the function, decrypting the global variable is the entrance point of the function. The xdata extortion software introduced below is used in this way.

In the fourth, the DOS header is erased. Generally, malicious code is often used in shellcode to resist memory forensics tools or to avoid the detection of process injection pe files. This method is used by the ccleaner backdoor introduced below

 

3. Methods to solve api confusion

3.1 idapython

Taking mailto blackmail software as an example (MD5:3D6203DF53FCAA16D71ADD5F47BDD060), first analyze the way the sample confuses the API. The sample creates IAT by itself, and obtains the dll base address and MwImportApi to obtain the address of the import function by using the self implemented function MwLoadDll. MwLoadDll takes the hash value of dll name as the parameter, and MwImportApi takes the module base address and function name hash value as the parameter

MwLoadDll function obtains PEB through FS:[0x30], and obtains ppeb through OxC offset of PEB structure_ LDR_ Data pointer, according to_ PEB_ LDR_ The 0x14 offset of the data structure obtains the inmemoryordmodulelist linked list, which points to the LDR_ The two-way linked list of module structure. After traversing the linked list, calculate whether the hash value of each module name is equal to the one passed in. If it is equal, the base address of the module is returned

MwImportApi function traverses its export table through the pe structure of dll module, calculates whether the hash value of each export function is equal to the passed hash value, returns the address of this function and stores it in the pointer array

After that, the sample call functions pass through this pointer array

Next, we start to use idapython to solve the API of sample confusion

First, we need to obtain the offset of the hash value passed in by all MwImportApi functions and its return value and correspond them one by one (because several positions of the offset of the sample are not in order to interfere with the analyst)

for addr in XrefsTo(idc.get_name_ea_simple("MwImportApi"),0):
if(hex(addr.frm)>hex(0x04013C8)):
argaddr = addr.frm - 9
offsetaddr = addr.frm+14
offsetarg = idc.get_operand_value(offsetaddr,0)
arghash = idc.get_operand_value(argaddr,0)
index = int(offsetarg/4)
apilist[index]=arghash

After having the corresponding value, we use python to implement the hash algorithm used by the sample. mailto can easily identify that the sample uses CRC32 algorithm according to ida. We also traverse the export table of dll, calculate the hash value, judge whether it is equal to the hash value stored in the list, and write the exported function name of the module to the corresponding offset of the list

def HashExportNames(pe_path, apilist, hashfunc):
pe = pefile.PE(pe_path, fast_load=False)
for entry in pe.DIRECTORY_ENTRY_EXPORT.symbols:
if entry.name != None:
strtmp = str(entry.name)
apiname = strtmp[2:len(strtmp)-1]
apihash = hashfunc(apiname)
inthash = int(apihash,16)
if( inthash in apilist ):
listidx = apilist.index(inthash)
apilist[listidx] = apiname
return

Finally, by traversing the list corresponding to the function name and offset, the structure header file of the sample function pointer is automatically generated, the header file is imported with ida, and the return value of the MwGetApiAddr function that obtains the pointer array is set as the structure pointer

f = open(APIh_path, 'w')
f.write("typedef struct MwImportApis{ \n")
for i in apilist:
f.write("\tDWORD* %s;\n"%(i))

f.write('}*PApis ;')
f.close()

Or use IDA Python to create the structure directly

sid = idc.add_struct(-1,"MwImportApis",0)
for i in apilist:
idc.add_struct_member(sid,i,-1,FF_DATA|FF_DWORD,0,4)

At this point, we have solved the problem of api confusion

After disambiguation

After the above methods are processed, there is still an imperfection, that is, the function parameters cannot be displayed. In order to facilitate ida static analysis, we can use ida Python to set the function type for each function in the structure

First, we get the structure from the structure name in ida,

sid = idaapi.get_struc_id("MwImportApis")
struc = idaapi.get_struc(sid)

Then we enumerate all member variables in the structure

def enum_members(struc):
idx = 0
while idx != -1:
member = struc.get_member(idx)
yield member
idx = idaapi.get_next_member_idx(struc,member.soff)

Finally, use ida_typeinf.get_named_type gets the function type, and then ida API parse_ Decl2 resolves this type. After setting the member type in the structure as this type, ida will identify the function parameters to facilitate the subsequent analysis of the sample

def set_member_type_info(struc,member,decl):
ti = idaapi.tinfo_t()
idaapi.parse_decl2(None,decl,ti,0)
idaapi.set_member_tinfo(struc,member,0,ti,0)

3.2 simulation execution

When using ida static analysis, you want to call a function of malicious code and analyze its functions, such as string decryption and api anti confusion. Simulation execution is a good choice. The commonly used plug-ins are flare EMU and qiling. Here, qiling is taken as an example to process mailto samples

We do not need to use the entire sample to build the simulation function, but only need to execute the simulation function

We use the return address of the qiling hook MwImportApi function to obtain the value of eax, that is, the return value of the function, and then we use qiling's import_ Search for this address in symbols and return the function name

from qiling import *

def extract_func_name(ql):
eax = ql.reg.eax
func = ql.loader.import_symbols[eax]
func_name = func["name"].decode("ascii")
print(f"found {func_name} ")

ql = Qiling(["H:/qiling/examples/bin/mailto.bin"], "B:/qiling_rootfs/x86_windows")
ql.hook_address(extract_func_name,0x040121A)

ql.run(begin=0x0401360, end=0x0402512)

Then we will match the function name with the offset of the pointer array one by one as above, which will not be demonstrated here

Simulation execution is time-consuming, and it takes longer to execute functions with a large number of encryption and decryption operations

3.3 remotelookup

Remotelookup is a tool developed by fireeye. It can enumerate all DLL s loaded by the process, calculate API addresses and build lookup tables. First, we use this tool in the virtual machine to select malicious code processes and check Allow Remote Queries (port 9000)

Then we use python in the host to send the api address to the remotelookup.com of the virtual machine Exe, the function name is returned by searching the lookup table built. We modify the address in ida to the function name

Take xdata ransomware as an example (MD5:A0A7022CAA8BD8761D6722FE3172C0AF). First, briefly introduce the principle of xdata confusion API. xdata stores the exclusive or of function address and key in global variables

When the sample calls the function, the real entry point of the function can be obtained by using the exclusive or of the global variable and the key

The following describes how to use Remotelookup to handle xdata obfuscation API

First, we use OD to obtain the encrypted value of the API entry point stored in the global variable

Store these values in the file. We modify the examples provided by RemoteLookup, read these encrypted values in turn, XOR decrypt them, obtain the real function entry point, and then transfer the function entry point to remotelookup.com in the virtual machine Exe, return the function name through the lookup table built by lookup, and then rename the address of ida global variable to the corresponding function name

if remote.attach(6436):
start=0x04121C0
end=0x41248C
base=0x43C1FBF5

while start < end:
temp = get_encoded_address(start)
real_address = base^temp
addr=str(hex(real_address))
addr=addr[2:10]
if remote.resolve(addr):
result=remote.response
if result.find("Error")==-1 or result.find("Win32Error"):
result=result.replace(' ','')
Name=result.split(',')[1]+'0'
idc.MakeName(start,Name)
else:
print (result)
else:
print( "Failed:" + remote.response)
start+=4

else:
print( 'Failed to attach to pid')

Before treatment

After treatment

3.4 repair pe head

The shellcode decrypted by ccleaner backdoor (MD5:ef694b89ad7addb9a16bb6f26f1efaf7) is a dll file that erases the dos header. ida can not identify the API it calls by directly using ida analysis. We can repair the pe file manually or with tools, and ida can identify the API after repair. We can also use the Volatility memory forensics framework to obtain the API and address used by shellcode, Then export the idc script to automatically name the API

Before repair

After repair

 

4 Summary

The above methods are sufficient to deal with most malicious code. Some methods may have limited use scenarios. Different methods should be used to deal with API confusion according to different situations

This is the end of the article. Thank you for watching

To be honest, I feel very happy every time I see some readers' responses in the background. I want to contribute some of my collection of programming dry goods to you and give back to every reader. I hope I can help you.

Dry goods mainly include:

① More than 2000 Python e-books (both mainstream and classic books should be available)

② Python standard library materials (the most complete Chinese version)

③ Project source code (forty or fifty interesting and classic hand training projects and source code)

④ Videos on basic introduction to Python, crawler, web development and big data analysis (suitable for Xiaobai)

⑤ Summary of all Python knowledge points (you can find out all directions and technologies of Python)

*If you can use it, you can take it directly. In my QQ technical exchange group, you can take it by yourself. The group number is 421592457*

Keywords: Python Programming HashMap

Added by wgordonw1 on Thu, 10 Feb 2022 04:53:37 +0200