Unicorn Engine初体验

muhe

2018-01-15

学习记录

0x00:关于unicorn engine

Unicorn Engine是一个模拟器(emulator)，简单的来说就是可以模拟执行程序or片段的代码。对于逆向分析来说很有用，比如分析某个片段的代码的作用；对于漏洞挖掘选手来说，前一段时间的unicorn-afl着实亮眼，不过有带更深入的研究。

0x01:关于本文

很巧，今天玄武推送推了一篇Unicorn Engine tutorial，感觉写的很好，作者也很有趣，文章中还布置了home work，哈哈哈，上班的时候没时间做，下班回到家就看了看，模仿者第一个例子，和作者给出的hint把两个home work做了下。

0x02: shellcode分析

作者在这里给出了一段混淆过的shellcode，直接反汇编器查看的话，并不能直接分析出这段shellcode的作用。

shellcode = "\xe8\xff\xff\xff\xff\xc0\x5d\x6a\x05\x5b\x29\xdd\x83\xc5\x4e\x89\xe9\x6a\x02\x03\x0c\x24\x5b\x31\xd2\x66\xba\x12\x00\x8b\x39\xc1\xe7\x10\xc1\xef\x10\x81\xe9\xfe\xff\xff\xff\x8b\x45\x00\xc1\xe0\x10\xc1\xe8\x10\x89\xc3\x09\xfb\x21\xf8\xf7\xd0\x21\xd8\x66\x89\x45\x00\x83\xc5\x02\x4a\x85\xd2\x0f\x85\xcf\xff\xff\xff\xec\x37\x75\x5d\x7a\x05\x28\xed\x24\xed\x24\xed\x0b\x88\x7f\xeb\x50\x98\x38\xf9\x5c\x96\x2b\x96\x70\xfe\xc6\xff\xc6\xff\x9f\x32\x1f\x58\x1e\x00\xd3\x80"

# muhe @ muheMacBookPro in /tmp [22:38:29]
$ python -c 'shellcode = "\xe8\xff\xff\xff\xff\xc0\x5d\x6a\x05\x5b\x29\xdd\x83\xc5\x4e\x89\xe9\x6a\x02\x03\x0c\x24\x5b\x31\xd2\x66\xba\x12\x00\x8b\x39\xc1\xe7\x10\xc1\xef\x10\x81\xe9\xfe\xff\xff\xff\x8b\x45\x00\xc1\xe0\x10\xc1\xe8\x10\x89\xc3\x09\xfb\x21\xf8\xf7\xd0\x21\xd8\x66\x89\x45\x00\x83\xc5\x02\x4a\x85\xd2\x0f\x85\xcf\xff\xff\xff\xec\x37\x75\x5d\x7a\x05\x28\xed\x24\xed\x24\xed\x0b\x88\x7f\xeb\x50\x98\x38\xf9\x5c\x96\x2b\x96\x70\xfe\xc6\xff\xc6\xff\x9f\x32\x1f\x58\x1e\x00\xd3\x80";print shellcode' > sc.dump

# muhe @ muheMacBookPro in /tmp [22:38:37]
$ file sc.dump
sc.dump: data

用r2分析的话：

[0x00000000]> pd
            0x00000000      e8ffffffff     call 4
            0x00000005      c05d6a05       rcr byte [rbp + 0x6a], 5
            0x00000009      5b             pop rbx
            0x0000000a      29dd           sub ebp, ebx
            0x0000000c      83c54e         add ebp, 0x4e               ; 'N'
            0x0000000f      89e9           mov ecx, ebp
            0x00000011      6a02           push 2                      ; 2
            0x00000013      030c24         add ecx, dword [rsp]
            0x00000016      5b             pop rbx
            0x00000017      31d2           xor edx, edx
            0x00000019      66ba1200       mov dx, 0x12                ; 18
        ┌─> 0x0000001d      8b39           mov edi, dword [rcx]
        ⁝   0x0000001f      c1e710         shl edi, 0x10
        ⁝   0x00000022      c1ef10         shr edi, 0x10
        ⁝   0x00000025      81e9feffffff   sub ecx, 0xfffffffe
        ⁝   0x0000002b      8b4500         mov eax, dword [rbp]
        ⁝   0x0000002e      c1e010         shl eax, 0x10
        ⁝   0x00000031      c1e810         shr eax, 0x10
        ⁝   0x00000034      89c3           mov ebx, eax
        ⁝   0x00000036      09fb           or ebx, edi
        ⁝   0x00000038      21f8           and eax, edi
        ⁝   0x0000003a      f7d0           not eax
        ⁝   0x0000003c      21d8           and eax, ebx
        ⁝   0x0000003e      66894500       mov word [rbp], ax
        ⁝   0x00000042      83c502         add ebp, 2
        ⁝   0x00000045      4a85d2         test rdx, rdx
        └─< 0x00000048      0f85cfffffff   jne 0x1d
            0x0000004e      ec             in al, dx
            0x0000004f      37             invalid
        ┌─< 0x00000050      755d           jne 0xaf
       ┌──< 0x00000052      7a05           jp 0x59
       ││   0x00000054      28ed           sub ch, ch
       ││   0x00000056      24ed           and al, 0xed
       ││   0x00000058      24ed           and al, 0xed
        │   0x0000005a      0b887feb5098   or ecx, dword [rax - 0x67af1481]
        │   0x00000060      38f9           cmp cl, bh
        │   0x00000062      5c             pop rsp
        │   0x00000063      96             xchg eax, esi
        │   0x00000064      2b9670fec6ff   sub edx, dword [rsi - 0x390190]
        │   0x0000006a      c6             invalid
        │   0x0000006b      ff9f321f581e   lcall [rdi + 0x1e581f32]
        │   0x00000071      00d3           add bl, dl
        │   0x00000073      800aff         or byte [rdx], 0xff
        │   0x00000076      ff             invalid
        │   0x00000077      ff             invalid
        │   0x00000078      ff             invalid
        │   0x00000079      ff             invalid
        │   0x0000007a      ff             invalid
        │   0x0000007b      ff             invalid

事实是啥都看不出来，但是作者说：

1	Note that the architecture is x86-32 now. List of syscalls numbers can be found here.

32位的，而且是调用了系统调用搞事情的。
那就可以模仿文中的例子，模拟执行这段代码，然后对系统调用打hook，把参数print出来，然后再跳过去。

根据资料，调用号放在eax寄存器，参数的顺序是：ebx,ecx,edx,esi,edi。
下面就是hook了int 80h指令，然后搞事情。

我的hook函数：

def hook_code(mu, address, size, user_data):
    op_code = mu.mem_read(address, size)
    if op_code == "\xcd\x80":
        call_number = mu.reg_read(UC_X86_REG_EAX)
        param1 = mu.reg_read(UC_X86_REG_EBX)
        param2 = mu.reg_read(UC_X86_REG_ECX)
        param3 = mu.reg_read(UC_X86_REG_EDX)
        param4 = mu.reg_read(UC_X86_REG_ESI)
        param5 = mu.reg_read(UC_X86_REG_EDI)
        
        print ("[*]Result as followed:")
        
        print ("\tCall number: {0}".format(call_number))
        print ("\tParam1     : {0}".format(param1))
        print ("\tParam2     : {0}".format(param2))
        print ("\tParam3     : {0}".format(param3))
        print ("\tParam4     : {0}".format(param4))
        print ("\tParam5     : {0}".format(param5))

        mu.reg_write(UC_X86_REG_EIP, address + size)

运行结果：

 $ python task1.py
[*]Result as followed:
	Call number: 15
	Param1     : 4194392
	Param2     : 438
	Param3     : 0
	Param4     : 0
	Param5     : 32979
[*]Result as followed:
	Call number: 1
	Param1     : 4194392
	Param2     : 438
	Param3     : 0
	Param4     : 0
	Param5     : 32979

第四、第五个参数应该没用到，第一次调用时15号调用，第二次是1号调用。查了一下，15号是chmod，1号是exit。
chmod的参数应该是文件名，权限。exit的参数的话，就是4194392。

就是想办法确定chmod操作了什么文件，4194392应该是一个指针。修改hook函数：


print ("\tCall number: {0}".format(call_number))
        if call_number == 15:
            file = mu.mem_read(param1, 32).split("\x00")[0]
            print ("\t[*]File is {0}".format(file))
        else:
            print ("\tParam1     : {0}".format(param1))

[*]Result as followed:
	Call number: 15
	[*]File is /etc/shadow
	Param2     : 438
	Param3     : 0
	Param4     : 0
	Param5     : 32979
[*]Result as followed:
	Call number: 1
	Param1     : 4194392
	Param2     : 438
	Param3     : 0
	Param4     : 0
	Param5     : 32979

chmod的第二个参数其实就是0666:

1
2
3

>>> oct(438)
'0666'
>>>

到此，分析完毕。

0x03: 修改函数的返回值

修改下面程序的逻辑，使得返回值是1。

int strcmp(char *a, char *b)
{
    //get length
    int len = 0;
    char *ptr = a;
    while(*ptr)
    {
        ptr++;
        len++;
    }
    
    //comparestrings
    for(int i=0; i<=len; i++)
    {
        if (a[i]!=b[i])
            return 1;
    }
    
    return 0;
}

__attribute__((stdcall))
int  super_function(int a, char *b)
{
    if (a==5 && !strcmp(b, "batman"))
    {
        return 1;
    }
    return 0;
}

int main()
{
    super_function(1, "spiderman");
}

这个也好做，直接调用super_function,然后根据栈的结构，直接把参数改了，因为是x86，c语言的调用约定是从右到左依次压栈，所以字符串spiderman的指针是第一个压栈的。

...
saved ebp
ret addr
1
ptr ---> "spiderman\0"
...

大概就是上面这样。

这部分比较容易，自己编译一个这个程序，然后找一下super_function函数的开头和结尾。

这个bin文件在mac上编译出来，地址啥的不一样，写脚本的时候要注意，bin文件映射地址最好是ida里分析的文件起始地址，这样的话，后面调用super func的时候，地址啥的可以直接用。

from unicorn import *
from unicorn.x86_const import *
import struct


def read(name):
    with open(name) as f:
        return f.read()

def u32(data):
    return struct.unpack("I", data)[0]

def p32(num):
    return struct.pack("I", num)

mu = Uc (UC_ARCH_X86, UC_MODE_32)

BASE = 0x00000000
STACK_ADDR = 0x40000000
STACK_SIZE = 1024*1024

mu.mem_map(BASE, 1024*1024)
mu.mem_map(STACK_ADDR, STACK_SIZE)


mu.mem_write(BASE, read("./function"))
r_esp = STACK_ADDR + (STACK_SIZE/2)     #ESP points to this address at function call

STRING_ADDR = 0x40000000
mu.mem_write(STRING_ADDR, "batman\x00") #write "batman" somewhere. We have choosen an address 0x0 which belongs to the stack.

mu.reg_write(UC_X86_REG_ESP, r_esp)     #set ESP
mu.mem_write(r_esp+4, p32(5))           #set the first argument. It is integer 5
mu.mem_write(r_esp+8, p32(STRING_ADDR)) #set the second argument. This is a pointer to the string "batman"


mu.emu_start(0x0000057B, 0x000005B1)      #start emulation from the beginning of super_function, end at RET instruction
return_value = mu.reg_read(UC_X86_REG_EAX)
print "The returned value is: %d" % return_value

# muhe @ muheMacBookPro in ~/Downloads [15:19:10]
$ python task2.py
The returned value is: 1

0x04: arm32的一个cm

类似于作者原文的第一个demo，就是那个ctf题目，只不过这次arch换成了arm32，注意大小端。

int __cdecl __noreturn main(int argc, const char **argv, const char **envp)
{
  int v3; // r0

  v3 = ccc(0x2710u, (int)argv, (int)envp);
  printf((const char *)&unk_745A4, v3);
}

在没有arm环境的情况下，使用unicorn来得出这个函数结算结果-。- 虽然我有arm环境 2333333

搜了一下arm传参的方式：

输入参数通过r0-r3传递，多余的放入堆栈中；返回值放入r0，不够的话放入{r0,r1}或者{r0,r1,r2,r3}，比如：
int foo(int a, int b, int c, int d), 输入：r0 = a, r1 = b, r2 = c, r3 = d，返回：r0 = 类型为int的retvalue
int *foo(char a, double b, int c, char d), 输入：r0 = a, r1用于对齐(double 要求8字节对齐), b = {r2, r3}，c放在堆栈的sp[0]位置，d放在堆栈的sp[4]位置，这里的sp是指进入函数时的sp；返回：r0 = 类型为int *的retvalue
注意如果返回值是结构体，情况有些特殊：
struct client foo(int a, char b, float c), 输入：r0 = 一个strcut client *变量，由调用者给出, r1 = a, r2 = b, r3 = c；返回：strcut client *变量，和调用者给的一样

from unicorn import *
from unicorn.arm_const import *
import struct


def read(name):
    with open(name) as f:
        return f.read()
        
def u32(data):
    return struct.unpack("I", data)[0]
    
def p32(num):
    return struct.pack("I", num)


mu = Uc (UC_ARCH_ARM, UC_MODE_LITTLE_ENDIAN)

BASE = 0x10000
STACK_ADDR = 0x300000
STACK_SIZE = 1024*1024

mu.mem_map(BASE, 1024*1024)
mu.mem_map(STACK_ADDR, STACK_SIZE)


mu.mem_write(BASE, read("./task4_arm"))

mu.reg_write(UC_ARM_REG_SP, STACK_ADDR + STACK_SIZE/2)

instructions_skip_list = []

CCC_START = 0x000104D0
CCC_END   = 0x00010580

stack = []                                          # Stack for storing the arguments
d = {}                                              # Dictionary that holds return values for given function arguments

def hook_code(mu, address, size, user_data):
    if address == CCC_START:                        # Are we at the beginning of ccc function?
        arg0 = mu.reg_read(UC_ARM_REG_R0)           # Read the first argument. it is passed by R0

        if arg0 in d:                               # Check whether return value for this function is already saved.
            ret = d[arg0]
            mu.reg_write(UC_ARM_REG_R0, ret)        # Set return value in R0
            mu.reg_write(UC_ARM_REG_PC, 0x105BC)    # Set PC to point at "BX LR" instruction. We want to return from fibonacci function

        else:
            stack.append(arg0)                      # If return value is not saved for this argument, add it to stack.

    elif address == CCC_END:
        arg0 = stack.pop()                          # We know arguments when exiting the function

        ret = mu.reg_read(UC_ARM_REG_R0)            # Read the return value (R0)
        d[arg0] = ret


mu.hook_add(UC_HOOK_CODE, hook_code)

mu.emu_start(0x00010584, 0x000105A8)

print "ret:{0}".format(mu.reg_read(UC_ARM_REG_R1))

1
2
3

# muhe @ muheMacBookPro in ~/Downloads [15:34:12]
$ python task4.py
ret:2635833876

0x05: 参考

Unicorn Engine tutorial

arm平台函数传递参数，反汇编实例分析