Often times, while analyzing malicious documents for instance a malicious PDF file obtained as an attachment in a spear phishing email, we locate obfuscated JavaScript.
This obfuscated JavaScript is used to spray the heap of the process with Shellcode and NOPsled. Now, the shellcode could be either regular shellcode which can be disassembled in IDA Pro or analyzed in a Debugger. In some cases, when you load the shellcode in IDA Pro and disassemble it, the x86 code does not look proper.
This is because the shellcode we are trying to analyze is a ROP Shellcode. Since most of the latest exploits have to bypass Data Execution Prevention on the target, it is becoming more and more common to find ROP Shellcodes while analyzing malicious files.
To analyze a ROP Shellcode we need to find the assembly language code corresponding to the ROP gadgets. This can be done by manually looking up each ROP Gadget in the corresponding module's address space. However, this can be tedious. To make this process more efficient, I wrote a code in C which will automatically extract the opcodes specific to a ROP gadget from a module's address space.
After you dump the shellcode from the deobfuscated JavaScript into a file, you need to check this shellcode either by opening it in IDA Pro and check the disassembly, or open it with a hex editor and observe it. This way you can confirm whether it is a regular shellcode or a ROP shellcode.
As an example, I have taken a malicious PDF file with the MD5 hash: 975d4c98a7ff531c26ab255447127ebb which was found in the wild exploiting the CVE-2010-2883
After dumping the shellcode into a file and opening it with a hex editor we can see that it is not a regular shellcode. I have highlighted some of the ROP gadgets:
In most cases, all the ROP gadgets will be used from a single Non ASLR module. In this case, as you can see all the gadgets are from a module whose base address is: 0x07000000
Let's open Adobe Reader with Windbg and we can see that BIB.dll module has the base address, 0x07000000
So, all the ROP gadgets in our case were taken from this module.
I wrote the following C code to scan the address space of a module and find opcodes corresponding to each ROP gadget and dump it to another file.
My code will differentiate between ROP gadgets and parameters to ROP gadgets. Now, we will load this file again in IDA Pro and mark appropriate sections as code and data.
We can analyze the ROP shellcode in a more efficient way now.
In some cases, we may need to step through the ROP shellcode to understand it better. In these cases, we need to debug the ROP shellcode. This can be done by setting a breakpoint on the first ROP gadget in the ROP chain.
As an example, I will take the previous PDF which can exploit versions of Adobe Reader >= 9.0 and <= 9.4.0
This malicious PDF has multiple ROP shellcodes which are used according to the version of Adobe Reader. We will now look at a ROP shellcode which uses ROP gadgets from icucnv36.dll
We open Adobe Reader with windbg. You can press, g to run Adobe Reader and observe that it loads more modules.
It is important to note here that icucnv36.dll is not loaded by Adobe Reader yet. If I try to set a breakpoint on the first ROP gadget now, it will not allow me to do that as shown below:
This is because we are trying to set a breakpoint at a memory address present inside a DLL's address space which has not yet been loaded.
We can automatically break into the debugger when this module is loaded with the command:
sxe ld icucnv36.dll
Now, we can run Adobe Reader process, open the malicious PDF and moment it loads icucnv36.dll, we break into the debugger.
We can now set a breakpoint at the first ROP gadget successfully:
We can run the process now and moment the first ROP gadget is executed, we break into the debugger. If we observe the register contents, we can see that ESP points to 0x0c0c0c10
The attacker was able to successfully switch the stack with the help of a stack pivot gadget.
If we view the contents of memory address, 0x0c0c0c0c we can see the entire ROP shellcode present there:
This obfuscated JavaScript is used to spray the heap of the process with Shellcode and NOPsled. Now, the shellcode could be either regular shellcode which can be disassembled in IDA Pro or analyzed in a Debugger. In some cases, when you load the shellcode in IDA Pro and disassemble it, the x86 code does not look proper.
This is because the shellcode we are trying to analyze is a ROP Shellcode. Since most of the latest exploits have to bypass Data Execution Prevention on the target, it is becoming more and more common to find ROP Shellcodes while analyzing malicious files.
To analyze a ROP Shellcode we need to find the assembly language code corresponding to the ROP gadgets. This can be done by manually looking up each ROP Gadget in the corresponding module's address space. However, this can be tedious. To make this process more efficient, I wrote a code in C which will automatically extract the opcodes specific to a ROP gadget from a module's address space.
After you dump the shellcode from the deobfuscated JavaScript into a file, you need to check this shellcode either by opening it in IDA Pro and check the disassembly, or open it with a hex editor and observe it. This way you can confirm whether it is a regular shellcode or a ROP shellcode.
As an example, I have taken a malicious PDF file with the MD5 hash: 975d4c98a7ff531c26ab255447127ebb which was found in the wild exploiting the CVE-2010-2883
After dumping the shellcode into a file and opening it with a hex editor we can see that it is not a regular shellcode. I have highlighted some of the ROP gadgets:
In most cases, all the ROP gadgets will be used from a single Non ASLR module. In this case, as you can see all the gadgets are from a module whose base address is: 0x07000000
Let's open Adobe Reader with Windbg and we can see that BIB.dll module has the base address, 0x07000000
So, all the ROP gadgets in our case were taken from this module.
I wrote the following C code to scan the address space of a module and find opcodes corresponding to each ROP gadget and dump it to another file.
My code will differentiate between ROP gadgets and parameters to ROP gadgets. Now, we will load this file again in IDA Pro and mark appropriate sections as code and data.
We can analyze the ROP shellcode in a more efficient way now.
In some cases, we may need to step through the ROP shellcode to understand it better. In these cases, we need to debug the ROP shellcode. This can be done by setting a breakpoint on the first ROP gadget in the ROP chain.
As an example, I will take the previous PDF which can exploit versions of Adobe Reader >= 9.0 and <= 9.4.0
This malicious PDF has multiple ROP shellcodes which are used according to the version of Adobe Reader. We will now look at a ROP shellcode which uses ROP gadgets from icucnv36.dll
We open Adobe Reader with windbg. You can press, g to run Adobe Reader and observe that it loads more modules.
It is important to note here that icucnv36.dll is not loaded by Adobe Reader yet. If I try to set a breakpoint on the first ROP gadget now, it will not allow me to do that as shown below:
This is because we are trying to set a breakpoint at a memory address present inside a DLL's address space which has not yet been loaded.
We can automatically break into the debugger when this module is loaded with the command:
sxe ld icucnv36.dll
Now, we can run Adobe Reader process, open the malicious PDF and moment it loads icucnv36.dll, we break into the debugger.
We can now set a breakpoint at the first ROP gadget successfully:
We can run the process now and moment the first ROP gadget is executed, we break into the debugger. If we observe the register contents, we can see that ESP points to 0x0c0c0c10
The attacker was able to successfully switch the stack with the help of a stack pivot gadget.
If we view the contents of memory address, 0x0c0c0c0c we can see the entire ROP shellcode present there: