|
Smashing Adobe's Heap Memory Management Systems for Profit. 2009.October.16 In-depth research on the recent PDF zero-day exploit Research and Analysis: Haifei Li Editors: Guillaume Lovet (Editing & Overview), Derek Manky
Index:
IntroductionAs pointed in a recent blog post, PDF vulnerabilities are receiving an increasing amount of attention in the security industry, matching cybercriminal patterns. Very recently, a new high-risk PDF zero-day vulnerability (CVE-2009-3459), was reported on Adobe's blog as being exploited in the Wild, in the frame of a targeted attack. As of writing, a vendor patch is available, and we highly recommend applying it. If for some reason, immediate patching raises issues, security equipments (AV, IDS, IPS, etc...) must be adjusted to block potentially malicious PDF documents leveraging this vulnerability. For that purpose, this document will provide an analysis of one malicious PDF file found in the wild, as well as in-depth insights on that vulnerability. OverviewPDF files are mostly made of tags, parameters, and streams, and can include javascript code. This vulnerability stems from an integer overflow when Adobe Reader processes a particular parameter.
Now, integer overflows are fairly common, but leveraging them into execution of arbitrary code is often tremendously difficult and crafty. Whoever is behind this exploit managed to do it, introducing in the process a rather innovative strategy (not universal though, it works only on Adobe Reader). There are 5 essential steps in the exploit:
The steps above will be studied in section 1 and 2. Some would say that what happens once the attacker has gained control of the execution flow is mere literature. It indeed pertains to the studied attack specifics, not to the exploit in general: the payload of this attack will be addressed in section 3. Section 1: The VulnerabilityThe parameter triggering the initial integer overflow (thus the exploit) sits in the following section within in the malicious PDF:
ParamX is the key parameter, and is set to 1073741838 (or 0x4000000E in hexadecimal). Note that the encoded stream is irrelevant, since at the point of execution when the overflow occurs, it has been decoded (ie "deflated") already. The decoded, original stream is: 00 00 20 00 00 00 10. After long and thorough debugging sessions, we finally came to this point, where the key parameter (0x4000000E) is being used: ![]() Figure 1: The point of the vulnerability: Integer overflow Figure 1 above shows that the process uses the parameter value 0x4000000E (sitting in ECX) at instruction 009F60A5h (highlighted in red). This instruction's goal is seemingly to compute the amount of memory in bytes needed if one DWORD (4 bytes) is allocated per item. This is counted by the key param (which means that the param most likely represents a number of items). That is why the param value is multiplied by 4, and this is where the integer overflow occurs. Highlighted in black background and red foreground on Figure 1 is instruction 009F60E5h, which is a call to the memory allocation routine using the overflowed value (as the value of EDX and ECX at 009F60D8h are both 0): acro_allocate_routine (0x4000000E*4+0x48) ==> acro_allocate_routine (0x80) Note: We will discuss the "acro_allocate_routine" function in section 2.3. As a result, a heap block of 0x80 is allocated; it will be processed by function 009F7ED0 called at the highlighted instruction on Figure 2 below: ![]() Figure 2: Before the heap overflow This function is listed and analyzed below. For a better comprehension of the comments (which are the most important part in the listing), pease note that:
Again, the decoded stream was "00 00 20 00 00 00 10", which means bits_original_stream is 0x38 (7*8). The listing follows:
So what does this function do? It might be easier to grasp a feeling of its goal from the following pseudo-code:
Essentially, the function slides over each DWORD in the buffer (remember, the buffer is supposed to contain one DWORD per item listed by ParamX), overwriting it with the result of a "mystery function" (mysterious as of yet, but we'll address it in section 2.1) at instruction 003A807F. This is where the heap overflow occurs, and this is the operational heart of the exploit. Because of the integer overflow, the buffer is actually a lot smaller than expected; it should contain one DWORD per item of ParamX, but it does not, thus when the counter enumerates items, it goes past the buffer's end. It is interesting to note that the loop only exits when the counter reaches bits_original_stream: v_counter >= ParamX will never happen, because of the huge value of ParamX. We will see afterwards that the original stream bits are used by the mystery function, one per item. The total length of the memory being written is therefore bits_original_stream * 4 (starts from offset 0x44): 0x38*4 = 0xE0 Considering that the length of the heap block which was allocated before (Figure 1) is only 0x80, the function will overwrite 0xA4 (0x44+0xE0-0x80) bytes beyond the allocated heap block. Monitoring these 0xA4 bytes of overflowed memory, shows that actually very few memory values were changed. This is apparent on Figure 3 below: ![]() Figure 3: After the heap overflow Indeed, comparing it with Figure 2 above (before the heap overflow), we see that only 2 DWORDs were modified by the heap overflow (Note: offset is relative to the end of the allocated heap block):
How and why those two slight modifications occur form the logical heart of the exploit. This is addressed in the next section. Section 2: The ExploitationAs we evoked above, the overflowed memory is set by the mystery function, which uses the (attacker-provided) decoded stream. 2.1: The Mystery Function
In short, the function first retrieves a bit in the decoded stream by using the current counter's value (v_counter); as explained earlier, we can consider that each item in the buffer has a corresponding bit in the stream, thus we'll call this bit "corresponding_bit". This bit is then converted into a DWORD and added to the DWORD (representing an item) being processed: new_DWORD = old_DWORD + corresponding_bit As a matter of course, corresponding_bit can only be 1 or 0. Thus the new DWORD cannot be set to an arbitrary value: either it remains unchanged, or it is incremented by 1. That is all the attacker can control, via the bitstream. Since the original stream is '00 00 20 00 00 00 10', only the bits at position 0x12 and 0x33 are set to 1. Thus only the two DWORDs located at position 0x12 and 0x33 will be incremented. Relative to the end of the 0x80-length heap block, those DWORDs sit at the following offsets:
offset1 = 0x12*4 + 0x44 - 0x80 = 0x0C
This matches what we observed at the end of section 1 above. So, what are those two DWORDs, and why increment them? 2.2: Taking Control of the Execution FlowAfter many experiments, no matter what the memory situation was, it appeared that one of these two DWORDs was a pointer. This pointer will point to a C-style structure in a varying memory location. This structure is used while Adobe Reader is rendering the page within the PDF. Most of the time this is pointer2 (offset2, 0x90) that points to the structure. An example is given below. First, we check that the pointer at offset 0x90 was incremented during the overflow:
Before the overflow, the structure's pointer is 0x0124da40, after the overflow, it became 0x0124da41. The following figure then shows how the structure is being referred to after the overflow: ![]() Figure 4: Going into the shellcode Firstly, the structure address is put into EAX. Then the first DWORD value of the structure is read. If this value is greater than 0x90, then it continues to read the value stored at offset 0x90 (relative to the beginning of the structure): the value at this offset is read as a function pointer and is put into EAX. Finally, if this function pointer is not NULL, it goes into the function. Thus, the goal of the attacker is really to change the value of this pointer at offset 0x90 of the structure (note: it's just a coincidence that this offset is also 0x90, the same offset used for the pointer to the structure relative to the heap block). This changed function pointer is used to gain control of the execution flow. To do this, the attacker could only increment the pointer to the structure (via the bitstream, see previous section), thus shifting all the values of the structure by one byte. He could not give the function pointer inside the structure an arbitrary value, which makes this exploit quite crafty indeed. This can be seen by comparing the memory under normal and attack (i.e. when the exploit occurred) conditions. Dumping the structure in normal conditions results in:
Normal conditions: Even though 0x10c is greater than 0x90, because the function pointer at offset 0x90 (0124dad0) is NULL, nothing will happen. But under attack conditions, after the heap overflow (pay attention to endian-ness when shifting bytes):
Because 0x50000001 is greater than 0x90, and 0x50000000 is not zero, the control flow will be transferred to 0x50000000. It should be noted that as per our tests, sometimes it is pointer1 (offset 0x0C relative to the end of the 0x80-length heap block) that points to the structure , which is why the attacker increments that pointer as well. The attacker did not change any other DWORDs because changing those two is the minimum necessary condition under which the exploit will run. Changing any other DWORDs may cause the application to crash before reaching control of the EIP. 2.3: The Heap SprayAs pointed above, the attacker does not precisely control the point where the ultimate function pointer will branch. He can only have it pointing to 0x50000000. Luckily for him, this generally belongs to the heap. Therefore, filling the heap with shellcode helps to ensure malicious code execution running from 0x50000000. This is achieved by a classical heap spray in javascript (Figure 5): ![]() Figure 5: The decoded Javascript using heap spray technology The shellcode that this script sprays all over the heap will be addressed in section 3, as it is not part of the exploit per se. 2.4: Smashing Adobe's Heap Memory Management SystemWhy use 0x80 as the bogus heap block allocation length? In other words, why select 0x4000000E as the vulnerable ParamX value? Generally speaking, heap memory allocations on a Windows system will eventually call the system API RtlAllocateHeap, to apply for a new heap block. In such a situation, memory values around the returned block are unpredictable for the application. However, Adobe Reader uses its own heap management routine: if the block's length is less than or equal to 0x80 bytes, the request won't be submitted to the system level heap management routine. Instead, Adobe Reader's own memory management routine finds a suitable recycled block matching the requested allocation size. Following is the relevant code in function "acro_allocate_routine" in "acrord32.dll":
It can be observed that it does not go into the system level's management routine to allocate a new heap block; instead, it reuses the heap block which has just been marked as "freed" by the application itself (in other words, the system actually does not free this memory). Because of this, the memory values around this reused block can be fairly easily predicted (or at least stay relatively stable). And this indicates a pretty effective method to exploit heap based overflow vulnerabilities on Adobe Reader. Section 3: Post-Exploit Activity & Embedded ShellcodeThe shellcode in the Javascript piece just does one thing: locate and execute another shellcode in the PDF. Remarkably, the shellcode seeks out the PDF file handle, testing all available handles starting from 0, comparing each to the target size by using the API function GetFileSize: ![]() Figure 6: Shellcode is searching for the file handle of the POC file The original C code may have looked like:
Then, it uses the obtained file handle to locate and execute the new shellcode embedded within the PDF file: ![]() Figure 7: Shellcode is jumping to another shellcode inside the POC As a side note, this is not new. The "Phantom Team" used this "shellcode seek and execute" strategy in a malicious PDF over a year ago. The new shellcode will perform the following activities:
Following are the key points of the new shellcode: ![]() Figure 8: Step #1 - Drop the exe file to the temp directory ![]() Figure 9: Step #2 - Execute the dropped exe file ![]() Figure 10: Step #3 - Drop the disguised PDF file ![]() Figure 11: Step #4 - Open the disguised PDF with Adobe Reader ConclusionTaking a look back over this 0-day attack as a whole, each single part of it is somehow ingenious - whether it be the vulnerability, operational exploitation, the logic behind it, the way to leverage it or the final shellcode. The exploitation method through the heap overflow, which may also be used in future Adobe Reader exploits, is particularly innovative in itself. FortiGuard Labs has released advisory FGA-2009-35 to respond to this issue, coordinating with Adobe's Security Advisory APSB09-15. Advanced zero-day protection was available to our customers since October 9th, 2009. The vulnerability can be detected by our IPS signature "Adobe.Reader.Decode.Color.Remote.Code", while our antivirus detects the PDF exploit as "W32/Protux.GK!exploit" and the dropped executable file as "W32/Protux.GK!tr". We nonetheless recommend again that all Adobe Reader and Acrobat users update their application to the latest version as soon as possible. Disclaimer: Although Fortinet has attempted to provide accurate information in these materials, Fortinet assumes no legal responsibility for the accuracy or completeness of the information. More specific information is available on request from Fortinet. Please note that Fortinet's product information does not constitute or contain any guarantee, warranty or legally binding representation, unless expressly identified as such in a duly signed writing. About Fortinet ( www.fortinet.com ): Fortinet is the pioneer and leading provider of ASIC-accelerated unified threat management, or UTM, security systems, which are used by enterprises and service providers to increase their security while reducing total operating costs. Fortinet solutions were built from the ground up to integrate multiple levels of security protection--including firewall, antivirus, intrusion prevention, VPN, spyware prevention and anti-spam -- designed to help customers protect against network and content level threats. Leveraging a custom ASIC and unified interface, Fortinet solutions offer advanced security functionality that scales from remote office to chassis-based solutions with integrated management and reporting. Fortinet solutions have won multiple awards around the world and are the only security products that are certified in six programs by ICSA Labs: (Firewall, Antivirus, IPSec, SSL, Network IPS, and Anti-Spyware). Fortinet is privately held and based in Sunnyvale, California. |