The blog continues at suszter.com/ReversingOnWindows

December 22, 2014

Use of refactor can make Visual C# 2013 to crash

For the change I'm working on a C# project not much to do with security. I was happy to re-discover and use the refactor feature in Visual C# 2013 until the point it crashed. Here are the simplified steps to reproduce the crash.
  • Create an empty Visual C# project, or open any C# project.
  • Add a class like below.
  • Do right-click on hello in hello = true, select Refactor, then select Remove Parameters...

  • Visual C# crashes.
Note, I was able to trigger the crash by an alternate way by simply using the hotkey Ctrl+R, Ctrl+V when the cursor is within hello in hello = true.

The bug reproduces with the up-to-date version of Visual C# 2013 Community Edition.

December 19, 2014

Variable-length permutation with repetition using backtracking

Recently, I needed the implementation of the search algorithm that qualifies for the followings.
  1. It must be able to produce the variable-length permutation with repetition of the given set.
  2. It must be based on backtracking.
  3. It must be a state machine.
  4. It must have basic programming elements only.
  5. It must be clean without irrelevant part.
Even I needed the final implementation in ActionScript, I was searching for the algorithm in C that I would convert later. Thought my odds would be better if I did so. Unluckily I didn't find anything useful, and so I wrote the code myself. Here is the sample implementation in C.

// Features:
// - Variable-length permutation with repetition
// - Backtracking
// - State machine

#include <stdio.h>

#define ITEMS            3                       // number of items
#define LASTITEM_IDX     (ITEMS-1)               // index of the last item
#define INIT             -1                      // initial value
#define EMPTY            0                       // empty

static char set[ITEMS]  = {1,2,3};               // set

// State
static char idx[ITEMS]  = {INIT,INIT,INIT};      // indices into set
static char out[ITEMS]  = {EMPTY,EMPTY,EMPTY};   // output

static int index = INIT;
static bool dir = 1;                             // direction: deep(1) / wide(0)
// State - END

bool backtrack();

// item to try
void add_item() {
    idx[index]++;
    out[index] = set[idx[index]];
}

bool step_wide() {
    if (index == INIT) {
        // reached initial level
        // no more items to take
        return false;
    }
    // way to go wider?
    if (idx[index] < LASTITEM_IDX) {
        add_item();          // take item
        return true;
    }
    else {
        return backtrack();  // backtrack a level higher
    }
}

bool backtrack() {
    // backtrack a level higher
    idx[index] = INIT;
    out[index] = EMPTY;
    index--;

    return step_wide();
}

bool step_deep() {
    // way to go deeper?
    if (index < LASTITEM_IDX) {
        index++;             // step a level deeper
        add_item();          // take item
        return true;
    }
    else {
        return step_wide();  // step wide
    }
}

// returns true if step is successful
// returns false if no more items to take
bool step() {
    if (dir) {
        return step_deep();
    } else {
        return step_wide();
    }
}

int main(int argc, char* argv[])
{
    while (step()) {
        for (int i=0; i<ITEMS && out[i]!=EMPTY; i++) {
            printf("%02x ", out[i]);
        }
        printf("\n");
    }

    return 0;
}

The source can be downloaded from here.

The output of the run looks like this.

01
01 01
01 01 01
01 01 02
01 01 03
01 02
01 02 01
01 02 02
01 02 03
01 03
01 03 01
01 03 02
01 03 03
02
02 01
02 01 01
02 01 02
02 01 03
02 02
02 02 01
02 02 02
02 02 03
02 03
02 03 01
02 03 02
02 03 03
03
03 01
03 01 01
03 01 02
03 01 03
03 02
03 02 01
03 02 02
03 02 03
03 03
03 03 01
03 03 02
03 03 03

August 9, 2014

Instrumenting Flash Player to Inspect JITted Pages for Integer Errors

In this blog post I'm writing about the method I experiment with to discover potential areas, that may or may not be prone to integer errors, in Flash Player.

I have 26k flash files that are used as a corpus to generate the test samples for Flash Player. The test samples have an element of 0x41424241 in the integer pool. I have a total of 344k files generated to test Flash Player with.

During the test, I use a pintool to instrument the JITted code. The pintool is based on this implementation. Since the elements in the integer pool being dereferenced by action script it makes sense to restrict the instrumentation to JITted code.

I use instruction-level instrumentation that allows to check the register values at every single instruction being executed. If any of the general registers have the value of 0x41424241 and the instruction is referencing to that register, the instruction information along with general registers are logged.

The pintool pre-allocates the address of 0x41424241 so Flash Player won't use that memory address and so reducing the irrelevant lines in the log.

There is no need to wait for the test to finish to get partial results. A log file is generated for each test file rendered by Flash Player.

The size of log files are vary. Some are close to size 0 specially if the value above makes the code to fail early. There are many log files with size 4k. When the execution keeps going long the size is about 16k. If the value has to do something with a loop the log can reach 100k but that's rare.

Logs can be grouped and many can be thrown away as they don't contain instructions associated with vulnerabilities.

What I look for is like signed shift, addition, subtraction, or multiplication instructions. If the value is used in displacement with lea instruction that counts suspicious too.

Once an address in the log is chosen for closer inspection, I reproduce the log on isolation with an option to dump the JITted pages so I can manually analyze the surrounding area of that address in disassembler. Knowing the state information it's also possible to debug the code.

If results positive, certain level of automation can be added.

July 30, 2014

Practical Suggestions for Writing a Pintool

This is my list of practical suggestions to people developing a pintool. Since I dealt with these previously I thought to jot them down to help others. By applying this you should be somewhat closer to avoid your pintool from unexpected termination.

Start from scratch. So you use a sample pintool to develop your own. Rather than to modify the sample, start with an empty project and gradually build it up by taking elements from the sample.

Simplicity. Keep the code-base small and easy to understand.

Testing. As a part of development, aim to test if all blocks have been exercised. Refrain from adding unreachable blocks.

Errors. Check for errors as early as possible, specially when returning from a Pin API.

Safe memory dereference. Whenever you have to dereference the target's memory use PIN_SafeCopy. If you want to read an integer you should use this function, too, rather than the dereference "*" operator.

Thread safety. Be aware the target may be running with multiple threads. Possibly, you want your pintool to be thread safe.

Multi-threading. Sometimes you want your variables to be stored in the thread context to have the ability to distinguish the analysis between threads. In that case looking at the sample inscount_tls.cpp is a good start.

Probe mode. Use of probe mode is always preferred as it gives better performance. However, only limited Pin facilities available in probe mode.

Limit instrumentation. Consider restricting the instrumentation to routines or libraries and even can avoid the instrumentation of shared libraries to get better performance.

Standard library. It's good idea to use C++ standard library in a pintool as it provides the most frequently needed data structures.

Visual Studio. Visual C++ project file is available with Pin framework in MyPinTool folder. Alternatively, you can create one for yourself after looking at an earlier post.

Trace vs Ins. Instruction instrumentation is practically the same as trace instrumentation. You can do instruction analysis from the trace by iterating through the instructions.

Output. Having output routines in Fini makes the application to run faster than having them in analysis functions. However if the application terminates unexpectedly and so Fini is not called there will be no results shown. By having output routines in analysis functions makes the application to run slower but if the application terminates unexpectedly partial results may be shown.

July 26, 2014

Inspection of SAR Instructions

SAR stands for Shift Arithmetic Right and the instruction performs arithmetic shift. The instruction preserves the sign of the value to be shifted and so the vacant bits are filled according to the sign-bit.

Compilers generate SAR instruction when right shift operator ">>" is used on a signed integer.

The use of SAR instruction can potentially lead to create a signedness bug if it's assumed the shift is unsigned.

Given the following simplified example.

char retItem(char* arr, int value)
{
    return arr[value>>24];
}

If value is positive the code is working as expected. However if value is negative the program can read out of the bounds of arr.

Other example would be to compare the signed value after the shift to an unsigned value leading to implicit conversion that may lead to trigger bug.

In my experiment, in several cases, it is seen that memory is being dereferenced involving SAR instruction. These places may be worthy to look for bugs, specially if the value to be shifted is a user input or is a controlled one.

If an unsigned jump is followed by a signed shift that could be a potential to look for bugs as well.

Regular expressions or scripts can be used to search for patterns of occurrences of SAR instructions. When it's not feasible to review all occurrences of SAR, a pintool may be used to highlight what SAR instructions have been executed, and only focus on those executed.

July 22, 2014

Examining Native Code by Looking for Patterns

Earlier this year a post was published of examining data format without using the program that reads the format. That post discusses patterns to look for, in order to identify certain constructs. This post focuses on static methods of examining code that can be either the complete code section of the file, memory dump, or just fragment. It also describes selected ideas what patterns to look for when examining a given code.

The reason one may look for patterns in code is to locate certain functionalities or to get high-level understanding of what the code does. Others may look for certain construct that may be the key part of the program in security point of view.

It's true to say one can expect this to be a rapid method compared to other methods such as line-by-line instruction analysis.

But, it's always good to read documentation, if possible at all, to get an overview of the expectations.

There are methods that more effective if performed on small region. Therefore to narrow the scope of the search wherein to look for pattern is something good to do at the beginning of analysis albeit it's not always feasible to do with enough certainty. Anyway, one can always widen the search region if required at a later stage.

Compilers tend to produce executable files with particular layout. Some have the library code at the beginning of the code section, while others have it at the end of the code section.

If there is no information about the compiler or no information about the layout there are other ways to locate the library in the code.

You may look for library function calls that can be visible in disassembler. Library code may have distinct color in disassembler.

Library/runtime code often have many implementations of functions to use the advantage of latest hardware. An example is MSVC. And so SSE instructions/functions may indicate the presence of library/runtime code.

Library code can be spotted by looking for strings can be associated with particular libraries.

Library/runtime code can be spotted by looking for constant values that can be associated with particular libraries such as cryptographic libraries that tend to have many constants.

To guess the compiler that was used to generate the code is possible by analyzing the library/runtime code.

In case the code is just a fragment of user code you may consider examining the instructions how they are encoded. Intel encodings are redundant and one instruction can have multiple encodings. This is something to make guess on what compiler was used.

If multiple encodings of an instruction is found in a binary the code that could be generated with a polymorphic encoder.

Also, code has other characteristics that may differ between compilers such as padding and stack allocation.

Imports and exports as well as strings can tell a lot. You may check where they are referenced in the code.

Debugging symbols can help awfully lot if the disassembler can handle that. Sometimes it's available sometimes it's not.

No matter what code you're looking at it most likely deals with input data. That case it may get the data from file, from network, via standard API calls. These are valuable areas to audit for security problems, and it's possible to follow how the data returned by these APIs. It may require to analyze caller functions as usually these APIs are wrapped around many calls before using the input.

Just like when reading the data the code may write data, or send data via standard API calls. These areas may be security-sensitive.

Programs have centralized, well-established functions. These functions, for example, read dword values, read data into structures and propagate any other internal storage. Discovery of these functions not considered hard, they are normally small, and have instructions of memory read and write. By looking where they referenced from we can find good attack surfaces.

Good to keep in mind that code sections can contain data besides code. But normally data is stored in data section. In the disassembler it's convenient to see how the data is referenced, and may decide if there is an attack surface nearby.

CRC and hash constants may indicate there is some data which is being CRC'd or hashed. You may figure out where is that data from and how can you perform security testing around.

When a library is using a parameter hardcoded it's often encoded as a part of the instruction rather than stored in data section of the executable. Example encoding looks like mov eax, <param> or mov al, <param>.

When a data format is parsed often a magic value is tested. Looking for instructions like cmp reg, <magic> or cmp dword ptr [addr], <magic> or similar instructions can help to locate attack surfaces.

Longer strings may be broken into immediate values and compared with multiple cmp instructions.

Looking for strcmp function calls is good idea to look for if you want to find code that test for data format as often strcmp functions are used for this purpose.

If the code is optimized for speed there are many ways to confirm. Normally the readability of code bad, for example when the code performs division or use the same memory address for multiple variables. If EBP register is used in arithmetic or other than to store stack base address that could indicate the code is optimized.

Perhaps there are circumstances when looking at the frequency of instructions, looking for undocumented instructions, or rare instruction, or instructions that not present can give us valuable clues that help the examination.

Intuitively going through the code and looking for undefined patterns can be good idea if the scientific ways have been exhausted.

July 16, 2014

251 Potential NULL Pointer Dereferences in Flash Player

251 potential NULL pointer dereference issues have been identified in Flash Player 14 by pattern matching approach. The file examined is NPSWF32_14_0_0_145.dll (17,029,808 bytes).

The issues are classified as CWE-690: Unchecked Return Value to NULL Pointer Dereference.

I don't copy&paste all the issues in this blog post but bringing up few examples.

First Example

0:012> uf 5438a1d0
NPSWF32_14_0_0_145!BrokerMainW+0xf6f6b:
5438a1d0 f6410810        test    byte ptr [ecx+8],10h
5438a1d4 8b4104          mov     eax,dword ptr [ecx+4]
5438a1d7 7411            je      NPSWF32_14_0_0_145!BrokerMainW+0xf6f85 (5438a1ea)

NPSWF32_14_0_0_145!BrokerMainW+0xf6f74:
5438a1d9 85c0            test    eax,eax
5438a1db 740b            je      NPSWF32_14_0_0_145!BrokerMainW+0xf6f83 (5438a1e8)

NPSWF32_14_0_0_145!BrokerMainW+0xf6f78:
5438a1dd 8b4c2404        mov     ecx,dword ptr [esp+4]
5438a1e1 8b448808        mov     eax,dword ptr [eax+ecx*4+8]
5438a1e5 c20400          ret     4

NPSWF32_14_0_0_145!BrokerMainW+0xf6f83:
5438a1e8 33c0            xor     eax,eax <--Set return value to NULL

NPSWF32_14_0_0_145!BrokerMainW+0xf6f85:
5438a1ea c20400          ret     4 <--Return with NULL
0:012> u 5438a47b L2
NPSWF32_14_0_0_145!BrokerMainW+0xf7216:
5438a47b e850fdffff      call    NPSWF32_14_0_0_145!BrokerMainW+0xf6f6b (5438a1d0)
5438a480 8a580c          mov     bl,byte ptr [eax+0Ch] <--Dereference NULL 

Second Example

0:012> uf 54362e60
NPSWF32_14_0_0_145!BrokerMainW+0xcfbfb:
54362e60 8b4128          mov     eax,dword ptr [ecx+28h]
54362e63 8b4c2404        mov     ecx,dword ptr [esp+4]
54362e67 3b4804          cmp     ecx,dword ptr [eax+4]
54362e6a 7205            jb      NPSWF32_14_0_0_145!BrokerMainW+0xcfc0c (54362e71)

NPSWF32_14_0_0_145!BrokerMainW+0xcfc07:
54362e6c 33c0            xor     eax,eax <--Set return value to NULL
54362e6e c20400          ret     4 <--Return with NULL

NPSWF32_14_0_0_145!BrokerMainW+0xcfc0c:
54362e71 56              push    esi
54362e72 8b748808        mov     esi,dword ptr [eax+ecx*4+8]
54362e76 56              push    esi
54362e77 e8e4b0faff      call    NPSWF32_14_0_0_145!BrokerMainW+0x7acfb (5430df60)
54362e7c 83c404          add     esp,4
54362e7f 85c0            test    eax,eax
54362e81 7407            je      NPSWF32_14_0_0_145!BrokerMainW+0xcfc25 (54362e8a)

NPSWF32_14_0_0_145!BrokerMainW+0xcfc1e:
54362e83 8b4010          mov     eax,dword ptr [eax+10h]
54362e86 5e              pop     esi
54362e87 c20400          ret     4

NPSWF32_14_0_0_145!BrokerMainW+0xcfc25:
54362e8a 8bc6            mov     eax,esi
54362e8c 83e0f8          and     eax,0FFFFFFF8h
54362e8f 5e              pop     esi
54362e90 c20400          ret     4
0:012> u NPSWF32_14_0_0_145+006b4eb2 L2
NPSWF32_14_0_0_145!BrokerMainW+0xd1c4d:
54364eb2 e8a9dfffff      call    NPSWF32_14_0_0_145!BrokerMainW+0xcfbfb (54362e60)
54364eb7 8b7004          mov     esi,dword ptr [eax+4] <--Dereference NULL

Third Example

0:012> uf 5429979a
NPSWF32_14_0_0_145!BrokerMainW+0x6535:
5429979a 0fb74108        movzx   eax,word ptr [ecx+8]
5429979e 48              dec     eax
5429979f 48              dec     eax
542997a0 740c            je      NPSWF32_14_0_0_145!BrokerMainW+0x6549 (542997ae)

NPSWF32_14_0_0_145!BrokerMainW+0x653d:
542997a2 83e815          sub     eax,15h
542997a5 7403            je      NPSWF32_14_0_0_145!BrokerMainW+0x6545 (542997aa)

NPSWF32_14_0_0_145!BrokerMainW+0x6542:
542997a7 33c0            xor     eax,eax <--Set return value to NULL
542997a9 c3              ret <--Return with NULL

NPSWF32_14_0_0_145!BrokerMainW+0x6545:
542997aa 8d4110          lea     eax,[ecx+10h]
542997ad c3              ret

NPSWF32_14_0_0_145!BrokerMainW+0x6549:
542997ae 8d410c          lea     eax,[ecx+0Ch]
542997b1 c3              ret
0:012> u NPSWF32_14_0_0_145+005f3423 L2
NPSWF32_14_0_0_145!BrokerMainW+0x101be:
542a3423 e87263ffff      call    NPSWF32_14_0_0_145!BrokerMainW+0x6535 (5429979a)
542a3428 8038fe          cmp     byte ptr [eax],0FEh <--Dereference NULL

You can find a list of 251 potential NULL pointer dereferences in Flash Player here.

July 14, 2014

Issues with Flash Player & Firefox in Non-default Configurations

Few months ago I encountered a bug when a fuzzed flash file is being rendered by Flash Player in Firefox. This bug can be reached only in the non-default configuration described below so very unlikely you are affected by this bug.

To trigger the bug the flash player module has to be loaded into Firefox's virtual address space. And this can be achieved if Flash Player protected mode is disabled and Firefox plugin container process is disabled too.

The bug involves to dereference arbitrary memory address via a CALL instruction in the vtable dispatcher. Here you can see the bug in the exception state.

0:048> g
Implementation limit exceeded: attempting to allocate too-large object
error: out of memory
(170fc.16998): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000001 ebx=00000000 ecx=0034f670 edx=00000000 esi=1600f2c8 edi=0000001c
eip=5996bd5f esp=0034f638 ebp=0034f668 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\SysWOW64\Macromed\Flash\NPSWF32_14_0_0_145.dll - 
NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5c2:
5996bd5f 8b461c          mov     eax,dword ptr [esi+1Ch] ds:002b:1600f2e4=????????
0:000> u eip L10
NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5c2:
5996bd5f 8b461c          mov     eax,dword ptr [esi+1Ch] <--Read unmapped address
5996bd62 a801            test    al,1
5996bd64 7420            je      NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5e9 (5996bd86)
5996bd66 33d2            xor     edx,edx
5996bd68 39550c          cmp     dword ptr [ebp+0Ch],edx
5996bd6b 7519            jne     NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5e9 (5996bd86)
5996bd6d 8b4e04          mov     ecx,dword ptr [esi+4]
5996bd70 83e0fe          and     eax,0FFFFFFFEh
5996bd73 89461c          mov     dword ptr [esi+1Ch],eax
5996bd76 8b06            mov     eax,dword ptr [esi] <--Read unmapped address
5996bd78 51              push    ecx
5996bd79 8bce            mov     ecx,esi
5996bd7b 895604          mov     dword ptr [esi+4],edx
5996bd7e 895618          mov     dword ptr [esi+18h],edx
5996bd81 ff500c          call    dword ptr [eax+0Ch] <--Dereference arbitrary memory content
5996bd84 eb06            jmp     NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5ef (5996bd8c)

I had reported this bug to Adobe and they opened a case PSIRT-2707 on 14/April/2014 but so far Adobe didn't confirm whether or not it was able to reproduce the bug or the exception state reported.

Again, the bug doesn't affect the default configuration, and so very unlikely you're affected by this. However, users using Firefox with plugin-container disabled as well as Flash Player plugin with protected mode disabled are affected by this issue.

The original report is about Flash Player 13_0_0_182 and Firefox 28.0 but the testcase fails with Flash Player 14_0_0_145 and Firefox 30.0 (latest available till today).

These are the steps to reproduce the bug.
  • Edit mms.cfg to have ProtectedMode=0 to disable protected mode in Flash Player
  • Start cmd.exe and type "set MOZ_DISABLE_OOP_PLUGINS=1" to disable plugin-container in Firefox
These settings above required to get Flash Player plugin loaded in firefox.exe's address space.
  • Start Firefox from command prompt opened previously
  • Open fuzzed.swf in Firefox (drag n drop should work)
  • Attach firefox.exe process to Windbg when you notice that Firefox is hanging
  • Exception should occur in few second. If you see the out-of-memory error in the debugger log without exception you may restart the browser and try again.
The fuzzed flash file has the following changes compared to the template file. The value of the first item in the integer pool has been changed to a large value. TagLength of DoAbc tag and FileSize of the main header have been therefore updated to maintain the integrity of the flash file.

Drop me an email if you think you need the testcase.

May 13, 2014

Security Implications of IsBad*Ptr Calls in Binaries

IsBad*Ptr [1] functions are to test whether the memory range specified in the argument list is accessible. Despite the fact they have been banned, they are still being referenced in many binaries shipped with popular applications.

In this post I'm describing the inner working of IsBad*Ptr, the steps the attacker may follow to abuse them, and mention few examples of binaries that have a reference to these banned functions.

Inner Working

When IsBad*Ptr is executed it first registers an exception handler. Then, it attempts to access to the memory specified in the argument list.

For example, IsBadReadPtr has the following instruction to read memory. ECX is the memory address specified in the argument list.

mov al,byte ptr [ecx]

If the instruction raises an exception, the execution is transferred to the exception-handler code. And IsBad*Ptr returns TRUE, meaning, it is a "bad" pointer because the data pointed by is inaccessible.

If the instruction executes without an exception being raised IsBad*Ptr returns FALSE.

Steps of Attacking

The attack against IsBad*Ptr looks like this.
  1. The attacker attempts to supply an invalid pointer parameter to IsBad*Ptr that returns TRUE.
  2. The attacker refines step #1 in a way that the supplied invalid pointer becomes valid due to a forced allocation for the location pointed by the invalid pointer. The attacker refines step #1 in a way that the supplied invalid pointer will point to valid memory location allocated by heap spray. And so, IsBad*Ptr returns FALSE leading to enter in an inconsistent state; that may or may not be an exploitable state.
  3. If the attacker can perform step #2 with IsBadWritePtr, when the call returns, it's expected to reach code that writes the location pointed by the pointer -- and that has attacker controlled data. And so, he reaches a presumably exploitable condition.
Referencing IsBad*Ptr can be easily checked during binary analysis and it is worthy to do.

Examples

This code snippet below can be found in msvbvm60.dll in Windows folder.

.text:72A0FEE5         push    38h                     ; ucb
.text:72A0FEE7         push    edi                     ; attacker supplies pointer was invalid before
.text:72A0FEE8         call    ds:IsBadReadPtr         ; and now it's valid because he's filled memory up
.text:72A0FEEE         test    eax, eax
.text:72A0FEF0         jnz     loc_72A0FF80            ; fall through
.text:72A0FEF6         mov     eax, [edi+4]
.text:72A0FEF9         mov     eax, [eax+4]
.text:72A0FEFC         mov     esi, [eax+8]            ; ESI is attacker controlled
.text:72A0FEFF         and     [ebp+arg_0], 0
.text:72A0FF03         mov     ax, [edi+2]
.text:72A0FF07         test    esi, esi
.text:72A0FF09         jz      short loc_72A0FF42      ; fall through
.text:72A0FF0B         movzx   ebx, ax
.text:72A0FF0E         mov     eax, [esi]              ; EAX is attacker controlled
.text:72A0FF10         push    esi
.text:72A0FF11         call    dword ptr [eax+0Ch]     ; EIP is attacker controlled


This one below can be found in dxtrans.dll in Windows folder.

.text:35C6142C         push    4                       ; ucb
.text:35C6142E         push    esi                     ; attacker supplies pointer was invalid before
.text:35C6142F         call    ds:__imp__IsBadWritePtr@8 ; and now it's valid because he's filled memory up
.text:35C61435         test    eax, eax
.text:35C61437         jz      short loc_35C61440      ; jump is taken
.text:35C61439         mov     eax, 80004003h
.text:35C6143E         jmp     short loc_35C6144A
.text:35C61440 loc_35C61440:                           ; CODE XREF: CDXBaseSurface::GetAppData(ulong *)+14
.text:35C61440         mov     eax, [ebp+this]
.text:35C61443         mov     eax, [eax+24h]
.text:35C61446         mov     [esi], eax              ; ESI is attacker controlled

The next code snippet is taken from v2.0.50727\mscorwks.dll in Windows folder. IsBadReadPtr is used to test the pointer that is passed to MultiByteToWideChar.

.text:7A0D17FB         push    eax                     ; ucb
.text:7A0D17FC         push    ebx                     ; attacker supplies pointer was invalid before
.text:7A0D17FD         call    ds:__imp__IsBadReadPtr@8 ; and now it's valid because he's filled memory up
.text:7A0D1803         test    eax, eax
.text:7A0D1805         jz      short loc_7A0D17A7      ; jump is taken
[...]
.text:7A0D17A7         cmp     edi, esi
.text:7A0D17A9         jle     short loc_7A0D17BF
.text:7A0D17AB         push    esi                     ; cchWideChar
.text:7A0D17AC         push    esi                     ; lpWideCharStr
.text:7A0D17AD         push    edi                     ; cbMultiByte
.text:7A0D17AE         push    ebx                     ; lpMultiByteStr - attacker's data
.text:7A0D17AF         push    1                       ; dwFlags
.text:7A0D17B1         push    esi                     ; CodePage
.text:7A0D17B2         call    ?WszMultiByteToWideChar@@YGHIKPBDHPAGH@Z ; WszMultiByteToWideChar(uint,ulong,char const *,int,ushort *,int)

I was collecting files with IsBad*Ptr in them, and have found plenty others including but not exclusively MSCOMCTL.OCX, EXCEL.EXE, Lenovo's, Corel's, Nokia's, AVerMedia's products...

UPDATE 13/May/2014 To add IsBad*Ptr to the program doesn't automatically mean to create bugs. However if IsBad*Ptr is present we have reason to believe that the function is expecting a pointer that might be invalid in certain circumstances. In that case IsBad*Ptr may be used to attack the program. And that's why it's important to conduct the audit according to this.

UPDATE 16/May/2014 The term "valid pointer" had an ambiguous meaning in the part Steps of Attacking. This is now reworded. Reference.

[1] IsBad*Ptr functions are IsBadReadPtr, IsBadCodePtr, IsBadWritePtr, and IsBadStringPtr. All of them are exports of kernel32.dll.

April 28, 2014

Order of Memory Reads of Intel's String Instructions

Neither the Intel Manual nor Kip R. Irvine's assembly book discusses the behavior I'm describing about x86 string instructions in this post.

Given the following instruction that compares the byte at ESI to the byte at EDI.

cmps    byte ptr [esi],byte ptr es:[edi]

To perform comparison the instruction must read the bytes first. The question is whether byte at ESI or byte at EDI is read first?

Intel Manual says:
Compares the byte, word, doubleword, or quadword specified with the first source operand with the byte, word, doubleword, or quadword specified with the second source operand
Kip R. Irvine's book titled Assembly Language for Intel-Based Computers (5th edition) says:
The CMPSB, CMPSW, and CMPSD instructions each compare a memory operand pointed to by ESI to a memory operand pointed to by EDI
Both of the descriptions explain what the instructions do but none of them says how. So I needed to do some experiments in Windbg to find the answer to the question.

The first experiment was not a good one. Initially, I thought I'd put processor breakpoint (aka memory breakpoint) at ESI and another one at EDI. I also thought to execute CMPS and let the debugger to break-in on either of the processor breakpoints. And here it goes why it was a bad idea. The execution of CMPS has to complete for debugger break-in. And, by the time the CMPS completes it hits both of the breakpoints.

The other experiment I came up with is like this. Set both ESI and EDI to point two distinct memory addresses that are unmapped. The assumption is when CMPS is executed it raises an exception when trying to read memory. By looking at the exception record we can tell the address the instruction tries to read from. Given that, we can tell if that value was assigned to ESI, or to EDI, and so we can tell whether byte at ESI or byte at EDI is read first.

Here is how I did the experiment in Windbg.

I opened an executable test file in Windbg. I assembled CMPS to be placed in the memory at EIP.

0:000> a
011113be cmpsb
cmpsb

I changed ESI to point to invalid memory. And I did the same with EDI.

0:000> resi=51515151
0:000> redi=d1d1d1d1

Here is the disassembly of CMPS. Also, you know from the highlighted text that both ESI and EDI point to unmapped memory addresses.

0:000> r
eax=cccccccc ebx=7efde000 ecx=00000000 edx=00000001 esi=51515151 edi=d1d1d1d1
eip=011113be esp=0022fb70 ebp=0022fc3c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
test!wmain+0x1e:
011113be a6              cmps    byte ptr [esi],byte ptr es:[edi] ds:002b:51515151=?? es:002b:d1d1d1d1=??

I executed the process leading to an expected access violation triggered by CMPS instruction.

0:000> g
(5468.1eec8): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=cccccccc ebx=7efde000 ecx=00000000 edx=00000001 esi=51515151 edi=d1d1d1d1
eip=011113be esp=0022fb70 ebp=0022fc3c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
test!wmain+0x1e:
011113be a6              cmps    byte ptr [esi],byte ptr es:[edi] ds:002b:51515151=?? es:002b:d1d1d1d1=?? 

I got the details of the exception like below.

0:000> .exr -1
ExceptionAddress: 011113be (test!wmain+0x0000001e)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: d1d1d1d1
Attempt to read from address d1d1d1d1

As you can see the access violation was occurred due to an attempt to read from d1d1d1d1 that is the value of EDI. Therefore, to answer the question at the beginning of the article, the byte at EDI is read first.

To give more to the fun, you may try how emulators handle this - that is the order of memory reads of CMPS instructions.

If you like this you may click to read more debugging posts.

UPDATE 28/April/2014 Of course I don't encourage people to rely on undocumented behavior when developing software. Stephen Canon says Intel may optimize the microcode for operands from time to time, and so we don't have a good reason to believe this behavior to be stable.

UPDATE 29/April/2014 Here is a Windows C++ program to test this behavior on your architecture: cmps-probe.cpp

April 23, 2014

Inspection of Division & Multiplication

Division and multiplication calculations can lead to trigger bugs, and potentially pose as security risks. Here are few things that I believe to be helpful for those who do binary inspection.

Division

Production quality binaries are normally built with optimization enabled which makes the binary to run fast. One of the optimizations technique for the compiler is to emit a series of fast instructions instead of a single slow instruction.

DIV and IDIV instructions are known to be slow. As a part of optimization the compiler emits a series of fast instructions that are functionally equivalent to DIVs. The fast instructions are shift, multiplication, and addition instructions that take magic (constant) values depending on the divisor. Therefore the divisor has to be known at compile-time to apply optimization.

If the optimized binary has any DIVs, that means, the divisor was not known at compile time. Thus it's known at run-time, and so it could be a user-controlled value or a user input taken as it is.

Division can cause exception if the divisor is 0, or if the result is to large to store.

Division by Zero in CLR's Native Code

As an interesting experiment I looked at what happens when an integer is divided by zero in C#.

CLR generates native code with division instruction in it. When the instruction of division by zero is executed, an exception is raised that is handled by CLR's exception handler.

So the generated code with division in it doesn't have a test for the divisor. It's left for the exception handler to handle division by zero situations.

Multiplication

Like division, multiplication can be optimized, too, by using a sequence of fast instructions (sometimes one instruction). Whether or not it's worth optimizing depends on the value of multiplier (or multiplicand).

The multiplication you can see in binary might not be seen on source-code level. And some multiplication cannot be easily spotted in binary code due to optimization. And, multiplications can lead to trigger bugs.

Overflow in Multiplication

Multiplication can lead to integer overflow. Multiplication of two values are more likely to lead to integer overflow than addition of the two values. Multiplication of two word length integers can overflow on 32-bit but addition can't.

Few instances of the IMUL instruction can take immediate value, that is the multiplier. It's easily possible to calculate what multiplicand overflows the multiplication. The challenging part is to determine how the value could be assigned to the multiplicand to trigger overflow.

It's worth searching for MUL and IMUL instructions in the binary using the # (hash mark) Windbg command.

Overflow in Multiplication by Scale Factor

Scale factor is a constant value by which the register in the instruction is multiplied. The scale factor is either 1, 2, 4 or 8.

Use of scale factor could be the result of optimization of multiplication. The example below demonstrates to multiply a value by 4.

lea eax,[edi*4]

Other common use case involves to dereference an element of the array. In the below example the array consists of elements of size 8. On source-level there is no multiplication but on low-level there is.

mov [edi+edx*8],eax

If the value of the register by which the scale factor is multiplied is large enough an integer overflow can occur.

Look at the below instruction. Even the multiplication might not overflow the result can, due to base (esp) and displacement (8000).

mov [esp+ebx*4+8000],eax

Method of Inspection

Generally, it's not feasible to review all the occurrences of certain instructions but on critical areas it might be reasonable to do. Instruction tracing, and tracing like this can be a good start to narrow the area that can be inspected closer.

April 16, 2014

You May Not Need to Debug SSE Instructions

There are binaries that contain implementation of an algorithm in two ways. The first one is optimized to run on all architectures and so it consists of i386 instructions only. The second one is optimized to run fast and therefore it has SSE instructions. When the application runs it checks the architecture to decide which implementation of the algorithm to be executed.

It is common thing that binaries can contain various implementations of the same algorithm. One example is the Microsoft Visual C++ runtime.

You may not need to debug SSE instructions though. What you need to do is to tell your application that SSE support is not available - which is most likely a lie in 2014.

Recently, when I debugged a Windows application I noticed it executes SSE instructions. Here is how I got my application to believe that there is no SSE support available.

I knew about CPUID instruction. It can come back with plenty information about the processor. If CPUID is used with input EAX set to 1 feature information is returned in ECX and EDX.

We only need the SSE-related bits of the feature information. Here are they (source: Intel Developer Manual).

In ECX:
    Bit 0  SSE3 Extensions
    Bit 9  SSSE3 Extensions
    Bit 19 SSE4.1
    Bit 20 SSE4.2

In EDX:
    Bit 25 SSE Extensions
    Bit 26 SSE2 Extensions

The idea is when CPUID is executed with EAX set to 1 we need to clear SSE bits in ECX and EDX. To clear SSE bits we have to mask the registers like below.

ECX<-ECX&FFE7FDFE
EDX<-EDX&F9FFFFFF


I used the following Windbg command to search for CPUID instructions in the code section of the virtual image.

# cpuid <address> L?<size>

I saw CPUID at few places. I checked all of them to find the ones that have EAX set to 1 input. I found few fragments like these.

xor eax,eax
inc eax
cpuid

I put breakpoints just after each of the right CPUID instructions. When the breakpoint hit the SSE flags are cleared and the execution resumes.

bp <address> "reip; recx=ecx&0ffe7fdfe; redx=edx&0f9ffffff; gc"

And it worked as expected in my experiment. The application took the alternate, but slower, code path of i386 instructions.

A final note, this technique may be used to avoid debugging SSE instructions but it can also be useful to increase code coverage during security testing.

April 8, 2014

Examining Unknown Binary Formats

This post is about to discuss the methods for examining unknown binary formats that can be either a file, file fragment, or memory dump.

Before discussing the methods I'm describing few scenarios when examination of an unknown format is appropriate.

Imagine you deal with an application that handles a certain file type that you want to fuzz. You think to carry out some dumb fuzzing with a bit of improvement. Before doing so, you may be examining the format to create an approximate map of the layout. So you'll get an idea what parts of the files are worth fuzzing, and what fuzzing method is reasonable to apply for each part.

In other scenario you might have the binary file but don't have the program that parses it. You want to know as much as possible of the format of the binary file to understand it's layout.

If the application that reads the file format is available you can use debugger to watch how the data is parsed. This scenario is not discussed here.

If the application that writes the format is available you can try the following idea. You may produce output file using the application. This can be done by save, export, convert options available in the application. Next time when producing output you change something minor in the settings that may produce a similar output file. Comparing the two output files you may see what changed.

Entropy analysis is very useful to locate compressed, encrypted, and other way encoded data. Higher entropy can indicate encoding of some kind. Lower entropy is likely anything else including text, code, header, data structures. Redundancy analysis is analogue to entropy analysis; the lower the redundancy the most likely the data is encoded.

Encoded data could be anything, even multimedia data. The compressed streams can have headers and/or magic bytes identify the compression type.

Character distribution of the file can tell us a lot. Creating a byte frequency map is very straightforward by using modern programming languages. That can tell us what are the most and less frequent bytes. We can easily know what are the bytes that are not present at all.

Strings can be discovered even with popular tools like a hex-editor. Most common encodings are ASCII and Unicode. If there is no terminating zero the length of the string is likely stored somewhere in the binary. It's often the preceding byte(s) of the first letter of the string.

Consecutive patterns, byte sequences are seen to be used for padding, for alignment, or to fill slack space.

Random-looking printable characters can indicate some kind of encoding of any data in plain text.

Scattered bytes, scattered zeros, scattered 0FFh bytes can indicate sequence of encoded integers. Integers can be offsets and lengths. Scattered zeros might indicate text in Unicode format.

It could be useful to analyze the density of zeros, printable characters, or of other patterns. This could be applied on the whole file or on a particular region of the file.

Consecutive values, integers might indicate an array of pointers. It might be useful to know if the values increasing, decreasing, or random values.

Also, good to know in what endianness the integers stored.

x86 code can be be detected by running disassembler on the binary. If you see a sequence of meaningful instructions that might be code-area.

There is a simpler way to look for x86 code though. You write a small program in some high level language that searchers for E8 (CALL) / E9 (JMP) patterns and calculates the absolute offset where the instruction jumps. If there is an absolute offset referenced from different places that might be an entry point of a real function. The more functions are identified the better the chance you have found code.

If you know what native code to look for you can search for a sequence of common instructions, like bytes at function entry point.

Meaningful text fragment in high-entropy area might indicate run-length encoding which is also known as RLE compression.

There is data format that looks like this. It consists of a sequence of structures, or chunks. The size of each structure is encoded sometimes as a first value in the structure. It's commonly seen that a sequence of compressed data is stored like that.

If it's known the binary is associated with certain time stamp or version number those constants might worth searching for.

Some methods described here can be combined with xor-search, and with other simple decoding techniques to discover the structure of the file.

April 5, 2014

Thoughts About Finding Race Condition Bugs

Race condition bugs can exist in multi-threaded applications. Improper synchronization can be the root cause of race condition bugs.

Executing stress testing is a good start to find bugs. It might not be an ideal black-box testing method though as it is mostly for developers to test their proprietary software. Injecting delays at various points into the target could help finding bugs but we need to know the right locations to inject the delays. Cuzz is a Microsoft tool for finding concurrency bugs by injecting random delays - it looks promising.

Using DBI (Dynamic Binary Instrumentation) it's possible to tell if an EIP is executed, and if so by what thread(s). Therefore it's possible to tell what code is executed by what thread(s).

Using DBI it is also possible to tell where (value of EIP) the thread context switch happens.

By having the above information we can make educated guesses where to inject the delays.

If a bug is found it might not be reachable from outside. That's always a possibility. However it's good to see if you can provide input that makes the application to run longer near the location of the intended delay. There might be a ReadFile that can take longer to complete if the file is large enough. Or there might be a loop where the iteration count can be controlled by user...

April 3, 2014

Change of Execution Flow in Debugger

When debugging sometimes we need to force the execution to either take or not take the conditional jump.

There are several ways to achieve this. One possibility is to overwrite the conditional jump with either JMP or NOP instruction to force the execution into the desired path.

The next trick is to simply change the instruction pointer. The below example demonstrates to increment the instruction pointer by 2 in Windbg.

reip=eip+2

Another idea involves to see what are the conditions of taking or not taking the conditional jump. Knowing the conditions you can change the register or data at the right memory location to influence the execution flow.

My favorite is to change the x86 flags when the instruction pointer points to the conditional jump. Below is how to set the zero flag in Windbg.

rzf=1

To see more info about flags check out msdn or Windbg's help.

April 1, 2014

Tracking Down by Pin

Recently, there was a challenging situation I had faced. At first sight it looked like a common debugging problem that can be solved with some experiment but the more time I spent on it the more difficult the situation looked like.

The situation was the following. The below instruction reads memory.

00400000+006026de mov     eax,dword ptr [ebp+4]

What is the EIP of the instruction that writes [ebp+4]? This is all I wanted to know that stage.

Note, while looking at the instruction it looks like ebp+4 reads a stack address -- it reads actually a heap address.

First I was looking at the function if I can find the instruction in it that writes [ebp+4]. It wasn't there so I investigated the caller functions, and their callers, and so on. Again, it wasn't there but noticed something. The functions passed a pointer to a context as a parameter containing many variables including [ebp+4].

At this point I had a good reason to believe the situation looked difficult because the context is likely to set by an initializer that may be on a completely different code path to the one I was investigating.

You may ask why I didn't use processor breakpoint too see what instruction writes [ebp+4]. It was a heap address kept changing on every execution and the address was not known to put breakpoint on.

I could have gone back to the point when the structure is allocated, and I could have set a breakpoint relative to the base address and see what code writes [ebp+4]. That sounded good and I would have gone to that direction if I hadn't had a better idea.

I thought I could write a PinTool that tracks write and read memory accesses. It adds all instructions writing memory to the list. When the instruction that reads memory is reached the program searches the list for instructions wrote that address. Of course this has to be thread safe.

It took me a day to develop the PinTool and find the EIP that writes [ebp+4].

This is how I executed the PinTool from command line.

pin.exe -t TrackDownWrite.dll -ip 0x6026de -s 4 -- c:\work\<redacted>.exe

-ip is the instruction pointer where [ebp+4] is read
-s is the size of read/write to track

The result looked like this.

0bab05dc is read by 006026de
0bab05dc was written by 005ee358 before read


The wanted instruction is below.

00400000+005ee358 mov     dword ptr [eax+4],edx

The prototype is available for download. After finished my debugging task I also tested it a bit on Windows x86 and I think it looks useful for similar problems might arise in the future.

March 10, 2014

On-the-fly Switching Between Debuggers

Sometimes it's useful to switch between debuggers without restarting the target application. An example for doing so is when you want to use another debugger's capability that the one doesn't have. Here is how to do by using the well-known EB FE trick.
  • Instruct the debugger to break-in, and memorize the two bytes at EIP.
  • Replace the two bytes at EIP with EB FE that is JMP EIP.
  • Detach the debugger leaving the application in an endless loop.
  • Attach the other debugger to the running process.
  • Locate the thread of the endless loop by switching between threads, and when found, restore the two bytes you memorized.
  • Carry-on with the debugging using the other debugger.
Note, the patched thread could interfere with watchdog thread if any, however I haven't experienced it yet.

March 5, 2014

Trace And Watch

This is how I recently performed dynamic integer analysis on a 32-bit binary application that reads DWORD values from the file.
  • The file format contains many fields of type DWORD. There was given a sample file. I made as many copies of the sample file as many DWORD fields it had. I crafted each sample to have 0x41414141 in a DWORD field. Only one DWORD field was changed per sample so all DWORD fields were covered by the change.
  • I wrote a PinTool, called TraceAndWatch, for this occassion that checks the value of the general registers before every instruction is executed. It shows memory state including disassembly of the instruction when a register value matches 0x41414141.
  • I executed the application using TraceAndWatch and let the application to parse the first sample containing 0x41414141. TraceAndWatch produced a log and I saw what instructions using 0x41414141.
  • In static disassembly code, I located the instructions using 0x41414141 and saw arithmetic and comparison operations with that value.
  • In some cases I realized I can enter to other code path by changing 0x41414141 in the sample to other value e.g. to signed value like 0x88888888. And re-run the test with TraceAndWatch specifying to trace and watch instructions using 0x88888888.
  • I executed this manual test on all the samples produced earlier.

The following weaknesses can be audited by this approach.

CWE-839: Numeric Range Comparison Without Minimum Check
CWE-195: Signed to Unsigned Conversion Error
CWE-682: Incorrect Calculation
CWE-190: Integer Overflow or Wraparound
CWE-680: Integer Overflow to Buffer Overflow
CWE-191: Integer Underflow (Wrap or Wraparound)

Final Notes

This is a generic, and quick way to locate comparison and arithmetic of integers.

TraceAndWatch doesn't track other than general registers so you can loose track of integers when value copied to, like, SSE register.

When arithmetic is performed on the value e.g. 0x41414141 is multiplied by 2, you need to set TraceAndWatch to look for 0x82828282 not to loose the tracking.

TraceAndWatch is available for download on my OneDrive space. If you use it you may contact me with your experience.

February 19, 2014

Bug in Flash Player when processing PNG format

The Bug

The PNG file consists of a sequence of data structures called chunks. A chunk has a Length field that is a DWORD value. A specially crafted Length field can cause integer overflow in Flash Player leading to read out of the designated buffer. Here is the disassembly code snippet explaining the bug.

015344a0 e8f7feffff      call    FlashPlayer!WinMainSandboxed+0x1f1122 (0153439c) ;Read CHUNK.Length from attacker controlled buffer
015344a5 8bd8            mov     ebx,eax                                          ;CHUNK.Length = 0ffffffd3h
015344a7 6a04            push    4
015344a9 8d45fc          lea     eax,[ebp-4]
015344ac 50              push    eax
015344ad 8bce            mov     ecx,esi
015344bb e8dcfeffff      call    FlashPlayer!WinMainSandboxed+0x1f1122 (0153439c)
015344c0 8b4d08          mov     ecx,dword ptr [ebp+8]
015344c3 8901            mov     dword ptr [ecx],eax
015344c5 8b560c          mov     edx,dword ptr [esi+0Ch]                          ;Current Position in buffer = 29h
015344c8 8945fc          mov     dword ptr [ebp-4],eax
015344cb 8d441a04        lea     eax,[edx+ebx+4]                                  ;<-First integer overflow
                                                                                  ;TotalValue = Position + CHUNK.Length + 4
                                                                                  ;TotalValue = 29h + 0ffffffd3h + 4 = 0
015344cf 3b4610          cmp     eax,dword ptr [esi+10h]                          ;Compare TotalValue (0) to FileSize (3d0h)
015344d2 7351            jae     FlashPlayer!WinMainSandboxed+0x1f12ab (01534525) ;Unsigned evaluation. Jump is not taken
015344d4 57              push    edi
015344d5 6afc            push    0FFFFFFFCh
015344d7 58              pop     eax
015344d8 83cfff          or      edi,0FFFFFFFFh
015344db 3bd8            cmp     ebx,eax                                          ;Compare CHUNK.Length (0ffffffd3h) to hardcoded 0FFFFFFFCh
015344dd 7e26            jle     FlashPlayer!WinMainSandboxed+0x1f128b (01534505) ;Signed evaluation. Jump is taken.
[...]
01534505 8b4e14          mov     ecx,dword ptr [esi+14h]                          ;Set pointer to Buffer
01534508 03ca            add     ecx,edx                                          ;Set Current Position in Buffer
0153450a 03cb            add     ecx,ebx                                          ;<-Second integer overflow
                                                                                  ;Increment by CHUNK.Length leading to position out of the buffer backward
0153450c e88bfeffff      call    FlashPlayer!WinMainSandboxed+0x1f1122 (0153439c)
[...]
0153439c 0fb601          movzx   eax,byte ptr [ecx]                               ;<-Can read out of designated buffer
0153439f 0fb65101        movzx   edx,byte ptr [ecx+1]                             ;<-Can read out of designated buffer
015343a3 c1e008          shl     eax,8
015343a6 0bc2            or      eax,edx
015343a8 0fb65102        movzx   edx,byte ptr [ecx+2]                             ;<-Can read out of designated buffer
015343ac 0fb64903        movzx   ecx,byte ptr [ecx+3]                             ;<-Can read out of designated buffer
015343b0 c1e008          shl     eax,8
015343b3 0bc2            or      eax,edx
015343b5 c1e008          shl     eax,8
015343b8 0bc1            or      eax,ecx
015343ba c3              ret

State in the erroneous code path looks like below. The designated buffer containing the content of PNG file starts at 00e4c810 where the PNG signature is seen. Due to the bug the instruction reads the memory at 4 bytes minus the pointer to the buffer, at 00e4c80c. Note, the instruction doesn't cause access violation because the illegally accessed memory address is mapped.

0:000> t
eax=fffffffc ebx=ffffffd3 ecx=00e4c80c edx=00000029 esi=0019e134 edi=ffffffff
eip=0153439c esp=0019dbf4 ebp=0019dc08 iopl=0         nv up ei pl nz na pe cy
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000207
FlashPlayer!WinMainSandboxed+0x1f1122:
0153439c 0fb601          movzx   eax,byte ptr [ecx]         ds:002b:00e4c80c=00
0:000> db ecx
00e4c80c  00 00 00 00 89 50 4e 47-0d 0a 1a 0a 00 00 00 0d  .....PNG........
00e4c81c  49 48 44 52 00 00 01 2c-00 00 01 2c 08 02 00 00  IHDR...,...,....
00e4c82c  00 f6 1f 19 22 ff ff ff-d3 49 44 41 54 78 9c ed  ...."....IDATx..
00e4c83c  d9 31 8a c3 40 14 44 c1-1e e3 fb 5f 59 8a 9d 09  .1..@.D...._Y...
00e4c84c  1c bc 40 55 6c b4 20 70-f2 68 98 7f b6 6b bb ce  ..@Ul. p.h...k..
00e4c85c  ef df b6 f3 e8 9f f3 ad-6f 7d fb e7 b7 9f 01 a9  ........o}......
00e4c86c  ef 4e fd 13 e0 dd 44 08-31 11 42 4c 84 10 13 21  .N....D.1.BL...!
00e4c87c  c4 bc 8e 42 cc 12 42 4c-84 10 13 21 c4 44 08 31  ...B..BL...!.D.1

Root Cause

Two incorrect sanity checks were identified.

Incorrect sanity check (015344cf) because it happens after the overflow (015344cb).
Incorrect sanity check (015344db) because signed comparison is performed on CHUNK.Length that is unsigned.

Severity

The technical severity of this bug is low because diverting execution flow is not possible. Further analysis suggests that address disclosure is not possible because the memory region can be accessed out of the designated buffer doesn't contain address.

Reproduction

Open Flash Player 12.0.0.38 (flashplayer_12_sa.exe has a size of 10,339,208) in Windbg. Then execute the following command.
0:006> bp flashplayer + 001f44a0 2
0:006> g
Open the PoC in Flash Player (send me an e-mail for a copy). Debugger breaks-in so you can step through the disassembly code and see the data-flow as explained above.

I'm aware there is a new version of Flash Player 12.0.0.44. I verified and it's affected by this bug, too.

UPDATE On 26th February an Adobe engineer confirmed via e-mail that he could reproduce the bug.

February 13, 2014

Data Flow Tracking in Flash Player: Undocumented Bytecodes and JIT

Undocumented Bytecodes

I did some analysis how the bytecodes in DoABC tag parsed, and compared the result against what I saw in the AVM2 documentation (May 2007). I found that Flash Player can parse certain bytecodes that are not mentioned in the documentation.

Bytecode Note Bytecode Note Bytecode Note Bytecode Note
0x00 RESERVED 0x40 newfunction 0x80 coerce 0xc0 increment_i
0x01 UNDOCUMENTED 0x41 call 0x81 UNDOCUMENTED 0xc1 decrement_i
0x02 nop 0x42 construct 0x82 coerce_a 0xc2 inclocal_i
0x03 throw 0x43 callmethod 0x83 UNDOCUMENTED 0xc3 declocal_i
0x04 getsuper 0x44 callstatic 0x84 UNDOCUMENTED 0xc4 negate_i
0x05 setsuper 0x45 callsuper 0x85 coerce_s 0xc5 add_i
0x06 dxns 0x46 callproperty 0x86 astype 0xc6 subtract_i
0x07 dxnslate 0x47 returnvoid 0x87 astypelate 0xc7 multiply_i
0x08 kill 0x48 returnvalue 0x88 UNDOCUMENTED 0xc8 RESERVED
0x09 label 0x49 constructsuper 0x89 UNDOCUMENTED 0xc9 RESERVED
0x0a RESERVED 0x4a constructprop 0x8a RESERVED 0xca RESERVED
0x0b RESERVED 0x4b RESERVED 0x8b RESERVED 0xcb RESERVED
0x0c ifnlt 0x4c callproplex 0x8c RESERVED 0xcc RESERVED
0x0d ifnle 0x4d RESERVED 0x8d RESERVED 0xcd RESERVED
0x0e ifngt 0x4e callsupervoid 0x8e RESERVED 0xce RESERVED
0x0f ifnge 0x4f callpropvoid 0x8f RESERVED 0xcf RESERVED
0x10 jump 0x50 UNDOCUMENTED 0x90 negate 0xd0 getlocal_0
0x11 iftrue 0x51 UNDOCUMENTED 0x91 increment 0xd1 getlocal_1
0x12 iffalse 0x52 UNDOCUMENTED 0x92 inclocal 0xd2 getlocal_2
0x13 ifeq 0x53 UNDOCUMENTED 0x93 decrement 0xd3 getlocal_3
0x14 ifne 0x54 RESERVED 0x94 declocal 0xd4 setlocal_0
0x15 iflt 0x55 newobject 0x95 typeof 0xd5 setlocal_1
0x16 ifle 0x56 newarray 0x96 not 0xd6 setlocal_2
0x17 ifgt 0x57 newactivation 0x97 bitnot 0xd7 setlocal_3
0x18 ifge 0x58 newclass 0x98 RESERVED 0xd8 RESERVED
0x19 ifstricteq 0x59 getdescendants 0x99 RESERVED 0xd9 RESERVED
0x1a ifstrictne 0x5a newcatch 0x9a RESERVED 0xda RESERVED
0x1b lookupswitch 0x5b RESERVED 0x9b RESERVED 0xdb RESERVED
0x1c pushwith 0x5c RESERVED 0x9c RESERVED 0xdc RESERVED
0x1d popscope 0x5d findpropstrict 0x9d RESERVED 0xdd RESERVED
0x1e nextname 0x5e findproperty 0x9e RESERVED 0xde RESERVED
0x1f hasnext 0x5f UNDOCUMENTED 0x9f RESERVED 0xdf RESERVED
0x20 pushnull 0x60 getlex 0xa0 add 0xe0 RESERVED
0x21 pushundefined 0x61 setproperty 0xa1 subtract 0xe1 RESERVED
0x22 RESERVED 0x62 getlocal 0xa2 multiply 0xe2 RESERVED
0x23 nextvalue 0x63 setlocal 0xa3 divide 0xe3 RESERVED
0x24 pushbyte 0x64 getglobalscope 0xa4 modulo 0xe4 RESERVED
0x25 pushshort 0x65 getscopeobject 0xa5 lshift 0xe5 RESERVED
0x26 pushtrue 0x66 getproperty 0xa6 rshift 0xe6 RESERVED
0x27 pushfalse 0x67 UNDOCUMENTED 0xa7 urshift 0xe7 RESERVED
0x28 pushnan 0x68 initproperty 0xa8 bitand 0xe8 RESERVED
0x29 pop 0x69 RESERVED 0xa9 bitor 0xe9 RESERVED
0x2a dup 0x6a deleteproperty 0xaa bitxor 0xea RESERVED
0x2b swap 0x6b RESERVED 0xab equals 0xeb RESERVED
0x2c pushstring 0x6c getslot 0xac strictequals 0xec RESERVED
0x2d pushint 0x6d setslot 0xad lessthan 0xed RESERVED
0x2e pushuint 0x6e getglobalslot 0xae lessequals 0xee RESERVED
0x2f pushdouble 0x6f setglobalslot 0xaf greaterequals 0xef debug
0x30 pushscope 0x70 convert_s 0xb0 UNDOCUMENTED 0xf0 debugline
0x31 pushnamespace 0x71 esc_xelem 0xb1 instanceof 0xf1 debugfile
0x32 hasnext2 0x72 esc_xattr 0xb2 istype 0xf2 UNDOCUMENTED
0x33 RESERVED 0x73 convert_i 0xb3 istypelate 0xf3 RESERVED
0x34 RESERVED 0x74 convert_u 0xb4 in 0xf4 RESERVED
0x35 UNDOCUMENTED 0x75 convert_d 0xb5 RESERVED 0xf5 RESERVED
0x36 UNDOCUMENTED 0x76 convert_b 0xb6 RESERVED 0xf6 RESERVED
0x37 UNDOCUMENTED 0x77 convert_o 0xb7 RESERVED 0xf7 RESERVED
0x38 UNDOCUMENTED 0x78 checkfilter 0xb8 RESERVED 0xf8 RESERVED
0x39 UNDOCUMENTED 0x79 RESERVED 0xb9 RESERVED 0xf9 RESERVED
0x3a UNDOCUMENTED 0x7a RESERVED 0xba RESERVED 0xfa RESERVED
0x3b UNDOCUMENTED 0x7b RESERVED 0xbb RESERVED 0xfb RESERVED
0x3c UNDOCUMENTED 0x7c RESERVED 0xbc RESERVED 0xfc RESERVED
0x3d UNDOCUMENTED 0x7d RESERVED 0xbd RESERVED 0xfd RESERVED
0x3e UNDOCUMENTED 0x7e RESERVED 0xbe RESERVED 0xfe RESERVED
0x3f RESERVED 0x7f RESERVED 0xbf RESERVED 0xff RESERVED

The loop and the big switch statement parsing DoABC bytecode is near 0x6087e9. Instruction near 0x58f25d also reads bytecode. The documentation certainly needs an update on Adobe's side so developers can add the currently undocumented bytecodes to their decompiler/disassembler.

JIT

After adding new functionalities to my pintool I run it against Flash Player. Here is my observation.

When executing a flash file containing DoAction tag in Flash Player no memory page allocated with or set to *EXECUTE* flag. Thus no dynamically generated code was executed with the most common method. Therefore I think DoAction works with interpreted execution. Meaning every single bytecode run on isolation rather than a set of bytes compiled&run (JIT).

When executing a flash file containing DoABC tag in Flash Player I observed increased usage of VirtualAlloc. The page was allocated with PAGE_READWRITE flag. Later on the execution the page was set to PAGE_EXECUTE_READ and the execution flow was transferred to the page. When the execution was returned to the caller the page was set back to PAGE_READWRITE. I knew this was a part of how JIT works. Change of the memory protection flags is the mitigation for DEP.

0x5205a6 is a VirtualProtect call to change the memory protection flags. When it's called with PAGE_READWRITE it's called via 0x5fc39c. When it's called with PAGE_EXECUTE_READ it's called via 0x5fc2e9.

During my experiment I figured out that instruction at 0x5d20ef calls into the JIT-compiled code. Though this might not be the only address to call JIT-compiled code from. I observed many call backs in the JIT-compiled code. One of the callback might be to give continuous feedback to the caller for example if a long loop is being executed. I observed that constants are encrypted with xor instructions to make memory spraying more difficult. This is not new but first time for me to see. This is how 0x41414141 looks like when it's encrypted.
03af1f67 b83a7c1959      mov     eax,59197C3Ah
03af1f6c 357b3d5818      xor     eax,18583D7Bh
All offsets in this post are RVAs, that is relative to Flash Player's image base. Offsets are appropriate in Flash Player 12.0.0.38 (flashplayer_12_sa.exe has a size of 10,339,208).
  This blog is written and maintained by Attila Suszter. Read in Feed Reader.