The blog continues at suszter.com/ReversingOnWindows

December 28, 2011

Software and Visual Thinking

Being said that the design of Graphical User Interface should not introduce new functionality in the application. The GUI and the functionality should be separated, and implemented on different levels which pretty much seems to be logical for me, but I'd add something important to this general claim.

There are people, including me, who are more of visual types. For me, visual things excite the imagination and when representing things visually, I can create new things from them that have some kind of value.

It happened to me numerous times to get new ideas involving awesome functionalities during the design of GUI. These ideas didn't come to my mind when I was focusing exclusively on functionality design.

Freedom inhibits creativity. When you're on the design of GUI, there are many restrictions (that you can see) and this stimulates the brain to create some new ideas.

I've imagined an application consists of multiple functions. Actually, it is a mix and improvement of some tools I wrote earlier. I'd put these small ideas into one application, by doing so, there is the possibility to connect those small functions to work together and would take the advantage of it. Also, I have some ideas about functionalities that I've seen in other applications, albeit that need change according to imagination.

The basic is as follows. The input is the (unknown) binary file. You can browse the file, can discover the structure of it, can display distribution diagrams and others. Also, can apply particular algorithms on data such as decoding compressed stream or display multimedia content.

December 24, 2011

Analysis of Data Formats by Guessing - Part II

I'm writing about how to find possible TIMESTAMP values in raw binary data.

Here is the first part of data format analysis by guessing but you can read this entry independently anyway.

TIMESTAMP is a DWORD value, represents Unix time, and we assume it's encoded as little-endian somewhere in the binary.

The prototype program reads the binary data into a byte array. It iterates through the array elements by reading four bytes at each position in the array, interprets as DWORD, and compares if the value falls into the date range we are after.

The looser date range you set the more likely to find false positives in your search result.

I've tested the code with a relatively strict date range, i.e. one month, and didn't find any false positives but did find TIMESTAMP. There were cases when multiple TIMESTAMPs found in PE files; for example one in the header and one in the .rdata section.

If you cannot set a loose date range, and there is a likelihood you came across to false positives you can filter out the results by applying some of the below.
  • You know more about the date e.g. it cannot be Saturday or Sunday?
  • It should be in a high entropy region of the dump, or on the contrary?
  • There might be other values stored around, do you know more about those values? Presence of scattered zeroes?
  • Should it be aligned to some value?

December 18, 2011

Analysis of Data Formats by Guessing - Part I

Usually, the most straightforward way to do an analysis of data format is to observe how the data is parsed by the application. However, there might be cases when the application parses the data is not available. In that case, the analysis migh be done by relying on the data itself and on general tools only.

In this post, I discuss three random ideas that might be helpful to explore unknown data formats without using the application parses the data.
  • To spot the presence of run-length encoding (RLE) you might look at string chunks in a relatively non-redundant (high entropy) data dump. String chunks might indicate that the original data contains string; also the string chunks in RLE dump might be found in the original data multiple times, likely, as a part of longer strings.
  • To spot the presence of IA32 code, you might look at instruction patterns such as CALL or JUMP in a relatively redundant data dump. The above instructions have bytecode of E8 and E9 followed by four bytes relative address that could also be included in the validation for more precise discovery.
  • Repetitive bytes could mean padding bytes, for example, to align a structure. Scattered bytes, usually 00s or FFs, could indicate the representation of positive or negative numbers. It could mean either pointer, or offset, or table of numbers, etc.
I call this process abductive reverse engineering. We don't have deductive evidence about the meaning of the structures in the data but we are able to form an explanatory hypothesis about their meaning.

December 16, 2011

An Infinite Loop Bug

In this entry, I'm describing an infinite loop bug caused by integer overflow. The bug involves data stream and erroneous algorithm.

Data stream looks like below.

[size][data] [size][data] ... [size][data]

The stream consists of records. The record consists of the size field followed by the data field.

The problem lies when the algorithm calculates the offset of record like below.

Offset of data plus size. The result of this operation could lead to overflow. If this fields are not properly sanitized, and the result points to previous record, the previous records will be processed again, and again...

I observed that multiple high-profile applications have this weakness during processing similar data structures could lead application hang.

It is possible to detect this algorithm weakness by code review but you certainly have an eye on what you are reviewing.

It is unlikely to detect this weakness by fuzz testing because the result of the overflow must point to previous record. If size is at least 32-bit wide you have a tiny chance to hit any exact result that would point to previous record.

I also call this kind of problem as stream hijacking.

December 3, 2011

Finding the Question

Some time ago, you saw something you were impressed about it. That time, you didn't have a detailed understanding about the inner working of the thing but you found it interesting anyway.

These days, you need to accomplish something. You think this would be similar one to the thing you've seen before. You are trying to look after it on the Internet but you can't find any useful material. You ask your friends about it and they are able to provide something but those don't seem to be relevant for you.

You're looking into the technology to research more info. You find things that seem to be close to the one you've imagined however they include other approach that is not suitable to you. The good new is, at least, you've realized it is possible to accomplish something in a way even if it's not suitable.

More findings reveal that you can't get the thing in a way you've originally imagined. However, you know what you can get, and you know a way how to get that even though that way is not suitable to you. This point you know the question. So you have to look after the answer now.

You know if you do reverse engineering you'll find the answer but actually it looks to be a long process at first glance. You restrict the code by try to focus on the relevant areas and you realized that you can use symbols that makes the code human readable. The start seems to be tedious but you keep looking and exploring the code even if you don't exactly know what you're doing. You have finally found something you are after. It's a small but impressive finding.

You realized to understand much more about the technology by this small finding. You're more confident now. You have an idea how similar technologies are designed.

October 24, 2011

Fuzzing ROUNDs

When you have already implemented some functionality in your application you might be thinking about how could you get even more with less coding, for example by reusing existing code.

This weekend, I wanted to code less, and I had an idea. I've updated my Flash fuzzer with a new functionality that I call ROUND. It's a very simple addition and I expect it to be powerful.

If you use the round command line switch you can define how many fuzzing rounds you want to apply on a particular file. If you use round=1, that wouldn't result any change, it's just a normal one round fuzzing, you should get the same result as without using round switch. However, if you use for example round=2, that means, the fuzzing will be applied twice on the particular file. It is similar to the case when you perform fuzzing on sample that already fuzzed, but you can now achieve it with a command line switch.

So that was the idea for the weekend.

August 29, 2011

Experiences with Signedness

Signed comparasion involves the following instructions: JG, JGE, JL, JLE, JNG, JNGE, JNL, JNLE. Unsigned comparasion: JA, JAE, JB, JBE, JNA, JNAE, JNB, JNBE.

Let's assume value variable has a type of signed integer, that is compared to particular const values. The following will perform signed comparison.
if (value < 0x7fffffff) {}
The following will perform unsigned comparison.
if (value < 0x80000000) {}
Inference: 0x7fffffff is stored as signed integer but 0x80000000 is stored as unsigned integer.

Let's take the above example except value variable has a type of signed char, rather than signed integer, that is compared to particular const values. The results regarding the comparison will be the same as above.
Inference: the comparison will be performed according to the signedness of larger type.

Let's assume value1 has a type of signed integer, and value2 has a type of unsigned integer. The values are compared and the comparison will be unsigned as you have guessed.
if (value1 < value2) {}

Let's assume value1 has a type of unsigned char, and value2 has a type of signed integer. The values are compared and the comparison will be signed as you have guessed.
if (value1 < value2) {}

Important, that the above based on observation, on Windows OS, on IA-32 architecture. When you develop a program it's better to rely on the definition of the language rather than solely on observation.

July 15, 2011

Randomization in SWFz

First of all, SWFz is a static security testing tool (not released, sorry) that is, as the name implies, aware of the Flash file format. Originally, it has been developed to generate test samples for Adobe Flash Player but surely the samples generated by this tool can be used to test any application that parses Flash file.

There are numerous methods embedded in the application to create fuzzed samples. The main reason for this is the demand for different fuzzing opportunities. Additionally, I added methods when parsing of some structures hadn't been completed yet but some ones had. These methods weren't planned but I thought to be a good idea to add them until I finish the relevant code, after all it still provides some creativity in the test cases.

So there are methods being primitive by ignoring some structures and there are methods being much structure aware for targeted testing. Then there are the between methods.

There is one important thing that's same among the methods: the fact of randomization.

As you envisage, SWFz has been evolutionary evolving. Initially, a simple method had been implemented that is dumb fuzzing; I call it fuzzing of linear blocks. The program divides the file into linear blocks, and can overwrite the byte at random position by a random value in each block. That time, I wanted to start with a simple feature and I did it by this. The position of each block and the byte value for altering the original one were chosen by random.

One of the next evolution stage was that the tool is able to locate the position and the size of the methods storing the actionscript bytecodes. Clearly, with this it's possible to launch more targeted fuzzing. The program can overwrite the bytecode at random position by a random value in each method. The position of each method and the byte value for altering the bytecode were chosen by random.

One of the next evolution stage was that the tool is able to locate the position and the size of the methods storing the actionscript bytecodes, moreover it is able to locate the position of each bytecode. The program can overwrite the opcode part of the bytecode at random position by a random opcode defined in a set. Practically, it means the program is able to replace the instruction with an other instruction that has the same size and properties.

I could have used the computer clock to generate the random numbers, that might work, I however have chosen a predictable random generator method called Mersenne twister. Using this, I can safely reproduce the test cases at any time, and the testing of the fuzzer can be done in more straightforward way.

June 12, 2011

The forensic example

I've been busy with a new project involving data formats. It's my C++ training. In the past, I developed a lot in C, was working on file format parser modules. Now there is something new here. Firstly, it's C++. Secondly, it's about data formats, and data structures. Although it will contain parser modules this is not about parsers.

Let's assume a crime. There is some evidence that is a file fragment. You might think it's hard to imagine but let's just think the file fragment is recovered from a corrupted disk. You need to extract evidence from that file. You need to make the best explanation what is the file for. Where is it from? Who or what did produce it? Maybe the meaning of that file depends on a proprietary application but that is not available and you need to extract as many information as possible to draw conclusion. How would you analyze that binary fragment? You might open it in your hex editor, launch entropy analysis on it, extract the printable strings, look for padding bytes, for possible header fields, for code of IA32 or ARM, or for constant values. That's very nice but doing everything manually it's a bit time consuming, and the murder is still out.

This tool is about analyzing binary objects involving unknown ones, discovering the structure of the object. The forensic example mentioned above is just one; I could have said how important to discover the underlying structure of objects for targeted fuzzing, too.
  This blog is written and maintained by Attila Suszter. Read in Feed Reader.