asmDIFF - An x86 binary assembler search tool

asmDiff is an binary assembly search, diff and disassembly tool. It supports Windows PE (exe/dll) and Linux ELF binary format compiled for x86 and x68_64 architectures. It is particular useful when searching for asm functions, instructions or memory pointers in a patched, updated or otherwise modified binary. Try out the live demo and read the paper below.

Note: asmDIFF is still closed source as long as we do not know how to proceed with it. Feel free to contact us on questions, suggestions, or licensing request. However a fully useable win32 version is included in mmBBQ 3.0.0 and upwards.

Quick start:


|=--------------------------------=[ asmDIFF ]=-------------------------------=|
|=------------------=[ by  defragger <>      ]=----------------=|
|=------------------=[ and willsteel <> ]=----------------=|
|=-----------------------------=[ Aug 04, 2011 ]=-----------------------------=|

<---------------------------( 0x00 Table of Contents )------------------------->

- 0x10   Introduction
- 0x20   PE file extraction
- 0x30   Detecting logical blocks
- 0x40   Finding function bounds
  - 0x41   Recursive method
  - 0x42   Linear method
  - 0x43   Combined method
- 0x50   Function matching
  - 0x51   ASM metrics
  - 0x52   Metric distance
  - 0x53   Metric strength
  - 0x54   Metric profile
  - 0x55   Metric uniqueness
  - 0x56   Metric rank
  - 0x57   Metric score
- 0x60   Data-Section
  - 0x61   Reverse lookup
- 0x70   Diffing ASM binary code
- 0x80   Use Cases
  - 0x81   Binary Search
  - 0x82   DLL Search
  - 0x83   Binary Diff
  - 0x84   Security Issues
- 0x90   Next steps
  - 0x91   Branch instruction blocks
  - 0x92   Call hierarchy graph
- 0xA0


|=-------------------------=[ 0x10  Introduction ]=---------------------------=|

The process of reverse engineering is an important step when hardening soft-
and hardware systems. Because this method is rather expensive in a matter of
development time, it can be very helpful to find differences or similarities
between two assembly binaries of a different version. A very common example to
reverse engineeres is the hated pointer update between binary patches. This is
not a trivial task. Some of the most obvious problems are: address shift, 
obfuscation, register usage, compile flags, etc..

The approach of asmDIFF is to ignore the impact of those effects. This is done 
by using assembler code metrics that are robust against these effects.

asmDIFF was started as part of a reverse engeenering project. The scientific
nature of this approach made us work on this article.

|=-------------------------=[ 0x20  PE file extraction ]=---------------------=|

First part in building a reverse engineering tool is understanding the format
of target binaries. The most important file-format for reversers is Microsoft's
Portable Executable (also called PE). The second most interesting format is
Unix/Linux ELF. For PE files any nessecary information can be found in MS's
header file <winnt.h>. To reverse a given binary assembler file it is
inevitable to read the various binary sections of the file. Those sections
contain the executed code (.text sections) and initialized or uninitialized
data (.data sections). There are platform specific differences but the basic
idea behind it is the same.

So basically what every executable file format does is to load the code/data 
to a certain address range. That can be fixed or random (ASLR). The different
sections are loaded in a way so that they start at the beginning of a fresh
memory page. The various access rights (execute, read, write) can be set for
each section seperately. After that the import table (IAT) can be filled to
match the current runtime environment.

The first part in the PE file is the IMAGE_DOS_HEADER (winnt.h). The most
important part here is the e_lfanew member of this struct. Its the offset to
the IMAGE_NT_HEADER which contains the IMAGE_OPTIONAL_HEADER. The various
sections are lying directly behind the IMAGE_NT_HEADERS: 
| void* first_section = IMAGE_DOS_HEADER->e_lfanew+sizeof(IMAGE_NT_HEADERS)
That is the offset from the beginning of the opened file. The next section is
sizeof(IMAGE_SECTION_HEADER) bytes behind the previous one. Section count is 
defined by IMAGE_NT_HEADER->FileHeader->NumberOfSections. The CPUs entry-point 
to a PE binary is:
This section is also called the ".text" section of a PE file. Normally, the 
address of this section is: 
| IMAGE_OPTIONAL_HEADER->ImageBase+section->VirtualAddress
The offset to the data of that section is:
| section->PointerToRawData
Finally the length of a section can obtained by:
| section->Misc->VirtualSize.
There are other sections that can be of interest as well, such as ".tls" or
".rdata" sections that contain static content.

|=-------------------------=[ 0x30 Detecting logical blocks ]=----------------=|

In order to show differences between arbitrary data one must be able to detect
certain logical blocks from the nature of that data. In case of text files this
is easy: newlines, paragraphs, pages. In the absence of logical blocks, one can
only do things such as measuring the edit distance. When diffing machine level
code it is obvisous to take the function calls and their boundaries as blocks.
Other blocks can be defined as well but using the function bounds is a really
good start. 

We are currently evaluating the usage of branch instructions as logical blocks.
This is more complex but can help when functions are getting inlined. Also the
call hierarchy which forms a graph like structure can be taken into account
when comparing assembly blocks. A generic approach would be to use pattern
recognition techniques to detect the logical blocks of the data automatically.
This can be helpful when the nature of the analized data is completely unkown. 

|=------------------------=[ 0x40 Finding function bounds ]=------------------=|

To find assembler function boundaries we first need to find the function
entry-points. Those entry-points are typically referenced by a "CALL"
instruction. Most of them are followed by the relative offset to the real
entry-point address. There are three common ways of function calls:

1. The regular relative offset call instruction:
 CALL 0x0052F358
Note: The destination address is the relative offset from the next instruction.
Meaning the length of the CALL instruction itself has to be taken into
consideration. On 32Bit x86 assembler this is 5 bytes. This is where the mystic
"len+=5" from various cadecaving tutorials came from.

2. The called address (absolute) is loaded as a static memory pointer into a 
register. After that a call is made on that register. Those calls may look like
 MOV EAX, [0x00B6EF2C] 

3. The offset address is calculated during runtime in non-static fashion. This
is most commonly used on callback methods or C++ classes, where a function 
pointer is attached to some kind of memory structure. 
 MOV EAX, [ECX+0xD1] 

Its is hard to detect entry-points of dynamic function callbacks without
running the programm itself. So we start by collecting the entry-points of the
direct referenced functions. Following pseudo code illustrates this step:
| while (instruction = assemble_next())
|    if (instruction.op == 0xE8) /* x86 CALL */
|        add_entry_point(current_offset + current_cmd_len + instruction.arg1)
|    if (instruction.asm == 'push' || instruction.asm == 'mov')
|       if (address_within_textsection(instruction.argX))
|           add_entry_point(instruction.argX)

There are other ways of asm function calls, but we make a break here to face
the next problem of finding the function boundaries. At this time only the
entry-points are known, but we need to know where a function starts, where it
ends and where the next function starts. To make things more complicated, there
are some functions where the entry-point is not the starting offset of a
function. This happens when a function makes a back-jump in a region somewhere
before the entry-point.

|=-------------------------=[ 0x41 Recursive method ]=------------------------=|

To find a functions boundaries we need to follow the path that the CPU will
take through this function. As there are many conditional jumps we need to be
able to follow each resulting sub-path through the function. That leads to the
first basic approach by following each sub-path in recursive fashion until
certain abort conditions are met. Following conditions suit our needs:

 1. reaching an asm RET, RETN
 2. crossing the previous or next asm function
 3. reaching an invalid instruction
 4. crossing another sub-path that has already been analyzed

The advantage of this method is its accuracy. On the other hand, the major
drawback is its performance. In regular assembly binaries there tons of
assembler function with tons of sub-path that we need to follow. Sure the
overall number can be reduced by making some assumprions. But even then it
takes several minutes to analyze a 3Mbyte text section on a recent CPU. This
leads us to the next paragraph.

|=-------------------------=[ 0x42 Linear method ]=---------------------------=|

Most of the asm functions start with their entry-point address and stop right
before the first instruction of the next method, or at some INT3, NOPS etc. 
Sure, the used compiler could behave different, but most implementations behave
this way because they are optimized for speed and it costs some CPU cycles to 
do a back jump. We can benefit of this fact by gathering as much information as
possible in a single run through the text section. Following pseudo code 
illustrates this process:

| function;
| next_function;
| last_function;
| offset;
| isntruction;
| while (instruction = assemble_next())
|     function->max_address = Math.max(offset, function->max_address)
|     boolean FIN = false;
|     if (instruction == JUMP)
|         new_offset = get_jump_offset(instruction)
|         if (new_offset > function->max_address < next_function->min_address)
|             function->max_address = new_offset
|         if (new_offset < function->min_address > last_function->max_address)
|             function->min_address = new_offset
|         // set terminator on negative jumps or reaching of next function
|         if (newoffset > offset || new_offest >= mext_function->min_address)
|             FIN = true;
|     if (isntruction == "RET"/"RETN"/"INVALID")
|         FIN = true;
|     if (FIN && off <= function->max_address)
|         function = next_function ...
|         skip_assemble_to(function->entry_point)

This way we are only pushing the max_address of each function until the abort
criteria is reached. As there are:

 1. RET / RETX reched
 2. Invalid instructions
 3. Unconditional back-jumps

The min_address is also pushed upwards, but resulting code that has not been
analyzed yet is not processed. So we can detect a single back-jump but not a
back-jump where the reached code also includes another back-jump.

|=-------------------------=[ 0x43 Combined method ]=-------------------------=|

The last thing that can happen now are gaps between function blocks where
certain double back-jumps were taken. These can be easily scanned for and
corrected by the recursive method introduced in prior in chapter [0x41].

|=-------------------------=[ 0x50 Function matching ]=-----------------------=|

Now that we know where assembler functions begin and where they end we can
analyze them. Our goal is to get a fingerprint of each function. Those
fingerprints shall be robust agains compiler specific flags, optimizations,
vendor etc.. Now, all functions can be compared against each other in order to
find possible matches. So we also have to find a method to compare the function
fingerprints in means of equalness.

|=-------------------------=[ 0x51 ASM metrics ]=-----------------------------=|

Robust and compareable fingerprints can be extracted using a form of
information geometry. Each ASM function is measured multiple times, using a set
of metics (N). This way we creat a N-dimensional vector space where each ASM
functions defines a point within this space. To reduce the number of
duplicates, we use serveral distinct metrics. Our current implementation has
~20 metrics. In order to work, a good metric has to match certiain criteria:

 1. It counts some low-level logic of the assembled code. Good metrics count
 the usage of instructions that can hardly be replaced by other instructions.
 For example the square-root function "sqrt". Also counting the resources a
 function uses is a good idea: reference count, calls or stack_size. The idea
 behind that is that almost any compiler is forced to build executables that
 give a good performance when running the binary code. So we can assume they
 will use a similar number of "sqrt" calls in whatever configuration.

 2. The metrics have to be distinct from each other in a way that they do not
 count the same thing partially or twice. A Counter-Example are two metrics
 where one counts the number of ASM bytes, and the other the number of
 instructions. It is easy to see that there exists some kind of correlation
 between those metrics.

 3. It shall be robust against common compiler flags, optimizations or
 obfuscation effects. This is important so we are able to find the "same"
 function in a new binary even if it has changed a lot due to the usage of a
 different compiler configuration.

 4. If (3) is not met in some case, the metric shall add a similar penalty to
 all analyzed functions so that the loss of accuracy can be compensated by
 other metrics. For example, if we count the general-purpose register usage,
 and the gcc flag "-fomit-fame-pointer" is used in the new binary. We add the
 same penalty to each method that is "-1". Note: Because EBP is now free to use
 the penalty may also be "0".

|=-------------------------=[ 0x52 Metric distance ]=-------------------------=|

To match functions agains each other we can simply look for the closest match
in our n-dimensional vektor space. This can be done in geometric fashion:

| metric_vevtor_a;
| metric_fector_b;
| delta = sqrt((b1-a1) + (b2-a2) + ... + (bn-an))

The metric distance also gives a good tuning point for the application when we
are not able to find good matches. We simply multiply a priority vector in
front of each metric, so that the influence of "bad" metrics can be reduced or
disabled at all. Each element in such a vector has to be in the interval of
[0.0 and 1.0].

|=-------------------------=[ 0x53 Metric strength ]=-------------------------=|

There will be logical blocks that differ differ in size dramatically. Functions
can consist of only a "RET" or be several kilobytes long). Thas a problem since
our described algorithm will perform badly on short functions, because theres 
little or less can match.

For that reason we introduce the "metric strength" property of a logicel block 
or function. Its basically the length of its metric vector within its euclidean
n-dimensional vetctor space. It can be calculated easily but taking the square 
square root of the summed quadratic dimensions: SQRT(x^2 + y^2 + z^2 + ...)

To match pointers with a low strength property can be handled by examining the 
call tree to that function. Theres a good chance that a high strength block is 
referencing / calling this low strength function. So it can be identified by a 
reverse lookup. First matching the strong block and then looking for low 
strength block by reference.

|=-------------------------=[ 0x54 Metric profile ]=--------------------------=|

In order to increase matching accuracy we can tune the metrics as noted in 
[0x52]. This can be done by incresing the "impact" for certain metrics that are 
known to be "good". This also reduces the remaing metric dimensions accordingly.
So a profile has to be used that balances metrics against each others in a way 
that uncommon assembler properties will be favored. This process increases 
matching results dramatically;

A small exmaples clearifies this process: We have two metrics. One counting 
referenced functions and pointer. The other one counting floating point division
operations. Its obvious to see that the fisrt one will have a higher average 
appereance. Lets say for example an average block has 10 times higher chance of 
referencing pointers than performing an division operation. An average profile
generation will result in a penalty of factor 0.5 for the referencing metric 
and an advantage of factor 5.0 for the divisional metric. So both metric will 
have the same average impact.

There are other ways of tuning metrics to optimize results. asmDIFF currently 
uses an advanced version of the approch described here that flattens out spikes 
in average profile generation by using some math.

|=-------------------------=[ 0x55 Metric uniqueness ]=-----------------------=|

... to be written when i find the time.