asmDIFF | An x86 binary assembler search tool
asmDiff is an binary assembly search, diff and disassembly tool. It supports Windows PE (exe/dll) and Linux ELF binary format compiled for x86 and x68_64 architectures. It is particular useful when searching for asm functions, instructions or memory pointers in a patched, updated or otherwise modified binary. Try out the live demo and read the paper below.
Note: asmDIFF is still closed source as long as we do not know how to proceed with it. Feel free to contact us on questions, suggestions, or licensing request. However a fully useable win32 version is included in mmBBQ 3.0.0 and upwards.
Quick start:
- upload two binaries files. (exe/dll/elf)
- $> asmdiff search <ADDR> -o <OLDFILE> -n <NEWFILE>
- $> asmdiff diff -o /usr/bin/asmdiff -n /usr/bin/asmdiff
Hints:
- <ADDR> can have prefix '@' or '+': @0x12345678 (absolute) or +0x12345678 (relative to mod_base)
- If no prefix is given, absolute addressing is assumed
- The mode 'asmdiff file ...' can be used to parse and update complete C header files containing address definitions
|=--------------------------------=[ asmDIFF ]=-------------------------------=| |=----------------------------------------------------------------------------=| |=------------------=[ by defragger <rene@0xaa55.org> ]=----------------=| |=------------------=[ and willsteel <michael@willigens.de> ]=----------------=| |=----------------------------------------------------------------------------=| |=-----------------------------=[ Aug 04, 2011 ]=-----------------------------=| <---------------------------( 0x00 Table of Contents )-------------------------> - 0x10 Introduction - 0x20 PE file extraction - 0x30 Detecting logical blocks - 0x40 Finding function bounds - 0x41 Recursive method - 0x42 Linear method - 0x43 Combined method - 0x50 Function matching - 0x51 ASM metrics - 0x52 Metric distance - 0x53 Metric strength - 0x54 Metric profile - 0x55 Metric uniqueness - 0x56 Metric rank - 0x57 Metric score - 0x60 Data-Section - 0x61 Reverse lookup - 0x70 Diffing ASM binary code - 0x80 Use Cases - 0x81 Binary Search - 0x82 DLL Search - 0x83 Binary Diff - 0x84 Security Issues - 0x90 Next steps - 0x91 Branch instruction blocks - 0x92 Call hierarchy graph - 0xA0 <------------------------------------------------------------------------------> |=-------------------------=[ 0x10 Introduction ]=---------------------------=| The process of reverse engineering is an important step when hardening soft- and hardware systems. Because this method is rather expensive in a matter of development time, it can be very helpful to find differences or similarities between two assembly binaries of a different version. A very common example to reverse engineeres is the hated pointer update between binary patches. This is not a trivial task. Some of the most obvious problems are: address shift, obfuscation, register usage, compile flags, etc.. The approach of asmDIFF is to ignore the impact of those effects. This is done by using assembler code metrics that are robust against these effects. asmDIFF was started as part of a reverse engeenering project. The scientific nature of this approach made us work on this article. |=-------------------------=[ 0x20 PE file extraction ]=---------------------=| First part in building a reverse engineering tool is understanding the format of target binaries. The most important file-format for reversers is Microsoft's Portable Executable (also called PE). The second most interesting format is Unix/Linux ELF. For PE files any nessecary information can be found in MS's header file <winnt.h>. To reverse a given binary assembler file it is inevitable to read the various binary sections of the file. Those sections contain the executed code (.text sections) and initialized or uninitialized data (.data sections). There are platform specific differences but the basic idea behind it is the same. So basically what every executable file format does is to load the code/data to a certain address range. That can be fixed or random (ASLR). The different sections are loaded in a way so that they start at the beginning of a fresh memory page. The various access rights (execute, read, write) can be set for each section seperately. After that the import table (IAT) can be filled to match the current runtime environment. The first part in the PE file is the IMAGE_DOS_HEADER (winnt.h). The most important part here is the e_lfanew member of this struct. Its the offset to the IMAGE_NT_HEADER which contains the IMAGE_OPTIONAL_HEADER. The various sections are lying directly behind the IMAGE_NT_HEADERS: -- | void* first_section = IMAGE_DOS_HEADER->e_lfanew+sizeof(IMAGE_NT_HEADERS) -- That is the offset from the beginning of the opened file. The next section is sizeof(IMAGE_SECTION_HEADER) bytes behind the previous one. Section count is defined by IMAGE_NT_HEADER->FileHeader->NumberOfSections. The CPUs entry-point to a PE binary is: -- | IMAGE_OPTIONAL_HEADER->AddressOfEntryPoint+IMAGE_OPTIONAL_HEADER->ImageBase -- This section is also called the ".text" section of a PE file. Normally, the address of this section is: -- | IMAGE_OPTIONAL_HEADER->ImageBase+section->VirtualAddress -- The offset to the data of that section is: -- | section->PointerToRawData -- Finally the length of a section can obtained by: -- | section->Misc->VirtualSize. -- There are other sections that can be of interest as well, such as ".tls" or ".rdata" sections that contain static content. |=-------------------------=[ 0x30 Detecting logical blocks ]=----------------=| In order to show differences between arbitrary data one must be able to detect certain logical blocks from the nature of that data. In case of text files this is easy: newlines, paragraphs, pages. In the absence of logical blocks, one can only do things such as measuring the edit distance. When diffing machine level code it is obvisous to take the function calls and their boundaries as blocks. Other blocks can be defined as well but using the function bounds is a really good start. We are currently evaluating the usage of branch instructions as logical blocks. This is more complex but can help when functions are getting inlined. Also the call hierarchy which forms a graph like structure can be taken into account when comparing assembly blocks. A generic approach would be to use pattern recognition techniques to detect the logical blocks of the data automatically. This can be helpful when the nature of the analized data is completely unkown. |=------------------------=[ 0x40 Finding function bounds ]=------------------=| To find assembler function boundaries we first need to find the function entry-points. Those entry-points are typically referenced by a "CALL" instruction. Most of them are followed by the relative offset to the real entry-point address. There are three common ways of function calls: 1. The regular relative offset call instruction: CALL 0x0052F358 Note: The destination address is the relative offset from the next instruction. Meaning the length of the CALL instruction itself has to be taken into consideration. On 32Bit x86 assembler this is 5 bytes. This is where the mystic "len+=5" from various cadecaving tutorials came from. 2. The called address (absolute) is loaded as a static memory pointer into a register. After that a call is made on that register. Those calls may look like this: MOV EAX, [0x00B6EF2C] CALL EAX 3. The offset address is calculated during runtime in non-static fashion. This is most commonly used on callback methods or C++ classes, where a function pointer is attached to some kind of memory structure. MOV EAX, [ECX+0xD1] CALL EAX Its is hard to detect entry-points of dynamic function callbacks without running the programm itself. So we start by collecting the entry-points of the direct referenced functions. Following pseudo code illustrates this step: -- | while (instruction = assemble_next()) | | if (instruction.op == 0xE8) /* x86 CALL */ | add_entry_point(current_offset + current_cmd_len + instruction.arg1) | | if (instruction.asm == 'push' || instruction.asm == 'mov') | if (address_within_textsection(instruction.argX)) | add_entry_point(instruction.argX) -- There are other ways of asm function calls, but we make a break here to face the next problem of finding the function boundaries. At this time only the entry-points are known, but we need to know where a function starts, where it ends and where the next function starts. To make things more complicated, there are some functions where the entry-point is not the starting offset of a function. This happens when a function makes a back-jump in a region somewhere before the entry-point. |=-------------------------=[ 0x41 Recursive method ]=------------------------=| To find a functions boundaries we need to follow the path that the CPU will take through this function. As there are many conditional jumps we need to be able to follow each resulting sub-path through the function. That leads to the first basic approach by following each sub-path in recursive fashion until certain abort conditions are met. Following conditions suit our needs: 1. reaching an asm RET, RETN 2. crossing the previous or next asm function 3. reaching an invalid instruction 4. crossing another sub-path that has already been analyzed The advantage of this method is its accuracy. On the other hand, the major drawback is its performance. In regular assembly binaries there tons of assembler function with tons of sub-path that we need to follow. Sure the overall number can be reduced by making some assumprions. But even then it takes several minutes to analyze a 3Mbyte text section on a recent CPU. This leads us to the next paragraph. |=-------------------------=[ 0x42 Linear method ]=---------------------------=| Most of the asm functions start with their entry-point address and stop right before the first instruction of the next method, or at some INT3, NOPS etc. Sure, the used compiler could behave different, but most implementations behave this way because they are optimized for speed and it costs some CPU cycles to do a back jump. We can benefit of this fact by gathering as much information as possible in a single run through the text section. Following pseudo code illustrates this process: -- | function; | next_function; | last_function; | offset; | isntruction; | | while (instruction = assemble_next()) | function->max_address = Math.max(offset, function->max_address) | | boolean FIN = false; | | if (instruction == JUMP) | new_offset = get_jump_offset(instruction) | if (new_offset > function->max_address < next_function->min_address) | function->max_address = new_offset | if (new_offset < function->min_address > last_function->max_address) | function->min_address = new_offset | | // set terminator on negative jumps or reaching of next function | if (newoffset > offset || new_offest >= mext_function->min_address) | FIN = true; | | if (isntruction == "RET"/"RETN"/"INVALID") | FIN = true; | | if (FIN && off <= function->max_address) | function = next_function ... | skip_assemble_to(function->entry_point) -- This way we are only pushing the max_address of each function until the abort criteria is reached. As there are: 1. RET / RETX reched 2. Invalid instructions 3. Unconditional back-jumps The min_address is also pushed upwards, but resulting code that has not been analyzed yet is not processed. So we can detect a single back-jump but not a back-jump where the reached code also includes another back-jump. |=-------------------------=[ 0x43 Combined method ]=-------------------------=| The last thing that can happen now are gaps between function blocks where certain double back-jumps were taken. These can be easily scanned for and corrected by the recursive method introduced in prior in chapter [0x41]. |=-------------------------=[ 0x50 Function matching ]=-----------------------=| Now that we know where assembler functions begin and where they end we can analyze them. Our goal is to get a fingerprint of each function. Those fingerprints shall be robust agains compiler specific flags, optimizations, vendor etc.. Now, all functions can be compared against each other in order to find possible matches. So we also have to find a method to compare the function fingerprints in means of equalness. |=-------------------------=[ 0x51 ASM metrics ]=-----------------------------=| Robust and compareable fingerprints can be extracted using a form of information geometry. Each ASM function is measured multiple times, using a set of metics (N). This way we creat a N-dimensional vector space where each ASM functions defines a point within this space. To reduce the number of duplicates, we use serveral distinct metrics. Our current implementation has ~20 metrics. In order to work, a good metric has to match certiain criteria: 1. It counts some low-level logic of the assembled code. Good metrics count the usage of instructions that can hardly be replaced by other instructions. For example the square-root function "sqrt". Also counting the resources a function uses is a good idea: reference count, calls or stack_size. The idea behind that is that almost any compiler is forced to build executables that give a good performance when running the binary code. So we can assume they will use a similar number of "sqrt" calls in whatever configuration. 2. The metrics have to be distinct from each other in a way that they do not count the same thing partially or twice. A Counter-Example are two metrics where one counts the number of ASM bytes, and the other the number of instructions. It is easy to see that there exists some kind of correlation between those metrics. 3. It shall be robust against common compiler flags, optimizations or obfuscation effects. This is important so we are able to find the "same" function in a new binary even if it has changed a lot due to the usage of a different compiler configuration. 4. If (3) is not met in some case, the metric shall add a similar penalty to all analyzed functions so that the loss of accuracy can be compensated by other metrics. For example, if we count the general-purpose register usage, and the gcc flag "-fomit-fame-pointer" is used in the new binary. We add the same penalty to each method that is "-1". Note: Because EBP is now free to use the penalty may also be "0". |=-------------------------=[ 0x52 Metric distance ]=-------------------------=| To match functions agains each other we can simply look for the closest match in our n-dimensional vektor space. This can be done in geometric fashion: -- | metric_vevtor_a; | metric_fector_b; | delta = sqrt((b1-a1) + (b2-a2) + ... + (bn-an)) -- The metric distance also gives a good tuning point for the application when we are not able to find good matches. We simply multiply a priority vector in front of each metric, so that the influence of "bad" metrics can be reduced or disabled at all. Each element in such a vector has to be in the interval of [0.0 and 1.0]. |=-------------------------=[ 0x53 Metric strength ]=-------------------------=| There will be logical blocks that differ differ in size dramatically. Functions can consist of only a "RET" or be several kilobytes long). Thas a problem since our described algorithm will perform badly on short functions, because theres little or less can match. For that reason we introduce the "metric strength" property of a logicel block or function. Its basically the length of its metric vector within its euclidean n-dimensional vetctor space. It can be calculated easily but taking the square square root of the summed quadratic dimensions: SQRT(x^2 + y^2 + z^2 + ...) To match pointers with a low strength property can be handled by examining the call tree to that function. Theres a good chance that a high strength block is referencing / calling this low strength function. So it can be identified by a reverse lookup. First matching the strong block and then looking for low strength block by reference. |=-------------------------=[ 0x54 Metric profile ]=--------------------------=| In order to increase matching accuracy we can tune the metrics as noted in [0x52]. This can be done by incresing the "impact" for certain metrics that are known to be "good". This also reduces the remaing metric dimensions accordingly. So a profile has to be used that balances metrics against each others in a way that uncommon assembler properties will be favored. This process increases matching results dramatically; A small exmaples clearifies this process: We have two metrics. One counting referenced functions and pointer. The other one counting floating point division operations. Its obvious to see that the fisrt one will have a higher average appereance. Lets say for example an average block has 10 times higher chance of referencing pointers than performing an division operation. An average profile generation will result in a penalty of factor 0.5 for the referencing metric and an advantage of factor 5.0 for the divisional metric. So both metric will have the same average impact. There are other ways of tuning metrics to optimize results. asmDIFF currently uses an advanced version of the approch described here that flattens out spikes in average profile generation by using some math. |=-------------------------=[ 0x55 Metric uniqueness ]=-----------------------=| ... to be written when i find the time.