| Courgette Internals |
| =================== |
| |
| Patch Generation |
| ---------------- |
| |
|  |
| |
| - courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch |
| generation by calling ensemble\_create.cc:GenerateEnsemblePatch |
| |
| - The files are read in by in courgette:SourceStream objects |
| |
| - ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which |
| uses MakeGenerator to create |
| patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes. |
| |
| - PatchGeneratorX86\_32's Transform method transforms the input file |
| using Courgette's core techniques that make the bsdiff delta |
| smaller. The steps it takes are the following: |
| |
| - _disassemble_ the old and new binaries into AssemblyProgram |
| objects, |
| |
| - _adjust_ the new AssemblyProgram object, and |
| |
| - _encode_ the AssemblyProgram object back into raw bytes. |
| |
| ### Disassemble |
| |
| - The input is a pointer to a buffer containing the raw bytes of the |
| input file. |
| |
| - Disassembly converts certain machine instructions that reference |
| addresses to Courgette instructions. It is not actually |
| disassembly, but this is the term the code-base uses. Specifically, |
| it detects instructions that use absolute addresses given by the |
| binary file's relocation table, and relative addresses used in |
| relative branches. |
| |
| - Done by disassemble:ParseDetectedExecutable, which selects the |
| appropriate Disassembler subclass by looking at the binary file's |
| headers. |
| |
| - disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler |
| |
| - disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler |
| |
| - disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler |
| |
| - The Disassembler replaces the relocation table with a Courgette |
| instruction that can regenerate the relocation table. |
| |
| - The Disassembler builds a list of addresses referenced by the |
| machine code, numbering each one. |
| |
| - The Disassembler replaces and address used in machine instructions |
| with its index number. |
| |
| - The output is an assembly\_program.h:AssemblyProgram class, which |
| contains a list of instructions, machine or Courgette, and a mapping |
| of indices to actual addresses. |
| |
| ### Adjust |
| |
| - This step takes the AssemblyProgram for the old file and reassigns |
| the indices that map to actual addresses. It is performed by |
| adjustment_method.cc:Adjust(). |
| |
| - The goal is the match the indices from the old program to the new |
| program as closely as possible. |
| |
| - When matched correctly, machine instructions that jump to the |
| function in both the new and old binary will look the same to |
| bsdiff, even the function is located in a different part of the |
| binary. |
| |
| ### Encode |
| |
| - This step takes an AssemblyProgram object and encodes both the |
| instructions and the mapping of indices to addresses as byte |
| vectors. This format can be written to a file directly, and is also |
| more appropriate for bsdiffing. It is done by |
| AssemblyProgram.Encode(). |
| |
| - encoded_program.h:EncodedProgram defines the binary format and a |
| WriteTo method that writes to a file. |
| |
| ### bsdiff |
| |
| - simple_delta.c:GenerateSimpleDelta |
| |
| Patch Application |
| ----------------- |
| |
|  |
| |
| - courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation |
| by calling ensemble\_apply.cc:ApplyEnsemblePatch |
| |
| - ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the |
| patch's header, then calls the overloaded version of |
| ensemble\_create.cc:ApplyEnsemblePatch. |
| |
| - The patch is read into an ensemble_apply.cc:EnsemblePatchApplication |
| object, which generates a set of patcher_x86_32.h:PatcherX86_32 |
| objects for the sections in the patch. |
| |
| - The original file is disassembled and encoded via a call |
| EnsemblePatchApplication.TransformUp, which in turn call |
| patcher_x86_32.h:PatcherX86_32.Transform. |
| |
| - The transformed file is then bspatched via |
| EnsemblePatchApplication.SubpatchTransformedElements, which calls |
| EnsemblePatchApplication.SubpatchStreamSets, which calls |
| simple_delta.cc:ApplySimpleDelta, Courgette's built-in |
| implementation of bspatch. |
| |
| - Finally, EnsemblePatchApplication.TransformDown assembles, i.e., |
| reverses the encoding and disassembly, on the patched binary data. |
| This is done by calling PatcherX86_32.Reform, which in turn calls |
| the global function encoded_program.cc:Assemble, which calls |
| EncodedProgram.AssembleTo. |
| |
| |
| Glossary |
| -------- |
| |
| **Adjust**: Reassign address indices in the new program to match more |
| closely those from the old. |
| |
| **Assembly program**: The output of _disassembly_. Contains a list of |
| _Courgette instructions_ and an index of branch target addresses. |
| |
| **Assemble**: Convert an _assembly program_ back into an object file |
| by evaluating the _Courgette instructions_ and leaving the machine |
| instructions in place. |
| |
| **Courgette instruction**: Replaces machine instructions in the |
| program. Courgette instructions replace branches with an index to |
| the target addresses and replace part of the relocation table. |
| |
| **Disassembler**: Takes a binary file and produces an _assembly |
| program_. |
| |
| **Encode**: Convert an _assembly program_ into an _encoded program_ by |
| serializing its data structures into byte vectors more appropriate |
| for storage in a file. |
| |
| **Encoded Program**: The output of encoding. |
| |
| **Ensemble**: A Courgette-style patch containing sections for the list |
| of branch addresses, the encoded program. It supports patching |
| multiple object files at once. |
| |
| **Opcode**: The number corresponding to either a machine or _Courgette |
| instruction_. |