11 - Mach-O Modification ------------------------- This tutorial deals with Mach-O format modification and introduces some internal aspects of the format. Files and scripts used in this tutorial are available on the `tutorials repository `_ By Romain Thomas - `@rh0main `_ ------ Introduction ~~~~~~~~~~~~ A basic Mach-O binary (i.e. not FAT) can be represented in fours parts that are described in this diagram: .. figure:: ../_static/tutorial/11/image1.png :align: center The first part begins with a header that can be accessed through the :attr:`lief.MachO.Binary.header` attribute. In the second part, we have the load commands table which can be iterated using :attr:`lief.MachO.Binary.load_commands` then, we can optionally have padding or free space. Finally, we have the raw data (assembly code, rebase bytecode, signature, ...) Load commands like :class:`~lief.MachO.SegmentCommand` or :class:`~lief.MachO.DyldInfo` can be associated with *raw data* that are located after the load command table and the padding section. The padding section is used by OSX to sign the binary after the compilation by adding a custom command. The ``codesign`` utility extends the *raw data* area with the signature and adds a ``LC_CODE_SIGNATURE`` or a ``LC_DYLIB_CODE_SIGN_DRS`` command in the padding area. Since load commands are the base unit of the Mach-O format -- segments, shared libraries, entry point, etc are somehow *commands* -- being able to add arbitrary commands in a binary enables interesting like code injection, anti-analysis, ... Different techniques exist to add new command in a Mach-O binary: * One can replace an existing load command that is not mandatory for the execution like :class:`~lief.MachO.UUIDCommand` or :class:`~lief.MachO.CodeSignature`. * One can use the padding area add to the command header. The main limitation of these techniques is that the size and the number of commands that can be added are tied to the padding section size or to the size of the command replaced. If the padding size is tiny, we can't add a ``LOAD_DYLIB`` command with a *very long* library path. Moreover ``codesign`` may complain that there are not enough spaces to add the ``LC_CODE_SIGNATURE`` since we are using the space that was reserved for it. Next parts are about format modifications and how we managed to address this limitation. When PIE makes thing easier ~~~~~~~~~~~~~~~~~~~~~~~~~~~ OSX and iOS executables are by default compiled with flags that make them position independent. Instructions generated by the compiler will use relative addressing associated with *rebase* information. To simplify (not accurate), PIE binaries enable to *map* the raw data section at a random base address. The idea that is implemented in LIEF is that raw data section can be mapped at a random base address so it can also be shifted within the format. .. figure:: ../_static/tutorial/11/image2.png :align: center Such transformation also requires to keep a consistent state of the format metadata. Especially, when we shift the raw data we need to update relocations, segment offsets, virtual address, etc. Once the raw data shifted and the metadata updated, we have an arbitrary space between the load command table and the raw data section. Thus we can extend the load command table as shown in figure below: .. figure:: ../_static/tutorial/11/image3.png :align: center .. warning:: The size of the shift must be aligned on a page size to avoid issues with section and segment alignments. Keeping the format consistency after the shift transformation is not easy. The next part presents some parts of the Mach-O format that need to be updated in order to keep the consistency. When Mach-O makes thing harder ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After the shift operation, we need to update several load commands of the Mach-O format: * :attr:`lief.MachO.SymbolCommand.symbol_offset` / :attr:`lief.MachO.SymbolCommand.strings_offset` * :attr:`lief.MachO.DataInCode.data_offset`, :attr:`lief.MachO.CodeSignature.data_offset`, :attr:`lief.MachO.SegmentSplitInfo.data_offset` * :attr:`lief.MachO.MainCommand.entrypoint` * :attr:`lief.MachO.FunctionStarts.data_offset` / :attr:`lief.MachO.FunctionStarts.functions` * :class:`~lief.MachO.DynamicSymbolCommand` * :attr:`lief.MachO.Section.offset` / :attr:`lief.MachO.Section.virtual_address` * :attr:`lief.MachO.SegmentCommand.offset` / :attr:`lief.MachO.SegmentCommand.virtual_address` * ... We also need to update: * Relocations * Binding information * Export information Whereas ELF and PE formats use some kinds of ``struct`` for internal storage of relocations and exports, Mach-O format uses a bytecode to *rebase* the binary. Export information are stored in a `trie `_ data structure. The use of trie and bytecode reduces the binary size but it makes the update more difficult as we need to interpret and regenerate the bytecode. Rebase bytecode *************** As mentioned in the previous part, recent Mach-O loader uses a bytecode to relocate (or rebase) the binary. Offset and size of the bytecode are given in :attr:`lief.MachO.DyldInfo.rebase` attribute. Basically, bytecode is compound of :class:`~lief.MachO.REBASE_OPCODES` that define addresses to relocate. .. warning:: One can notice that :class:`~lief.MachO.Section` object has a :attr:`~lief.MachO.Section.relocation_offset` attribute. Actually, it seems to be only used for Mach-O object files (:attr:`lief.MachO.FILE_TYPES.OBJECT`) or with an executable that uses an old version of the Mach-O loader. This offset points to a list of relocation structures (not bytecode) for which the number is defined by :attr:`~lief.MachO.Section.numberof_relocations`. To know which addresses need to be relocated, we have to interpret the bytecode. The :attr:`lief.MachO.DyldInfo.show_rebases_opcodes` attribute returns the bytecode as *pseudo code*: .. code-block:: python import lief app = lief.parse("MachO64_x86-64_binary_id.bin") print(app.dyld_info.show_rebases_opcodes) .. code-block:: text [SET_TYPE_IMM] Type: POINTER [SET_SEGMENT_AND_OFFSET_ULEB] Segment Index := 2 (__DATA) Segment Offset := 0x20 [DO_REBASE_ULEB_TIMES] for i in range(26): rebase(POINTER, __DATA, 0x20) Segment Offset += 0x8 (0x28) rebase(POINTER, __DATA, 0x28) Segment Offset += 0x8 (0x30) rebase(POINTER, __DATA, 0x30) Segment Offset += 0x8 (0x38) rebase(POINTER, __DATA, 0x38) Segment Offset += 0x8 (0x40) rebase(POINTER, __DATA, 0x40) Segment Offset += 0x8 (0x48) ... [DONE] From the above output, we can see that the loader will rebase **pointer** in the ``__DATA`` segment at offset ``0x20, 0x28, 0x38, ...``. For those who only care about which exact addresses are relocated, this output is not very user-friendly. LIEF also provides a **representation** of this bytecode by creating :class:`lief.MachO.Relocation` object. They are the result of the **interpretation** of the bytecode. The :attr:`lief.MachO.Binary.relocations` attribute returns an iterator over :class:`lief.MachO.Relocation` objects that **model** a relocation in a similar object as :class:`lief.ELF.Relocation` and :class:`lief.PE.Relocation`. .. code-block:: python for relocation in app.relocations: print(relocation) .. code-block:: text 100002020 POINTER 64 DYLDINFO __DATA.__la_symbol_ptr _err 100002028 POINTER 64 DYLDINFO __DATA.__la_symbol_ptr _errx 100002030 POINTER 64 DYLDINFO __DATA.__la_symbol_ptr _exit 100002038 POINTER 64 DYLDINFO __DATA.__la_symbol_ptr _fprintf 100002040 POINTER 64 DYLDINFO __DATA.__la_symbol_ptr _free 100002048 POINTER 64 DYLDINFO __DATA.__la_symbol_ptr _fwrite ... Using this representation, we can update relocations by adding the shift size to the :attr:`lief.MachO.Relocation.address` attribute. When the Mach-O builder reconstructs the final binary, it **regenerates** and optimize the rebase bytecode according to the current state of the relocations. The process can be summed up with the following diagram: .. figure:: ../_static/tutorial/11/lief_bytecode.png :align: center Binding bytecode **************** The Mach-O loader also uses a bytecode to bind imported functions or imported symbols. Actually, this bytecode is used in three different binding methods: * Normal binding * Weak binding -- Used when the same symbol is defined multiple times * Lazy binding -- Bound only when there is an access to the symbol The bytecode can be pretty printed with the :attr:`~lief.MachO.DyldInfo.show_bind_opcodes`, :attr:`~lief.MachO.DyldInfo.show_weak_bind_opcodes` and :attr:`~lief.MachO.DyldInfo.show_lazy_bind_opcodes`: .. code-block:: python print(app.dyld_info.show_bind_opcodes) .. code-block:: text [SET_DYLIB_ORDINAL_IMM] Library Ordinal := 1 [SET_SYMBOL_TRAILING_FLAGS_IMM] Symbol name := ___stderrp Is Weak ? false [SET_TYPE_IMM] Type := POINTER [SET_SEGMENT_AND_OFFSET_ULEB] Segment := __DATA Segment Offset := 0x10 [DO_BIND] bind(POINTER, __DATA, 0x10, ___stderrp, library_ordinal=/usr/lib/libSystem.B.dylib, addend=0, is_weak_import=false) Segment Offset += 0x8 (0x18) The representation and the update process is the same as the one described in the section about *Rebase bytecode* Export Trie *********** Regarding exported functions and exported symbols, Mach-O format uses a *trie* structure to store export information. Trie offset and size are given in the :attr:`~lief.MachO.DyldInfo.export_trie` attribute. Once parsed, trie entries are represented through the :class:`~lief.MachO.ExportInfo` object and can be retrieved with the :attr:`~lief.MachO.Symbol.export_info` attribute. .. code-block:: python app = lief.parse("FAT_MachO_x86_x86-64_library_libdyld.dylib") print(app.dyld_info.show_export_trie) .. code-block:: text ... _@off.0x17 _N@off.0x21 _NS@off.0x50 _NSI@off.0x5d _NSInstallLinkEditErrorHandlers@off.0x11d _NSInstallLinkEditErrorHandlers{addr: 0x126b, flags: 0} ... .. code-block:: python for s in app.symbols: if s.has_export_info: print(s.export_info) .. code-block:: text Node Offset: 128 Flags: 0 Address: 126b Symbol: _NSInstallLinkEditErrorHandlers Node Offset: 5f6 Flags: 0 Address: 2168 Symbol: _NSIsSymbolDefinedInObjectFileImage Node Offset: 1a0 Flags: 0 Address: 1391 Symbol: _NSIsSymbolNameDefined ... After the shift operation, export information are patched by updating the :attr:`~lief.MachO.ExportInfo.address` attribute, then a new export trie is generated from the previous updates. Removing signature ~~~~~~~~~~~~~~~~~~ Removing the ``LC_CODE_SIGNATURE`` command is a basic modification that is pretty useful when modifying Mach-O file. Since the signature checks the integrity of the binary, we usually need to remove this command after modification on the file. We can still re-sign the binary once all modifications finished. LIEF provides the :meth:`lief.MachO.Binary.remove_signature` function to remove this command: .. code-block:: python ssh = lief.parse("/usr/bin/ssh") ssh.remove_signature() ssh.write("ssh.nosigned") Code Injection with shared libraries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As explained in the talk about format modification [1]_, one way to inject code within the memory space of a program is to force the loader to load a library (that was not previously linked) that contains a constructor function. For a Mach-O binary, it can be achieved by adding one of these load commands: * :attr:`~lief.MachO.LOAD_COMMAND_TYPES.ID_DYLIB` * :attr:`~lief.MachO.LOAD_COMMAND_TYPES.LOAD_DYLIB` * ... Let's take an example with ``clang``. First, we need to create a tiny library which defines a constructor: .. code-block:: cpp #include #include __attribute__((constructor)) void my_constructor(void) { printf("Hello World\n"); } Which is complied with .. code-block:: console $ clang -fPIC -shared libexample.c -o libexample.dylib Then we add a new :attr:`~lief.MachO.LOAD_COMMAND_TYPES.LOAD_DYLIB` using the :meth:`lief.MachO.Binary.add_library` function: .. code-block:: python import lief clang = lief.parse("/usr/bin/clang") clang.add_library("/Users/romain/libexample.dylib") clang.write("/tmp/clang.new") Finally, we run ``clang.new`` and see that ``Hello World`` is printed before the main execution of clang: .. code-block:: console $ chmod u+x /tmp/clang.new $ /tmp/clang.new Hello World clang: error: no input files We can also observe the new :attr:`~lief.MachO.LOAD_COMMAND_TYPES.LOAD_DYLIB` command with otool: .. code-block:: console $ otool -l /tmp/clang.new|grep -C4 LOAD_DYLIB ... cmdsize 16 dataoff 73864 datasize 0 Load command 16 cmd LC_LOAD_DYLIB cmdsize 56 name /Users/romain/libexample.dylib (offset 24) time stamp 2 Thu Jan 1 01:00:02 1970 current version 0.0.0 Adding Section/Segment ~~~~~~~~~~~~~~~~~~~~~~ As we can allocate arbitrary space between the load command table and the raw data, we can also extend an existing :class:`~lief.MachO.LoadCommand`. Especially, Mach-O segments are commands that are associated with the LIEF object :class:`lief.MachO.SegmentCommand`. To add a new section in the ``__TEXT`` segment, we must extend the load command associated with this segment so that we can add a new section structure. We must also reserve space for the content of the section. As the content of the ``__TEXT`` segment begin at offset 0 and finish somewhere in the raw data, the right place to insert the new content is between the end of the load command table and the beginning of the raw data: .. figure:: ../_static/tutorial/11/extendtxt.png :align: center The process described above is implemented through the :meth:`lief.MachO.Binary.add_section` method. Here is an example in which we will inject assembly code that executes ``/bin/sh``: .. code-block:: python app = lief.parse("MachO64_x86-64_binary_id.bin") raw_shell = [...] # Assembly code section = lief.MachO.Section("__shell", raw_shell) section.alignment = 2 section += lief.MachO.SECTION_FLAGS.SOME_INSTRUCTIONS section += lief.MachO.SECTION_FLAGS.PURE_INSTRUCTIONS section = app.add_section(section) print(section) Then we can change the entry point by setting the :attr:`lief.MachO.MainCommand.entrypoint` attribute: .. code-block:: python __TEXT = app.get_segment("__TEXT") app.main_command.entrypoint = section.virtual_address - __TEXT.virtual_address Finally, we remove the signature and reconstruct the binary: .. code-block:: python app.remove_signature() app.write("./id.modified") The execution of ``id.modified`` should give a similar output: .. code-block:: console Mac-mini:tmp romain$ ./id.modified tmp @ [romain] $ You can also check other tools such as optool [2]_ or insert_dylib [3]_ .. rubric:: References .. [1] https://www.romainthomas.fr/publication/static-instrumentation/ .. [2] https://github.com/alexzielenski/optool .. [3] https://github.com/Tyilo/insert_dylib .. rubric:: API * :meth:`lief.MachO.Binary.add_section` * :meth:`lief.MachO.Binary.add_library`