10 - Android formats

This tutorial introduces Android formats as well as the API to use them. We are talking about DEX, OAT, VDEX and ART.

Files used in this tutorial are available on the tutorials repository

By Romain Thomas - @rh0main


Introduction

Let’s start with a quick reminder about compilation, installation and the execution of Android applications.

When dealing with application development, the main part of the code is usually written in Java. Developers can also write native code (C/C++) through the Java Native Interface (JNI) interface.

In the APK building process, the Java code is eventually transformed in the Dalvik bytecode which is interpreted by the Android Java virtual machine. The Android JVM is different from the implementation by Oracle and, among the differences, it is based on registers whereas the one from Oracle is based on a stack.

To produce the Dalvik bytecode, Java sources are first compiled with javac into the Java bytecode and then Android transforms this bytecode into the Dalvik one by using the dx compiler (or the new one: D8). This bytecode is finally wrapped in a DEX file(s) such as classes.dex. The DEX format is specific to Android and the documentation is available here.

During the installation of the APK, the system applies optimizations on this DEX file in order to speed-up the execution. Indeed interpreting bytecode is not as efficient as executing native code and the Dalvik virtual machine is based on registers that are 32-bits width size whereas most of the recent CPU are 64-bits width.

To address this issue and prior to Android 4.4 (KitKat), the runtime used JIT compilation to transform Dalvik bytecode into assembly. The JIT ocurred during the execution and it was done each time the application was executed. Since Android 4.4 they moved to a new runtime which, among other features, performs the optimizations during the installation. Consequently the installation takes more time but transformations to native code are done once.

To optimize the Dalvik bytecode, the original DEX file (e.g. classes.dex) is transformed into another file that will contain the native code. This new file usually has the .odex, .oat extension and is wrapped by the ELF format. Using ELF format makes sense for mainly two reasons:

  • It’s the default format used by Linux and Android to package assembly code.

  • It enables to use the same loader: /system/bin/linker{64}

OAT files are in fact ELF and this is why, we choose to add this new format in LIEF. This ELF format is actually used as a wrapper over another format which is specific to Android: the OAT format.

Basically the ELF associated exports few symbols:

import lief

oat = lief.parse("SomeOAT")
for s in oat.dynamic_symbols:
  print(s)
oatdata                       OBJECT    GLOBAL    1000      1262000
oatexec                       OBJECT    GLOBAL    1263000   10d4060
oatlastword                   OBJECT    GLOBAL    233705c   4
oatbss                        OBJECT    GLOBAL    2338000   f5050
oatbsslastword                OBJECT    GLOBAL    242d04c   4

These symbols are a kind of pointers to specific part of the OAT format. For example, oatdata will point to the begining of the underlying OAT format whereas oatexec points to the native code. For those who are interested in a deeper understand of OAT internal structures, See:

These different formats can be a bit confusing and to summarize:

DEX are transformed into .odex files that are primarily ELF files wrapping a custom OAT format.

../_images/elf_oat.png

The structure of the OAT is poorly documented and its internal structures change for each version of Android without backward compatibility. It means that OAT files produced on Android 6.0.1 can only be used on this version.


In the Android framework the dex2oat executable is responsible to convert and optimize the APK DEX files into OATs. This executable is located in the /system/bin/ directory and we can have its output through logcat:

$ adb logcat -s "dex2oat:I"
...
05-04 10:16:37.218  1987  1987 I dex2oat : /system/bin/dex2oat --compiler-filter=speed --dex-file=/data/user/0/com.google.android.gms/snet/installed/snet.jar --oat-file=/data/user/0/com.google.android.gms/snet/dalvik-cache/snet.dex
05-04 10:16:37.688  1987  1998 W dex2oat : Compilation of void com.google.android.snet.Snet.enterSnetIdle(android.content.Context, android.os.Bundle) took 116.995ms
05-04 10:16:37.768  1987  1987 I dex2oat : ----------------------------------------------------
05-04 10:16:37.768  1987  1987 I dex2oat : <SS>: S T A R T I N G . . .
05-04 10:16:37.768  1987  1987 E dex2oat : <SS>: oat location is not valid /data/user/0/com.google.android.gms/snet/dalvik-cache/snet.dex
05-04 10:16:37.768  1987  1987 I dex2oat : dex2oat took 552.045ms (threads: 8) arena alloc=3MB java alloc=1150KB native alloc=8MB free=3MB
05-04 12:25:50.878 10460 10460 I dex2oat : /system/bin/dex2oat --compiler-filter=speed
...

The output above is the transformation of SaftyNet DEX files, located in /data/user/0/com.google.android.gms/snet/installed/snet.jar, into an OAT saved in /data/user/0/com.google.android.gms/snet/dalvik-cache/snet.dex.

One can notice that the extension is .dex so it should be a DEX file and not an OAT. Actually if we check the type:

$ file snet.dex
snet.dex: ELF 64-bit LSB shared object, ARM aarch64, version 1 (GNU/Linux), dynamically linked, stripped

We can see an ELF.

Warning

Do not trust extensions: .dex can be DEX or OAT, .odex are OAT, .oat are OAT, …

The process of converting Java sources into the OAT can be simplified with the following diagram:

../_images/java2oat.png

The Missing DEX

If we analyze applications from the Google PlayStore, we usually have the classes.dex file(s) in APK. As this file contains the Dalvik bytecode, most of the tools rely on this file to perform the analysis (decompilation, static analysis, …)

However when analyzing constructor firmwares (or ROM) these DEX files could miss. For example, if we are interested in com.android.settings from Samsung, the application is associated with the /system/priv-app/SecSettings2 directory which has the following structure:

$ tree system/priv-app/SecSettings2

├── oat
│   └── arm64
│       └── SecSettings2.odex
└── SecSettings2.apk

2 directories, 2 files

By looking at the files in SecSettings2.apk, we can’t find .dex files:

$ unzip -l ./SecSettings2.apk|grep -c "classes.dex"
0

Next to the SecSettings2.apk, we find SecSettings2.odex which is the OAT file resulting of the optimization of the missing DEX file. As ROM developers control the Android version and the target architecture, they just have to provide the OAT file.

They can also use this “feature” to avoid analysis and reverse engineering of the application. As the Dalvik bytecode is located in the DEX file, without this file the analysis is quite limited.

Thankfully there is a copy of the original DEX within the OAT! Actually it’s not an exact copy as dex2oat replaces some Dalvik instructions (like invoke-virtual) with optimized ones [1] but starting from Android N, we can also recover the original instructions.

Prior Android Oreo (8.0.0) DEX files were embedded in the OAT itself and after Oreo, the transformation performed by dex2oat generates two files:

  • classes.odex: OAT containing native code

  • classes.vdex: VDEX file containing copy of original DEX files

The DEX files originally located in the OAT has been exported in a new file with a new format: the VDEX format. This new format is completely different from OAT, especially it’s not an ELF.

In the same way as OAT format, VDEX internal structures change for each version of Android without backward compatibility.

It also exists tools [4] [5] [6] to extract DEX from OAT/VDEX files but the extraction [3] is either limited to OAT [4] or to VDEX [5]. With LIEF we aim to provide a single framework to deal with these formats.

OAT and VDEX

As explained in the previous part, internal structures of the formats change for each version of Android. LIEF provides an abstraction of these modifications and the user can deal with OAT or VDEX without carrying of the underlying version of the OAT.

It currently supports OAT files from Android 6.0 Marshmallow (OAT v64) to Android 8.0.1 Oreo (OAT v131).

The OAT version is available with the lief.OAT.version() function:

>>> import lief
>>> lief.OAT.version("classes.odex") # From Android 6
64
>>> lief.OAT.version("classes.odex") # From Android 7
88

One can also access to the associated Android version by using lief.OAT.android_version():

>>> lief.OAT.android_version(64)
ANDROID_VERSIONS.VERSION_601
>>> lief.OAT.android_version(124)
ANDROID_VERSIONS.VERSION_800
>>> lief.Android.code_name(lief.Android.ANDROID_VERSIONS.VERSION_800)
'Oreo'
>>> lief.Android.version_string(lief.Android.ANDROID_VERSIONS.VERSION_800)
"8.0.0"

To express the fact that OAT files are first ELF, the lief.OAT.Binary class extends the lief.ELF.Binary

>>> import lief
>>> oat = lief.parse("classes.odex")
>>> type(oat)
_pylief.OAT.Binary
>>> isinstance(oat, lief.ELF.Binary)
True

Thus the same ELF API is available: adding sections, modifying dynamic entries, etc and the lief.OAT.Binary object adds the following methods:

class lief.OAT.Binary

OAT binary representation

class FORMATS
ELF = lief._lief.FORMATS.ELF
MACHO = lief._lief.FORMATS.MACHO
OAT = lief._lief.FORMATS.OAT
PE = lief._lief.FORMATS.PE
UNKNOWN = lief._lief.FORMATS.UNKNOWN
class PHDR_RELOC
AUTO = lief._lief.ELF.PHDR_RELOC.AUTO
BSS_END = lief._lief.ELF.PHDR_RELOC.BSS_END
FILE_END = lief._lief.ELF.PHDR_RELOC.FILE_END
PIE_SHIFT = lief._lief.ELF.PHDR_RELOC.PIE_SHIFT
SEGMENT_GAP = lief._lief.ELF.PHDR_RELOC.SEGMENT_GAP
class VA_TYPES
AUTO = lief._lief.VA_TYPES.AUTO
RVA = lief._lief.VA_TYPES.RVA
VA = lief._lief.VA_TYPES.VA
property abstract lief.Binary

Return the abstract representation of the current binary (lief.Binary)

add(*args) lief.ELF.Section | lief.ELF.Note | lief.ELF.DynamicEntry | lief.ELF.Segment

Overloaded function.

  1. add(self, arg: lief._lief.ELF.DynamicEntry, /) -> lief._lief.ELF.DynamicEntry

dynamic_entry

  1. add(self, section: lief._lief.ELF.Section, loaded: bool = True) -> lief._lief.ELF.Section

    Add the given Section to the binary.

    If the section does not aim at being loaded in memory, the loaded parameter has to be set to False (default: True)

  2. add(self, segment: lief._lief.ELF.Segment, base: int = 0) -> lief._lief.ELF.Segment

Add a new Segment in the binary

  1. add(self, note: lief._lief.ELF.Note) -> lief._lief.ELF.Note

Add a new Note in the binary

add_dynamic_relocation(self, relocation: lief.ELF.Relocation) lief.ELF.Relocation

Add a new dynamic relocation.

We consider a dynamic relocation as a relocation which is not plt-related.

See: lief.ELF.Binary.add_pltgot_relocation()

add_dynamic_symbol(self, symbol: lief.ELF.Symbol, symbol_version: lief.ELF.SymbolVersion | None) lief.ELF.Symbol

Add a dynamic Symbol to the binary

The function also takes an optional lief.ELF.SymbolVersion

add_exported_function(self, address: int, name: str) lief.ELF.Symbol

Create a symbol for the function at the given address and create an export

add_library(self, library_name: str) lief.ELF.DynamicEntryLibrary

Add a library with the given name as dependency

add_object_relocation(self, relocation: lief.ELF.Relocation, section: lief.ELF.Section) lief.ELF.Relocation

Add relocation for object file (.o)

The first parameter is the section to add while the second parameter is the Section associated with the relocation.

If there is an error, this function returns a nullptr. Otherwise, it returns the relocation added.”,

add_pltgot_relocation(self, relocation: lief.ELF.Relocation) lief.ELF.Relocation

Add a .plt.got relocation. This kind of relocation is usually associated with a PLT stub that aims at resolving the underlying symbol.

See: lief.ELF.Binary.add_dynamic_relocation()

add_static_symbol(self, symbol: lief.ELF.Symbol) lief.ELF.Symbol

Add a static Symbol to the binary

property classes lief.OAT.Binary.it_classes

Return an iterator over Class

property concrete lief.ELF.Binary | lief.PE.Binary | lief.MachO.Binary

The concrete representation of the binary. Basically, this property cast a lief.Binary into a lief.PE.Binary, lief.ELF.Binary or lief.MachO.Binary.

See also: lief.Binary.abstract

property ctor_functions list[lief.Function]

Constructor functions that are called prior to any other functions

property dex2dex_json_info str
property dex_files lief.OAT.Binary.it_dex_files

Return an iterator over File

property dtor_functions list[lief.Function]

List of the binary destructors (typically, the functions located in the .fini_array)

property dynamic_entries lief.ELF.Binary.it_dynamic_entries

Return an iterator to DynamicEntry entries as a list

property dynamic_relocations lief.ELF.Binary.it_filter_relocation

Return an iterator over dynamics Relocation

property dynamic_symbols lief.ELF.Binary.it_symbols

Return an iterator to dynamic Symbol

property entrypoint int

Binary’s entrypoint

property eof_offset int

Return the last offset used by the ELF binary according to both, the sections table and the segments table.

export_symbol(*args) lief.ELF.Symbol

Overloaded function.

  1. export_symbol(self, symbol: lief._lief.ELF.Symbol) -> lief._lief.ELF.Symbol

Export the given symbol and create an entry if it doesn’t exist

  1. export_symbol(self, symbol_name: str, value: int = 0) -> lief._lief.ELF.Symbol

Export the symbol with the given name and create an entry if it doesn’t exist

property exported_functions list[lief.Function]

Return the binary’s exported Function

property exported_symbols lief.ELF.Binary.it_filter_symbols

Return dynamic Symbol which are exported

extend(*args) lief.ELF.Section | lief.ELF.Segment

Overloaded function.

  1. extend(self, segment: lief._lief.ELF.Segment, size: int) -> lief._lief.ELF.Segment

Extend the given given Segment by the given size

  1. extend(self, segment: lief._lief.ELF.Section, size: int) -> lief._lief.ELF.Section

Extend the given given Section by the given size

property format lief.Binary.FORMATS

File format (FORMATS) of the underlying binary.

property functions list[lief.Function]

List of the functions found the in the binary

get(*args) lief.ELF.Section | lief.ELF.Note | lief.ELF.DynamicEntry | lief.ELF.Segment

Overloaded function.

  1. get(self, tag: lief._lief.ELF.DYNAMIC_TAGS) -> lief._lief.ELF.DynamicEntry

    Return the first binary’s DynamicEntry from the given DYNAMIC_TAGS.

    It returns None if the dynamic entry can’t be found.

  2. get(self, type: lief._lief.ELF.SEGMENT_TYPES) -> lief._lief.ELF.Segment

    Return the first binary’s Segment from the given SEGMENT_TYPES

    It returns None if the segment can’t be found.

  3. get(self, type: lief._lief.ELF.Note.TYPE) -> lief._lief.ELF.Note

    Return the first binary’s Note from the given TYPE.

    It returns None if the note can’t be found.

  4. get(self, type: lief._lief.ELF.SECTION_TYPES) -> lief._lief.ELF.Section

    Return the first binary’s Section from the given ELF_SECTION_TYPES

    It returns None if the section can’t be found.

get_class(*args) lief.OAT.Class

Overloaded function.

  1. get_class(self, class_name: str) -> lief._lief.OAT.Class

Return the Class from its name

  1. get_class(self, class_index: int) -> lief._lief.OAT.Class

Return the Class from its index

get_content_from_virtual_address(self, virtual_address: int, size: int, va_type: lief.Binary.VA_TYPES) memoryview

Return the content located at the provided virtual address. The virtual address is specified in the first argument and size to read (in bytes) in the second.

If the underlying binary is a PE, one can specify if the virtual address is a RVA or a VA. By default, it is set to AUTO.

get_dynamic_symbol(self, symbol_name: str) lief.ELF.Symbol

Get the dynamic symbol from the given name.

It returns None if it can’t be found.

get_function_address(self, function_name: str) int | lief.lief_errors

Return the address of the given function name

get_library(self, library_name: str) lief.ELF.DynamicEntryLibrary

Return the DynamicEntryLibrary with the given name

It returns None if the library can’t be found.

get_relocation(*args) lief.ELF.Relocation

Overloaded function.

  1. get_relocation(self, symbol_name: str) -> lief._lief.ELF.Relocation

Return the Relocation associated with the given symbol name

  1. get_relocation(self, symbol: lief._lief.ELF.Symbol) -> lief._lief.ELF.Relocation

Return the Relocation associated with the given Symbol

  1. get_relocation(self, address: int) -> lief._lief.ELF.Relocation

Return the Relocation associated with the given address

get_section(self, section_name: str) lief.ELF.Section

Return the Section with the given name

It returns None if the section can’t be found.

get_static_symbol(self, symbol_name: str) lief.ELF.Symbol

Get the static symbol from the given name.

It returns None if it can’t be found.

get_strings(self, min_size: int) list[str]

Return list of strings used in the current ELF file with a minimal size given in first parameter (Default: 5) It looks for strings in the .roadata section

get_symbol(self, symbol_name: str) lief.Symbol

Return the Symbol from the given name.

If the symbol can’t be found, it returns None.

property gnu_hash lief.ELF.GnuHash

Return the GnuHash object

Hash are used by the loader to speed up symbols resolution (GNU Version)

has(*args) bool

Overloaded function.

  1. has(self, tag: lief._lief.ELF.DYNAMIC_TAGS) -> bool

    Check if it exists a DynamicEntry with the given DYNAMIC_TAGS

  2. has(self, type: lief._lief.ELF.SEGMENT_TYPES) -> bool

Check if a Segment of type (SEGMENT_TYPES) exists

  1. has(self, type: lief._lief.ELF.Note.TYPE) -> bool

Check if a Note of type (TYPE) exists

  1. has(self, type: lief._lief.ELF.SECTION_TYPES) -> bool

Check if a Section of type (SECTION_TYPES) exists

property has_class bool

Check if the class if the given name is present in the current OAT binary

has_dynamic_symbol(self, symbol_name: str) bool

Check if the symbol with the given name exists in the dynamic symbol table

property has_interpreter bool

Check if the binary uses a loader (also named linker or interpreter)

has_library(self, library_name: str) bool

Check if the given library name exists in the current binary

property has_notes bool

True if the binary contains notes

property has_nx bool

Check if the binary has NX protection (non executable stack)

property has_overlay bool

True if data are appended to the end of the binary

has_section(self, section_name: str) bool

Check if a Section with the given name exists in the binary

has_section_with_offset(self, offset: int) bool

Check if a Section that encompasses the given offset exists

has_section_with_va(self, virtual_address: int) bool

Check if a Section that encompasses the given virtual address exists

has_static_symbol(self, symbol_name: str) bool

Check if the symbol with the given name exists in the static symbol table

has_symbol(self, symbol_name: str) bool

Check if a Symbol with the given name exists

property header lief.OAT.Header

Return the OAT Header

property imagebase int

Return the program image base. (e.g. 0x400000)

property imported_functions list[lief.Function]

Return the binary’s imported Function (name)

property imported_symbols lief.ELF.Binary.it_filter_symbols

Return dynamic Symbol which are imported

property interpreter str

ELF interpreter (loader) if any. (e.g. /lib64/ld-linux-x86-64.so.2)

property is_pie bool

Check if the binary has been compiled with -fpie -pie flags

To do so we check if there is a PT_INTERP segment and if the binary type is ET_DYN (Shared object)

class it_classes

Iterator over lief._lief.OAT.Class

class it_dex_files

Iterator over lief._lief.DEX.File

class it_dyn_static_symbols

Iterator over lief._lief.ELF.Symbol

class it_dynamic_entries

Iterator over lief._lief.ELF.DynamicEntry

class it_filter_relocation

Iterator over lief._lief.ELF.Relocation

class it_filter_symbols

Iterator over lief._lief.ELF.Symbol

class it_methods

Iterator over lief._lief.OAT.Method

class it_notes

Iterator over lief._lief.ELF.Note

class it_oat_dex_files

Iterator over lief._lief.OAT.DexFile

class it_relocations

Iterator over lief._lief.ELF.Relocation

class it_sections

Iterator over lief._lief.ELF.Section

class it_segments

Iterator over lief._lief.ELF.Segment

class it_symbols

Iterator over lief._lief.ELF.Symbol

class it_symbols_version

Iterator over lief._lief.ELF.SymbolVersion

class it_symbols_version_definition

Iterator over lief._lief.ELF.SymbolVersionDefinition

class it_symbols_version_requirement

Iterator over lief._lief.ELF.SymbolVersionRequirement

property last_offset_section int

Return the last offset used in binary according to sections table

property last_offset_segment int

Return the last offset used in binary according to segments table

property libraries list[str | bytes]

Return binary’s imported libraries (name)

property methods lief.OAT.Binary.it_methods

Return an iterator over Method

property next_virtual_address int

Return the next virtual address available

property notes lief.ELF.Binary.it_notes

Return an iterator over the Note entries

property oat_dex_files lief.OAT.Binary.it_oat_dex_files

Return an iterator over DexFile

property object_relocations lief.ELF.Binary.it_filter_relocation

Return an iterator over object Relocation

offset_to_virtual_address(self, offset: int, slide: int) int | lief.lief_errors

Convert an offset into a virtual address.

property overlay memoryview

Overlay data that are not a part of the ELF format

patch_address(*args) None

Overloaded function.

  1. patch_address(self, address: int, patch_value: list[int], va_type: lief._lief.Binary.VA_TYPES = lief._lief.VA_TYPES.AUTO) -> None

    Patch the address with the given list of bytes. The virtual address is specified in the first argument and the content in the second (as a list of bytes).

    If the underlying binary is a PE, one can specify if the virtual address is a RVA or a VA. By default, it is set to AUTO.

  2. patch_address(self, address: int, patch_value: int, size: int = 8, va_type: lief._lief.Binary.VA_TYPES = lief._lief.VA_TYPES.AUTO) -> None

    Patch the address with the given integer value. The virtual address is specified in the first argument, the integer in the second and the integer’s size of in third one.

    If the underlying binary is a PE, one can specify if the virtual address is a RVA or a VA. By default, it is set to AUTO.

patch_pltgot(*args) None

Overloaded function.

  1. patch_pltgot(self, symbol_name: str, address: int) -> None

Patch the imported symbol’s name with the address

  1. patch_pltgot(self, symbol: lief._lief.ELF.Symbol, address: int) -> None

Patch the imported Symbol with the address

permute_dynamic_symbols(self, permutation: list[int]) None

Apply the given permutation on the dynamic symbols table

property pltgot_relocations lief.ELF.Binary.it_filter_relocation

Return an iterator over PLT/GOT Relocation

relocate_phdr_table(self, type: lief.ELF.Binary.PHDR_RELOC) int

Force relocating the segments table in a specific way (see: PHDR_RELOC).

This function can be used to enforce a specific relocation of the segments table. Upon successful relocation, the function returns the offset of the relocated segments table. Otherwise, if the function fails, it returns 0

property relocations lief.ELF.Binary.it_relocations

Return an iterator over all Relocation

remove(*args) None

Overloaded function.

  1. remove(self, dynamic_entry: lief._lief.ELF.DynamicEntry) -> None

Remove the given DynamicEntry from the dynamic table

  1. remove(self, tag: lief._lief.ELF.DYNAMIC_TAGS) -> None

Remove all the DynamicEntry with the given DYNAMIC_TAGS

  1. remove(self, section: lief._lief.ELF.Section, clear: bool = False) -> None

Remove the given Section. The clear parameter specifies whether or not we must fill its content with 0 before removing

  1. remove(self, note: lief._lief.ELF.Note) -> None

Remove the given Note

  1. remove(self, type: lief._lief.ELF.Note.TYPE) -> None

Remove all the Note with the given TYPE

remove_dynamic_symbol(*args) None

Overloaded function.

  1. remove_dynamic_symbol(self, arg: lief._lief.ELF.Symbol, /) -> None

Remove the given Symbol from the .dynsym section

  1. remove_dynamic_symbol(self, arg: str, /) -> None

Remove the Symbol with the name given in parameter from the .dynsym section

remove_library(self, library_name: str) None

Remove the given library

remove_section(self, name: str, clear: bool) None

Remove the section with the given name

remove_static_symbol(self, arg: lief.ELF.Symbol) None

Remove the given Symbol from the .symtab section

replace(self, new_segment: lief.ELF.Segment, original_segment: lief.ELF.Segment, base: int) lief.ELF.Segment

Replace the Segment given in 2nd parameter with the Segment given in the first parameter and return the updated segment.

Warning

The original_segment is no longer valid after this function

section_from_offset(self, offset: int, skip_nobits: bool) lief.ELF.Section

Return the Section which encompasses the given offset. It returns None if a section can’t be found.

If skip_nobits is set (which is the case by default), this function won’t consider sections for which the type is SHT_NOBITS (like .bss, .tbss, ...)

section_from_virtual_address(self, address: int, skip_nobits: bool) lief.ELF.Section

Return the Section which encompasses the given virtual address. It returns None if a section can’t be found.

If skip_nobits is set (which is the case by default), this function won’t consider sections for which the type is SHT_NOBITS (like .bss, .tbss, ...)

property sections lief.ELF.Binary.it_sections

Return an iterator over binary’s Section

segment_from_offset(self, offset: int) lief.ELF.Segment

Return the Segment which encompasses the given offset. It returns None if a segment can’t be found.

segment_from_virtual_address(self, address: int) lief.ELF.Segment

Return the Segment which encompasses the given virtual address. It returns None if a segment can’t be found.

property segments lief.ELF.Binary.it_segments

Return an iterator to binary’s Segment

property static_symbols lief.ELF.Binary.it_symbols

Return an iterator to static Symbol

property strings list[str | bytes]

Return list of strings used in the current ELF file. Basically this function looks for strings in the .roadata section

strip(self) None

Strip the binary

property symbols lief.ELF.Binary.it_dyn_static_symbols

Return an iterator over both static and dynamic Symbol

property symbols_version lief.ELF.Binary.it_symbols_version

Return an iterator SymbolVersion

property symbols_version_definition lief.ELF.Binary.it_symbols_version_definition

Return an iterator to SymbolVersionDefinition

property symbols_version_requirement lief.ELF.Binary.it_symbols_version_requirement

Return an iterator to SymbolVersionRequirement

property sysv_hash lief.ELF.SysvHash

Return the SysvHash object

Hash are used by the loader to speed up symbols resolution (SYSV version)

property type lief.ELF.ELF_CLASS

Return the binary’s ELF_CLASS

property use_gnu_hash bool

True if GNU hash is used

property use_sysv_hash bool

True if SYSV hash is used

virtual_address_to_offset(self, virtual_address: int) int | lief.lief_errors

Convert the virtual address to a file offset

property virtual_size int

Return the size of the mapped binary

write(*args) None

Overloaded function.

  1. write(self, output: str) -> None

Rebuild the binary and write it in a file

  1. write(self, output: str, config: lief._lief.ELF.Builder.config_t) -> None

Rebuild the binary with the given configuration and write it in a file

xref(self, virtual_address: int) list[int]

Return all virtual addresses that use the address given in parameter

If the given OAT targets Android Marshmallow or Nougat (6 or 7) then DEX files can be retrieved with the lief.OAT.Binary.dex_files attribute:

>>> len(oat.dex_files) # > 1 if multi-dex
1
>>> dex = oat.dex_files[0]
>>> dex.save("/tmp/classes.dex")

From the code above, the lief.DEX.File has been extracted to /tmp/classes.dex (with de-optimization).

If the given OAT targets Android Oreo or above, then extraction is done by using the VDEX file. The lief.OAT.parse() function accepts an OAT file or an OAT and a VDEX file. By providing the VDEX file in the second parameter, the lief.OAT.Binary object will have the same functionalities as the one for OAT pre-Oreo.

If the VDEX file is not provided then lief.OAT.Binary will have limited information:

# Without VDEX file
>>> oat_oreo = lief.parse("KeyChain.odex")
>>> len(oat_oreo.dex_files)
0
>>> len(oat_oreo.classes)
0
>>> len(oat_oreo.oat_dex_files)
1
>>> oat_dex_file = oat_oreo.oat_dex_files[0]
>>> print(oat_dex_file)
/system/app/KeyChain/KeyChain.apk - (Checksum: 0x206c8ab1)
# With VDEX file
>>> oat_oreo = lief.OAT.parse("KeyChain.odex", "KeyChain.vdex")
>>> len(oat_oreo.dex_files)
1
>>> len(oat_oreo.classes)
17
>>> oat_oreo.dex_files[0].save("/tmp/classes.dex")

We can also use the LIEF’s VDEX module directly:

>>> vdex = lief.VDEX.parse("KeyChain.vdex")

As the VDEX format is completely different from OAT, ELF, PE and Mach-O the VDEX parser creates a lief.VDEX.File object and not a Binary. We can also extract DEX files with the lief.VDEX.File.dex_files attribute:

>>> len(vdex.dex_files)
1
>>> vdex.dex_files[0].save("/tmp/KeyChain.dex") # With de-optimization

DEX

The previous part was about the OAT/VDEX formats and how to access to the underlying DEX. This part introduces the main API for the lief.DEX.File object.

The LIEF DEX module enables to get information about Java code such as String, classes name, dalvik bytecodes, …

Note

As LIEF project is only focused on formats, there won’t be Dalvik disassembler in the DEX module.

The main API for a DEX file is in the lief.DEX.File object. This object can be generated using:

>>> oat = lief.parse("SecSettings2.odex")
>>> type(oat.dex_files[0])
_pylief.DEX.File

>>> vdex = lief.VDEX.parse("SecSettings2.odex")
>>> type(vdex.dex_files[0])
_pylief.DEX.File

>>> dex = lief.DEX.parse("classes.dex")
>>> type(dex)
_pylief.DEX.File

Once created, we can access to the strings with the lief.DEX.File.strings attribute:

>>> len(dex.strings)
23529
>>> for s in dex.strings:
...   if http in s:
...     print(s)

https://analytics.mopub.com/i/jot/exchange_client_event
https://app-measurement.com/a
https://mobilecrashreporting.googleapis.com/v1/crashes:batchCreate?key=
https://pagead2.googlesyndication.com/pagead/gen_204?id=gmob-apps
https://plus.google.com/
https://ssl.google-analytics.com
https://support.google.com/dfp_premium/answer/7160685#push
https://www.google.com
...

Similarly, methods and classes are available with the lief.DEX.File.classes / lief.DEX.File.methods attributes:

for cls in dex.classes:
  if cls.source_filename:
    print(cls)
com.avast.android.sdk.antitheft.internal.protection.wipe.a - CalendarWiper.java - 3 Methods
com.avast.android.account.internal.identity.a - AvastIdentityProvider.java - 17 Methods
com.avast.android.account.internal.identity.d - FacebookIdentityProvider.java - 19 Methods
com.avast.android.lib.wifiscanner.internal.b$a - WifiScannerComponentFactory.java - 1 Methods

In the DEX file format, there is a special attribute for classes that register the original source filename: source_file_idx. Some obfuscators mangle classes but keep this attribute! Since Java source filenames are associated with class names, we can easily recover the deobfuscated name using:

for cls in dex.classes:
  if cls.source_filename:
    print(cls.pretty_name + ": ---> " + cls.source_filename)
com.avast.android.sdk.antitheft.internal.protection.wipe.a: ---> CalendarWiper.java
com.avast.android.account.internal.identity.a               ---> AvastIdentityProvider.java
com.avast.android.account.internal.identity.d               ---> FacebookIdentityProvider.java
com.avast.android.lib.wifiscanner.internal.b$a              ---> WifiScannerComponentFactory.java


If we are interested with DEX methods, they are represented with the :class:`~lief.DEX.Method` object and we can access to the **raw** Dalvik bytecode through :attr:`lief.DEX.Method.bytecode`

ART

ART is the name of the Android Runtime but it’s also a format! This format is used for optimization purpose by the Android’s framework.

As discussed previously, Android has its own implementation of the Java virtual based on the Dalvik bytecode. This JVM is implemented in C++ and Java primitives (java.lang.String, java.lang.Object, etc) are mirrored with C++ objects:

  • java.lang.Class: art::mirror::Class

  • java.lang.String: art::mirror::String

  • java.lang.reflect.Method: art::mirror::Method

When instantiating a new Java class, it creates a mirrored C++ object (Memory allocation, calling constructors, …) and the JVM handles a reference on this C++ object. To speed up the boot process and to avoid instantiation of well-known classes [2] at each boot, Android uses the ART format to store instances of C++ objects. To simplify, it can be seen as a heap dump of C++ objects.

In the same way as OAT and VDEX, the internal structures of this format change for each version of Android.

LIEF 0.9 has a very basic support for this format and only exposes the ART lief.ART.Header. The main API is available in the lief.ART.File object.

art = lief.ART.parse("boot.art")
print(art.header)
Version:                         46
Image Begin:                     0x70000000
Image Size:                      0x238ac8
Checksum:                        0x997c0fb0
OAT File Begin:                  0x70a5b000
OAT File End:                    0x71272000
OAT Data Begin:                  0x70a5c000
OAT Data End:                    0x7126df70
Patch Delta:                     0
Pointer Size:                    8
Compile pic:                     true
Number of sections:              10
Number of methods:               7
Boot Image Begin:                0
Boot Image Size:                 0
Boot OAT Begin:                  0
Boot OAT Size:                   0
Storage Mode:                    UNCOMPRESSED
Data Size:                       0x2389f0

oatlastword

LIEF 0.9 is read-only on these formats but further versions should enable to modify them (Add methods, change names, patch checksum, …)

Enjoy!

Notes

API