The elf file header is always present at the offset 0 of a binary program (remember FILE OFFSETS from the Introduction?) and works as a roadmap to parse ELF binary. Let’s have a look at it.
The files hello_32
and hello_64
are the 32-bit and 64-bit compiled versions of the file hello.c.
Lets have a look at the ELF header of the 32-bit compiled binary. I have cross compiled the source code for a 32 bit platform. Try viewing the file with cat.
ELF��4,4 (44�4� TT�T�����
�
�
����hh�h�DDP�td�����DDQ�tdR�td
�
���/lib/ld-linux.so.2GNUGNU�z�8��u�6�0P#g�G� �K��6
��libc.so.6_IO_stdi,��__libc_start_mainGLIBC_2.0__gmon_start__ii
�S�����k��������t�&�[��5����%
�h������%��f�1�^�����PTR�#����p���P�����PQV����P�������$�f�f�f�f����f�f�f�f�f�f�f��$�f�f�f�f�f�f���=�t$���tU����h��Ѓ��É���'�Í�&��'��-����������t ���tU����Ph��҃��Ít&�Í��=�uU����l������Ív�Í�&��'U��]��U���
...
problably some more crap
We see here, no meaningful data can be scraped out from this output. Therefore, we ought to use a tool to analyse the information provided by the binary, here comes readelf in action.
This tool scraps the ELF binary to extract and display meaningful information from it. The ‘-h’ flag is used to display ELF header of file hello_64
.
critical@d3ad:~/COURSE_DISECTING_BINARIES/ELF/ELF_HEADER$ readelf -h hello_64
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4003b0
Start of program headers: 64 (bytes into file)
Start of section headers: 6256 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 27
Section header string table index: 26
ELF header is actually a structure - Elf64_Ehdr
defined in file - /usr/include/elf.h
.
critical@d3ad:~$ cat /usr/include/elf.h | grep -B16 " Elf64_Ehdr;"
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;
e_ident is an array identifying the binary in a processor-independent fashion, meaning this array is probably the only part which doesn’t use encoding of the target processor. It basically works as an identifier for ELF specifying how to decode and interpret the remaining binary. The output above displays 16 hexadecimal bytes in a row. Here the first 4 bytes (7f 45 4c 46) are called magic numbers. These bytes are used to identify an ELF format. Here first byte is fixed (0x7f). Have a look at the ASCII representation of the hex values (type man ASCII
to open up ASCII table) and see the values of bytes after 0x7f, i.e. 0x45 0x4c 0x46 are ‘E’, ‘L’, ‘F’ respectively.
BYTES | Description |
---|---|
7f | It is a Fixed byte |
45 4c 46 | ELF signature. Look at the ASCII table (with command man ASCII ), the values corresponding to them are‘E’, ‘L’, ‘F’. This indicates that it is an ELF executable. |
02 | It is the class described bellow. (ELF64) |
01 | It is the Data field described bellow (2’s complement, little endian) |
01 | Version field bellow (Version Number 1) |
00 | OS/ABI field bellow (UNIX - System V) |
00 00 00 00 00 00 00 00 | Padded bytes for allignment, currently unused, kept for future use. |
The file /usr/include/elf.h
includes the MACROS which define values for each byte.
critical@d3ad:~$ cat /usr/include/elf.h | grep -n -A2 "EI_MAG"
107:#define EI_MAG0 0 /* File identification byte 0 index */
108-#define ELFMAG0 0x7f /* Magic number byte 0 */
109-
110:#define EI_MAG1 1 /* File identification byte 1 index */
111-#define ELFMAG1 'E' /* Magic number byte 1 */
112-
113:#define EI_MAG2 2 /* File identification byte 2 index */
114-#define ELFMAG2 'L' /* Magic number byte 2 */
115-
116:#define EI_MAG3 3 /* File identification byte 3 index */
117-#define ELFMAG3 'F' /* Magic number byte 3 */
118-
Verify the magic field by hd (hexdump) tool.
critical@d3ad:~/COURSE_DISECTING_BINARIES/ELF/ELF_HEADER$ hd -n 16 hello_64
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010
It is the 5th byte in ELF header with value of 0x01, specifying a 64-bit binary, i.e. the entire binary is made up of 64-bit objects.
Value | Description |
---|---|
1 | 32-bit objects |
2 | 64-bit ojects |
critical@d3ad:~$ cat /usr/include/elf.h | grep -n -A5 "EI_CLASS"
123:#define EI_CLASS 4 /* File class byte index */
124-#define ELFCLASSNONE 0 /* Invalid class */
125-#define ELFCLASS32 1 /* 32-bit objects */
126-#define ELFCLASS64 2 /* 64-bit objects */
127-#define ELFCLASSNUM 3
128-
This field describes the data encoding (little-endian in this case). You can have a look at the /usr/include/elf.h
to know what each value means.
Value | Description |
---|---|
1 | 2’s complement, little endian |
2 | 2’s complement, big endian |
critical@d3ad:~$ cat /usr/include/elf.h | grep -n -A5 "EI_DATA"
129:#define EI_DATA 5 /* Data encoding byte index */
130-#define ELFDATANONE 0 /* Invalid data encoding */
131-#define ELFDATA2LSB 1 /* 2's complement, little endian */
132-#define ELFDATA2MSB 2 /* 2's complement, big endian */
133-#define ELFDATANUM 3
134-
ELF header version number.
critical@d3ad:~$ cat /usr/include/elf.h | grep -n -A2 "EI_VERSION"
135:#define EI_VERSION 6 /* File version byte index */
136- /* Value must be EV_CURRENT */
137-
These are the values for target OS ABI (Application Binary Interface).
critical@d3ad:~$ cat /usr/include/elf.h | grep -n -A19 "EI_OSABI"
138:#define EI_OSABI 7 /* OS ABI identification */
139-#define ELFOSABI_NONE 0 /* UNIX System V ABI */
140-#define ELFOSABI_SYSV 0 /* Alias. */
141-#define ELFOSABI_HPUX 1 /* HP-UX */
142-#define ELFOSABI_NETBSD 2 /* NetBSD. */
143-#define ELFOSABI_GNU 3 /* Object uses GNU ELF extensions. */
144-#define ELFOSABI_LINUX ELFOSABI_GNU /* Compatibility alias. */
145-#define ELFOSABI_SOLARIS 6 /* Sun Solaris. */
146-#define ELFOSABI_AIX 7 /* IBM AIX. */
147-#define ELFOSABI_IRIX 8 /* SGI Irix. */
148-#define ELFOSABI_FREEBSD 9 /* FreeBSD. */
149-#define ELFOSABI_TRU64 10 /* Compaq TRU64 UNIX. */
150-#define ELFOSABI_MODESTO 11 /* Novell Modesto. */
151-#define ELFOSABI_OPENBSD 12 /* OpenBSD. */
152-#define ELFOSABI_ARM_AEABI 64 /* ARM EABI */
153-#define ELFOSABI_ARM 97 /* ARM */
154-#define ELFOSABI_STANDALONE 255 /* Standalone (embedded) application */
155-
156-#define EI_ABIVERSION 8 /* ABI version */
157-
MACROs identifying the type of ELF binary (executable, shared object, relocatable and CORE dumps).
critical@d3ad:~/COURSE_DISECTING_BINARIES/Introduction$ cat /usr/include/elf.h | grep -n -A9 "ET_NONE"
162:#define ET_NONE 0 /* No file type */
163-#define ET_REL 1 /* Relocatable file */
164-#define ET_EXEC 2 /* Executable file */
165-#define ET_DYN 3 /* Shared object file */
166-#define ET_CORE 4 /* Core file */
167-#define ET_NUM 5 /* Number of defined types */
168-#define ET_LOOS 0xfe00 /* OS-specific range start */
169-#define ET_HIOS 0xfeff /* OS-specific range end */
170-#define ET_LOPROC 0xff00 /* Processor-specific range start */
171-#define ET_HIPROC 0xffff /* Processor-specific range end */
It is the machine specification required to properly run this ELF.
critical@d3ad:~$ cat /usr/include/elf.h | grep -n -A190 "EM_NONE"
175:#define EM_NONE 0 /* No machine */
176-#define EM_M32 1 /* AT&T WE 32100 */
177-#define EM_SPARC 2 /* SUN SPARC */
178-#define EM_386 3 /* Intel 80386 */
179-#define EM_68K 4 /* Motorola m68k family */
180-#define EM_88K 5 /* Motorola m88k family */
181-#define EM_IAMCU 6 /* Intel MCU */
182-#define EM_860 7 /* Intel 80860 */
183-#define EM_MIPS 8 /* MIPS R3000 big-endian */
184-#define EM_S370 9 /* IBM System/370 */
185-#define EM_MIPS_RS3_LE 10 /* MIPS R3000 little-endian */
....
And tons of other machine types (nearly 200)
Specifies the version of the file. (not the ELF header’s)
critical@d3ad:~$ cat /usr/include/elf.h | grep -n -A4 "EV_NONE"
378:#define EV_NONE 0 /* Invalid ELF version */
379-#define EV_CURRENT 1 /* Current version */
380-#define EV_NUM 2
381-
This specifies the address where the execution of the program starts. If you’ve ever programmed in C/C++ languages or read books regarding the same, you may have read - execution of the program always begins from main(). While this introductory statement might be helpful to beginners, it is not an absolulte truth. Let’s examine what location does this entry point address point to.
critical@d3ad:~$ readelf -h hello_32 | grep Entry
Entry point address: 0x80482e0
critical@d3ad:~$ objdump -d ./hello_32 | grep -A18 80482e0
080482e0 <_start>:
80482e0: 31 ed xor %ebp,%ebp
80482e2: 5e pop %esi
80482e3: 89 e1 mov %esp,%ecx
80482e5: 83 e4 f0 and $0xfffffff0,%esp
80482e8: 50 push %eax
80482e9: 54 push %esp
80482ea: 52 push %edx
80482eb: e8 23 00 00 00 call 8048313 <_start+0x33>
80482f0: 81 c3 10 1d 00 00 add $0x1d10,%ebx
80482f6: 8d 83 70 e4 ff ff lea -0x1b90(%ebx),%eax
80482fc: 50 push %eax
80482fd: 8d 83 10 e4 ff ff lea -0x1bf0(%ebx),%eax
8048303: 50 push %eax
8048304: 51 push %ecx
8048305: 56 push %esi
8048306: c7 c0 f6 83 04 08 mov $0x80483f6,%eax
804830c: 50 push %eax
804830d: e8 ae ff ff ff call 80482c0 <__libc_start_main@plt>
8048312: f4 hlt
As you can see, entry point address (0x80482e0) points to a label named _start
. If you’ve ever programmed in assembly you might already be familiar of _start
, which marks the beginning of programmer specified instructions. In C/C++ compiled binaries, this routine is embedded by the compiler which acts as initialization code to setup environment for the execution of main()
as specified by the programmer.
So, where is the main() and how is it called ?
main() is called by __libc_start_main(). Analysing this function call, we can know what parameters it takes by looking at what all is pushed onto the stack. Search out the function prototype for __libc_start_main() and also read x86 calling conventions. Can you figure out the address of main() by yourself ?
It specifies the offset to the Program Header Table (PHT) (explained later in the course), from the very first byte in the file.
It specifies the offset to the Section Header Table (SHT) (explained later in the course), from the very first byte in the file.
The EFLAGS
register is set to this value when the program is loaded into memory.
Identifies the size of this ‘ELF Header’
Identifies the size of each entry in PHT.
Displays the number of Program Headers (Segments) in the PHT.
Displays size of each section header.
Specifies the number of section headers (starting from the NULL entry) in SHT.
The value is 26. It means the section at index number 26 (in Section Header Table), stores all the section names (which are NULL terminated ASCII strings). Let’s have a look at this section.
critical@d3ad:~/COURSE_DISECTING_BINARIES/ELF/ELF_HEADER$ readelf -p 26 hello_64
String dump of section '.shstrtab':
[ 1] .symtab
[ 9] .strtab
[ 11] .shstrtab
[ 1b] .interp
[ 23] .note.ABI-tag
[ 31] .note.gnu.build-id
[ 44] .gnu.hash
[ 4e] .dynsym
[ 56] .dynstr
[ 5e] .gnu.version
[ 6b] .gnu.version_r
[ 7a] .rela.dyn
[ 84] .init
[ 8a] .text
[ 90] .fini
[ 96] .rodata
[ 9e] .eh_frame_hdr
[ ac] .eh_frame
[ b6] .init_array
[ c2] .fini_array
[ ce] .dynamic
[ d7] .got
[ dc] .got.plt
[ e5] .data
[ eb] .bss
[ f0] .comment
Below is how you can programatically parse ELF header using low-level structures provided by /usr/include/elf.h
(in case someone likes to write their own toolset to play around with low-level mechanisms in the ELF world). Please look parse_elf_header.c for full source. Here parameter map
is the address where the target binary is memory mapped (mmap)
in our calling process address space.
void parseElfHeader (uint8_t *map) {
Elf64_Ehdr *ehdr = (Elf64_Ehdr *) map;
fprintf (stderr, "Entry point: 0x%lx\n", ehdr->e_entry);
/* Similarly, one can access any field of ELF header */
}
You can’t mug up these things. All this will start making sense after getting some hands on disecting and analysing ELF binaries. We’ll now start with SHT and PHT. This was designed to give you an entry point into binary dissection :)