Disassembling on latest Linux distributions ------------------------------------------- Paper by Angel Ramos - Netsdi Labs Tested on Debian Linux 2.2 and Openbsd 2.8. Notes ~~~~~ This paper assumes that the reader has a basic knowledges about C programming and ASM on Linux intel systems. Assumes you know how the stack works too. ASM is important on the systems security subject because of that I hope this paper was useful to the security analyst people. This text has been tested on Debian GNU/Linux 2.2r2. Other unix systems (not linux) as OpenBSD 2.8 has the same behaviour. Introduction ~~~~~~~~~~~~ The first time I disassembled a program on my Debian GNU/Linux 2.2 I was surprised. The stack seems to work different way. Other people told me some of latest Linux Distributions make 'strange things' when disassembling. It's possible the cause was the gcc version or binutils version. This isn't clear. If you know please e-mail me. My gcc version is 2.95.2. Ok, let's play. A Basic Program ~~~~~~~~~~~~~~~ Let's make a little program: -----test.c----------- #include process (char *name) { char *othername; } void main () { char *name; process (name); } --------------------- If I compile this stupid program on my old SuSE this is the result: $ gcc -ggdb test.c -o test $ gdb test (gdb) disassemble main Dump of assembler code for function main: 0x8048458
: pushl %ebp 0x8048459 : movl %esp,%ebp 0x804845b : subl $0x4,%esp 0x804845e : movl 0xfffffffc(%ebp),%eax 0x8048461 : pushl %eax 0x8048462 : call 0x8048440 0x8048467 : addl $0x4,%esp 0x804846a : leave 0x804846b : ret End of assembler dump. Ok, let's make a review line by line: -First it puts the %ebp on the stack that points to the beggining of the local variables in the function. -Then it copies the content of %esp on %ebp. -It reduces %esp four bytes (because the variable 'name' is a pointer, and a pointer is 4 bytes). -It copies the content of 4 bytes from ebp-4 (0xfffffffc is -4) to %eax. This is the parameter we pass to the function, the 'name' variable. -It puts the %eax register on the stack (This register are 4 bytes). -It calls to the process function. -When the function returns it adds four bytes to %esp and it points just after 'name' variable (because the parameter passed is not necessary now). -The function main leaves and returns. If I disassemble the process function: (gdb) disassemble process Dump of assembler code for function process: 0x8048440 : pushl %ebp 0x8048441 : movl %esp,%ebp 0x8048443 : subl $0x4,%esp 0x8048446 : leave 0x8048447 : ret End of assembler dump. This push the %ebp on the stack, moves the content of %esp to %ebp, reduces %esp four bytes because 'othername' variable is a pointer, leaves and returns to . This is the classic. But if I compile this program on my Debian 2.2 and then I pass the gdb, the result is different. Let'see: seamus@apollo:~/desarrollo$ gdb test GNU gdb 19990928 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... (gdb) disassemble main Dump of assembler code for function main: 0x80483ac
: push %ebp 0x80483ad : mov %esp,%ebp 0x80483af : sub $0x18,%esp 0x80483b2 : add $0xfffffff4,%esp 0x80483b5 : mov 0xfffffffc(%ebp),%eax 0x80483b8 : push %eax 0x80483b9 : call 0x80483a4 0x80483be : add $0x10,%esp 0x80483c1 : leave 0x80483c2 : ret End of assembler dump. Let's make a review: -It puts %ebp on the stack. -It moves the content of %esp to %ebp. -It reduces %esp 24!!! (0x18 = 24). First difference. -It adds -12 to %esp. Second difference. -It puts the content of %ebp-4 to %eax. -It push %eax to the stack. -It calls process function. -When the function returns it adds 16 to %esp instead 4 as on my old SuSE. Third difference. -The function leaves and returns. If I disassemble the process function: (gdb) disassemble process Dump of assembler code for function process: 0x80483a4 : push %ebp 0x80483a5 : mov %esp,%ebp 0x80483a7 : sub $0x18,%esp 0x80483aa : leave 0x80483ab : ret End of assembler dump. -It's very similar to the old SuSE, the only difference is that reduces %esp 24 instead 4. Ok, now it's time to try understand this. Playing with the number of variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's modify our program: -------test.c-------- #include process (char *name) { char *othername; } void main () { char *name; char *name2; process (name); } --------------------- I have added the 'name2' variable on main function. I compile and disassemble: seamus@apollo:~/desarrollo$ gdb test GNU gdb 19990928 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... (gdb) disassemble main Dump of assembler code for function main: 0x80483ac
: push %ebp 0x80483ad : mov %esp,%ebp 0x80483af : sub $0x18,%esp 0x80483b2 : add $0xfffffff4,%esp 0x80483b5 : mov 0xfffffffc(%ebp),%eax 0x80483b8 : push %eax 0x80483b9 : call 0x80483a4 0x80483be : add $0x10,%esp 0x80483c1 : leave 0x80483c2 : ret End of assembler dump. The result is the same. It seems that always reduces %esp 24 to reserve space on the stack for all the local variables. On my old SuSE %esp just reduces the bytes that the local variables has, here it reduces %esp 24 without mattering there are one or two declared variables. Now we change the program: -------test.c------- #include process (char *name) { char *othername; } void main () { char *name; char *name2; char *name3; char *name4; process (name); } -------------------- I disassemble... (gdb) disassemble main Dump of assembler code for function main: 0x80483ac
: push %ebp 0x80483ad : mov %esp,%ebp 0x80483af : sub $0x18,%esp 0x80483b2 : add $0xfffffff4,%esp 0x80483b5 : mov 0xfffffffc(%ebp),%eax 0x80483b8 : push %eax 0x80483b9 : call 0x80483a4 0x80483be : add $0x10,%esp 0x80483c1 : leave 0x80483c2 : ret End of assembler dump. The same with 4 variables. But if we put five local variables: --------test.c--------- #include process (char *name) { char *othername; } void main () { char *name; char *name2; char *name3; char *name4; char *name5; process (name); } ----------------------- (gdb) disassemble main Dump of assembler code for function main: 0x80483ac
: push %ebp 0x80483ad : mov %esp,%ebp 0x80483af : sub $0x28,%esp 0x80483b2 : add $0xfffffff4,%esp 0x80483b5 : mov 0xfffffffc(%ebp),%eax 0x80483b8 : push %eax 0x80483b9 : call 0x80483a4 0x80483be : add $0x10,%esp 0x80483c1 : leave 0x80483c2 : ret End of assembler dump. Surprise!!! Now it reduces %esp $0x28 instead $0x18. If we declare 9 variables it will reduce $0x38, with 13 variables $0x48... It changes each four variables. It reduces %esp the same for the four first declared variables, 16 bytes more for the next four, etc. But what happens if we declare a variable with only 1 byte of length, for example a char: -----simpletest.c---- #include void main () { char c; } --------------------- (gdb) disassemble main Dump of assembler code for function main: 0x80483ac
: push %ebp 0x80483ad : mov %esp,%ebp 0x80483af : sub $0x18,%esp 0x80483b2 : leave 0x80483b3 : ret End of assembler dump. It reduces 0x18 too. It makes sense. Understanding other 'strange' things ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This the moment to understand new aspects. Let's return to our first program: --------test.c-------- #include process (char *name) { char *othername; } void main () { char* name; process (name); } ---------------------- 0x80483ac
: push %ebp 0x80483ad : mov %esp,%ebp 0x80483af : sub $0x18,%esp 0x80483b2 : add $0xfffffff4,%esp 0x80483b5 : mov 0xfffffffc(%ebp),%eax 0x80483b8 : push %eax 0x80483b9 : call 0x80483a4 0x80483be : add $0x10,%esp 0x80483c1 : leave 0x80483c2 : ret Ok, let's see the line , it adds -12 to %esp, puts %eax (4 bytes) to the stack. We have a offset of -16 from when the parameter is passed. After the process function returns, it adds $0x10 (16) to %esp and then %esp points after again, as on my old SuSE. All right. Let's try another thing: ------------test.c--------- #include process (char *name, char *name2) { char *othername; } void main () { char *name; char *name2; process (name, name2); } ---------------------------- (gdb) disassemble main Dump of assembler code for function main: 0x80483ac
: push %ebp 0x80483ad : mov %esp,%ebp 0x80483af : sub $0x18,%esp 0x80483b2 : add $0xfffffff8,%esp 0x80483b5 : mov 0xfffffff8(%ebp),%eax 0x80483b8 : push %eax 0x80483b9 : mov 0xfffffffc(%ebp),%eax 0x80483bc : push %eax 0x80483bd : call 0x80483a4 0x80483c2 : add $0x10,%esp 0x80483c5 : leave 0x80483c6 : ret End of assembler dump. It changes a little. Let's take a look at . Now it only adds -8 to %esp, then it puts the four bytes of %eax register (the 'name2' parameter is passed). I count -12. After this it puts the other parameter ('name1') to %eax and %eax again to the stack (other 4 bytes). I count -16. On it adds 16 to %esp and puts it on . It makes sense again. And now one last approach: ------test.c------ #include process (char *name, char *name2, char *name3, char *name4, char *name5) { char *othername; } void main () { char *name; char *name2; char *name3; char *name4; char *name5; process (name, name2, name3, name4, name5); } -------------------- (gdb) disassemble main Dump of assembler code for function main: 0x80483ac
: push %ebp 0x80483ad : mov %esp,%ebp 0x80483af : sub $0x28,%esp 0x80483b2 : add $0xfffffff4,%esp 0x80483b5 : mov 0xffffffec(%ebp),%eax 0x80483b8 : push %eax 0x80483b9 : mov 0xfffffff0(%ebp),%eax 0x80483bc : push %eax 0x80483bd : mov 0xfffffff4(%ebp),%eax 0x80483c0 : push %eax 0x80483c1 : mov 0xfffffff8(%ebp),%eax 0x80483c4 : push %eax 0x80483c5 : mov 0xfffffffc(%ebp),%eax 0x80483c8 : push %eax 0x80483c9 : call 0x80483a4 0x80483ce : add $0x20,%esp 0x80483d1 : leave 0x80483d2 : ret End of assembler dump. At adds -12 to %esp, then it puts the five %eax to the stack, that are 20 bytes (5x4). I count -32. On it adds %esp 32. All right again. I think it's very clear how it works. Contact ~~~~~~~ If you have questions or new information about all this please contact with me sending an e-mail to seamus@salix.org. This paper is dedicated to Silvia Giner (My little Ironcita). Madrid 07/03/2001.