chevron_left chevron_right
Login Register invert_colors photo_library


Stay updated and chat with others! - Join the Discord!
Poll: How was it?
You do not have permission to vote in this poll.
Awesome!
100.00%
4 100.00%
Too slow
0%
0 0%
Too fast
0%
0 0%
Terrible
0%
0 0%
Total 4 vote(s) 100%
* You voted for this item. [Show Results]

Thread Rating:
  • 0 Vote(s) - 0 Average


Tutorial Reverse Engineering 101 - Part 2 filter_list
Author
Message
Reverse Engineering 101 - Part 2 #1
So, in the last part we got our sandbox environment set up and we backtracked a simple hello world program, neat. Things won't be this simple, ever. So, why don't we add a little complexity to our code and reverse engineer a hello world program where we wrote our own puts().

[Image: 3PRrMJT.png]

Now, we're going to do the same as in part one:
Code:
gcc -o hello hello.c
objudmp -d ./hello | less

Jump straight to the function main, we can see the following:
[Image: ej43tdQ.png]

Now, if you remember the mention of call frames in part one, you notice that the code for this one is slightly different. Specifically we have a subtract instruction in it:
[Image: j74dqzr.png]
What is this for? Well, from a reverse engineering standpoint that's a variable. It's reserving 16 bytes on the stack. Why is this? Well, we can make 2 assumptions:
  1. this is an array type variable
  2. this is a const reference
We know it's an array type because it's been allocated on the stack, indicating it is a pointer. But, aside from that (in this case) we know it's an array because it's larger than a normal C type.
We know it's a const reference because the entire variable is being stored in memory (instead of just a pointer). Meaning, this variable is not intended to change, so we don't need to waste memory by allocating it on the heap (bss section) AND allocating a pointer for it on the stack.

Now, if you noticed it, the variable is allocated for 16 bytes, but our string "Hello, World!", 0x0A, 0x00 is only 15 bytes. So, where did the extra byte come from?
The stack must ALWAYS be word aligned
That extra byte is only there for padding, consider it a wasted byte. In the old days programmers used to do tricky things to avoid stack padding to make programs run better.

So, now we get to look at some more interesting things. The next bit of assembly will be confusing to you.
[Image: ZE85VEI.png]
So, this one won't be quite so simple to understand. If you see something that looks odd to you (in the asm), then please read this article about memory offset notation.
So, you can go ahead and ignore 0x400939 (0x00) as it's also there as padding (due to offset notation).
So, if you look, we are left with two move operations. If you add these up (movq=8 bytes, mov=4 bytes) we get 12 bytes, which is "Hello, World". Once again, we lost a little bit of our string here, don't worry about it, its typical. Let's go ahead and check the memory address of that string.
Yep, offset 997, just like our first move operation, that's our string being copied into that stack variable.
[Image: eliHyPL.png]
So, we can think of that as:
Code:
const char *str;
str = "Hello, World";

Now, let's move on to the next bit
[Image: xrD8Wcx.png]
You see here we are copying rax (variable str) into rdi, and then we call myputs.
Why are we using rdi when the ABI clearly states this should be a stack variable? This is a special use case, because it is a string variable we are using the 64-bit destination index (R=64 bit, DI=destination index) register instead. This is perfectly fine, and is another compiler optimization. Before we go into the next function, let's finish up main.

[Image: 10acLo1.png]
You see, we zero eax (our return value), but then we see a new pseudo op (leaveq). In the past, we haven't had complex call frames, so it was easier to just write out the exit code ourselves. leaveq is simply an instruction that will destroy the frame so that we can return to our parent function.
After that we simply exit main and back into libc where control can be returned to the OS.



Now, at this point we have the following:
Code:
int main()
{
    const char *str;
    str = "Hello, World";
    myputs(str);
    return 0;
}

So, we need to do a little tracing. Let's find out where myputs is located to see if we need to reverse it.
[Image: uFagwmI.png]
Using this command we can see which symbols are defined in this executable. We only need to reverse symbols we can't get source code for somewhere else (or link to), so we pay attention to our binary in question. We are also only interested in those with a capital T next to them. This means they are in the text section, which is executable code. Here we see that myputs is on the list, so let's reverse it.

[Image: iS74P9S.png]
And hello! There's a good amount of code for us to look at!

The first thing you are going to want to do is trim away the frame initializer and destructor, might as well remove the return nonsense as well, you're left with the following as the actual meat of your code.
[Image: Mz439fM.png]
So, this one gets a little tricky, so I'm going to use some pseudo code so that things will be a little clearer, and we will convert to C later on.

[Image: XSqskVz.png]
This is the tricky part, we aren't going to execute this code in order, the first thing we do is jump to 0x40091c, so let's start our pseudocode and then go there.
Code:
goto 91c

91c:
[Image: ieieNll.png]
Now, we're going to call this a gadget (its not one, but easy name). it's just a logical block of code that's going to turn out to be a small expression.
Line 1: move the value in (rbp - 8) into rax
Line 2: move the value at address stored in rax with sign extend into eax (unsigned to signed cast)
Line 3: Z bit = al - al (Z = eax == 0)
Line 4: if (!Z) goto 8ca
It will be hard to follow that, so read over it a few times before proceeding. Do some research, make sure you think!
Code:
goto 91c

91c:
           chr = *str
           if (chr == 0)
                      goto 8ca

So now, let's go trace 8ca
[Image: kEtYCnE.png]
So, this should look very familiar (it's the same code). The reason the first was different is that the loop contains an increment, so we have to work backwards (we don't know this yet in the real world). At the top of the loop, we don't. It's a double check, yes.
Code:
goto 91c
8ca:
           if (chr == 0)
                      goto 8f9
91c:
           oldchr = *(str)
           str = str + 1
           chr = *(str)
           if (oldchr == 0)
                      goto 8ca

Alright, another trace...yay
[Image: gATMkNp.png]
I left in part of 91c for reference, but let's break this guy down if we can.

[Image: vhyr3lw.png]
You will learn to recognize these and won't need to pseudo them out anymore eventually (so I won't). What's going on here is assigning two variables. The first one is for the call (we don't care about it), the second is our character.
Code:
goto 91c
8ca:
           if (chr == 0)
                      goto 8f9
8f9:
           //variable assignment we will ignore (you'll see why)
91c:
           oldchr = *(str)
           str = str + 1
           chr = *(str)
           if (oldchr == 0)
                      goto 8ca
So, let's move on for a moment
[Image: OVrz5tW.png]
Hey! that looks familiar! that variable we just assigned? yeah, make those 32 bit and store them in data index, then call putchar.
Code:
goto 91c
8ca:
           if (chr == 0)
                      goto 8f9
8f9:
           putchar(chr)
91c:
           oldchr = *(str)
           str = str + 1
           chr = *(str)
           if (oldchr == 0)
                      goto 8ca

And now let's go back and trace the other side of the branch
[Image: gioPBsN.png]
Notice anything different? Not really. Same function, just different call. So, for the purposes of getting pseudocode we are going to assume the same. Let's look at that branch though.
it's an unconditional branch to 91c, so let's reflect that in our pseudocode
Code:
goto 91c
8ca:
           if (chr == 0)
                      goto 8f9
           fputch(stdout, chr)
           goto 91c
8f9:
           fputch(stdout, chr)
91c:
           oldchr = *(str)
           str = str + 1
           chr = *(str)
           if (oldchr == 0)
                      goto 8ca

Alright, all done. Now, let's go ahead and clean up our pseudo code a little
Code:
goto 91c
8ca:
           fputch(stdout, chr)
91c:
           str = str + 1
           chr = *(str)
           if (chr == 0)
                      goto 8ca

Alright, now this is making some sense, let's convert to C. Note, loops in c vs assembly are backwards, so bridge this gap in your head.
Code:
chr = *str;
while (chr)
{
    fputch(stdout, chr)
    str++;
    chr = *str;
}

And, we can clean that up by hand to be the following: (replace all instances of chr with str equivalent)
Code:
while (*str)
    fputch(stdout, *str++)

So, our function becomes
Code:
void myputs(const char *str)
{
    while (*str)
        fputch(stdout, *str++)
}



Now that we've traced all of our functions, let's assemble the whole program:

Code:
void myputs(const char *str)
{
    while (*str)
        fputch(stdout, *str++)
}

int main()
{
    const char *str;
    str = "Hello, World";
    myputs(str);
    return 0;
}

I'd say that came out pretty well. Hope you enjoyed!



please give me feedback!
I need to know which points you guys understand better and which you are loose on, otherwise once we start getting deeper you WILL get lost.

(10-12-2016, 05:42 AM)Slacker Wrote: Posting for future reading please also tag me in new ones as you create.

Consider yourself tagged
(This post was last modified: 10-13-2016, 04:16 AM by phyrrus9.)

[+] 2 users Like phyrrus9's post
Reply






Users browsing this thread: 1 Guest(s)