Introduction to Assembly for Linux (Intel 32 and 64 bit)

Deque · 07-03-2013, 12:19 PM

I know that I am late with this and I still haven't included everything I wanted to, but here is the first part. I will include more parts once they are done.
Note: I tested the code in a 64 bit system. If there is someone using 32 bit and it doesn't work for some reason, tell me please.

Introduction to Assembly for Linux (Intel 32 and 64 bit)

"If you can’t do it in Fortran, do it in assembly language. If you can’t do it in assembly language, it isn’t worth doing." (Ed Post)

Since: late 1940s
Paradigms: imperative

Advantages:

useful if code needs direct interaction with hardware
useful if program needs extreme optimization
useful if program needs precise timing
useful if processor-specific instructions are needed, which are not implemented by a compiler
useful for: reverse-engineering, device drivers, interrupt handlers, computer viruses, bootloaders, operating system programming
useful to learn how the CPU works

Disadvantages:

assembly is human readable machine code, it is not adviseable for general purpose programs
assembly is hard to read, hard to write and hard to debug (compared to high level languages)
it takes much longer to write a program in assembly than in a high level language

The very first programs were written in machine language. Machine language is a sequence of bytes that is stored in memory, fetched, interpreted and executed by the computer. Writing programs in machine language is very tedious. You need to know all the addresses of the data and the addresses of program branches, which depend on where the intructions are loaded into memory. Once you modify the data (adding, deleting) these addresses might change. This is why people introduced symbolic names and mnemonics to represent introductions and data (example: MOV). They are more easy to remember and diminish or remove the need to calculate addresses.

So assembly (also abbreviated ASM) is only one step away from the actual machine language. Thus the resulting code is only usable for the specific architecture and operating system it was written for. So why would you want to learn or use assembly?

Look at the advantages mentioned above. Assembly is so close to the metal that you are able to optimize a lot and create programs that have a very good performance. The big BUT: Most people aren't that skilled that they can outperform i.e. C programs that where optimized and translated into machine language by a compiler. It is very hard to do so and it is not adviceable to use assembly for general purpose applications. But in some cases assembly is still used today.

Knowledge in assembly is necessary if you want to dive into fields like reverse-engineering, because binaries sometimes can't be decompiled into a high-level language.

Assembly and C work well together. Assembly is able to call C library functions and C is able to embed assembly code directly. Operating systems are often written in both, assembly and C.

Prerequisites:

You should understand:

Code:
binaries

hexadecimal

basic linux terminal commands

Your first assembly program:

In order to write an assembly program and create an executable out of it you will need:

Code:
Texteditor

Assembler

Linker

Open a text editor of your choice (i.e. Nano, Vim, Emacs, Gedit, Notepad, ...) and write the following program. Save it as exit.nasm. Further down I will explain the code in detail, but for now we will just manage to create an executable program.

Code:
;Name: exit.nasm

;Purpose: Executes the exit system call

;Input: None

;Output: The exit status ($?)

    segment .text

    global _start

_start:

    mov eax, 1  ;1 is exit syscall number

    mov ebx, 5  ;the status value to return

    int 80h    ;execute system call

We will use the NASM assembler. Install it, i.e. in Ubuntu:

Code:
sudo apt-get install nasm

To assemble the file go to the folder you saved exit.nasm and type in your terminal:

Code:
nasm -f elf exit.nasm

Afterwards you call the linker on your object file:

Code:
ld -m elf_i386 exit.o -o exit

This will create the executable exit in the folder you are in and you can run it via

Code:
./exit

To display the exit status returned by your program type:

Code:
echo $?

If everything went fine, a 5 is displayed.

Explanation:

Ok, that's great, but what did you actually do there?
At first you assembled the file: The assembler took your assembly code, translated the mnemonics into opcode, resolved symbolic names and thus turned it into an assembly listing with offsets, the so called object code.

The linker which you called with the command ld takes one or more object files and possibly libraries to create one executable out of it.

Now let's have a look at the program you wrote there.

Code:
;Name:      exit.nasm

;Purpose:   Executes the exit system call

;Input:     None

;Output:    The exit status

    segment .text

    global _start

_start:

    mov eax, 1  ;1 is exit syscall number

    mov ebx, 5  ;the status value to return

    int 80h     ;execute system call

The semicolon is used to write comments into your code. It is usual to comment almost every instruction, because assembly is hard to read without them.

Instructions are either machine instructions, assembly instructions or macros.

segment is an instruction for the assembler. The .text segment is where the program instructions are put into.

global _start tells the assembler to make the label _start known for the linker. The _start function is an entry point for your program, you can compare this to the main function in a C program (actually the main function of a C program is called in the _start function of the C library).

_start is a label.

The instruction

Code:
mov eax, 1

moves the constant 1 into a register called EAX (you will learn more about registers later). MOV is a mnemonic that stands for move.

Code:
mov ebx, 5

Here the constant 5 is moved into the register EBX.

Code:
int 80h

The mnemonic INT stands for interrupt. In this case you make a system call. It tells your computer to look at the values you set in the registers and take action according to them.

I wrote the meaning of the numbers and instructions right behind in comments, but how do you get to know them?

You need a system call reference. I.e. this one:
http://docs.cs.up.ac.za/programming/asm/...calls.html

Now have a look at the very first function sys_exit in the reference table. You can see that in order to call sys_exit, you have to put the constant 1 into EAX and an integer for the return code into EBX. This is exactly what we did here with the MOV instructions.

Congratulations, that was your first assembly program. We will move on to write a hello world program in the next part.

Hello World Program:

This is part 2 of the assembly introduction where we will move on with writing a hello world program.
We want to print a string to the standard output, so we need to define the string. This is done in the .data section with db command to declare an array of bytes. db stands for declare bytes (you can also declare data with dw -> declare words). Every character of a string is defined with a single byte. The whole array has the name msg.

Code:
msg db 'Hello World!'

In addition we need to add the newline character. Have a look at this table: http://www.bobborst.com/tools/ascii-codes/
There you see that the linefeed is 0AH (this is a hex value which you can see because of the prepending H, 0A is 10 in decimal).

This is what we get:

Code:
section .data

msg db 'Hello World!', 0AH

For the body of the program (the .text section) we will keep the exit system call, but change the return code to 0 (for success).

The only thing missing is the system call to write our string to standard output. Let's look up the system call table again and look for a write function. You will see that sys_write does the job. But there might not be clear what the parameters are for.

Here is a complete reference of the system calls: http://asm.sourceforge.net/syscall.html
A desciption of sys_write can be found here: http://man7.org/linux/man-pages/man2/write.2.html

Looking into the system call table you will discover that sys_write is called by moving 4 into EAX. EBX takes a filedescriptor according to the description. What is a filedescriptor? Have a look into this little table: https://en.wikipedia.org/wiki/File_descriptor

There you see:

Code:
stdin

stdout

stderr

So we need to pass 1 for the filedescriptor, since we want to write to standard output (stdout).
This is what we have by now, but there are still two parameters missing:

Code:
mov eax, 4

mov ebx, 1

int 80h

Now we have to put our message into ECX and the length of the message into EDX.

Code:
mov ecx, msg

mov edx, len

How do we get the length?
We create another entry in the .data section called len and there we compute the len of msg.

Code:
len equ $-msg

The $ sign means "address of here", which is the byte right after the msg string.
msg is the starting address of our string. So by substracting the end address from the start address we get the length of our string

equ just creates a symbol whose value is the expression. The result of the expression has to be a constant value.

Now we can complete our system call to sys_write:

Code:
mov eax, 4                                                                  

mov ebx, 1                                                                  

mov ecx, msg                                                                

mov edx, len                                                                

int 80h

The whole hello world program looks like this:

Code:
;File: helloworld.nasm                     

;Purpose: prints Hello, World!             

;                                          

;nasm -f elf helloworld.nasm               

;ld -m elf_i386 helloworld.o -o helloworld 

;                                          

;or                                        

;                                          

;nasm -f elf64 helloworld.nasm             

;ld -m elf_x86_64 hellworld.o -o helloworld

section .data                              

    msg db 'Hello, World!', 0AH ;define string           

    len equ $-msg               ;compute length of msg           

section .text                              

global  _start                             

_start:                                    

    mov eax, 4                  ;sys_write           

    mov ebx, 1                  ;filedescriptor for stdout          

    mov ecx, msg                ;pass string           

    mov edx, len                ;pass string length               ;           

    int 80h                                

    mov eax, 1                  ;sys_exit

    mov ebx, 0                  ;success return code           

    int 80h

References: Introduction to 64 Bit Intel Assembly Language Programming for Linux - Ray Seyfarth (uses yasm)

Psycho_Coder · 07-03-2013, 03:30 PM

Thank you mam for this thread I loved it. But I have a slight confusion for now as I think "Save it as exit.asm" Should it be .asm or .nasm as while executing it on terminal we write exit.nasm

EDIT1 : We should save it as exit.nasm. So mam if you can will you please change that little part ?
Another request, Can you give another simple example of a system call like sys_exit as in the documentation ?

EDIT2 : i have collected some good links so I will post it here if someone else is interested :

http://cs.lmu.edu/~ray/notes/x86assembly/
http://docs.cs.up.ac.za/programming/asm/.../#terminal
http://sourceware.org/binutils/docs-2.23...index.html
http://asm.sourceforge.net/resources.html#tutorials
http://asm.sourceforge.net/howto.html
http://www.tldp.org/LDP/lki/
http://asm.sourceforge.net/syscall.html#p4
http://www.lxhp.in-berlin.de/lhpsyscal.html#intro
http://wotsit.org/list.asp?fc=5
https://www.cs.tcd.ie/~waldroj/itral/cahome.html
http://www.leto.net/writing/nasm.php

EDIT3: I have understood the syscalls so you need not to give any example. Also I have done the "Hello, World". Thanks for everything!

Thank you,
Sincerely,
Psycho_Coder

Deque · 07-03-2013, 04:56 PM

(07-03-2013, 03:30 PM)Psycho_Coder Wrote: Thank you mam for this thread I loved it. But I have a slight confusion for now as I think "Save it as exit.asm" Should it be .asm or .nasm as while executing it on terminal we write exit.nasm

EDIT1 : We should save it as exit.nasm. So mam if you can will you please change that little part ?
Another request, Can you give another simple example of a system call like sys_exit as in the documentation ?

Thanks for pointing that out. I corrected the mistake.
I will extend my tutorial at least to the hello world part, explaining the .data segment and more. Then there is also another system call I will explain.

diana32 · 07-03-2013, 07:29 PM

(07-03-2013, 03:30 PM)Psycho_Coder Wrote: ....................................................................................

EDIT3: I have understood the syscalls so you need not to give any example. Also I have done the "Hello, World". Thanks for everything!

Thank you,
Sincerely,
Psycho_Coder

You are a lucky guy ,but i think that if you already know this thing you could closed an eyes ,and you can notice Deque on the pm , you could work toghether for complish a good quality thread.This programming language is not easy, but i will try to learn all the things that are inside, becouse i'm interest and becouse i don't have a university prepair-i do all by myself,so for the next time try to consultate whith Deque becouse if she has start a thread like this then he try to give to the mortal people a light in the end of the tunnel. godbye world. :bye:

i forgot to thank you for all the link-the first is dead but the others works fine .

Psycho_Coder · 07-03-2013, 08:10 PM

(07-03-2013, 07:29 PM)diana32 Wrote:
(07-03-2013, 03:30 PM)Psycho_Coder Wrote: ....................................................................................

EDIT3: I have understood the syscalls so you need not to give any example. Also I have done the "Hello, World". Thanks for everything!

Thank you,
Sincerely,
Psycho_Coder
You are a lucky guy ,but i think that if you already know this thing you could closed an eyes ,and you can notice Deque on the pm , you could work toghether for complish a good quality thread.This programming language is not easy, but i will try to learn all the things that are inside, becouse i'm interest and becouse i don't have a university prepair-i do all by myself,so for the next time try to consultate whith Deque becouse if she has start a thread like this then he try to give to the mortal people a light in the end of the tunnel. godbye world. :bye:

i forgot to thank you for all the link-the first is dead but the others works fine .

Sorry but I actually didn't understood what you said except two things :-

1. "you could work toghether for complish a good quality thread" ----> Making some thread with Mam (Deque), you must be joking as she a million times more knowledgeable than me :lol:

2. The first link works fine for me here have a look :-

Spoiler: Open

Will you explain Why I am Lucky ? (Reason : Every morning and all the time I think of myself as a very unlucky guy from every aspect of life)

Are you trying to say that why I told her not to give another example ? Is it so ?
I am sorry for that!

diana32 · 07-03-2013, 08:20 PM

you are lucky for knowing what you know-and here we talk only for assembler.For the first link maybe is my browser,and since i see that correct Deque i think that you know well then she the assembler.There is no sutterfuge and don't get me wrong, i only comment on what i read and for what i understend.

Psycho_Coder · 07-03-2013, 08:34 PM

(07-03-2013, 08:20 PM)diana32 Wrote: you are lucky for knowing what you know-and here we talk only for assembler.For the first link maybe is my browser,and since i see that correct Deque i think that you know well then she the assembler.There is no sutterfuge and don't get me wrong, i only comment on what i read and for what i understend.

No problem dear :wub:, that was a simple typing mistake and nothing else. I made many errors like this in my tutorials and later I have fixed them. Its usually hard to make a tutorial complete error free. I am a student and I am still learning.

For that link try use another browser or you can download it from here :- http://www.mediafire.com/download/kahcrl...x86.tar.gz I have uploaded it there.

Thank you,
Sincerely,
Psycho_Coder

diana32 · 07-03-2013, 08:45 PM

I find good a thread whith error-and you know why? becouse make's you judge when you do the things.Is simply to copy and paste the code but where is the fun ?-after two ,three errors the neurons start to collegate each other and only then you learn.

offocurse -thread for learning and not for asking request or when you must compile 1000 line code

Legolas · 07-17-2013, 10:49 PM

Nice Introduction @Deque !

I always love ASM and Delphi.
Some guys think that Asm is very difficult.
Personaly,i think that C is more difficult than ASM.

Thank for this Deque. :-)

The Alchemist · 07-18-2013, 02:46 AM

This was another masterpiece made by the one and only Deque. Smile

Simply brilliant.