![]() |
Tutorial ARM [Part 2: Writing basic programs] - Printable Version +- Sinisterly (https://sinister.ly) +-- Forum: Coding (https://sinister.ly/Forum-Coding) +--- Forum: Assembly (https://sinister.ly/Forum-Assembly) +--- Thread: Tutorial ARM [Part 2: Writing basic programs] (/Thread-Tutorial-ARM-Part-2-Writing-basic-programs) |
ARM [Part 2: Writing basic programs] - phyrrus9 - 03-13-2018 Ok, it's been a long while since I started this series, and it's time for another round. This time we're going to go over your first ARM programs, using just basic instructions. This practice will form the foundations of every program you will ever write for ARM. For the most part, you're only going to be using a couple instructions, and we'll discuss how to optimize them in a later part. For now, let's just go over what they are. So, As I discussed in Part 1 of this series--ARM, like all RISC processors, uses a load-store architecture. This is a fundamental change to how you write assembly code if you have any experience with an x86, z80, or similar architecture. What that means is that all of the instructions that do work can ONLY do work on register arguments. For example, something like the following would be valid for x86: Code: ADD EAX, [ESP] add the value of EAX with the value from a variable held at the top of the stack, and place it in EAX This is invalid for ARM. Our add instruction is not able to interface with memory. So that brings us to the first fundamental difference. Loading and Storing So, since ARM can't interface directly with memory in working instructions, we need a way to be able to get data from memory, and put it back. We do that using the LDR and STR instructions. They're pretty simple, and they're one of the very few instructions for ARM that takes 2 operands. Remember that ARM normally takes 3 operands: the destination, the source, and op-n. The LDR instruction (short for "load register") will retrieve a value from memory and place it in the destination register. To accomplish the same task as we did above, we'd do the following: Code: LDR R1, R13 ; remember, R13 is the stack pointer. You can also use SP in some assemblers STR - the STR instruction is the reverse of LDR, and it's short for "store register". This is the second half to the load-store architecture. Let's look at the following x86 program: Code: ADD EAX, [ESP] We can use our LDR and STR instructions now to make the same snippet for ARM: Code: LDR R1, R13 Now, let's talk about another fundamental difference with ARM: Preserving arithmetic values Since ARM is a 3-operand system, we can actually preserve the value of R0/EAX when we do our addition. To do this for x86, we would need to do a bunch of crazy stack operations: Code: PUSH EAX Code: LDR R1, R13 Now, some of you might have been wondering why I was using R13 instead of SP. That's because I wanted to get the thought in your mind that aliases exist for registers. This brings us to another key difference between RISC and x86 (CISC). Lots of registers ARM has 15 general purpose registers. This means that you can put whatever you want in them, and your program will still run (for the most part). Our x86 counterpart only has 4. This means that for x86, you will need to put a lot more data into RAM when you don't need it this second, and you waste all of that time having to push and pull it as you work. Each of these registers is 32 bits long. For the sake of consistency I'm basing this series on ARM 32-bit, since it's more common, and x86 rather than AMD64. So, that means with x86, you have a total of 128 bits, or 16 bytes of register storage on the chip. That's 16 bytes of data you are limited to at any given time, without having to wait for memory to get more data. In ARM, since you have 15 general purpose registers, you have 480 bits, or 60 bytes of data you can work with. That's 275% MORE data! This means less time waiting for memory, and less memory used for your programs. Remember that the stack is a place in memory. Of course, it's not advised for you to use all 15 of these registers, though you can. At any given time, you should only modify 14 of them, and I'll tell you why. With x86, you have those 4 registers (EAX, EBX, ECX, EDX) that you can play with, and they will work with the majority of the instructions. You can put data into EDI, EBP, and ECS if you want to, but they won't actually work with all of the instructions, making them only useful for temporarily storing something rather than pushing it (though you still need to move it back and forth, which costs CPU time). Here in ARM-land, we don't have that limitation. We can use 15 of our 16 total registers with any instruction we like. We can use the 16th register with at least half of the instructions. However, there is a catch. ARM treats all registers (except R15) as a user register. This means that all of those registers can operate on data, but they might have other purposes. Exactly 3 of the 16 registers have a primary purpose other than storing computations. These are done by register aliases. Here's what they are: R13 => SP (the stack pointer) R14 => LR (the link register, we'll get to this one shortly) R15 => PC (the program counter, aka the instruction pointer) So, like I said. You can put data in all of these, with the exception of R15 (if you put a computation in that, bad things will happen). But, you really don't ever want to overwrite the stack pointer, so you'd want to leave that one alone. This direct access to these registers makes it very easy to do some awesome hacks with your code. For example, you can write your program in such a way that when you make the program counter out of (4-byte) alignment by one byte, the program changes to another valid program that you want. This means that you can write a program that only uses 1/4 of the memory it might normally need! You would just increment R15 by one and jump back to the start each time, and it would run over the same bytes, but in a slightly different order each time. I do want to note, that R0 - R12 have no other primary purpose. At any time, you can modify these to whatever you feel like. This makes up 52 bytes of truly free storage on the CPU die. Now, I mentioned something in the last section about the link register, so let's talk about that for a bit: The link register (R14) With x86, when you call a subroutine, the CPU pushes the address of the next instruction onto the stack for you. This takes up time, and it also means that you have to have very careful control over your stack or your program will break (and it leaves room for attackers). However, it does provide you with the means to "return" (RET) out of your subroutine, and you'd store your return value in EAX. ARM is a little bit different. The ABI for ARM not only doesn't store this value on the stack (saving a little memory and some time), but actually allows you TWO return values (R0 and R1). The second key bit is because ARM has lots of registers, you don't need to push your arguments onto the stack either. You supply them by registers (up to 12 arguments), saving you even more time and memory. So, how does ARM know where to return to? Short answer: it doesn't, the programmer is in charge of that. ARM doesn't actually have a call and return setup, but it uses a series of branches. In their base form, a branch is identical to a JMP in x86. However, there are 2 special forms of this: the branch with exchange (BX) and the branch with link (BL). It's the second one we're interested in at the moment. That's more like x86's CALL. Here's a sample x86 program: Code: start: Code: start: Now, these programs should look pretty much the same. Ok, so we've now covered all of the basics! Let's get to writing our first programs! We'll start with the classic x86 linux hello world: Code: section .text Oh boy....every time I see that code I cringe a little. Not because it's terribly different from the ARM variant, but just because it looks so yucky. Let's start writing the ARM version and I'll explain the differences as we go. First of all, since ARM has had the NX bit for ages (do some research on it), we don't need to differentiate our sections. The OS won't allow us write and execute permission on the same memory. So we already get to skip our section .text. Also, since ARM has a large register file, we pass everything in as registers. Code: .global _start ; we still have to declare this, that's just basic linker stuff Code: ssize_t sys_write(unsigned int fd, const char * buf, size_t count); The next thing I want to talk about is how we called the OS. As you'll see, with x86 we call int 0x80, but for ARM we're calling SWI 0. This makes them basically the same, but it wasn't always like that. In fact, using Software Interrupt 0 is a relatively new EABI feature, and you shouldn't count on it to always be that. With other kernels (like XNU), you don't load R7 at all, you would call SWI 4 in that case. ARM is a very great system in that it's very easy to determine the interrupt number that was called. For example: Code: SWI 8 Ok, everything else in that program was basically the same, and for this stage in the tutorial series it pretty much will always be the same. I want to spend a little bit of time to show you a list of some other basic instructions you'll be using: Here's a graphic that lists out ALL of the instructions that do data processing ![]() You may have seen this before in our CYFA tutorial series (if you haven't, go take a look). These are very basic instructions, and I don't think I need to explain them to you, however do note that the mnemonics are different from x86 in some places. So, you saw the hello world program, now let's write a program (using subroutines) that lets us write "Hello World" to a file. We'll write our very own version of strlen for this, because hard coding the value just isn't right. Let's start with that (strlen). I'm going to use the same definition of strlen as in manual chapter 3 Code: size_t strlen(const char *s); Let's start off with a basic subroutine structure: Code: ; strlen - calculate the length of a string Now, for this we're really only going to need 1 variable, so we'll shove that in R1. For the sake of making the programmer's life easier we'll also preserve that register, and note it in our comments. Code: ; strlen - calculate the length of a string From this point forward, I'm not going to leave in the previous comments, just so it doesn't look so messy. Ok, so at this point we have our basic structure, all we need is our loop. We're going to loop as long as the current value isn't 0 (the NULL terminator). With ARM this is really easy if we use the S-bit (if you've forgotten what that was, reread the first part of this series). Code: ; strlen - calculate the length of a string I aligned the columns so you can see better. Let's start our loop Code: ; strlen - calculate the length of a string Code: ; strlen - calculate the length of a string Perfect, our strlen function is done. Now, let's make our open and close file subroutines. We'll start with close, since it's the easiest. Again, start with our base: Code: ; close - closes a file descriptor Code: #6 Alright, now we know which registers we'll need. Looks like all we need to save is R7, because we don't have to set any arguments and sys_close doesn't return anything. So, let's push it, pop it, and don't bother with clearing R0. Code: ; close - closes a file descriptor Now, the rest of this is pretty simple. We just need to move integer 6 into R7 and call the OS. Since the user supplied us with the fd in R0, we don't need to change it at all. Code: ; close - closes a file descriptor Now, we need to work out how to open the file. We'll do this in a very basic way. We'll hard code some of the options so that the file always opens in read-write mode, creating the file if it doesn't exist, truncating it if it does, and opening with permissions 666 (read write all). This is the same as opening with fopen with the mode string being "wb". Let's first figure out our syscall number and argument list, then figuring out our hardcoded values. Code: #5 Code: O_RDWR => 0x002 Code: flags => 0x602 Alright, let's get started with our barebones subroutine Code: ; open - opens a file Now, let's figure out what registers we need. We know we're going to need R7 for the syscall, and then we need 2 registers for our hard coded arguments. So, we need to preserve R1, R2, R7 and note that we destroy R0 (which we already did) Code: ; open - opens a file Code: ; open - opens a file Perfect! At this point, we have everything we need, aside from our main subroutine. Your code should look like this right now Code: ; strlen - calculate the length of a string Now, let's quickly pseudocode our main subroutine. It will be a pretty simple one now that we have all these nice wrappers for us. Code: 1. open file For now, let's pretend we have already defined the symbols string and file. We can start with the easiest part of it, step 5. Code: start: Now, let's work on #1. We know that it takes an argument (R0) that is a pointer to our file string, and returns a number (R0) that is the fd it opened. This is as simple as moving the arg in and branching to it (with a LINK!) Code: start: Great, but now we have a different issue, when we go to get the length of the string, we'll end up overwriting our fd with that value. No worries, this is where we get to take advantage of ARM's massive register file. From now on, let's store the fd in R1. For the sake of making this fewer steps, we're going to store the length of the string in R2 (this way we don't have to move it). We'll then need to save R1, R2, and R7 (for the sys_write syscall). Code: start: Great, now let's work on step 2. This one is nearly identical to step 1, so I won't comment it. We're just moving string to R0, branching (with link!) to strlen, then moving it to R2. Code: MOV R0, =string Now, on to step 3. This one is slightly different, because we're calling the syscall directly. Remember our sys_write stuff from hello world? Code: #4 R7 <= 4 R0 <= fd R1 <= string R2 <= string_len Isn't it great that we already did the last step in that list ![]() Code: MOV R7, #4 And now for our last step (that we still have to write), we need to close our file. This takes the fd in on R0 and doesn't return anything, so it's simple Code: MOV R0, R1 Sweet! Our finished start subroutine should look like the following: Code: start: That means that our code is completely finished! All we need to do now is add the little bits of linker fluff around it. We know already that we need to have Code: .global start Code: .data And we're done! Our finished ARM assembly program should look like the following (I've removed the step comments as well) Code: ; strlen - calculate the length of a string A hello world printed to a file program in pure ARM assembly using only 51 lines! Now, I do want to note, this is NOT the most optimized form of this program, there are a couple ways to shave 2-7 lines out of this, but I wanted to keep it as simple as possible for you guys for now. I'll hand out rep or NSP to anyone who does make it more efficient. 10 NSP and +2 rep to the first person who can tell me exactly how long this program will take to run on a 1ghz CPU! (you can treat the software interrupts like a NOP for this. I hope you enjoyed this one, it took me like 4 hours to type all of this up for you. @"Ender" I know you wanted to read this one, so here you go. In part 3, we'll talk about how to plan out these programs so they aren't as inefficient as this one is, and in part 4 we'll do some hard core optimizations of our code and really show those x86 idiots that ARM is king! RE: ARM [Part 2: Writing basic programs] - Blink - 03-18-2018 Well that was uhhm... long... Nice though, thanks for writing this I expected it to end at "Hello World", but nope, you dove into file I/O. I'll get that NSP later today (I hope), and I'll also read part 3. Time to break out a Raspberry Pi, hell, maybe I'll even write an ARM kernel to learn more about this. I find it interesting how ARM chose a 3-argument format, you generally see only 2. I'm considering learning a few assemblies for the hell of it, MIPS, 6502, Z80, whatever else. Z80 would be an easy one though, I already know 8086... RE: ARM [Part 2: Writing basic programs] - phyrrus9 - 03-18-2018 (03-18-2018, 02:08 AM)Ender Wrote: Well that was uhhm... long... Nice though, thanks for writing this They actually did some crafty stuff so that using 3-arg instructions and 32-bit fixed width wouldn't prevent you from doing the same things that you can with variable width. ![]() The example here being the very wise use of the barrel shifter, which iirc only ARM has. It allows you to effectively do Code: R0 <= R2 + (R1 << 25) Code: ADD R0, R2, R1, LSL #25 RE: ARM [Part 2: Writing basic programs] - ElizabethSwann78 - 12-20-2021 Unfortunately, I am a lousy student. My grades aren't improving, despite my best efforts. Rather than pursuing a career in academics, I'd want to concentrate on my athletics. As soon as possible, I need to find a solution to my dissertation's problem. My friend recommends that I look into getting assistance with my dissertation from [link redacted] , and I'm taking her advice. In the event that I decide to write my own dissertation, their website provides detailed directions on how to do it. RE: ARM [Part 2: Writing basic programs] - m1nam - 02-27-2022 thank you for sharing this with us. RE: ARM [Part 2: Writing basic programs] - Kelso - 02-15-2023 This is a fantastic written guide, thank you very much fren! I'm sure it'll come in handy whenever I feel like catching up on ARM-ASM. |