Tutorial CYFA - Creating Your First Assembler - Single Data Transfers

phyrrus9 · 10-16-2017, 10:27 AM

Ok, first thing I want to do before I start this off is make a slight correction from last time. In the previous part I implied that the rotate field on the immediate section was a number of bits to rotate, but I was wrong. In fact, it works by first zero extending to 32 bits and then rotating by two times that field. Secondly, in the last part I said I would start working on this part after I had gotten 5 replies, that just didn't happen. I decided to write this anyways because im fucking bored to death and I actually want this series to get finished. fucking reply to these threads god dammit! Alright, with that taken care of, let's start to think about part 4: the single data transfer.

Side note, this is my 100th thread (in nearly 5 years Tongue

). Woot, throw a party, ive already got an 8 drink head start. I know the number is low, but I don't write threads unless its good info.

First off, we gotta know what these instructions do. The single data transfer type of instruction has one and only one purpose, to move data between registers and the memory controller. We don't care what happens to it before or after that. Here's a graphic that explains the layout of this one:
[Image: vK4LLFZ.png]

Ok, I'm really not going to go into too much detail on what all of these bits do, the graphic explains it well enough. For our assembler we will be using the following configuration of bits that will be static:

Code:
I = 0

P = 1

B = 0

W = 0

Ok, so that leaves us with bits U and L, which are really all we need. There's a lot of other stuff you can do with these instructions, but we won't.
NOTE: our assembler has to check that R15 is not Rd.

Now, the U bit: this one is important to touch on. In most every programming language, integers are signed by default, meaning that you can have the following:

Code:
int i = -1;

This is exactly backwards from assembly, if you interpreted it in the literal sense:

Code:
mov r0, -1

you would actually get
0xFFFFFFFF (4294967295).

So, negative numbers aren't possible then?
No, they are. 4294967295 and -1 are actually the same number in a computer (32-bit). This works by using 2's compliment (we won't get into it, you can google though). Because of that, positive and negative numbers share the same space (there isn't a negative bit sign). So, to get a negative number, you just interpret it as if it were a signed number, and the CPU doesn't do this at all. To the CPU, these aren't even numbers. You have to be the one who interprets them.

Ok, how does this apply?
Well, our 12-bit immediate field is an unsigned field (like all fields), so what if we want to address something that's 4 bytes below us (like say we want to read the instruction we're executing). Well, in assembly, we'd write this as -4, but it would actually come out as positive 4, and we would set the U bit. That tells the CPU to address the memory in inverse order, effectively subtracting the offset.

Next, we have the L bit. This one is pretty simple, it differentiates between a load (LDR) and a store (STR) instruction. These will be the only 2 instructions we support (no MRS or any of that shit, feel free to read about them though), so really this will be one of the easiest fields to populate. We simply have to set 2 bits, an offset (which will likely be 0), and know if its a setter or a getter. Pretty nice huh?

Ok, cool. Since this one is really short, I want to work through a couple examples as well as explain another bit, particularly the I bit.
The I bit tells us what type of offset we're dealing with, these can get really complicated, so we're not going to bother too much with it, but note that with it set to 0, it means that we are using relative mappings. The following 2 lines do not reference the same spot in memory:

Code:
ldr r0, [pc, -0x4]

ldr r0, [pc, -0x4]

Each of those lines would reference itself, but not the others. Another neat example, would be a self perpetuating program:

Code:
ldr r0, [pc, -0x4]

str [pc, 0x0], r0

these two lines would fill memory with itself until it ran out of memory, PC would eventually hit 0, and the CPU would fault (exception vectors).

Since we have the I bit set, we always have to access ram relatively, we can't do this:

Code:
ldr r0, 0x5005

but we can do this, if we aren't within 12 bits on another register:

Code:
ldr r0, 0x5005

ldr r0, [r0, 0x0]

Alright, that's probably all for this one, I tried to keep it REALLY basic. Next time we're going to talk about branches, and I'll likely get into the condition fields at that point.

FUCKING REPLY DAMMIT!

Blink · 10-18-2017, 06:05 AM

That was nice. You should've explained what LDR and STR do so us Intel-users can understand what's happening.

To Intel People:
As stated earlier, they change based on the flags set and stuff, but in this case, LDR loads a register (specified by Rd) with a value from a memory address relative to a register, and STR loads memory relative to a register with the contents of the register specified by Rd.
(Rd changes places from right to left in the assembly code, so it'll make more sense for the programmer)

phyrrus9 · 10-18-2017, 06:11 AM

(10-18-2017, 06:05 AM)Ender Wrote: That was nice. You should've explained what LDR and STR do so us Intel-users can understand what's happening.

To Intel People:
As stated earlier, they change based on the flags set and stuff, but in this case, LDR loads a register (specified by Rd) with a value from a memory address relative to a register, and STR loads memory relative to a register with the contents of the register specified by Rd.
(Rd changes places from right to left in the assembly code, so it'll make more sense for the programmer)

LDR = LoaD Register
STR = STore Register

Probably should have mentioned that RISC is load-store arch. Intel let's you do this with MOV instructions, which isn't a possibility with these setups. There are only a few instructions that can access memory. LDR and STR are the most important two.

For those who want a little more reading on the subject, here is the ARM infodoc on these instructions. The link is for the M3, but it's the same instruction. We'll be working with the first format:

Code:
op{type}{cond} Rt, [Rn {, #offset}]        ; immediate offset

For our example, the type is always WORD, so we don't fill that field in.