Tutorial [ARM] - Switch statements filter_list Author Message
[ARM] - Switch statements #1
So, it's been quite some time since I updated my tutorial list, and I figured I'd give it another go.

I've told you in the past how various things don't exist in programming, such as if-blocks, for loops, pointers, and data types. In each of these, I've explained how they actually do work. Well now is that anticipated moment when I combine them all together and show you something more abstract. Yes, I'm talking about the switch statement.

Take a look at this C code. How do you think it's implemented?

Code:
```int main(void) {        int foo;        int var;        foo = 2;        switch (foo)        {                case 1:                        var = 2;                        break;                case 2:                        var = 3;                case 3:                        var = 4;                        break;                default:                        var = 1;                        break;        }        return 0; }```

Did you say "a bunch of comparisons and branches"?

Well, if you did, you'd be wrong. That's terrible for efficiency. You should only need 1 comparison to implement this (or any sized) switch statement (with one exception). Only one comparison!? How on Earth can you manage that!?

It's actually pretty simple. Switch statements are one of those special types of abstract control structure. We know the following information from looking at the C code
1. There are exactly 4 cases we can fall into
2. All 3 specified cases are linear (no missing numbers)
3. We know the start and ending numbers (1 and 3)

Based on this knowledge, we can construct a table of sorts. To accomplish this feat, let's first start with the assumption that any number is valid. What we'll do is make a one-column table, something like this:
• case0
• case1
• case2
• case3

This looks pretty useless, but bear with me. As an array, we can use these as pointers to the correct label of something. We'd just need to do the following:

Code:
```void *table[4] = { &case0, &case1, &case2, &case3 }; void *jump_addr = table[foo]; // jump to jump_addr```

Make sense? Well, our table actually is exactly like that, but without all the complicated mess. We can just use some basic algebra to figure it out.
We know that ARM addresses memory on a 32-bit bus, meaning that every location (and thus label address) will take up 4 bytes. Because of that, we can figure out the address stored in table[foo] with the following formula:
Code:
`table + (foo /*4)`
But wait you say. ARM doesn't have an efficient multiply instruction. Don't you worry, there's a shortcut. Notice how 4 is one of those numbers that sticks in your head? Seems familiar huh? Like maybe from the RAM cards themselves and that they're always in multiples of 2?

Well, they actually aren't multiples of 2, they're powers of 2. This is due to addressing and your computers logic being in base 2. It wouldn't be able to access bits outside of that. Because this is base 2, we can multiply by 2 just by shifting every bit to the left 1 location, and inserting a 0 in the empty space. Here's an example.

Code:
```0x0001 = 1 0x0010 = 2 0x0100 = 4 0x1000 = 8 and likewise with odd numbers 0x0101 = 5 0x1010 = 10```

So, we can use that pretty reliably. In fact, so reliably we get to cut an entire cycle off our code. ARM has this neat little way of doing logic shifts and additions in the same operation. So we can now get our address with this:
Code:
```// assuming R0 holds base address and R1 holds value of foo ADD R2, R0, R1 LSL #0x2 // R2 = R0 + (R1 << 2)```
So, that's neat and all, but what does it mean? That means we only have 4 bytes to hold our code? Seems pretty useless.

Well, normally, yes. Jump tables make this a lot easier. You can use that 4 bytes to hold a branch instruction to the label you need it to be.

Ok, now that I've confused you enough, here's the actual assembly.

Code:
```caseentry:        LDR     R1,     =foo        LDR     R0,     [R1]        LDR     R2,     =casetable        ANDS    R3,     R0,     #0xFFFFFFFC        LDREQ   pc,     [R2, R0, lsl #2]         a:        MOV     R4,     #0x1        B       caseend b:        MOV     R4,     #0x2        B       caseend c:        MOV     R4,     #0x3 d:        MOV     R4,     #0x4        B       caseend caseend:        LDR     R1,     =var        STR     R4,     [R1]        MOV     R7,     #0x1        XOR     R0,     R0,     R0        SWI     #0x0 casetable:        .word   a        .word   b        .word   c        .word   d foo:    .word   #0x2 var:    .space  #0x4```

Now, let's go through this pretty briefly, but enough that someone with ARM knowledge should know what's going on.
Those first 3 lines are all loads, they should be pretty obvious. First we get the value of foo and store it in R0, then we store the address of our jump table in R2. We don't want to get any values from that yet, we need to do some math first.

Next up, we bitwise and with 0xfffffffc. What that is doing is removing the last 3 bits (the ones we care about). We store that value in R3. We wanted to remove the bits we care about to see if there are any bits we don't care about, since any number higher than 3 will be in the default case, we want to see first if that is the case. This is also our only comparison, if you notice the S following the AND instruction. This instructs the processor to compare that result with 0 after it's finished and set our condition bits. ARM has this amazing feature that does it as part of a math operation for us if we choose, eliminating the need for us to waste another clock cycle and do the comparison ourselves. Since this comparison is all we really care about, we don't actually need the value in R3, and we're free to reuse it again (even though I did not to make it easy to read).

Next up is where all of the magic happens. You should see that the right side of this instruction looks pretty similar to the one we made before. It actually does the same thing in this case. I mentioned that ARM was powerful enough to do a shift and an addition in one cycle, well it's also powerful enough to use that result in a memory fetch operation. All of this happens in 1 single clock cycle. What this actually comes out to is
Code:
`PC = R2 + (R0 << 2)`
Remember, PC is our program counter, which tells the processor where to execute memory from. Setting the PC to something is equivalent to a branch instruction, which is exactly what's going on here. We're branching to the base of the table + 4 times the value of foo.

Now, about the EQ at the end of the LDR instrucion. This instructs the processor to only perform this operation if our previous comparison was equal to 0, meaning that the number was between 0 and 3 inclusively. Because of this, we don't need to and it with 3 to mask off the extra bits, because this will only get executed if there are no extra bits.

Now, if you look at the case table, you can see that I just define 4 words in succession. These 4 words just hold the address of a label, and so when we jump to the address held in them, we get to our 4 different cases. One label for each of the 4 valid cases (0-3). But what about the default case?

Well, the default case just happens to be the same as the 0 case, so the C programmer left off the 0 case and just used the default one. If he had created both a 0 case and a default case that were the same, the compiler would have optimized it to the same C code I wrote. Back to that EQ suffix on the LDR instruction. If there were extra bits (meaning a number not in the range of 0-3), then  that EQ suffix prevents the jump from happening, and it continues on to label a, which happens to be the same as our default (and zero) case.

The same is true for the fallthrough case (2). It is the only one without a branch to the case end label, meaning that after it executes, it will fall through to label d, or case for 3.

Because this is assembly, and it's been optimized, we use R4 for temporary storage and write the value into var at the end of the case. This allows us to have fewer load and store operations in the code, and provides a central place to debug an issue from.

Well, that's about it, let me know what you think, and if you have any questions.

3 users Like phyrrus9's post
RE: [ARM] - Switch statements #2
Fuck this is crazy. Good job

RE: [ARM] - Switch statements #3
(07-28-2018, 05:35 AM)Mimiakira Wrote: Fuck this is crazy. Good job

I actually wrote this all out like 2 months ago and just never really got around to explaining it until now. Life's been pretty busy.

RE: [ARM] - Switch statements #4
(07-28-2018, 06:28 AM)phyrrus9 Wrote:
(07-28-2018, 05:35 AM)Mimiakira Wrote: Fuck this is crazy. Good job

I actually wrote this all out like 2 months ago and just never really got around to explaining it until now. Life's been pretty busy.

1 user Likes MimiE's post
RE: [ARM] - Switch statements #5
(07-28-2018, 07:36 AM)Mimiakira Wrote:
(07-28-2018, 06:28 AM)phyrrus9 Wrote:
(07-28-2018, 05:35 AM)Mimiakira Wrote: Fuck this is crazy. Good job

I actually wrote this all out like 2 months ago and just never really got around to explaining it until now. Life's been pretty busy.

No, just sitting in ~/code for ages.

1 user Likes phyrrus9's post
RE: [ARM] - Switch statements #6
I'm assuming that there isn't supposed to be a '/' here?
Code:
`table + (foo /*4)`

Nice job. It's nice that you can use normal operations like LDR on PC instead of being limited to only JMP/B. Also conditional execution is nice too of course...

(07-28-2018, 07:37 AM)phyrrus9 Wrote:
(07-28-2018, 07:36 AM)Mimiakira Wrote:
(07-28-2018, 06:28 AM)phyrrus9 Wrote: I actually wrote this all out like 2 months ago and just never really got around to explaining it until now. Life's been pretty busy.

No, just sitting in ~/code for ages.

So.... A draft?

(11-02-2018, 02:51 AM)Skullmeat Wrote: Ok, there no real practical reason for doing this, but that's never stopped me.

RE: [ARM] - Switch statements #7
(07-29-2018, 08:24 AM)Ender Wrote: I'm assuming that there isn't supposed to be a '/' here?
Code:
`table + (foo /*4)`

Nice job. It's nice that you can use normal operations like LDR on PC instead of being limited to only JMP/B. Also conditional execution is nice too of course...

(07-28-2018, 07:37 AM)phyrrus9 Wrote:
(07-28-2018, 07:36 AM)Mimiakira Wrote: I bet. was this a draft till now?

No, just sitting in ~/code for ages.

So.... A draft?

No, there isn't supposed to be a / there. Remember, we're trimming off any value greater than 3, so we simply multiply by 4 to get the offset in memory.

Users browsing this thread: 1 Guest(s)