[Tutorial] Multiple precision arithmatic (x86) 08-19-2015, 02:13 AM
#1
So, sooner or later you will run into the problem of having to do math on massive numbers and find that it isn't quite as simple as you thought it would be. For this tutorial I make the assumption that you know at least one compiled programming language (C, C++, Assembly, etc).
Microprocessors do math a little bit differently than you would imagine. Let us take, for example an 8-bit processor. That is, the CPU can only handle 8 bits of data at a time. Let's make up a pretend register set and a pretend instruction set containing just a few instructions:
Now, a sample program:
this should be equivalent to the following in C:
All this makes sense, right? Good. Now what if I told you that there are actually 9 bits in this operation instead of just 8?
Well, this is where things get cool. A microprocessor has a state register that keeps track of all of the operations results as it goes along. Here is an annotated version of the status register all the way through:
But, that still doesn't make much of a difference...In 8-bit math, the highest value we can hold is 255 (0xFF). So what is 0xFF + 0x01? Well, 0x100 of course! This is called overflow in some cases, but in most cases this is simply a carry. Here is another annotated program in our fake language:
When this is all done, A=0, B=1, and C=6 (zero and carry set)
So, by this logic, 0xFF + 0x01 = 0x00. Well, that isn't right!
In C:
Note, the result is zero.
As far as the CPU knows, the result is [1] 0000 0000 where that 1 is actually stored somewhere else. This is known as the carry bit and will be essential to our task.
Alright, let's part from the fictional shit and get into real processors. To make this simple, I am only going to use the x86 cpu family (32-bit). If you want to see a more complicated example, check out this which uses 64-bit intel assembly to work on integers up to 8192-bit.
So, if you don't know much about assembly (which I doubt most of you do), that's alright, but you should at least grasp the fundamentals if you want to understand this (registers, mov, push, pop, add, and sub).
So, let's write a simple function in assembly that takes 3 parameters:
We will assume these parameters are passed via the registers
So, basically right now we have implemented a software 32-bit adder...the equivalent in C would be:
But now is where it gets interesting. 32-bit processors have been able to handle 64-bit numbers for ages...thats what long is, right? WRONG. Nice assumption, though. Now you get to see how that actually works:
64-bit addition over a 32-bit bus
The way we are going to do this, is by using our 32-bit adder twice. But, first I am going to make some modifications to it (saving extra registers, and changing the ADD to an ADC, which I will explain later).
Alright, now we have our 32-bit adder function, let's write a 64-bit adder? Yeah![Smile Smile](https://sinister.ly/images/smilies/set/smile.png)
First thing, I am going to explain why I changed ADD to ADC.
ADD is Add...duh.... ADC is Add With Carry and is the key to this whole thing (and why we have to do this in assembly if we want efficiency). If we add 2^32 and 1, we get 0. But when we do that we also set the carry bit. ADC looks at the carry bit first, and if it is set, adds 1 before doing any other add. think of it like this in C:
For the sake of time, I won't spoonfeed every step of writing the 64-bit adder, but I will comment it well.
It should make sense now that to add 64 bits, we just do it 32 bits at a time. Now, for the last iteration we will make that 128-bit just to give you something cool to walk away with. I am going to modify this again just a little (you will see why later)
Here is the whole file so far:
Alright...With that done, we can start chaining them together. To add 128-bits we can either call add32 four times, or add64 two times. I prefer to use the latter of the methods, but you can do it however you wish. Here is the add128 function (should look very similar) [actually, the whole file is here]
Cool! Now what? Do I have to write all my programs in assembly to use this?
No! We can call these directly from C if we wanted to. The problem is, we aren't conforming to the C standard... Those of you trying to use Windows with this, you're going to have a much harder time, but the smart ones using linux this should be a breeze.
Anyways, let's make a wrapper for all this and write a C program.
Now, you can call it from a C program! I don't want to fiddle with endianness on this one, check out the github repo I linked at the top of this article if you want to play with something more serious.
Microprocessors do math a little bit differently than you would imagine. Let us take, for example an 8-bit processor. That is, the CPU can only handle 8 bits of data at a time. Let's make up a pretend register set and a pretend instruction set containing just a few instructions:
Code:
Registers:
A - 8 bit register
B - 8 bit register
C - 4 bit state register { N Z C S } (negative, zero, carry, stop)
Instructions:
LDA - Load value into A
LDB - Load value into B
ADD - Add A and B and store in A
SUB - Subtract B from A and store in A
Now, a sample program:
Code:
LDA 2
LDB 2
ADD
Code:
A = 2;
B = 2;
A += B;
All this makes sense, right? Good. Now what if I told you that there are actually 9 bits in this operation instead of just 8?
Well, this is where things get cool. A microprocessor has a state register that keeps track of all of the operations results as it goes along. Here is an annotated version of the status register all the way through:
Code:
LDA 2 { ? ? ? 0 }
LDB 2 { ? ? ? 0 }
ADD { 0 0 0 0 }
{ 0 0 0 1 }
But, that still doesn't make much of a difference...In 8-bit math, the highest value we can hold is 255 (0xFF). So what is 0xFF + 0x01? Well, 0x100 of course! This is called overflow in some cases, but in most cases this is simply a carry. Here is another annotated program in our fake language:
Code:
LDA 255 { ? ? ? 0 }
LDB 1 { ? ? ? 0 }
ADD { 0 1 1 0 } --note, zero and carry are set
{ 0 1 1 1 }
So, by this logic, 0xFF + 0x01 = 0x00. Well, that isn't right!
In C:
Code:
unsigned char A = 0xFF, B = 0x01;
A += B;
printf("%d\n", A);
As far as the CPU knows, the result is [1] 0000 0000 where that 1 is actually stored somewhere else. This is known as the carry bit and will be essential to our task.
Alright, let's part from the fictional shit and get into real processors. To make this simple, I am only going to use the x86 cpu family (32-bit). If you want to see a more complicated example, check out this which uses 64-bit intel assembly to work on integers up to 8192-bit.
So, if you don't know much about assembly (which I doubt most of you do), that's alright, but you should at least grasp the fundamentals if you want to understand this (registers, mov, push, pop, add, and sub).
So, let's write a simple function in assembly that takes 3 parameters:
- Destination memory address ®
- First parameter (A)
- Second parameter (B)
We will assume these parameters are passed via the registers
- R - EAX
- A - EBX
- B - ECX
Code:
[BITS 32]
adder:
;EAX R
;EBX A
;ECX B
MOV EDX, EAX ;Save our copy of R
PUSH EBX ;Save our copy of A
PUSH ECX ;Save our copy of B
MOV DWORD EAX, [EBX] ;Copy 32-bits from address in EBX (A)
MOV DWORD EBX, [ECX] ;Copy 32-bits from address in ECX (B)
ADD EAX, EBX ;EAX = EAX + EBX (EAX = A + B)
MOV DWORD [EDX], EAX ;Store the result back into address in EDX (R)
POP ECX ;Restore values
POP EBX ;Restore values
MOV EAX, EDX ;Restore values
RET
So, basically right now we have implemented a software 32-bit adder...the equivalent in C would be:
Code:
void adder(unsigned int *R, unsigned int *A, unsigned int *B)
{
*R = *A + *B;
}
But now is where it gets interesting. 32-bit processors have been able to handle 64-bit numbers for ages...thats what long is, right? WRONG. Nice assumption, though. Now you get to see how that actually works:
64-bit addition over a 32-bit bus
The way we are going to do this, is by using our 32-bit adder twice. But, first I am going to make some modifications to it (saving extra registers, and changing the ADD to an ADC, which I will explain later).
Code:
[BITS 32]
add32:
;EAX R
;EBX A
;ECX B
PUSH EBX ;Save our copy of A
PUSH ECX ;Save our copy of B
PUSH EDX
MOV EDX, EAX ;Save our copy of R
MOV DWORD EAX, [EBX] ;Copy 32-bits from address in EBX (A)
MOV DWORD EBX, [ECX] ;Copy 32-bits from address in ECX (B)
ADC EAX, EBX ;EAX = EAX + EBX (EAX = A + B)
MOV DWORD [EDX], EAX ;Store the result back into address in EDX (R)
MOV EAX, EDX ;Restore values
POP EDX ;Restore values
POP ECX ;Restore values
POP EBX ;Restore values
RET
Alright, now we have our 32-bit adder function, let's write a 64-bit adder? Yeah
![Smile Smile](https://sinister.ly/images/smilies/set/smile.png)
First thing, I am going to explain why I changed ADD to ADC.
ADD is Add...duh.... ADC is Add With Carry and is the key to this whole thing (and why we have to do this in assembly if we want efficiency). If we add 2^32 and 1, we get 0. But when we do that we also set the carry bit. ADC looks at the carry bit first, and if it is set, adds 1 before doing any other add. think of it like this in C:
Code:
R = A + B + C ? 1 : 0;
For the sake of time, I won't spoonfeed every step of writing the 64-bit adder, but I will comment it well.
Code:
add64:
;EAX R (First 32 bits)
;EBX A (First 32 bits)
;ECX B (First 32 bits)
;Example: A is actually A:A+4, B=B:B+4, etc
PUSH EAX ;v
PUSH EBX ;Save registers
PUSH ECX ;^
CLC ;Clear the carry bit (so we don't get an off by 1
CALL add32 ;Add the first 32 bits
ADD EAX, 4 ;v
ADD EBX, 4 ;Shift to the second 32 bits
ADD ECX, 4 ;^
CALL add32 ;Add the second 32 bits
POP ECX ;v
POP EBX ;Restore registers
POP EAX ;^
RET
It should make sense now that to add 64 bits, we just do it 32 bits at a time. Now, for the last iteration we will make that 128-bit just to give you something cool to walk away with. I am going to modify this again just a little (you will see why later)
Here is the whole file so far:
Code:
[BITS 32]
addinit:
CLC ;Put this into its own sub for now
RET ;Make sure to call it before any adders
add32:
;EAX R
;EBX A
;ECX B
PUSH EBX ;Save our copy of A
PUSH ECX ;Save our copy of B
PUSH EDX
MOV EDX, EAX ;Save our copy of R
MOV DWORD EAX, [EBX] ;Copy 32-bits from address in EBX (A)
MOV DWORD EBX, [ECX] ;Copy 32-bits from address in ECX (B)
ADC EAX, EBX ;EAX = EAX + EBX (EAX = A + B)
MOV DWORD [EDX], EAX ;Store the result back into address in EDX (R)
MOV EAX, EDX ;Restore values
POP EDX ;Restore values
POP ECX ;Restore values
POP EBX ;Restore values
RET
add64:
;EAX R (First 32 bits)
;EBX A (First 32 bits)
;ECX B (First 32 bits)
;Example: A is actually A:A+4, B=B:B+4, etc
PUSH EAX ;v
PUSH EBX ;Save registers
PUSH ECX ;^
CALL add32 ;Add the first 32 bits
ADD EAX, 4 ;v
ADD EBX, 4 ;Shift to the second 32 bits
ADD ECX, 4 ;^
CALL add32 ;Add the second 32 bits
POP ECX ;v
POP EBX ;Restore registers
POP EAX ;^
RET
Alright...With that done, we can start chaining them together. To add 128-bits we can either call add32 four times, or add64 two times. I prefer to use the latter of the methods, but you can do it however you wish. Here is the add128 function (should look very similar) [actually, the whole file is here]
Code:
[BITS 32]
addinit:
CLC ;Put this into its own sub for now
RET ;Make sure to call it before any adders
add32:
;EAX R
;EBX A
;ECX B
PUSH EBX ;Save our copy of A
PUSH ECX ;Save our copy of B
PUSH EDX
MOV EDX, EAX ;Save our copy of R
MOV DWORD EAX, [EBX] ;Copy 32-bits from address in EBX (A)
MOV DWORD EBX, [ECX] ;Copy 32-bits from address in ECX (B)
ADC EAX, EBX ;EAX = EAX + EBX (EAX = A + B)
MOV DWORD [EDX], EAX ;Store the result back into address in EDX (R)
MOV EAX, EDX ;Restore values
POP EDX ;Restore values
POP ECX ;Restore values
POP EBX ;Restore values
RET
add64:
;EAX R (First 32 bits)
;EBX A (First 32 bits)
;ECX B (First 32 bits)
;Example: A is actually A:A+4, B=B:B+4, etc
PUSH EAX ;v
PUSH EBX ;Save registers
PUSH ECX ;^
CALL add32 ;Add the first 32 bits
ADD EAX, 4 ;v
ADD EBX, 4 ;Shift to the second 32 bits
ADD ECX, 4 ;^
CALL add32 ;Add the second 32 bits
POP ECX ;v
POP EBX ;Restore registers
POP EAX ;^
RET
add128:
PUSH EAX ;v
PUSH EBX ;Save registers
PUSH ECX ;^
CALL add64 ;Add the first 64 bits
ADD EAX, 8 ;v
ADD EBX, 8 ;Shift to the second 32 bits
ADD ECX, 8 ;^
CALL add64 ;Add the second 64 bits
POP ECX ;v
POP EBX ;Restore registers
POP EAX ;^
RET
Cool! Now what? Do I have to write all my programs in assembly to use this?
No! We can call these directly from C if we wanted to. The problem is, we aren't conforming to the C standard... Those of you trying to use Windows with this, you're going to have a much harder time, but the smart ones using linux this should be a breeze.
Anyways, let's make a wrapper for all this and write a C program.
Code:
wrap128:
MOV EAX, ECX ;Get R
MOV EBX, EDX ;Get A
POP EDX ;Get the return address
POP ECX ;Get B
PUSH EDX ;Save the return address
CALL addinit ;Initialize our library
CALL add128 ;Do the add
RET
Now, you can call it from a C program! I don't want to fiddle with endianness on this one, check out the github repo I linked at the top of this article if you want to play with something more serious.