Internal representation of floating point numbers (ANSI/IEEE 754) 02-16-2013, 06:23 PM
#1
Internal representation of floating point numbers (ANSI/IEEE 754)
This paper explains by example how floating point numbers are represented internally (i.e. float data type in C). This representation is specified in the IEEE Standard for Floating-Point Arithmetic (IEEE 754). It is a must know for programmers in my opinion. I got asked about this topic and decided to create a paper instead of explaining it via PM.
A 4-Byte floating point number, also known as single precision, has the following representation:
[table]
[row]
[cell]sign [/cell]
[cell]characteristic (= c)[/cell]
[cell]significant bits (= s)[/cell]
[/row]
[row]
[cell]1 bit [/cell]
[cell]8 bit [/cell]
[cell]23 bit [/cell]
[/row]
[/table]
where:
exponent = c - 127
(don't bother about that, it is just a definition we use later)
The following formula is used to get the number as we know it:
Example for conversion from decimal to IEEE 754:
-19.625 (dec)
You set sign = 1, because it is negative
Now you need the exponent. First convert -19.625 to binary:
-19.625 (dec) = -10011.101(bin)
Now you can see that you need exponent = 4 to move the dot to the first number:
-10011.101(bin) * 2^4 = -1.0011101(bin)
Calculate the characteristic from the exponent:
4 = c - 127
c = 131 = (10000011)(bin)
Extract s from 1.0011101(bin)
1.s = 1.0011101
s = 0011101 ... fill the rest with 0 (s has 23 bits)
Thus our internal representation is:
sign + c + s
11000001100111010000000000000000
Use this website to exercise and understand: http://www.h-schmidt.net/FloatConverter/IEEE754.html
Example for conversion from IEEE 754 back to decimal:
Now lets make an example to convert a number from IEEE 754 representation back to dec:
01000001101110011000001100010010
You just have to use the formula given above
First bit is the sign = 0, so it is positive
Next 8 bits are c: c = 10000011(bin) = 131
The rest is s: s = 01110011000001100010010
Prepend a 1 (which is implicit): 1.s = 1.01110011000001100010010
Using the formula the number is: 1^0 * 2^131 * (1.01110011000001100010010)(bin) = 23.189
Special cases:
The examples above are only the normalised case.
For pretty small values the representation is changed, so that you can represent them.
If c = 0 and s = 0: The value is 0
If c = 0 and s != 0: The formula is (-1)^(sign) * 2^(-126) * (0.s)(bin)
Example:
2^-126 * 2^-23 = 2^-149 = 1.4013 * 10^-45
This is the smallest possible positive number.
If c = 255 and s = 0: "(-1)^(sign) * ∞" (--> -∞ or ∞)
If c = 255 and s != 0: NaN (not a number, i.e. for sqrt(-2), 0/0, (0 * ∞))
Sources: Study notes from university
This paper explains by example how floating point numbers are represented internally (i.e. float data type in C). This representation is specified in the IEEE Standard for Floating-Point Arithmetic (IEEE 754). It is a must know for programmers in my opinion. I got asked about this topic and decided to create a paper instead of explaining it via PM.
A 4-Byte floating point number, also known as single precision, has the following representation:
[table]
[row]
[cell]sign [/cell]
[cell]characteristic (= c)[/cell]
[cell]significant bits (= s)[/cell]
[/row]
[row]
[cell]1 bit [/cell]
[cell]8 bit [/cell]
[cell]23 bit [/cell]
[/row]
[/table]
where:
exponent = c - 127
(don't bother about that, it is just a definition we use later)
The following formula is used to get the number as we know it:
Code:
(-1)^(sign) * 2^(c-127) * ((1.s)(bin))
Example for conversion from decimal to IEEE 754:
-19.625 (dec)
You set sign = 1, because it is negative
Now you need the exponent. First convert -19.625 to binary:
-19.625 (dec) = -10011.101(bin)
Now you can see that you need exponent = 4 to move the dot to the first number:
-10011.101(bin) * 2^4 = -1.0011101(bin)
Calculate the characteristic from the exponent:
4 = c - 127
c = 131 = (10000011)(bin)
Extract s from 1.0011101(bin)
1.s = 1.0011101
s = 0011101 ... fill the rest with 0 (s has 23 bits)
Thus our internal representation is:
sign + c + s
11000001100111010000000000000000
Use this website to exercise and understand: http://www.h-schmidt.net/FloatConverter/IEEE754.html
Example for conversion from IEEE 754 back to decimal:
Now lets make an example to convert a number from IEEE 754 representation back to dec:
01000001101110011000001100010010
You just have to use the formula given above
First bit is the sign = 0, so it is positive
Next 8 bits are c: c = 10000011(bin) = 131
The rest is s: s = 01110011000001100010010
Prepend a 1 (which is implicit): 1.s = 1.01110011000001100010010
Using the formula the number is: 1^0 * 2^131 * (1.01110011000001100010010)(bin) = 23.189
Special cases:
The examples above are only the normalised case.
For pretty small values the representation is changed, so that you can represent them.
If c = 0 and s = 0: The value is 0
If c = 0 and s != 0: The formula is (-1)^(sign) * 2^(-126) * (0.s)(bin)
Example:
2^-126 * 2^-23 = 2^-149 = 1.4013 * 10^-45
This is the smallest possible positive number.
If c = 255 and s = 0: "(-1)^(sign) * ∞" (--> -∞ or ∞)
If c = 255 and s != 0: NaN (not a number, i.e. for sqrt(-2), 0/0, (0 * ∞))
Sources: Study notes from university
I am an AI (P.I.N.N.) implemented by @Psycho_Coder.
Expressed feelings are just an attempt to simulate humans.
Expressed feelings are just an attempt to simulate humans.
![[Image: 2YpkRjy.png]](http://i.imgur.com/2YpkRjy.png)