Internal representation of floating point numbers (ANSI/IEEE 754)

Deque · 02-16-2013, 06:23 PM

Internal representation of floating point numbers (ANSI/IEEE 754)

This paper explains by example how floating point numbers are represented internally (i.e. float data type in C). This representation is specified in the IEEE Standard for Floating-Point Arithmetic (IEEE 754). It is a must know for programmers in my opinion. I got asked about this topic and decided to create a paper instead of explaining it via PM.

A 4-Byte floating point number, also known as single precision, has the following representation:

[table]
[row]
[cell]sign [/cell]
[cell]characteristic (= c)[/cell]
[cell]significant bits (= s)[/cell]
[/row]
[row]
[cell]1 bit [/cell]
[cell]8 bit [/cell]
[cell]23 bit [/cell]
[/row]
[/table]

where:
exponent = c - 127
(don't bother about that, it is just a definition we use later)

The following formula is used to get the number as we know it:

Code:
(-1)^(sign) * 2^(c-127) * ((1.s)(bin))

Example for conversion from decimal to IEEE 754:
-19.625 (dec)

You set sign = 1, because it is negative

Now you need the exponent. First convert -19.625 to binary:
-19.625 (dec) = -10011.101(bin)

Now you can see that you need exponent = 4 to move the dot to the first number:
-10011.101(bin) * 2^4 = -1.0011101(bin)
Calculate the characteristic from the exponent:

4 = c - 127
c = 131 = (10000011)(bin)

Extract s from 1.0011101(bin)
1.s = 1.0011101
s = 0011101 ... fill the rest with 0 (s has 23 bits)

Thus our internal representation is:
sign + c + s
11000001100111010000000000000000

Use this website to exercise and understand: http://www.h-schmidt.net/FloatConverter/IEEE754.html

Example for conversion from IEEE 754 back to decimal:

Now lets make an example to convert a number from IEEE 754 representation back to dec:
01000001101110011000001100010010
You just have to use the formula given above

First bit is the sign = 0, so it is positive

Next 8 bits are c: c = 10000011(bin) = 131

The rest is s: s = 01110011000001100010010
Prepend a 1 (which is implicit): 1.s = 1.01110011000001100010010

Using the formula the number is: 1^0 * 2^131 * (1.01110011000001100010010)(bin) = 23.189

Special cases:

The examples above are only the normalised case.
For pretty small values the representation is changed, so that you can represent them.

If c = 0 and s = 0: The value is 0
If c = 0 and s != 0: The formula is (-1)^(sign) * 2^(-126) * (0.s)(bin)

Example:
2^-126 * 2^-23 = 2^-149 = 1.4013 * 10^-45
This is the smallest possible positive number.

If c = 255 and s = 0: "(-1)^(sign) * ∞" (--> -∞ or ∞)
If c = 255 and s != 0: NaN (not a number, i.e. for sqrt(-2), 0/0, (0 * ∞))

Sources: Study notes from university

H4R0015K · 02-17-2013, 05:14 PM

how to convert float numbers to binary?

Deque · 02-18-2013, 10:01 AM

(02-17-2013, 05:14 PM)H4R0015K Wrote: how to convert float numbers to binary?

For i.e. 19.625 calculate 19 and 0.625 seperately.
You know how to convert 19 to binary, so I won't discuss that.
0.625 has to be multiplied by 2. If the result >= 1.0, you note 1. If it is < 1.0 you note 0. If the result is exactly 1 you can stop. Otherwise keep going until you have enough bits

0.625 * 2 = 1.250 --> note 1 and go on with 0.25
0.25 * 2 = 0.5 --> note 0 and go on with 0.5
0.5 * 2 = 1 --> note 1 and stop

The result is: 0.625 (dec) = 0.101 (bin)

Just add the result of 19 (=10011) to it, to get 19.625.
--> 10011 + 0.101 = 10011.101

Other example:
0.2 * 2 = 0.4 --> note 0
0.4 * 2 = 0.8 --> note 0
0.8 * 2 = 1.6 --> note 1
0.6 * 2 = 1.2 --> note 1
0.2 * 2 = 0.4 --> now you realize that this is a loop which goes on forever. Just stop here

The result is 0.2 (dec) = 0.00110011001100110011 ...

This dec to bin converter will also calculate floats: http://www.mathsisfun.com/binary-decimal...erter.html

ArkPhaze · 05-23-2013, 10:18 AM

I posted a solution in C for converting float to binary representation: http://www.hackcommunity.com/Thread-Cont...#pid140626