Floating Point Trouble! [Information]

Floating Point Trouble! [Information] - Printable Version

+- Sinisterly (https://sinister.ly)
+-- Forum: Coding (https://sinister.ly/Forum-Coding)
+--- Forum: C, C++, & Obj-C (https://sinister.ly/Forum-C-C-Obj-C)
+--- Thread: Floating Point Trouble! [Information] (/Thread-Floating-Point-Trouble-Information)

Pages: 1 2

Floating Point Trouble! [Information] - ArkPhaze - 05-22-2013

Comparing Floating Point Integers

Alright, I am here to make this a short note, because there's too much to write about this quite honestly, and I don't have the time or patience to explain all of the details. The truth is that floating point datatypes are not as simple as people seem to think; floating point math is not easy! Most, think they understand their datatypes, yet I can guarantee that most (also) don't know as much as they think. Here are some basic rules though, that if followed should keep you on the right path, whether they are obvious to you or not:

1. Don't use floating point data types for precision on any scale.
2. Don't use floating point data types for currency!
3. [Insert any other rules that you can probably derive from 1. or 2. here]

Let's go through a quick example:

Code:
const float epsilon = 0.1f;

printf("2: %f\n", epsilon * 6);

getchar();

double d = 0.0;

for (int i = 0 ;; i++)

{

    d += 0.1;

    printf("%f\n", d);

    getchar();

}

Okay nothing suspicious... Now let's try the same code a bit modified for (a little) more precision:

Code:
const float epsilon = 0.1f;

printf("2: %.16f\n", epsilon * 6);

getchar();

double d = 0.0;

for (int i = 0 ;; i++)

{

    d += 0.1;

    printf("%.16f\n", d);

    getchar();

    // Notice anything strange after a few iterations? ;)

}

(The previous values seen were actually being rounded.)

It's a little better break-pointing though, so here's some other code I had:
[Image: SgK7vNP.png]

NOTE: I pinned the values of d1 and d2 after the breakpoint was hit, and as you can see they are not the exact same; different. Also shown, is the suggested method of comparing floating point values for equality, however I modified that code even further to compare the traditional == operator, against this epsilon method, here:

[Image: HmEpLPf.png]

And the results:
[Image: J3BKA10.png]

Surprising? Or maybe not because we know what those values really are and can see that they are not equal? :whistle: Assuming you didn't know the exact value, perhaps your initial guess would have been different though.

Commonly, we know that there are 4 bytes (32 bits) in a regular int (which is usually short for a 32 bit integer), short--which is short for a 16 bit integer, or 2 bytes in memory, and 64 bit integers, or long, which are 8 bytes in memory. It's not difficult to tell what the values represented as a byte array would be for these integral datatypes...

What about floating point datatypes? They are not meant to be truly accurate, although there is the whole number portion, and the decimal portion, unlike integers. I wrote a bit of code in C for checking the bytes of a double:

Code:
// 0xE2 0x06 0x7C 0x7E 0x98 0x51 0x7B 0x40

double n = 437.09973;

printf("%f. Byte Values:\n", n);

byte* bytes = (byte*)&n;

for (int j = 0; j < sizeof(n); j++)

{

    printf("0x%.2X ", bytes[j]);

}

printf("\n");

NOTE: The same can be modified to view the bytes of some other datatypes as well.

The 32 bit floating point datatype consists of 1 bit reserved for the sign, 8 bits reserved for the exponent, and 23 bits reserved for what is known as the "mantissa" field. This type is usually associated with the float keyword in a most languages. For a more in depth analysis of this data type, you can look into the IEEE 754-1985 standard.

Just as a mention, this is not a bug either, but rather just the specification behind the way they are stored in memory. The error is a result of trying to "fit" the correct size in bits and bytes, which causes minor modification, and thus slightly "off" data. You will find the same issue in other languages as well because it's purely conceptual with the datatype itself and no other factor.

If you are now discouraged from using floating point datatypes at all because you consider them "unreliable," let's say this, as I mentioned in the beginning of this post: If you intend to use floating point anything, as a means for calculating precise values, that would be your first mistake, and it is usually not suggested to use floating point data types for currency for this reason.

They are perfectly fine for estimated values though, where you only need to be in the ballpark..

edit: For float you can use FLT_EPSILON btw. And for those of you who do not know, epsilon is just a one word way of saying "small value."

~ArkPhaze

RE: Floating Point Trouble! - Deque - 05-22-2013

That's a good write up and indeed a topic most people struggle with.
We had a challenge about that, so that people can learn more about it by writing their own converter: http://www.hackcommunity.com/Thread-Contest-floating-point-converter
I also know for a few days (you know I am learning C# atm) that C# has a decimal data type which is exactly for the cases where you need precision. That's quite nice.

For your thread: You can consider adding a tutorial tag. I know it is not a tutorial, but now it looks like your thread was a question or a problem you have which is misleading.

Edit: Interesting facts for the programmers who still think this is nothing serious:
Taken from: http://introcs.cs.princeton.edu/java/91float/

Quote:Real-world numerical catastrophes.
Ariane 5 rocket. Ariane 5 rocket exploded 40 seconds after being launched by European Space Agency. Maiden voyage after a decade and 7 billion dollars of research and development. Sensor reported acceleration that so was large that it caused an overflow in the part of the program responsible for recalibrating inertial guidance. 64-bit floating point number was converted to a 16-bit signed integer, but the number was larger than 32,767 and the conversion failed. Unanticipated overflow was caught by a general systems diagnostic and dumped debugging data into an area of memory used for guiding the rocket's motors. Control was switched to a backup computer, but this had the same data. This resulted in a drastic attempt to correct the nonexistent problem, which separated the motors from their mountings, leading to the end of Ariane 5.

Patriot missile accident. On February 25, 1991 an American Patriot missile failed to track and destroy an Iraqi Scud missile. Instead it hit an Army barracks, killing 26 people. The cause was later determined to be an inaccurate calculate caused by measuring time in tenth of a second. Couldn't represent 1/10 exactly since used 24 bit floating point. Software to fix problem arrived in Dhahran on February 26. Here is more information.

Intel FDIV Bug Error in Pentium hardwire floating point divide circuit. Discovered by Intel in July 1994, rediscovered and publicized by math professor in September 1994. Intel recall in December 1994 cost $300 million. Another floating point bug discovered in 1997.

Sinking of Sleipner oil rig. Sleipner A $700 million platform for producing oil and gas sprang a leak and sank in North Sea in August, 1991. Error in inaccurate finite element approximation underestimate shear stress by 47% Reference.

Vancouver stock exchange. Vancouver stock exchange index was undervalued by over 50% after 22 months of accumulated roundoff error. The obvious algorithm is to add up all the stock prices after Instead a "clever" analyst decided it would be more efficient to recompute the index by adding the net change of a stock after each trade. This computation was done using four decimal places and truncating (not rounding) the result to three.

Our professor told us about these catastrophies in the very first semester to make us pay attention as we are the people who will do or not do such mistakes and maybe have to take responsibility for such catastrophies if we don't do it right.

RE: Floating Point Trouble! - ArkPhaze - 05-22-2013

Quote:I also know for a few days (you know I am learning C# atm) that C# has a decimal data type which is exactly for the cases where you need precision. That's quite nice.

It is the solution to what you should use for currency instead of a datatype like double. You will see lots of people using double though... I even helped someone with a program that dealt with financial management for his/her course, and the instructor told them to use System.Double.

I can add the Tutorial tag now... I usually don't use them because I'm not used to them being available.

I think it's existence without any real warranty on it's use lol, causes problems like that. It's from people that think they know what they are doing, because they expect things to work and to be simple.

I could have written this a bit better though. I was just fooling around in my IDE, and being promoted as a secondary leader of the dev group I thought I should try to contribute something. It's a concept that I have in the back of my mind all the time though when programming for accuracy in calculations. It wasn't for a while though, and it took one project, a big headache, and some research of my own to find out what was happening. I haven't forgotten it since, because experiences are the best way of learning...

edit:
I guess I do show a proper way of comparing floating point datatypes, so I'll consider it a tutorial/informal thread. Originally I just posted this with the intention of handing off information to others that don't fully understand floating point datatypes.

edit2: And somehow I missed that challenge, so maybe I'll try it later. Smile

RE: Floating Point Trouble! [Information] - Psycho_Coder - 05-22-2013

Very informative post sir. I have a doubt, see the following :-

Code:
double d = 0.0;

for (int i = 0 ;; i++)

{

    d += 0.1;

    printf("%f\n", d);

    getchar();

}

In printf the format specifier you used is %f , shouldn't this be %lf which is in case of C (Since this is a C code). In the second code too you have used %.16f instead of %.16lf, since you are using double datatype so %lf should be used.

RE: Floating Point Trouble! [Information] - ArkPhaze - 05-22-2013

(05-22-2013, 05:58 AM)Psycho_Coder Wrote: Very informative post sir. I have a doubt, see the following :-

Code:
double d = 0.0; for (int i = 0 ;; i++) { d += 0.1; printf("%f\n", d); getchar(); }

In printf the format specifier you used is %f , shouldn't this be %lf which is in case of C (Since this is a C code). In the second code too you have used %.16f instead of %.16lf, since you are using double datatype so %lf should be used.

I read that %lf is for long double based on the C99 standard. There seems to be confusion depending on what you read out there between C89 and C99..

Source:

Quote:ISO/IEC 9899, second edition (the C99 Standard), sections 7.19.6.1 and 7.19.6.2.

RE: Floating Point Trouble! [Information] - Psycho_Coder - 05-22-2013

(05-22-2013, 06:14 AM)ArkPhaze Wrote:
(05-22-2013, 05:58 AM)Psycho_Coder Wrote: Very informative post sir. I have a doubt, see the following :-

Code:
double d = 0.0; for (int i = 0 ;; i++) { d += 0.1; printf("%f\n", d); getchar(); }

In printf the format specifier you used is %f , shouldn't this be %lf which is in case of C (Since this is a C code). In the second code too you have used %.16f instead of %.16lf, since you are using double datatype so %lf should be used.

I read that %lf is for long double, you could be right though. There seems to be confusion depending on what you read out there between C89 and C99..

Well %f is for float, and %lf is for double and %Lf for long double, I knew this. Here have a look :- http://www.ethernut.de/nutwiki/Output_Format_Specifiers

When we have to use scanf then for double we have to give %f and according to C99 %Lf is used for long double. Here :- http://www.cplusplus.com/reference/cstdio/printf/

RE: Floating Point Trouble! [Information] - ArkPhaze - 05-22-2013

Actually, by the C99 standard, %lf is undefined... The l length modifier only applies to the following conversion specifiers: d, i, o, u, x, or X.

l (ell):

Quote:Specifies that a following d, i, o, u, x,or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g,or G conversion specifier.

Edit: Further:

Quote:If a length modifier appears with any conversion specifier other than as specified above, the behavior is undefined

You don't use %lf...

[Image: aJ4dm5u.png]

edit: That bit of text also verifies that rounding does happen which is why I had to loosen up on the precision for the number of decimal places. Confused

moke:

in C11:

Quote:%lf conversion specifier allowed in printf

Although the official meaning to "l" hasn't changed, and %f still means double.

RE: Floating Point Trouble! [Information] - Psycho_Coder - 05-22-2013

In most of the sites I have read that %lf if specifically to be used for double

Code:
double radius;

    int length;  

    scanf("%lf%d", &radius, &length); 

    scanf("%lf%d", &radius, &length); 

    printf("%lf  %d", radius, length);

RE: Floating Point Trouble! [Information] - ArkPhaze - 05-22-2013

(05-22-2013, 07:00 AM)Psycho_Coder Wrote: In most of the sites I have read that %lf if specifically to be used for double

Code:
double radius; int length; scanf("%lf%d", &radius, &length); scanf("%lf%d", &radius, &length); printf("%lf %d", radius, length);

Regardless of what these sites say, I'm taking my source directly from the C99 and C11 specification. Only in C11 it says that %lf is allowed, yet that doesn't mean it has changed. Looking at the meaning for the l length modifier, has not changed since C99, and %f still means double.

@"Psycho_Coder"

Quote:When we have to use scanf then for double we have to give %f and according to C99 %Lf is used for long double. Here :- http://www.cplusplus.com/reference/cstdio/printf/

I'm also not using a long double, I'm only using a double.

RE: Floating Point Trouble! [Information] - Deque - 05-22-2013

ArkPhaze is correct. printf doesn't take any float so it doesn't have a format specifier. If you give a float to printf it is converted to double. So you use %f for double.

The internet is a bitch when it comes to finding correct information and examples. It is always adviceable to look into language specifications.

Edit: Download C99 here: http://www.open-std.org/jtc1/sc22/wg14/www/standards.html