![]() |
Floating Point Trouble! [Information] - Printable Version +- Sinisterly (https://sinister.ly) +-- Forum: Coding (https://sinister.ly/Forum-Coding) +--- Forum: C, C++, & Obj-C (https://sinister.ly/Forum-C-C-Obj-C) +--- Thread: Floating Point Trouble! [Information] (/Thread-Floating-Point-Trouble-Information) Pages:
1
2
|
Floating Point Trouble! [Information] - ArkPhaze - 05-22-2013 Comparing Floating Point Integers
Alright, I am here to make this a short note, because there's too much to write about this quite honestly, and I don't have the time or patience to explain all of the details. The truth is that floating point datatypes are not as simple as people seem to think; floating point math is not easy! Most, think they understand their datatypes, yet I can guarantee that most (also) don't know as much as they think. Here are some basic rules though, that if followed should keep you on the right path, whether they are obvious to you or not:
Let's go through a quick example: Code: const float epsilon = 0.1f; Okay nothing suspicious... Now let's try the same code a bit modified for (a little) more precision: Code: const float epsilon = 0.1f; (The previous values seen were actually being rounded.) It's a little better break-pointing though, so here's some other code I had: ![]() NOTE: I pinned the values of d1 and d2 after the breakpoint was hit, and as you can see they are not the exact same; different. Also shown, is the suggested method of comparing floating point values for equality, however I modified that code even further to compare the traditional == operator, against this epsilon method, here: ![]() And the results: ![]() Surprising? Or maybe not because we know what those values really are and can see that they are not equal? :whistle: Assuming you didn't know the exact value, perhaps your initial guess would have been different though. Commonly, we know that there are 4 bytes (32 bits) in a regular int (which is usually short for a 32 bit integer), short--which is short for a 16 bit integer, or 2 bytes in memory, and 64 bit integers, or long, which are 8 bytes in memory. It's not difficult to tell what the values represented as a byte array would be for these integral datatypes... What about floating point datatypes? They are not meant to be truly accurate, although there is the whole number portion, and the decimal portion, unlike integers. I wrote a bit of code in C for checking the bytes of a double: Code: // 0xE2 0x06 0x7C 0x7E 0x98 0x51 0x7B 0x40 NOTE: The same can be modified to view the bytes of some other datatypes as well. The 32 bit floating point datatype consists of 1 bit reserved for the sign, 8 bits reserved for the exponent, and 23 bits reserved for what is known as the "mantissa" field. This type is usually associated with the float keyword in a most languages. For a more in depth analysis of this data type, you can look into the IEEE 754-1985 standard. Just as a mention, this is not a bug either, but rather just the specification behind the way they are stored in memory. The error is a result of trying to "fit" the correct size in bits and bytes, which causes minor modification, and thus slightly "off" data. You will find the same issue in other languages as well because it's purely conceptual with the datatype itself and no other factor. If you are now discouraged from using floating point datatypes at all because you consider them "unreliable," let's say this, as I mentioned in the beginning of this post: If you intend to use floating point anything, as a means for calculating precise values, that would be your first mistake, and it is usually not suggested to use floating point data types for currency for this reason. They are perfectly fine for estimated values though, where you only need to be in the ballpark.. edit: For float you can use FLT_EPSILON btw. And for those of you who do not know, epsilon is just a one word way of saying "small value." ~ArkPhaze RE: Floating Point Trouble! - Deque - 05-22-2013 That's a good write up and indeed a topic most people struggle with. We had a challenge about that, so that people can learn more about it by writing their own converter: http://www.hackcommunity.com/Thread-Contest-floating-point-converter I also know for a few days (you know I am learning C# atm) that C# has a decimal data type which is exactly for the cases where you need precision. That's quite nice. For your thread: You can consider adding a tutorial tag. I know it is not a tutorial, but now it looks like your thread was a question or a problem you have which is misleading. Edit: Interesting facts for the programmers who still think this is nothing serious: Taken from: http://introcs.cs.princeton.edu/java/91float/ Quote:Real-world numerical catastrophes. Our professor told us about these catastrophies in the very first semester to make us pay attention as we are the people who will do or not do such mistakes and maybe have to take responsibility for such catastrophies if we don't do it right. RE: Floating Point Trouble! - ArkPhaze - 05-22-2013 Quote:I also know for a few days (you know I am learning C# atm) that C# has a decimal data type which is exactly for the cases where you need precision. That's quite nice. It is the solution to what you should use for currency instead of a datatype like double. You will see lots of people using double though... I even helped someone with a program that dealt with financial management for his/her course, and the instructor told them to use System.Double. I can add the Tutorial tag now... I usually don't use them because I'm not used to them being available. I think it's existence without any real warranty on it's use lol, causes problems like that. It's from people that think they know what they are doing, because they expect things to work and to be simple. I could have written this a bit better though. I was just fooling around in my IDE, and being promoted as a secondary leader of the dev group I thought I should try to contribute something. It's a concept that I have in the back of my mind all the time though when programming for accuracy in calculations. It wasn't for a while though, and it took one project, a big headache, and some research of my own to find out what was happening. I haven't forgotten it since, because experiences are the best way of learning... edit: I guess I do show a proper way of comparing floating point datatypes, so I'll consider it a tutorial/informal thread. Originally I just posted this with the intention of handing off information to others that don't fully understand floating point datatypes. edit2: And somehow I missed that challenge, so maybe I'll try it later. ![]() RE: Floating Point Trouble! [Information] - Psycho_Coder - 05-22-2013 Very informative post sir. I have a doubt, see the following :- Code: double d = 0.0; In printf the format specifier you used is %f , shouldn't this be %lf which is in case of C (Since this is a C code). In the second code too you have used %.16f instead of %.16lf, since you are using double datatype so %lf should be used. RE: Floating Point Trouble! [Information] - ArkPhaze - 05-22-2013 (05-22-2013, 05:58 AM)Psycho_Coder Wrote: Very informative post sir. I have a doubt, see the following :- I read that %lf is for long double based on the C99 standard. There seems to be confusion depending on what you read out there between C89 and C99.. Source: Quote:ISO/IEC 9899, second edition (the C99 Standard), sections 7.19.6.1 and 7.19.6.2. RE: Floating Point Trouble! [Information] - Psycho_Coder - 05-22-2013 (05-22-2013, 06:14 AM)ArkPhaze Wrote:(05-22-2013, 05:58 AM)Psycho_Coder Wrote: Very informative post sir. I have a doubt, see the following :- Well %f is for float, and %lf is for double and %Lf for long double, I knew this. Here have a look :- http://www.ethernut.de/nutwiki/Output_Format_Specifiers When we have to use scanf then for double we have to give %f and according to C99 %Lf is used for long double. Here :- http://www.cplusplus.com/reference/cstdio/printf/ RE: Floating Point Trouble! [Information] - ArkPhaze - 05-22-2013 Actually, by the C99 standard, %lf is undefined... The l length modifier only applies to the following conversion specifiers: d, i, o, u, x, or X. l (ell): Quote:Specifies that a following d, i, o, u, x,or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g,or G conversion specifier. Edit: Further: Quote:If a length modifier appears with any conversion specifier other than as specified above, the behavior is undefined You don't use %lf... ![]() edit: That bit of text also verifies that rounding does happen which is why I had to loosen up on the precision for the number of decimal places. ![]() in C11: Quote:%lf conversion specifier allowed in printf Although the official meaning to "l" hasn't changed, and %f still means double. RE: Floating Point Trouble! [Information] - Psycho_Coder - 05-22-2013 In most of the sites I have read that %lf if specifically to be used for double Code: double radius; RE: Floating Point Trouble! [Information] - ArkPhaze - 05-22-2013 (05-22-2013, 07:00 AM)Psycho_Coder Wrote: In most of the sites I have read that %lf if specifically to be used for double Regardless of what these sites say, I'm taking my source directly from the C99 and C11 specification. Only in C11 it says that %lf is allowed, yet that doesn't mean it has changed. Looking at the meaning for the l length modifier, has not changed since C99, and %f still means double. @"Psycho_Coder" Quote:When we have to use scanf then for double we have to give %f and according to C99 %Lf is used for long double. Here :- http://www.cplusplus.com/reference/cstdio/printf/ I'm also not using a long double, I'm only using a double. RE: Floating Point Trouble! [Information] - Deque - 05-22-2013 ArkPhaze is correct. printf doesn't take any float so it doesn't have a format specifier. If you give a float to printf it is converted to double. So you use %f for double. The internet is a bitch when it comes to finding correct information and examples. It is always adviceable to look into language specifications. Edit: Download C99 here: http://www.open-std.org/jtc1/sc22/wg14/www/standards.html |