Tuesday

Floating-Point Constants - Advantages and Disadvantages

When you write a floating-point constant in a program, in which floating-point type does the program store it? By default, floating-point constants such as 8.24 and 2.4E8 are type double. If you want a constant to be type float, use an f or F suffix. For type long double, use an l or L suffix.

1.234f // a float constant
2.45E20F // a float constant
2.345324E28 // a double constant
2.2L // a long double constant

Floating-Point Advantages and Disadvantages
Floating-point numbers have two advantages over integers. First, they can represent values between integers. Second, because of the scaling factor, they can represent a much greater range of values. On the other hand, floating-point operations are slower than integer operations, at least on computers without math coprocessors, and you can lose precision. Listing 3.8 illustrates the last point.

Listing 3.8 fltadd.cpp
// fltadd.cpp -- precision problems with float
#include <iostream>
using namespace std;
int main()
{
 float a = 2.34E+22f;
 float b = a + 1.0f;
 
 cout << "a = " << a << "\n";
 cout << "b - a = " << b - a << "\n";
 return 0;
}

Compatibility Note
Some ancient C++ implementations based on pre-ANSI C compilers don't support the f suffix for floating-point constants. If you find yourself facing this problem, you can replace 2.34E+22f with 2.34E+22 and replace 1.0f with (float) 1.0.

The program takes a number, adds 1, and then subtracts the original number. That should result in a value of 1. Does it? Here is the output for one system:

a = 2.34e+022
b - a = 0

The problem is that 2.34E+22 represents a number with 23 digits to the left of the decimal place. By adding 1, you are attempting to add 1 to the 23rd digit in that number. But type float only can represent the first 6 or 7 digits in a number, so trying to change the 23rd digit has no effect on the value.

Classifying the Types
C++ brings some order to its basic types by classifying them into families. Types signed char, short, int, and long are termed signed integer types. The unsigned versions are termed unsigned integer types. The bool, char, wchar_t, signed integer, and unsigned integer types together are termed integral types or integer types. The float, double, and long double are termed floating-point types. Integer and
floating-point types collectively are termed arithmetic types.

C++ Arithmetic Operators
Perhaps you have warm memories of doing arithmetic drills in grade school. You can give that same pleasure to your computer. C++ uses operators to do arithmetic. It provides operators for five basic arithmetic calculations: addition, subtraction, multiplication, division, and taking the modulus. Each of these operators uses two values (called operands) to calculate a final answer. Together, the operator and its operands constitute an expression. For example, consider the following statement:

int wheels = 4 + 2;

The values 4 and 2 are operands, the + symbol is the addition operator, and 4 + 2 is an expression whose value is 6.

Here are C++'s five basic arithmetic operators:
  • The + operator adds its operands. For example, 4 + 20 evaluates to 24.
  • The - operator subtracts the second operand from the first. For example, 12 - 3 evaluates to 9.
  • The * operator multiplies its operands. For example, 28 * 4 evaluates to 112.
  • The / operator divides its first operand by the second. For example, 1000 / 5 evaluates to 200. If both operands are integers, the result is the integer portion of the quotient. For example, 17 / 3 is 5, with the fractional part discarded.
  • The % operator finds the modulus of its first operand with respect to the second. That is, it produces the remainder of dividing the first by the second. For example, 19 % 6 is 1, because 6 goes into 19 three times with a remainder of 1. Both operands must be integer types. If one of the operands is negative, the sign of the result depends on the implementation.

Of course, you can use variables as well as constants for operands. Listing 3.9 does just that. Because the % operator works only with integers, we'll leave it for a later example.

Listing 3.9 arith.cpp
// arith.cpp -- some C++ arithmetic
#include <iostream>
using namespace std;
int main()
{
 float hats, heads;
 
 cout.setf(ios_base::fixed, ios_base::floatfield); // fixed-point
 cout << "Enter a number: ";
 cin >> hats;
 cout << "Enter another number: ";
 cin >> heads;
 
 cout << "hats = " << hats << "; heads = " << heads << "\n";
 cout << "hats + heads = " << hats + heads << "\n";
 cout << "hats - heads = " << hats - heads << "\n";
 cout << "hats * heads = " << hats * heads << "\n";
 cout << "hats / heads = " << hats / heads << "\n";
 return 0;
}

Compatibility Note
If your compiler does not accept the ios_base forms in setf(), try using the older ios forms instead; that is, substitute ios::fixed for ios_base::fixed, etc.

Here's sample output. As you can see, you can trust C++ to do simple arithmetic:

Enter a number: 50.25
Enter another number: 11.17
hats = 50.250000; heads = 11.170000
hats + heads = 61.419998
hats - heads = 39.080002
hats * heads = 561.292480
hats / heads = 4.498657

Well, maybe you can't trust it completely. Adding 11.17 to 50.25 should yield 61.42, but the output reports 61.419998. This is not an arithmetic problem; it's a problem with the limited capacity of type float to represent significant figures.

Remember, C++ guarantees just six significant figures for float. If you round 61.419998 to six figures, you get 61.4200, which is the correct value to the guaranteed precision. The moral is that if you need greater accuracy, use double or long double.


This article is taken from