systemhalted by Palak Mathur

IEEE 754 Doubles - The Numbers That Lie With A Straight Face

Share on:

In Java, double feels like a real number. You write 1.0, the compiler nods, the program runs, and everything looks fine. Until it does not.

Take this tiny example:

List values = List.of(
1e16, 1.0, 1.0, 1.0, 1.0
);

double s1 = values.stream().reduce(0.0, Double::sum);
double s2 = values.parallelStream().reduce(0.0, Double::sum);

System.out.println(s1);
System.out.println(s2);

You might see

1.0E16
1.0000000000000004E16

Same data, same operation, different result.

To understand why, you have to stop thinking of doubles as numbers and start thinking of them as compressed approximations of numbers with strict rules and sharp edges.

Doubles are just bit patterns

An IEEE 754 double is 64 bits laid out like this: 1. One bit for the sign 2. Eleven bits for the exponent 3. Fifty two bits for the fraction (often called the mantissa or significand)

That layout represents numbers of the form:

(-1)^sign × 1.fraction_bits × 2^(exponent - bias)

There is no infinite continuum here. There is a huge but finite set of exactly representable values. Between any two nearby doubles, there is literally nothing.

So when you write:

double x = 0.1;

Java does its best to find the closest representable double to 0.1. But 0.1 in base 10 is a repeating fraction in base 2, just like 1/3 is repeating in base 10. There is no exact binary representation. The runtime rounds to the nearest representable double and moves on.

If you print it with enough digits, you see the approximation leak through:

System.out.printf(%.20f%n, 0.1);

You will get something like:

0.10000000000000000555

The lie is small, but it is always there.

Why 1e16 + 1 == 1e16

Back to our friend 1e16.

At that scale, the distance between adjacent representable doubles is larger than 1. Think of the number line as a ladder. Near zero the rungs are very close. Near 1e16 the rungs are far apart. If adding 1 does not reach the next rung, the result rounds back to the same double.

In practice:

double a = 1e16;
double b = a + 1.0;

System.out.println(a == b); // true

It looks absurd, but it is perfectly legal in IEEE land. You are not adding real numbers. You are adding approximations and rounding the result back into the finite set of doubles.

Now imagine how this interacts with summation order.

Sequential stream:

double s1 = values.stream().reduce(0.0, Double::sum);

Evaluation order is effectively:

((((0 + 1e16) + 1) + 1) + 1) + 1

After the first addition you have 1e16. Each subsequent + 1 is below the resolution of the ladder at that height, so it keeps snapping back to 1e16. End result: 1.0E16.

Parallel stream:

double s2 = values.parallelStream().reduce(0.0, Double::sum);

Now the framework is allowed to regroup: 1. One thread might sum 1e16 + 1.0 to get 1e16. 2. Another might sum 1.0 + 1.0 + 1.0 to get 3.0. 3. Then it adds 1e16 + 3.0.

Depending on the exact rounding behavior, that last addition might actually hit the next rung on the ladder and give:

1.0000000000000004E16

Same math on paper. Different rounding path in silicon.

You have just met the most important fact about floating point arithmetic:

Associativity is broken in practice.

Mathematically:

(a + b) + c == a + (b + c)

In floating point, (a + b) + c and a + (b + c) can differ by a few bits.

Equality with doubles is a trap

This is why direct equality checks with doubles are dangerous:

double x = 0.1 + 0.2;
double y = 0.3;

System.out.println(x == y); // often false

The left side is “nearest double to (0.1 + 0.2) after two rounding steps”. The right side is “nearest double to 0.3 after one rounding step”.

You are really comparing two approximations that reached the neighborhood of 0.3 via different routes. The neighborhood is small, but the routes do not always end at the same exact bit pattern.

The usual advice is: 1. Compare with an epsilon, for example: Math.abs(x - y) < 1e-9, tuned to your domain. 2. Or avoid equality checks altogether and reason in ranges, ratios or integers when possible.

Why parallel reduce cares about associativity

The Java Streams API assumes your reduction operator is associative and has a proper identity. For doubles, Double::sum with 0.0 definitely satisfies the mathematical definition.

The parallel stream uses this to split the work: 1. Compute partial sums in parallel. 2. Combine partials in arbitrary order.

From basic math, this is fine.

From IEEE 754 reality, it means you get “approximately the same result most of the time, with possible tiny differences depending on grouping, thread scheduling and platform”.

In most business code that is acceptable. In some domains it is not.

If you need reproducible sums independent of order, you have options.

Use BigDecimal:

BigDecimal sum = values.stream()
.map(BigDecimal::valueOf)
.reduce(BigDecimal.ZERO, BigDecimal::add);

This gives you honest base 10 arithmetic at the cost of performance.

Or implement a compensated summation like Kahan’s algorithm, possibly in a custom Collector, to reduce error accumulation.

Or structure your algorithm so you sum small magnitude values first and large values later. That makes the ladder problem less vicious.

None of these fix floating point. They just manage its tradeoffs more consciously.

Doubles are not broken, they are Engineered

It is tempting to call all this a bug. It is not.

The IEEE 754 design is a compromise between range, precision, performance and hardware simplicity. Doubles give you:

  1. A huge dynamic range, from tiny numbers around 1e-308 up to around 1e308.
  2. About 15 to 17 decimal digits of precision.
  3. Fast operations supported directly by the CPU.

The cost is:

  1. Not all decimal fractions are exact.
  2. Rounding happens all the time.
  3. Algebraic laws like associativity and distributivity become “mostly true, but not guaranteed”.

Once you internalize that, your mental model shifts.

You stop thinking “the computer is bad at math”.

You start thinking “the computer is doing carefully specified approximate math on a discrete set of representable values, and I need to respect that contract”.

Where this matters in real systems

In an auto loan system, this can show up in subtle ways.

You might compute a customer’s total interest by:

  1. Summing daily interest over the life of the loan.
  2. Summing per period interest.
  3. Using a closed form formula.

All three approaches should be equivalent mathematically, but can drift by a few cents because of rounding and summation order. Then you run in parallel, or refactor a loop into a stream, or change the order in which fees and interest are applied, and suddenly some accounts are off by a cent.

Nothing catastrophic, but enough to fail reconciliation tests.

Once you remember that doubles are approximations and order matters, you design differently:

  1. Use BigDecimal for money.
  2. Keep calculations stable and deterministic.
  3. Avoid relying on bit perfect equality when the underlying math is continuous.

And when your parallelStream() sum differs from your sequential sum in the thirteenth decimal place, you do not panic. You smile a little and think:

Floating point kept its side of the bargain. I just finally started reading the fine print.

Computer Science   Programming   Software Engineering   Technology