Rounding to 13 digits

A while back I was porting the PRNG (Pseudo Random Number Generator) from the video game Balatro into Rust to search for weird seeds. Along the way, I had to translate this line of Lua from the game's source:

tonumber(string.format("%.13f", ...))

Formatting to a string and parsing it back just to round a number feels wrong. Even if it weren't, it's not obvious that Rust's string formatting is guaranteed to match LuaJIT's exactly. I needed a faster equivalent, and quickly rounding a double to 13 decimal places seems like it should be simple, but floating-point math rarely is.

One constraint worth noting upfront: the values being rounded are in the range [0, 1], so we only need to handle non-negative inputs — a restriction that matters for some of the approaches below.

The naive approach

The intuitive solution is to scale the number up, round it to the nearest integer, and scale it back down.

fn round13_naive(x: f64) -> f64 {
    (x * 1e13).round() / 1e13
}

To verify this against the game's logic, I used LuaJIT's FFI to monkey test the Rust implementation against the string.format behavior.

local ffi = require("ffi")

ffi.cdef("double round13_naive(double x)")
local lib = ffi.load("./libround13.so")

while true do
    local x = math.random()
    local output = lib.round13_naive(x)
    local expected = tonumber(string.format("%.13f", x))

    if output ~= expected then
        -- %.17f because that guarantees an exact round-trip
        print(string.format("x:        %.17f", x))
        print(string.format("output:   %.17f", output))
        print(string.format("expected: %.17f", expected))
        print("-----------------------------")
    end
end

And very quickly it found a lot of failures:

x:        0.16354471362765000
output:   0.16354471362770001
expected: 0.16354471362760001
-----------------------------
x:        0.89032982907944991
output:   0.89032982907949998
expected: 0.89032982907939995
-----------------------------
x:        0.91479517807684996
output:   0.91479517807690003
expected: 0.91479517807680000
-----------------------------
...

Let's look at the first failing case: x = 0.16354471362765000. If we print the exact representation of these numbers, the problem becomes obvious:

>>> x = 0.16354471362765000
>>> "%.50f" % x
'0.16354471362764999575745150650618597865104675292969'
>>> "%.50f" % (x * 1e13)
'1635447136276.50000000000000000000000000000000000000000000000000'

In memory, x is slightly less than ...65. Lua's string.format correctly looks at this exact value, sees ...64999..., and rounds down to ...6.

But our naive Rust code relies on x * 1e13. Because floating-point multiplication isn't perfectly precise, the result has to be rounded to the nearest representable f64. This implicitly rounds the intermediate value up to a perfect .5, causing our explicit .round() to push the final result up to ...7.

More precision

The naive approach failed because x * 1e13 rounded the intermediate result. The straightforward fix is to just use more precision:

fn round13_f128(x: f64) -> f64 {
    ((x as f128 * 1e13).round() / 1e13) as f64
}

This is the same multiply-round-divide as the naive approach, but in f128. With 112 bits of significand instead of 52, the intermediate rounding error is far too small to affect the final result for our inputs. The monkey test ran for 30 minutes without finding a single mismatch.

It's a brute-force solution, though. f128 arithmetic is emulated in software on most platforms, making it slow and not SIMD-friendly. There's usually a faster way to get the same result by recovering the lost precision rather than avoiding it entirely.

Fused Multiply-Add

Fused multiply-add (FMA) computes x * a + b with a single rounding at the very end — internally, the full exact product is computed before adding b. In Rust, this is x.mul_add(a, b).

FMA doesn't seem related to rounding at first, but the key insight is that we can use it to recover the precision lost by x * 1e13. The idea is to compute a tentative floor(x * 1e13), then use FMA to find the true remainder — the exact distance between that floor and the true (unrounded) product.

fn round13_fma(x: f64) -> f64 {
    let y = (x * 1e13).floor();
    let r = x.mul_add(1e13, -y);
    if r >= 0.5 {
        (y + 1.0) / 1e13
    } else {
        y / 1e13
    }
}

The key line is the FMA: x.mul_add(1e13, -y) computes x * 1e13 - y with the full exact product before rounding. Since y is the floor of x * 1e13 and can only underestimate the true product*, r is the fractional part of the true product. This fractional part lies in [0, 1), so it can be represented with much better relative precision than the full product, precise enough to reliably compare against 0.5 to decide the rounding direction.

Since r lies in [0, 1), r.round() gives 1.0 when r >= 0.5 and 0.0 otherwise, so we can simplify:

 fn round13_fma(x: f64) -> f64 {
     let y = (x * 1e13).floor();
     let r = x.mul_add(1e13, -y);
-    if r >= 0.5 {
-        (y + 1.0) / 1e13
-    } else {
-        y / 1e13
-    }
+    (y + r.round()) / 1e13
 }

Like the f128 approach, this passed 30 minutes of monkey testing without a single mismatch.

*Strictly speaking, x * 1e13 may round up before the floor, making y one too high. Then r is slightly negative (at most 0.5 ULP) which rounds to 0, so y + 0 gives the same result as (y - 1) + 1 would have.

Magic Rounding

The FMA approach works, but it still needs a floor, a separate round, and an addition. We can do better by exploiting a property of IEEE 754 doubles.

Recall that a double is essentially a sign, an exponent, and a 52-bit significand. When the exponent places a value in the range [2^52, 2^53), all 52 significand bits are used to represent the integer part, leaving no bits for a fractional component. This means every representable f64 in this range is an integer, and consecutive values differ by exactly 1.

So when you add 2^52 to a positive value less than 2^52, the hardware has to round the result to fit into this integer-only range, which rounds it to the nearest integer for free. Subtracting 2^52 back recovers the rounded value. For example, (6.7 + 2^52) - 2^52 gives 7.0 — the addition forces the .7 to be rounded away, and the subtraction shifts back down.

By combining this with FMA, we get the best of both worlds: x.mul_add(1e13, MAGIC) computes the exact product x * 1e13 at full precision and then adds MAGIC with a single rounding, one that happens in the range where consecutive doubles are 1 apart, so it rounds to the nearest integer. Subtracting MAGIC and dividing back down gives us the result in one clean expression:

fn round13_magic(x: f64) -> f64 {
    const MAGIC: f64 = (1u64 << 52) as f64; // 2^52
    (x.mul_add(1e13, MAGIC) - MAGIC) / 1e13
}

This is just an FMA, a subtract, and a divide — no branches, no floor, no round. Note that this only works for non-negative inputs where x * 1e13 < 2^52, but our inputs are in [0, 1], well within this range. Like the previous approaches, it passed 30 minutes of monkey testing without a mismatch.

Benchmarks

There are two axes to measure here: throughput and latency.

Throughput is how many values you can process per second given a batch of independent inputs — what matters when rounding millions of values in a loop.
Latency is how long a single call takes from input to output — what matters when each call depends on the previous one's result.

All benchmarks were compiled with -C target-cpu=native to ensure hardware FMA and rounding instructions are used. Without this, Rust emits libc function calls for round, floor, and mul_add, which are dramatically slower. Results below are from an i7-12700K.

Throughput is with auto-vectorization (AVX2, 4 doubles per instruction):

Approach Throughput Latency
tostring 13.3 million op/s 75 ns
Naive 2.35 billion op/s 6.7 ns
f128 17.0 million op/s 64 ns
FMA 2.34 billion op/s 10.0 ns
Magic 2.35 billion op/s 4.4 ns

The tostring approach is the slowest overall — string formatting and parsing dominate. The f128 approach is similarly slow since f128 arithmetic is emulated in software with no SIMD support. The other three have identical throughput because every approach ends with a division by 1e13, and vdivpd (packed double division) is the bottleneck that hides all other work.

For latency, the magic approach has the shortest dependency chain — just an FMA, a subtract, and a divide. The naive approach is longer because Rust's round() (round half away from zero) compiles to multiple instructions. The fma approach is longest, needing a floor, an FMA remainder, and a round before the final divide.