Added "Fast Inverse Root"

This commit is contained in:
Mark 2025-02-11 10:09:58 -08:00
parent 62ce3ddaa7
commit 6d4127f5a5
7 changed files with 748 additions and 0 deletions

View File

@ -0,0 +1,28 @@
#import "@local/handout:0.1.0": *
// Bonus:
// - Floats vs fixed point
// - Float density
// - Find non-floatable rational numbers
// - What if we use `n`-bit floats?
#show: doc => handout(
doc,
group: "Advanced 2",
title: [Fast Inverse Square Root],
by: "Mark",
)
#include "parts/00 int.typ"
#pagebreak()
#include "parts/01 float.typ"
#pagebreak()
#include "parts/02 approx.typ"
#pagebreak()
#include "parts/03 quake.typ"
#pagebreak()
#include "parts/04 bonus.typ"

View File

@ -0,0 +1,7 @@
[metadata]
title = "Fast Inverse Square Root"
[publish]
handout = true
solutions = true

View File

@ -0,0 +1,96 @@
#import "@local/handout:0.1.0": *
= Integers
#definition()
A _bit string_ is a string of binary digits. \
In this handout, we'll denote bit strings with the prefix `0b`. \
That is, $1010 =$ "one thousand and one," while $#text([`0b1001`]) = 2^3 + 2^0 = 9$
#v(2mm)
We will separate long bit strings with underscores for readability. \
Underscores have no meaning: $#text([`0b1111_0000`]) = #text([`0b11110000`])$.
#problem()
What is the value of the following bit strings, if we interpret them as integers in base 2?
- `0b0001_1010`
- `0b0110_0001`
#solution([
- $#text([`0b0001_1010`]) = 2 + 8 + 16 = 26$
- $#text([`0b0110_0001`]) = 1 + 32 + 64 = 95$
])
#v(1fr)
#definition()
We can interpret a bit string in any number of ways. \
One such interpretation is the _unsigned integer_, or `uint` for short. \
`uint`s allow us to represent positive (hence "unsigned") integers using 32-bit strings.
#v(2mm)
The value of a `uint` is simply its value as a binary number:
- $#text([`0b00000000_00000000_00000000_00000000`]) = 0$
- $#text([`0b00000000_00000000_00000000_00000011`]) = 3$
- $#text([`0b00000000_00000000_00000000_00100000`]) = 32$
- $#text([`0b00000000_00000000_00000000_10000010`]) = 130$
#problem()
What is the largest number we can represent with a 32-bit `uint`?
#solution([
$#text([`0b01111111_11111111_11111111_11111111`]) = 2^(31)$
])
#v(1fr)
#pagebreak()
#problem()
Find the value of each of the following 32-bit unsigned integers:
- `0b00000000_00000000_00000101_00111001`
- `0b00000000_00000000_00000001_00101100`
- `0b00000000_00000000_00000100_10110000`
#hint([The third conversion is easy---look carefully at the second.])
#solution([
- $#text([`0b00000000_00000000_00000101_00111001`]) = 1337$
- $#text([`0b00000000_00000000_00000001_00101100`]) = 300$
- $#text([`0b00000000_00000000_00000010_01011000`]) = 1200$
Notice that the third int is the second shifted left twice (i.e, multiplied by 4)
])
#v(1fr)
#definition()
In general, division of `uints` is nontrivial#footnote([One may use repeated subtraction, but that isn't efficient.]). \
Division by powers of two, however, is incredibly easy: \
To divide by two, all we need to do is shift the bits of our integer right.
#v(2mm)
For example, consider $#text[`0b0000_0110`] = 6$. \
If we insert a zero at the left end of this bit string and delete the digit at the right \
(thus "shifting" each bit right), we get `0b0000_0011`, which is 3. \
#v(2mm)
Of course, we loose the remainder when we left-shift an odd number: \
$9 div 2 = 4$, since `0b0000_1001` shifted right is `0b0000_0100`.
#problem()
Right shifts are denoted by the `>>` symbol: \
$#text[`00110`] #text[`>>`] n$ means "shift `0b0110` right $n$ times." \
Find the value of the following:
- $12 #text[`>>`] 1$
- $27 #text[`>>`] 3$
- $16 #text[`>>`] 8$
#solution[
- $12 #text[`>>`] 1 = 6$
- $27 #text[`>>`] 3 = 3$
- $16 #text[`>>`] 8 = 0$
]
#v(1fr)

View File

@ -0,0 +1,200 @@
#import "@local/handout:0.1.0": *
#import "@preview/cetz:0.3.1"
= Floats
#definition()
_Binary decimals_#footnote["decimal" is a misnomer, but that's ok.] are very similar to base-10 decimals. \
In base 10, we interpret place value as follows:
- $0.1 = 10^(-1)$
- $0.03 = 3 times 10^(-2)$
- $0.0208 = 2 times 10^(-2) + 8 times 10^(-4)$
#v(5mm)
We can do the same in base 2:
- $#text([`0.1`]) = 2^(-1) = 0.5$
- $#text([`0.011`]) = 2^(-2) + 2^(-3) = 0.375$
- $#text([`101.01`]) = 5.125$
#v(5mm)
#problem()
Rewrite the following binary decimals in base 10: \
#note([You may leave your answer as a fraction.])
- `1011.101`
- `110.1101`
#v(1fr)
#pagebreak()
#definition()
Another way we can interpret a bit string is as a _signed floating-point decimal_, or a `float` for short. \
Floats represent a subset of the real numbers, and are interpreted as follows: \
#note([The following only applies to floats that consist of 32 bits. We won't encounter any others today.])
#align(
center,
box(
inset: 2mm,
cetz.canvas({
import cetz.draw: *
let chars = (
`0`,
`b`,
`0`,
`_`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`_`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`_`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`_`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
`0`,
)
let x = 0
for c in chars {
content((x, 0), c)
x += 0.25
}
let y = -0.4
line((0.3, y), (0.65, y))
content((0.45, y - 0.2), [s])
line((0.85, y), (2.9, y))
content((1.9, y - 0.2), [exponent])
line((3.10, y), (9.4, y))
content((6.3, y - 0.2), [fraction])
}),
),
)
- The first bit denotes the sign of the float's value
We'll label it $s$. \
If $s = #text([`1`])$, this float is negative; if $s = #text([`0`])$, it is positive.
- The next eight bits represent the _exponent_ of this float.
#note([(we'll see what that means soon)]) \
We'll call the value of this eight-bit binary integer $E$. \
Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits.)])
- The remaining 23 bits represent the _fraction_ of this float, which we'll call $F$. \
These 23 bits are interpreted as the fractional part of a binary decimal. \
For example, the bits `0b10100000_00000000_00000000` represents $0.5 + 0.125 = 0.625$.
#problem(label: "floata")
Consider `0b01000001_10101000_00000000_00000000`. \
Find the $s$, $E$, and $F$ we get if we interpret this bit string as a `float`. \
#note([Leave $F$ as a sum of powers of two.])
#solution([
$s = 0$ \
$E = 258$ \
$F = 2^31+2^19 = 2,621,440$
])
#v(1fr)
#definition(label: "floatdef")
The final value of a float with sign $s$, exponent $E$, and fraction $F$ is
$
(-1)^s times 2^(E - 127) times (1 + F / (2^(23)))
$
Notice that this is very similar to decimal scientific notation, which is written as
$
(-1)^s times 10^(e) times (f)
$
#problem()
Consider `0b01000001_10101000_00000000_00000000`. \
This is the same bit string we used in @floata. \
#v(2mm)
What value do we get if we interpret this bit string as a float? \
#hint([$21 div 16 = 1.3125$])
#solution([
This is 21:
$
2^(131) times (1 + (2^(21) + 2^(19)) / (2^(23)))
= 2^(4) times (1 + 0.25 + 0.0625)
= 16 times (1.3125)
= 21
$
])
#v(1fr)
#pagebreak()
#problem()
Encode $12.5$ as a float. \
#hint([$12.5 div 8 = 1.5625$])
#solution([
$
12.5
= 8 times 1.5625
= 2^(3) times (1 + (0.5 + 0.0625))
= 2^(130) times (1 + (2^(22) + 2^(19)) / (2^(23)))
$
which is `0b01000001_01001000_00000000_00000000`. \
])
#v(1fr)
#definition()
Say we have a bit string $x$. \
We'll let $x_f$ denote the value we get if we interpret $x$ as a float, \
and we'll let $x_i$ denote the value we get if we interpret $x$ an integer.
#problem()
Let $x = #text[`0b01000001_01001000_00000000_00000000`]$. \
What are $x_f$ and $x_i$? #note([As always, you may leave big numbers as powers of two.])
#solution([
$x_f = 12.5$
#v(2mm)
$x_i = 2^30 + 2^24 + 2^22 + 2^19 = 11,095,237,632$
])
#v(1fr)

View File

@ -0,0 +1,173 @@
#import "@local/handout:0.1.0": *
#import "@preview/cetz:0.3.1"
#import "@preview/cetz-plot:0.1.0": plot, chart
= Integers and Floats
#generic("Observation:")
For small values of $x$, $log_2(1 + x)$ is approximately equal to $x$. \
Note that this equality is exact for $x = 0$ and $x = 1$, since $log_2(1) = 0$ and $log_2(2) = 1$.
#v(5mm)
We'll add the _correction term_ $epsilon$ to our approximation: $log_2(1 + a) approx a + epsilon$. \
This allows us to improve the average error of our linear approximation:
#table(
stroke: none,
align: center,
columns: (1fr, 1fr),
inset: 5mm,
[$log(1+x)$ and $x + 0$]
+ cetz.canvas({
import cetz.draw: *
let f1(x) = calc.log(calc.abs(x + 1), base: 2)
let f2(x) = x
// Set-up a thin axis style
set-style(axes: (stroke: .5pt, tick: (stroke: .5pt)))
plot.plot(
size: (7, 7),
x-tick-step: 0.2,
y-tick-step: 0.2,
y-min: 0,
y-max: 1,
x-min: 0,
x-max: 1,
legend: none,
axis-style: "scientific-auto",
x-label: none,
y-label: none,
{
let domain = (0, 1)
plot.add(
f1,
domain: domain,
label: $log(1+x)$,
style: (stroke: ogrape),
)
plot.add(
f2,
domain: domain,
label: $x$,
style: (stroke: oblue),
)
},
)
})
+ [
Max error: 0.086 \
Average error: 0.0573
],
[$log(1+x)$ and $x + 0.045$]
+ cetz.canvas({
import cetz.draw: *
let f1(x) = calc.log(calc.abs(x + 1), base: 2)
let f2(x) = x + 0.0450466
// Set-up a thin axis style
set-style(axes: (stroke: .5pt, tick: (stroke: .5pt)))
plot.plot(
size: (7, 7),
x-tick-step: 0.2,
y-tick-step: 0.2,
y-min: 0,
y-max: 1,
x-min: 0,
x-max: 1,
legend: none,
axis-style: "scientific-auto",
x-label: none,
y-label: none,
{
let domain = (0, 1)
plot.add(
f1,
domain: domain,
label: $log(1+x)$,
style: (stroke: ogrape),
)
plot.add(
f2,
domain: domain,
label: $x$,
style: (stroke: oblue),
)
},
)
})
+ [
Max error: 0.041 \
Average error: 0.0254
],
)
A suitiable value of $epsilon$ can be found using calculus or with computational trial-and-error. \
We won't bother with this---we'll simply leave the correction term as an opaque constant $epsilon$.
#v(1fr)
#note(
type: "Note",
[
"Average error" above is simply the area of the region between the two graphs:
$
integral_0^1 abs( #v(1mm) log(1+x) - (x+epsilon) #v(1mm))
$
Feel free to ignore this note, it isn't a critical part of this handout.
],
)
#pagebreak()
#problem(label: "convert")
Use the fact that $log_2(1 + a) approx a + epsilon$ to approximate $log_2(x_f)$ in terms of $x_i$. \
Namely, show that
$
log_2(x_f) = (x_i) / (2^23) - 127 + epsilon
$
#note([
In other words, we're finding an expression for $x$ as a float
in terms of $x$ as an int.
])
#solution([
Let $E$ and $F$ be the exponent and float bits of $x_f$. \
We then have:
$
log_2(x_f)
&= log_2 ( 2^(E-127) times (1 + (F) / (2^23)) ) \
&= E - 127 + log_2(1 + F / (2^23)) \
& approx E-127 + F / (2^23) + epsilon \
&= 1 / (2^23)(2^23 E + F) - 127 + epsilon \
&= 1 / (2^23)(x_i) - 127 + epsilon
$
])
#v(1fr)
#problem()
Using basic log rules, rewrite $log_2(1 / sqrt(x))$ in terms of $log_2(x)$.
#solution([
$
log_2(1 / sqrt(x)) = (-1) / (2)log_2(x)
$
])
#v(1fr)

View File

@ -0,0 +1,208 @@
#import "@local/handout:0.1.0": *
= The Fast Inverse Square Root
The following code is present in _Quake III Arena_ (1999):
#v(5mm)
```c
float Q_rsqrt( float number ) {
long i = * ( long * ) &number;
i = 0x5f3759df - ( i >> 1 );
return * ( float * ) &i;
}
```
#v(5mm)
This code defines a function `Q_rsqrt` that consumes a float `number` and approximates its inverse square root.
If we rewrite this using notation we're familiar with, we get the following:
$
#text[`Q_sqrt`] (n_f) =
#h(5mm)
6240089 - (n_i div 2)
#h(5mm)
approx 1 / sqrt(n_f)
$
#note[
`0x5f3759df` is $6240089$ in hexadecimal. \
It is a magic number hard-coded into `Q_sqrt`.
]
#v(2mm)
Our goal in this section is to understand why this works:
- How does Quake approximate $1 / sqrt(x)$ by simply subtracting and dividing by two?
- What's special about $6240089$?
#v(1fr)
#remark()
For those that are interested, here are the details of the "code-to-math" translation:
- "`long i = * (long *) &number`" is C magic that tells the compiler \
to set `i` to the `uint` value of the bits of `number`. \
#note[
"long" refers to a "long integer", which has 32 bits. \
Normal `int`s have 16 bits, `short int`s have 8.
] \
In other words, `number` is $n_f$ and `i` is $n_i$.
#v(2mm)
- Notice the right-shift in the second line of the function. \
We translated `(i >> i)` into $(n_i div 2)$.
#v(2mm)
- "`return * (float *) &i`" is again C magic. \
Much like before, it tells us to return the value of the bits of `i` as a float.
#pagebreak()
#generic("Setup:")
We are now ready to show that $#text[`Q_sqrt`] (x) approx 1/sqrt(x)$. \
For convenience, let's call the bit string of the inverse square root $r$. \
In other words,
$
r_f := 1 / (sqrt(n_f))
$
This is the value we want to approximate.
#problem(label: "finala")
Find an approximation for $log_2(r_f)$ in terms of $n_i$ and $epsilon$ \
#note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + a)$.]
#solution[
$
log_2(r_f)
= log_2(1 / sqrt(n_f))
= (-1) / 2 log_2(n_f)
approx (-1) / 2 ( (n_i) / (2^23) + epsilon - 127 )
$
]
#v(1fr)
#problem(label: "finalb")
Let's call the "magic number" in the code above $kappa$, so that
$
#text[`Q_sqrt`] (n_f) = kappa - (n_i div 2)
$
Use @convert and @finala to show that $#text[`Q_sqrt`] (n_f) approx r_i$
#solution[
From @convert, we know that
$
log_2(r_f) approx (r_i) / (2^23) + epsilon - 127
$
Combining this with the result from @finala, we get:
$
(r_i) / (2^23) + epsilon - 127
&approx (-1) / (2) ( (n_i) / (2^23) + epsilon - 127) \
(r_i) / (2^23)
&approx (-1) / (2) ( (n_i) / (2^23)) + 3 / 2 (127 - epsilon) \
r_i
&approx (-1) / 2 (n_i) + 2^23 3 / 2(127 - epsilon)
= 2^23 3 / 2 (127 - epsilon) - (n_i) / 2
$
#v(2mm)
This is exactly what we need! If we set $kappa$ to $(3 times 2^22) (127-epsilon)$, then
$
r_i approx kappa - (n_i div 2) = #text[`Q_sqrt`] (n_f)
$
]
#v(1fr)
#problem(label: "finalc")
What is the exact value of $kappa$ in terms of $epsilon$? \
#hint[Look at @finalb. We already found it!]
#solution[
This problem makes sure our students see that
$kappa = (3 times 2^22) (127 - epsilon)$. \
See the solution to @finalb.
]
#v(2cm)
#pagebreak()
#remark()
In @finalc we saw that $kappa = (3 times 2^22) (127 - epsilon)$. \
Looking at the code again, we see that $kappa = #text[`0x5f3759df`]$ in _Quake_:
#v(2mm)
```c
float Q_rsqrt( float number ) {
long i = * ( long * ) &number;
i = 0x5f3759df - ( i >> 1 );
return * ( float * ) &i;
}
```
#v(2mm)
Using a calculator and some basic algebra, we can find the $epsilon$ this code uses: \
#note[Remember, #text[`0x5f3759df`] is $6240089$ in hexadecimal.]
$
(3 times 2^22) (127 - epsilon) &= 6240089 \
(127 - epsilon) &= 126.955 \
epsilon &= 0.0450466
$
So, $0.045$ is the $epsilon$ used by Quake. \
Online sources state that this constant was generated by trial-and-error, \
though it is fairly close to the ideal $epsilon$.
#remark()
And now, we're done! \
We've shown that `Q_sqrt(x)` approximates $1/sqrt(x)$ fairly well, \
thanks to the approximation $log(1+a) = a + epsilon$.
#v(2mm)
Notably, `Q_sqrt` uses _zero_ divisions or multiplications (`>>` doesn't count). \
This makes it _very_ fast when compared to more traditional approximation techniques (i.e, Taylor series).
#v(2mm)
In the case of _Quake_, this is very important. 3D graphics require thousands of inverse-square-root calculations to render a single frame#footnote[e.g, to generate normal vectors], which is not an easy task for a Playstation running at 300MHz.
#instructornote[
Let $x$ be a bit string. If we assume $x_f$ is positive and $E$ is even, then
$
(x #text[`>>`] 1)_f = 2^((E div 2) - 127) times (1 + (F div 2) / (2^(23)))
$
Notably: a right-shift divides the exponent of $x_f$ by two, \
which is, of course, a square root!
#v(2mm)
This intuition is hand-wavy, though: \
If $E$ is odd, its lowest-order bit becomes the highest-order bit of $F$ when we shift $x$ right. \
Also, a right shift doesn't divide the _entire_ exponent, skipping the $-127$ offset. \
#v(2mm)
Remarkably, this intuition is still somewhat correct. \
The bits align _just so_, and our approximation still works.
#v(8mm)
One can think of the fast inverse root as a "digital slide rule": \
The integer representation of $x_f$ already contains $log_2(x_f)$, offset and scaled. \
By subtracting and dividing in "log space", we effectively invert and root $x_f$!
After all,
$
- 1 / 2 log_2(n_f) = 1 / sqrt(n_f)
$
]

View File

@ -0,0 +1,36 @@
#import "@local/handout:0.1.0": *
= Bonus -- More about Floats
#problem()
Convince yourself that all numbers that can be represented as a float are rational.
#problem()
Find a rational number that cannot be represented as a float.
#v(1fr)
#problem()
What is the smallest positive 32-bit float?
#v(1fr)
#problem()
What is the largest positive 32-bit float?
#v(1fr)
#problem()
How many floats are between $-1$ and $1$?
#v(1fr)
#problem()
How many floats are between $1$ and $2$?
#v(1fr)
#problem()
How many floats are between $1$ and $128$?
#v(1fr)