211 lines
5.7 KiB
Typst
211 lines
5.7 KiB
Typst
#import "@local/handout:0.1.0": *
|
|
|
|
= The Fast Inverse Square Root
|
|
|
|
A simplified version of the _Quake_ routine we are studying is reproduced below.
|
|
|
|
#v(2mm)
|
|
|
|
```c
|
|
float Q_rsqrt( float number ) {
|
|
long i = * ( long * ) &number;
|
|
i = 0x5f3759df - ( i >> 1 );
|
|
return * ( float * ) &i;
|
|
}
|
|
```
|
|
|
|
#v(2mm)
|
|
|
|
This code defines a function `Q_rsqrt` that consumes a float `number` and approximates its inverse square root.
|
|
If we rewrite this using notation we're familiar with, we get the following:
|
|
$
|
|
#text[`Q_sqrt`] (n_f) =
|
|
6240089 - (n_i div 2)
|
|
#h(10mm)
|
|
approx 1 / sqrt(n_f)
|
|
$
|
|
|
|
#note[
|
|
`0x5f3759df` is $6240089$ in hexadecimal. \
|
|
Ask an instructor to explain if you don't know what this means. \
|
|
It is a magic number hard-coded into `Q_sqrt`.
|
|
]
|
|
|
|
#v(2mm)
|
|
|
|
Our goal in this section is to understand why this works:
|
|
- How does Quake approximate $1 / sqrt(x)$ by simply subtracting and dividing by two?
|
|
- What's special about $6240089$?
|
|
|
|
|
|
|
|
#v(1fr)
|
|
|
|
#remark()
|
|
For those that are interested, here are the details of the "code-to-math" translation:
|
|
|
|
- "`long i = * (long *) &number`" is C magic that tells the compiler \
|
|
to set `i` to the `uint` value of the bits of `number`. \
|
|
#note[
|
|
"long" refers to a "long integer", which has 32 bits. \
|
|
Normal `int`s have 16 bits, `short int`s have 8.
|
|
] \
|
|
In other words, `number` is $n_f$ and `i` is $n_i$.
|
|
#v(2mm)
|
|
|
|
|
|
- Notice the right-shift in the second line of the function. \
|
|
We translated `(i >> 1)` into $(n_i div 2)$.
|
|
#v(2mm)
|
|
|
|
- "`return * (float *) &i`" is again C magic. \
|
|
Much like before, it tells us to return the value of the bits of `i` as a float.
|
|
#pagebreak()
|
|
|
|
#generic("Setup:")
|
|
We are now ready to show that $#text[`Q_sqrt`] (x)$ effectively approximates $1/sqrt(x)$. \
|
|
For convenience, let's call the bit string of the inverse square root $r$. \
|
|
In other words,
|
|
$
|
|
r_f := 1 / (sqrt(n_f))
|
|
$
|
|
This is the value we want to approximate. \
|
|
|
|
#problem(label: "finala")
|
|
Find an approximation for $log_2(r_f)$ in terms of $n_i$ and $epsilon$ \
|
|
#note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + x)$.]
|
|
|
|
#solution[
|
|
$
|
|
log_2(r_f)
|
|
= log_2(1 / sqrt(n_f))
|
|
= (-1) / 2 log_2(n_f)
|
|
approx (-1) / 2 ( (n_i) / (2^23) + epsilon - 127 )
|
|
$
|
|
]
|
|
|
|
#v(1fr)
|
|
|
|
#problem(label: "finalb")
|
|
Let's call the "magic number" in the code above $kappa$, so that
|
|
$
|
|
#text[`Q_sqrt`] (n_f) = kappa - (n_i div 2)
|
|
$
|
|
Use @convert and @finala to show that $#text[`Q_sqrt`] (n_f) approx r_i$ \
|
|
#note(type: "Note")[
|
|
If we know $r_i$, we know $r_f$. \
|
|
We don't even need to convert between the two---the underlying bits are the same!
|
|
]
|
|
|
|
#solution[
|
|
From @convert, we know that
|
|
$
|
|
log_2(r_f) approx (r_i) / (2^23) + epsilon - 127
|
|
$
|
|
|
|
Combining this with the result from @finala, we get:
|
|
$
|
|
(r_i) / (2^23) + epsilon - 127
|
|
&approx (-1) / (2) ( (n_i) / (2^23) + epsilon - 127) \
|
|
(r_i) / (2^23)
|
|
&approx (-1) / (2) ( (n_i) / (2^23)) + 3 / 2 (127 - epsilon) \
|
|
r_i
|
|
&approx (-1) / 2 (n_i) + 2^23 3 / 2(127 - epsilon)
|
|
= 2^23 3 / 2 (127 - epsilon) - (n_i) / 2
|
|
$
|
|
|
|
#v(2mm)
|
|
|
|
This is exactly what we need! If we set $kappa$ to $(3 times 2^22) (127-epsilon)$, then
|
|
$
|
|
r_i approx kappa - (n_i div 2) = #text[`Q_sqrt`] (n_f)
|
|
$
|
|
]
|
|
|
|
#v(1fr)
|
|
|
|
#problem(label: "finalc")
|
|
What is the exact value of $kappa$ in terms of $epsilon$? \
|
|
#hint[Look at @finalb. We already found it!]
|
|
|
|
#solution[
|
|
This problem makes sure our students see that
|
|
$kappa = (3 times 2^22) (127 - epsilon)$. \
|
|
See the solution to @finalb.
|
|
]
|
|
|
|
#v(2cm)
|
|
|
|
#pagebreak()
|
|
|
|
#remark()
|
|
In @finalc we saw that $kappa = (3 times 2^22) (127 - epsilon)$. \
|
|
Looking at the code again, we see that $kappa = #text[`0x5f3759df`]$ in _Quake_:
|
|
|
|
#v(2mm)
|
|
|
|
```c
|
|
float Q_rsqrt( float number ) {
|
|
long i = * ( long * ) &number;
|
|
i = 0x5f3759df - ( i >> 1 );
|
|
return * ( float * ) &i;
|
|
}
|
|
```
|
|
|
|
#v(2mm)
|
|
Using a calculator and some basic algebra, we can find the $epsilon$ this code uses: \
|
|
#note[Remember, #text[`0x5f3759df`] is $6240089$ in hexadecimal.]
|
|
$
|
|
(3 times 2^22) (127 - epsilon) &= 6240089 \
|
|
(127 - epsilon) &= 126.955 \
|
|
epsilon &= 0.0450466
|
|
$
|
|
|
|
So, $0.045$ is the $epsilon$ used by Quake. \
|
|
Online sources state that this constant was generated by trial-and-error, \
|
|
though it is fairly close to the ideal $epsilon$.
|
|
|
|
#remark()
|
|
And now, we're done! \
|
|
We've shown that `Q_sqrt(x)` approximates $1/sqrt(x)$ fairly well. \
|
|
|
|
#v(2mm)
|
|
|
|
Notably, `Q_sqrt` uses _zero_ divisions or multiplications (`>>` doesn't count). \
|
|
This makes it _very_ fast when compared to more traditional approximation techniques (i.e, Taylor series).
|
|
|
|
#v(2mm)
|
|
|
|
In the case of _Quake_, this is very important. 3D graphics require thousands of inverse-square-root calculations to render a single frame#footnote[e.g, to generate normal vectors], which is not an easy task for a Playstation running at 300MHz.
|
|
|
|
#instructornote[
|
|
Let $x$ be a bit string. If we assume $x_f$ is positive and $E$ is even, then
|
|
$
|
|
(x #text[`>>`] 1)_f = 2^((E div 2) - 127) times (1 + (F div 2) / (2^(23)))
|
|
$
|
|
Notably: a right-shift divides the exponent of $x_f$ by two, \
|
|
which is, of course, a square root!
|
|
|
|
#v(2mm)
|
|
|
|
This intuition is hand-wavy, though: \
|
|
If $E$ is odd, its lowest-order bit becomes the highest-order bit of $F$ when we shift $x$ right. \
|
|
Also, a right shift doesn't divide the _entire_ exponent, skipping the $-127$ offset. \
|
|
|
|
#v(2mm)
|
|
|
|
Remarkably, this intuition is still somewhat correct. \
|
|
The bits align _just so_, and our approximation still works.
|
|
|
|
#v(8mm)
|
|
|
|
One can think of the fast inverse root as a "digital slide rule": \
|
|
The integer representation of $x_f$ already contains $log_2(x_f)$, offset and scaled. \
|
|
By subtracting and dividing in "log space", we effectively invert and root $x_f$!
|
|
|
|
After all,
|
|
$
|
|
- 1 / 2 log_2(n_f) = 1 / sqrt(n_f)
|
|
$
|
|
]
|