Tom edits
This commit is contained in:
parent
6d4127f5a5
commit
b4852e7fcd
@ -13,16 +13,19 @@
|
|||||||
by: "Mark",
|
by: "Mark",
|
||||||
)
|
)
|
||||||
|
|
||||||
#include "parts/00 int.typ"
|
#include "parts/00 intro.typ"
|
||||||
#pagebreak()
|
#pagebreak()
|
||||||
|
|
||||||
#include "parts/01 float.typ"
|
#include "parts/01 int.typ"
|
||||||
#pagebreak()
|
#pagebreak()
|
||||||
|
|
||||||
#include "parts/02 approx.typ"
|
#include "parts/02 float.typ"
|
||||||
#pagebreak()
|
#pagebreak()
|
||||||
|
|
||||||
#include "parts/03 quake.typ"
|
#include "parts/03 approx.typ"
|
||||||
#pagebreak()
|
#pagebreak()
|
||||||
|
|
||||||
#include "parts/04 bonus.typ"
|
#include "parts/04 quake.typ"
|
||||||
|
#pagebreak()
|
||||||
|
|
||||||
|
#include "parts/05 bonus.typ"
|
||||||
|
45
src/Advanced/Fast Inverse Root/parts/00 intro.typ
Normal file
45
src/Advanced/Fast Inverse Root/parts/00 intro.typ
Normal file
@ -0,0 +1,45 @@
|
|||||||
|
#import "@local/handout:0.1.0": *
|
||||||
|
|
||||||
|
= Introduction
|
||||||
|
|
||||||
|
In 2005, ID Software published the source code of _Quake III Arena_, a popular game released in 1999. \
|
||||||
|
This caused quite a stir: ID Software was responsible for many games popular among old-school engineers (most notably _Doom_, which has a place in programmer humor even today).
|
||||||
|
|
||||||
|
#v(2mm)
|
||||||
|
|
||||||
|
Naturally, this community immediately began dissecting _Quake_'s source. \
|
||||||
|
One particularly interesting function is reproduced below, with original comments: \
|
||||||
|
|
||||||
|
#v(3mm)
|
||||||
|
|
||||||
|
```c
|
||||||
|
float Q_rsqrt( float number ) {
|
||||||
|
long i;
|
||||||
|
float x2, y;
|
||||||
|
const float threehalfs = 1.5F;
|
||||||
|
|
||||||
|
x2 = number * 0.5F;
|
||||||
|
y = number;
|
||||||
|
i = * ( long * ) &y; // evil floating point bit level hacking
|
||||||
|
i = 0x5f3759df - ( i >> 1 ); // [redacted]
|
||||||
|
y = * ( float * ) &i;
|
||||||
|
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
|
||||||
|
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
|
||||||
|
|
||||||
|
return y;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#v(3mm)
|
||||||
|
|
||||||
|
This code defines a function `Q_sqrt`, which was used as a fast approximation of the inverse square root in graphics routines. (in other words, `Q_sqrt` efficiently approximates $1 div sqrt(x)$)
|
||||||
|
|
||||||
|
#v(3mm)
|
||||||
|
|
||||||
|
The key word here is "fast": _Quake_ ran on very limited hardware, and traditional approximation techniques (like Taylor series)#footnote[Taylor series aren't used today, and for the same reason. There are better ways.] were too computationally expensive to be viable.
|
||||||
|
|
||||||
|
#v(3mm)
|
||||||
|
|
||||||
|
Our goal today is to understand how `Q_sqrt` works. \
|
||||||
|
To do that, we'll first need to understand how computers represent numbers. \
|
||||||
|
We'll start with simple binary integers---turn the page.
|
@ -5,7 +5,8 @@
|
|||||||
#definition()
|
#definition()
|
||||||
A _bit string_ is a string of binary digits. \
|
A _bit string_ is a string of binary digits. \
|
||||||
In this handout, we'll denote bit strings with the prefix `0b`. \
|
In this handout, we'll denote bit strings with the prefix `0b`. \
|
||||||
That is, $1010 =$ "one thousand and one," while $#text([`0b1001`]) = 2^3 + 2^0 = 9$
|
#note[This prefix is only notation---it is _not_ part of the string itself.] \
|
||||||
|
For example, $1001$ is the number "one thousand and one," while $#text([`0b1001`])$ is the string of bits "1 0 0 1".
|
||||||
|
|
||||||
#v(2mm)
|
#v(2mm)
|
||||||
We will separate long bit strings with underscores for readability. \
|
We will separate long bit strings with underscores for readability. \
|
||||||
@ -40,7 +41,7 @@ The value of a `uint` is simply its value as a binary number:
|
|||||||
What is the largest number we can represent with a 32-bit `uint`?
|
What is the largest number we can represent with a 32-bit `uint`?
|
||||||
|
|
||||||
#solution([
|
#solution([
|
||||||
$#text([`0b01111111_11111111_11111111_11111111`]) = 2^(31)$
|
$#text([`0b11111111_11111111_11111111_11111111`]) = 2^(32)-1$
|
||||||
])
|
])
|
||||||
|
|
||||||
#v(1fr)
|
#v(1fr)
|
||||||
@ -53,6 +54,10 @@ Find the value of each of the following 32-bit unsigned integers:
|
|||||||
- `0b00000000_00000000_00000100_10110000`
|
- `0b00000000_00000000_00000100_10110000`
|
||||||
#hint([The third conversion is easy---look carefully at the second.])
|
#hint([The third conversion is easy---look carefully at the second.])
|
||||||
|
|
||||||
|
#instructornote[
|
||||||
|
Consider making a list of the powers of two $>= 1024$ on the board.
|
||||||
|
]
|
||||||
|
|
||||||
#solution([
|
#solution([
|
||||||
- $#text([`0b00000000_00000000_00000101_00111001`]) = 1337$
|
- $#text([`0b00000000_00000000_00000101_00111001`]) = 1337$
|
||||||
- $#text([`0b00000000_00000000_00000001_00101100`]) = 300$
|
- $#text([`0b00000000_00000000_00000001_00101100`]) = 300$
|
||||||
@ -64,20 +69,20 @@ Find the value of each of the following 32-bit unsigned integers:
|
|||||||
|
|
||||||
|
|
||||||
#definition()
|
#definition()
|
||||||
In general, division of `uints` is nontrivial#footnote([One may use repeated subtraction, but that isn't efficient.]). \
|
In general, fast division of `uints` is difficult#footnote([One may use repeated subtraction, but this isn't efficient.]). \
|
||||||
Division by powers of two, however, is incredibly easy: \
|
Division by powers of two, however, is incredibly easy: \
|
||||||
To divide by two, all we need to do is shift the bits of our integer right.
|
To divide by two, all we need to do is shift the bits of our integer right.
|
||||||
|
|
||||||
#v(2mm)
|
#v(2mm)
|
||||||
|
|
||||||
For example, consider $#text[`0b0000_0110`] = 6$. \
|
For example, consider $#text[`0b0000_0110`] = 6$. \
|
||||||
If we insert a zero at the left end of this bit string and delete the digit at the right \
|
If we insert a zero at the left end of this string and delete the zero at the right \
|
||||||
(thus "shifting" each bit right), we get `0b0000_0011`, which is 3. \
|
(thus "shifting" each bit right), we get `0b0000_0011`, which is 3. \
|
||||||
|
|
||||||
#v(2mm)
|
#v(2mm)
|
||||||
|
|
||||||
Of course, we loose the remainder when we left-shift an odd number: \
|
Of course, we lose the remainder when we right-shift an odd number: \
|
||||||
$9 div 2 = 4$, since `0b0000_1001` shifted right is `0b0000_0100`.
|
$9$ shifted right is $4$, since `0b0000_1001` shifted right is `0b0000_0100`.
|
||||||
|
|
||||||
#problem()
|
#problem()
|
||||||
Right shifts are denoted by the `>>` symbol: \
|
Right shifts are denoted by the `>>` symbol: \
|
||||||
@ -86,6 +91,7 @@ Find the value of the following:
|
|||||||
- $12 #text[`>>`] 1$
|
- $12 #text[`>>`] 1$
|
||||||
- $27 #text[`>>`] 3$
|
- $27 #text[`>>`] 3$
|
||||||
- $16 #text[`>>`] 8$
|
- $16 #text[`>>`] 8$
|
||||||
|
#note[Naturally, you'll have to convert these integers to binary first.]
|
||||||
|
|
||||||
#solution[
|
#solution[
|
||||||
- $12 #text[`>>`] 1 = 6$
|
- $12 #text[`>>`] 1 = 6$
|
@ -3,7 +3,7 @@
|
|||||||
|
|
||||||
= Floats
|
= Floats
|
||||||
#definition()
|
#definition()
|
||||||
_Binary decimals_#footnote["decimal" is a misnomer, but that's ok.] are very similar to base-10 decimals. \
|
_Binary decimals_#footnote([Note that "binary decimal" is a misnomer---"deci" means "ten"!]) are very similar to base-10 decimals.\
|
||||||
In base 10, we interpret place value as follows:
|
In base 10, we interpret place value as follows:
|
||||||
- $0.1 = 10^(-1)$
|
- $0.1 = 10^(-1)$
|
||||||
- $0.03 = 3 times 10^(-2)$
|
- $0.03 = 3 times 10^(-2)$
|
||||||
@ -107,11 +107,13 @@ Floats represent a subset of the real numbers, and are interpreted as follows: \
|
|||||||
- The next eight bits represent the _exponent_ of this float.
|
- The next eight bits represent the _exponent_ of this float.
|
||||||
#note([(we'll see what that means soon)]) \
|
#note([(we'll see what that means soon)]) \
|
||||||
We'll call the value of this eight-bit binary integer $E$. \
|
We'll call the value of this eight-bit binary integer $E$. \
|
||||||
Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits.)])
|
Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits)])
|
||||||
|
|
||||||
- The remaining 23 bits represent the _fraction_ of this float, which we'll call $F$. \
|
- The remaining 23 bits represent the _fraction_ of this float. \
|
||||||
These 23 bits are interpreted as the fractional part of a binary decimal. \
|
They are interpreted as the fractional part of a binary decimal. \
|
||||||
For example, the bits `0b10100000_00000000_00000000` represents $0.5 + 0.125 = 0.625$.
|
For example, the bits `0b10100000_00000000_00000000` represent $0.5 + 0.125 = 0.625$. \
|
||||||
|
We'll call the value of these bits as a binary integer $F$. \
|
||||||
|
Their value as a binary decimal is then $F div 2^23$. #note([(convince yourself of this)])
|
||||||
|
|
||||||
|
|
||||||
#problem(label: "floata")
|
#problem(label: "floata")
|
||||||
@ -135,12 +137,17 @@ $
|
|||||||
(-1)^s times 2^(E - 127) times (1 + F / (2^(23)))
|
(-1)^s times 2^(E - 127) times (1 + F / (2^(23)))
|
||||||
$
|
$
|
||||||
|
|
||||||
Notice that this is very similar to decimal scientific notation, which is written as
|
Notice that this is very similar to base-10 scientific notation, which is written as
|
||||||
|
|
||||||
$
|
$
|
||||||
(-1)^s times 10^(e) times (f)
|
(-1)^s times 10^(e) times (f)
|
||||||
$
|
$
|
||||||
|
|
||||||
|
#note[
|
||||||
|
We subtract 127 from $E$ so we can represent positive and negative numbers. \
|
||||||
|
$E$ is an eight bit binary integer, so $0 <= E <= 255$ and thus $-127 <= (E - 127) <= 127$.
|
||||||
|
]
|
||||||
|
|
||||||
#problem()
|
#problem()
|
||||||
Consider `0b01000001_10101000_00000000_00000000`. \
|
Consider `0b01000001_10101000_00000000_00000000`. \
|
||||||
This is the same bit string we used in @floata. \
|
This is the same bit string we used in @floata. \
|
@ -5,7 +5,7 @@
|
|||||||
= Integers and Floats
|
= Integers and Floats
|
||||||
|
|
||||||
#generic("Observation:")
|
#generic("Observation:")
|
||||||
For small values of $x$, $log_2(1 + x)$ is approximately equal to $x$. \
|
If $x$ is smaller than 1, $log_2(1 + x)$ is approximately equal to $x$. \
|
||||||
Note that this equality is exact for $x = 0$ and $x = 1$, since $log_2(1) = 0$ and $log_2(2) = 1$.
|
Note that this equality is exact for $x = 0$ and $x = 1$, since $log_2(1) = 0$ and $log_2(2) = 1$.
|
||||||
|
|
||||||
#v(5mm)
|
#v(5mm)
|
||||||
@ -18,7 +18,7 @@ This allows us to improve the average error of our linear approximation:
|
|||||||
align: center,
|
align: center,
|
||||||
columns: (1fr, 1fr),
|
columns: (1fr, 1fr),
|
||||||
inset: 5mm,
|
inset: 5mm,
|
||||||
[$log(1+x)$ and $x + 0$]
|
[$log_2(1+x)$ and $x + 0$]
|
||||||
+ cetz.canvas({
|
+ cetz.canvas({
|
||||||
import cetz.draw: *
|
import cetz.draw: *
|
||||||
|
|
||||||
@ -64,7 +64,7 @@ This allows us to improve the average error of our linear approximation:
|
|||||||
Max error: 0.086 \
|
Max error: 0.086 \
|
||||||
Average error: 0.0573
|
Average error: 0.0573
|
||||||
],
|
],
|
||||||
[$log(1+x)$ and $x + 0.045$]
|
[$log_2(1+x)$ and $x + 0.045$]
|
||||||
+ cetz.canvas({
|
+ cetz.canvas({
|
||||||
import cetz.draw: *
|
import cetz.draw: *
|
||||||
|
|
||||||
@ -125,7 +125,7 @@ We won't bother with this---we'll simply leave the correction term as an opaque
|
|||||||
[
|
[
|
||||||
"Average error" above is simply the area of the region between the two graphs:
|
"Average error" above is simply the area of the region between the two graphs:
|
||||||
$
|
$
|
||||||
integral_0^1 abs( #v(1mm) log(1+x) - (x+epsilon) #v(1mm))
|
integral_0^1 abs( #v(1mm) log(1+x)_2 - (x+epsilon) #v(1mm))
|
||||||
$
|
$
|
||||||
Feel free to ignore this note, it isn't a critical part of this handout.
|
Feel free to ignore this note, it isn't a critical part of this handout.
|
||||||
],
|
],
|
@ -2,10 +2,9 @@
|
|||||||
|
|
||||||
= The Fast Inverse Square Root
|
= The Fast Inverse Square Root
|
||||||
|
|
||||||
|
A simplified version of the _Quake_ routine we are studying is reproduced below.
|
||||||
|
|
||||||
The following code is present in _Quake III Arena_ (1999):
|
#v(2mm)
|
||||||
|
|
||||||
#v(5mm)
|
|
||||||
|
|
||||||
```c
|
```c
|
||||||
float Q_rsqrt( float number ) {
|
float Q_rsqrt( float number ) {
|
||||||
@ -15,20 +14,20 @@ float Q_rsqrt( float number ) {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
#v(5mm)
|
#v(2mm)
|
||||||
|
|
||||||
This code defines a function `Q_rsqrt` that consumes a float `number` and approximates its inverse square root.
|
This code defines a function `Q_rsqrt` that consumes a float `number` and approximates its inverse square root.
|
||||||
If we rewrite this using notation we're familiar with, we get the following:
|
If we rewrite this using notation we're familiar with, we get the following:
|
||||||
$
|
$
|
||||||
#text[`Q_sqrt`] (n_f) =
|
#text[`Q_sqrt`] (n_f) =
|
||||||
#h(5mm)
|
|
||||||
6240089 - (n_i div 2)
|
6240089 - (n_i div 2)
|
||||||
#h(5mm)
|
#h(10mm)
|
||||||
approx 1 / sqrt(n_f)
|
approx 1 / sqrt(n_f)
|
||||||
$
|
$
|
||||||
|
|
||||||
#note[
|
#note[
|
||||||
`0x5f3759df` is $6240089$ in hexadecimal. \
|
`0x5f3759df` is $6240089$ in hexadecimal. \
|
||||||
|
Ask an instructor to explain if you don't know what this means. \
|
||||||
It is a magic number hard-coded into `Q_sqrt`.
|
It is a magic number hard-coded into `Q_sqrt`.
|
||||||
]
|
]
|
||||||
|
|
||||||
@ -56,7 +55,7 @@ For those that are interested, here are the details of the "code-to-math" transl
|
|||||||
|
|
||||||
|
|
||||||
- Notice the right-shift in the second line of the function. \
|
- Notice the right-shift in the second line of the function. \
|
||||||
We translated `(i >> i)` into $(n_i div 2)$.
|
We translated `(i >> 1)` into $(n_i div 2)$.
|
||||||
#v(2mm)
|
#v(2mm)
|
||||||
|
|
||||||
- "`return * (float *) &i`" is again C magic. \
|
- "`return * (float *) &i`" is again C magic. \
|
||||||
@ -64,17 +63,17 @@ For those that are interested, here are the details of the "code-to-math" transl
|
|||||||
#pagebreak()
|
#pagebreak()
|
||||||
|
|
||||||
#generic("Setup:")
|
#generic("Setup:")
|
||||||
We are now ready to show that $#text[`Q_sqrt`] (x) approx 1/sqrt(x)$. \
|
We are now ready to show that $#text[`Q_sqrt`] (x)$ effectively approximates $1/sqrt(x)$. \
|
||||||
For convenience, let's call the bit string of the inverse square root $r$. \
|
For convenience, let's call the bit string of the inverse square root $r$. \
|
||||||
In other words,
|
In other words,
|
||||||
$
|
$
|
||||||
r_f := 1 / (sqrt(n_f))
|
r_f := 1 / (sqrt(n_f))
|
||||||
$
|
$
|
||||||
This is the value we want to approximate.
|
This is the value we want to approximate. \
|
||||||
|
|
||||||
#problem(label: "finala")
|
#problem(label: "finala")
|
||||||
Find an approximation for $log_2(r_f)$ in terms of $n_i$ and $epsilon$ \
|
Find an approximation for $log_2(r_f)$ in terms of $n_i$ and $epsilon$ \
|
||||||
#note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + a)$.]
|
#note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + x)$.]
|
||||||
|
|
||||||
#solution[
|
#solution[
|
||||||
$
|
$
|
||||||
@ -92,7 +91,11 @@ Let's call the "magic number" in the code above $kappa$, so that
|
|||||||
$
|
$
|
||||||
#text[`Q_sqrt`] (n_f) = kappa - (n_i div 2)
|
#text[`Q_sqrt`] (n_f) = kappa - (n_i div 2)
|
||||||
$
|
$
|
||||||
Use @convert and @finala to show that $#text[`Q_sqrt`] (n_f) approx r_i$
|
Use @convert and @finala to show that $#text[`Q_sqrt`] (n_f) approx r_i$ \
|
||||||
|
#note(type: "Note")[
|
||||||
|
If we know $r_i$, we know $r_f$. \
|
||||||
|
We don't even need to convert between the two---the underlying bits are the same!
|
||||||
|
]
|
||||||
|
|
||||||
#solution[
|
#solution[
|
||||||
From @convert, we know that
|
From @convert, we know that
|
||||||
@ -164,8 +167,7 @@ though it is fairly close to the ideal $epsilon$.
|
|||||||
|
|
||||||
#remark()
|
#remark()
|
||||||
And now, we're done! \
|
And now, we're done! \
|
||||||
We've shown that `Q_sqrt(x)` approximates $1/sqrt(x)$ fairly well, \
|
We've shown that `Q_sqrt(x)` approximates $1/sqrt(x)$ fairly well. \
|
||||||
thanks to the approximation $log(1+a) = a + epsilon$.
|
|
||||||
|
|
||||||
#v(2mm)
|
#v(2mm)
|
||||||
|
|
Loading…
x
Reference in New Issue
Block a user