diff --git a/src/Advanced/Fast Inverse Root/main.typ b/src/Advanced/Fast Inverse Root/main.typ index 71043b5..e36d184 100644 --- a/src/Advanced/Fast Inverse Root/main.typ +++ b/src/Advanced/Fast Inverse Root/main.typ @@ -13,16 +13,19 @@ by: "Mark", ) -#include "parts/00 int.typ" +#include "parts/00 intro.typ" #pagebreak() -#include "parts/01 float.typ" +#include "parts/01 int.typ" #pagebreak() -#include "parts/02 approx.typ" +#include "parts/02 float.typ" #pagebreak() -#include "parts/03 quake.typ" +#include "parts/03 approx.typ" #pagebreak() -#include "parts/04 bonus.typ" +#include "parts/04 quake.typ" +#pagebreak() + +#include "parts/05 bonus.typ" diff --git a/src/Advanced/Fast Inverse Root/parts/00 intro.typ b/src/Advanced/Fast Inverse Root/parts/00 intro.typ new file mode 100644 index 0000000..d0ca1b5 --- /dev/null +++ b/src/Advanced/Fast Inverse Root/parts/00 intro.typ @@ -0,0 +1,45 @@ +#import "@local/handout:0.1.0": * + += Introduction + +In 2005, ID Software published the source code of _Quake III Arena_, a popular game released in 1999. \ +This caused quite a stir: ID Software was responsible for many games popular among old-school engineers (most notably _Doom_, which has a place in programmer humor even today). + +#v(2mm) + +Naturally, this community immediately began dissecting _Quake_'s source. \ +One particularly interesting function is reproduced below, with original comments: \ + +#v(3mm) + +```c +float Q_rsqrt( float number ) { + long i; + float x2, y; + const float threehalfs = 1.5F; + + x2 = number * 0.5F; + y = number; + i = * ( long * ) &y; // evil floating point bit level hacking + i = 0x5f3759df - ( i >> 1 ); // [redacted] + y = * ( float * ) &i; + y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration +// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed + + return y; +} +``` + +#v(3mm) + +This code defines a function `Q_sqrt`, which was used as a fast approximation of the inverse square root in graphics routines. (in other words, `Q_sqrt` efficiently approximates $1 div sqrt(x)$) + +#v(3mm) + +The key word here is "fast": _Quake_ ran on very limited hardware, and traditional approximation techniques (like Taylor series)#footnote[Taylor series aren't used today, and for the same reason. There are better ways.] were too computationally expensive to be viable. + +#v(3mm) + +Our goal today is to understand how `Q_sqrt` works. \ +To do that, we'll first need to understand how computers represent numbers. \ +We'll start with simple binary integers---turn the page. diff --git a/src/Advanced/Fast Inverse Root/parts/00 int.typ b/src/Advanced/Fast Inverse Root/parts/01 int.typ similarity index 75% rename from src/Advanced/Fast Inverse Root/parts/00 int.typ rename to src/Advanced/Fast Inverse Root/parts/01 int.typ index 9de9189..0bc07bc 100644 --- a/src/Advanced/Fast Inverse Root/parts/00 int.typ +++ b/src/Advanced/Fast Inverse Root/parts/01 int.typ @@ -5,7 +5,8 @@ #definition() A _bit string_ is a string of binary digits. \ In this handout, we'll denote bit strings with the prefix `0b`. \ -That is, $1010 =$ "one thousand and one," while $#text([`0b1001`]) = 2^3 + 2^0 = 9$ +#note[This prefix is only notation---it is _not_ part of the string itself.] \ +For example, $1001$ is the number "one thousand and one," while $#text([`0b1001`])$ is the string of bits "1 0 0 1". #v(2mm) We will separate long bit strings with underscores for readability. \ @@ -40,7 +41,7 @@ The value of a `uint` is simply its value as a binary number: What is the largest number we can represent with a 32-bit `uint`? #solution([ - $#text([`0b01111111_11111111_11111111_11111111`]) = 2^(31)$ + $#text([`0b11111111_11111111_11111111_11111111`]) = 2^(32)-1$ ]) #v(1fr) @@ -53,6 +54,10 @@ Find the value of each of the following 32-bit unsigned integers: - `0b00000000_00000000_00000100_10110000` #hint([The third conversion is easy---look carefully at the second.]) +#instructornote[ + Consider making a list of the powers of two $>= 1024$ on the board. +] + #solution([ - $#text([`0b00000000_00000000_00000101_00111001`]) = 1337$ - $#text([`0b00000000_00000000_00000001_00101100`]) = 300$ @@ -64,20 +69,20 @@ Find the value of each of the following 32-bit unsigned integers: #definition() -In general, division of `uints` is nontrivial#footnote([One may use repeated subtraction, but that isn't efficient.]). \ +In general, fast division of `uints` is difficult#footnote([One may use repeated subtraction, but this isn't efficient.]). \ Division by powers of two, however, is incredibly easy: \ To divide by two, all we need to do is shift the bits of our integer right. #v(2mm) For example, consider $#text[`0b0000_0110`] = 6$. \ -If we insert a zero at the left end of this bit string and delete the digit at the right \ +If we insert a zero at the left end of this string and delete the zero at the right \ (thus "shifting" each bit right), we get `0b0000_0011`, which is 3. \ #v(2mm) -Of course, we loose the remainder when we left-shift an odd number: \ -$9 div 2 = 4$, since `0b0000_1001` shifted right is `0b0000_0100`. +Of course, we lose the remainder when we right-shift an odd number: \ +$9$ shifted right is $4$, since `0b0000_1001` shifted right is `0b0000_0100`. #problem() Right shifts are denoted by the `>>` symbol: \ @@ -86,6 +91,7 @@ Find the value of the following: - $12 #text[`>>`] 1$ - $27 #text[`>>`] 3$ - $16 #text[`>>`] 8$ +#note[Naturally, you'll have to convert these integers to binary first.] #solution[ - $12 #text[`>>`] 1 = 6$ diff --git a/src/Advanced/Fast Inverse Root/parts/01 float.typ b/src/Advanced/Fast Inverse Root/parts/02 float.typ similarity index 84% rename from src/Advanced/Fast Inverse Root/parts/01 float.typ rename to src/Advanced/Fast Inverse Root/parts/02 float.typ index 1125a1f..db056ff 100644 --- a/src/Advanced/Fast Inverse Root/parts/01 float.typ +++ b/src/Advanced/Fast Inverse Root/parts/02 float.typ @@ -3,7 +3,7 @@ = Floats #definition() -_Binary decimals_#footnote["decimal" is a misnomer, but that's ok.] are very similar to base-10 decimals. \ +_Binary decimals_#footnote([Note that "binary decimal" is a misnomer---"deci" means "ten"!]) are very similar to base-10 decimals.\ In base 10, we interpret place value as follows: - $0.1 = 10^(-1)$ - $0.03 = 3 times 10^(-2)$ @@ -107,11 +107,13 @@ Floats represent a subset of the real numbers, and are interpreted as follows: \ - The next eight bits represent the _exponent_ of this float. #note([(we'll see what that means soon)]) \ We'll call the value of this eight-bit binary integer $E$. \ - Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits.)]) + Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits)]) -- The remaining 23 bits represent the _fraction_ of this float, which we'll call $F$. \ - These 23 bits are interpreted as the fractional part of a binary decimal. \ - For example, the bits `0b10100000_00000000_00000000` represents $0.5 + 0.125 = 0.625$. +- The remaining 23 bits represent the _fraction_ of this float. \ + They are interpreted as the fractional part of a binary decimal. \ + For example, the bits `0b10100000_00000000_00000000` represent $0.5 + 0.125 = 0.625$. \ + We'll call the value of these bits as a binary integer $F$. \ + Their value as a binary decimal is then $F div 2^23$. #note([(convince yourself of this)]) #problem(label: "floata") @@ -135,12 +137,17 @@ $ (-1)^s times 2^(E - 127) times (1 + F / (2^(23))) $ -Notice that this is very similar to decimal scientific notation, which is written as +Notice that this is very similar to base-10 scientific notation, which is written as $ (-1)^s times 10^(e) times (f) $ +#note[ + We subtract 127 from $E$ so we can represent positive and negative numbers. \ + $E$ is an eight bit binary integer, so $0 <= E <= 255$ and thus $-127 <= (E - 127) <= 127$. +] + #problem() Consider `0b01000001_10101000_00000000_00000000`. \ This is the same bit string we used in @floata. \ diff --git a/src/Advanced/Fast Inverse Root/parts/02 approx.typ b/src/Advanced/Fast Inverse Root/parts/03 approx.typ similarity index 94% rename from src/Advanced/Fast Inverse Root/parts/02 approx.typ rename to src/Advanced/Fast Inverse Root/parts/03 approx.typ index 439e262..5934628 100644 --- a/src/Advanced/Fast Inverse Root/parts/02 approx.typ +++ b/src/Advanced/Fast Inverse Root/parts/03 approx.typ @@ -5,7 +5,7 @@ = Integers and Floats #generic("Observation:") -For small values of $x$, $log_2(1 + x)$ is approximately equal to $x$. \ +If $x$ is smaller than 1, $log_2(1 + x)$ is approximately equal to $x$. \ Note that this equality is exact for $x = 0$ and $x = 1$, since $log_2(1) = 0$ and $log_2(2) = 1$. #v(5mm) @@ -18,7 +18,7 @@ This allows us to improve the average error of our linear approximation: align: center, columns: (1fr, 1fr), inset: 5mm, - [$log(1+x)$ and $x + 0$] + [$log_2(1+x)$ and $x + 0$] + cetz.canvas({ import cetz.draw: * @@ -64,7 +64,7 @@ This allows us to improve the average error of our linear approximation: Max error: 0.086 \ Average error: 0.0573 ], - [$log(1+x)$ and $x + 0.045$] + [$log_2(1+x)$ and $x + 0.045$] + cetz.canvas({ import cetz.draw: * @@ -125,7 +125,7 @@ We won't bother with this---we'll simply leave the correction term as an opaque [ "Average error" above is simply the area of the region between the two graphs: $ - integral_0^1 abs( #v(1mm) log(1+x) - (x+epsilon) #v(1mm)) + integral_0^1 abs( #v(1mm) log(1+x)_2 - (x+epsilon) #v(1mm)) $ Feel free to ignore this note, it isn't a critical part of this handout. ], diff --git a/src/Advanced/Fast Inverse Root/parts/03 quake.typ b/src/Advanced/Fast Inverse Root/parts/04 quake.typ similarity index 90% rename from src/Advanced/Fast Inverse Root/parts/03 quake.typ rename to src/Advanced/Fast Inverse Root/parts/04 quake.typ index f2b96ae..5e2d57f 100644 --- a/src/Advanced/Fast Inverse Root/parts/03 quake.typ +++ b/src/Advanced/Fast Inverse Root/parts/04 quake.typ @@ -2,10 +2,9 @@ = The Fast Inverse Square Root +A simplified version of the _Quake_ routine we are studying is reproduced below. -The following code is present in _Quake III Arena_ (1999): - -#v(5mm) +#v(2mm) ```c float Q_rsqrt( float number ) { @@ -15,20 +14,20 @@ float Q_rsqrt( float number ) { } ``` -#v(5mm) +#v(2mm) This code defines a function `Q_rsqrt` that consumes a float `number` and approximates its inverse square root. If we rewrite this using notation we're familiar with, we get the following: $ #text[`Q_sqrt`] (n_f) = - #h(5mm) 6240089 - (n_i div 2) - #h(5mm) + #h(10mm) approx 1 / sqrt(n_f) $ #note[ `0x5f3759df` is $6240089$ in hexadecimal. \ + Ask an instructor to explain if you don't know what this means. \ It is a magic number hard-coded into `Q_sqrt`. ] @@ -56,7 +55,7 @@ For those that are interested, here are the details of the "code-to-math" transl - Notice the right-shift in the second line of the function. \ - We translated `(i >> i)` into $(n_i div 2)$. + We translated `(i >> 1)` into $(n_i div 2)$. #v(2mm) - "`return * (float *) &i`" is again C magic. \ @@ -64,17 +63,17 @@ For those that are interested, here are the details of the "code-to-math" transl #pagebreak() #generic("Setup:") -We are now ready to show that $#text[`Q_sqrt`] (x) approx 1/sqrt(x)$. \ +We are now ready to show that $#text[`Q_sqrt`] (x)$ effectively approximates $1/sqrt(x)$. \ For convenience, let's call the bit string of the inverse square root $r$. \ In other words, $ r_f := 1 / (sqrt(n_f)) $ -This is the value we want to approximate. +This is the value we want to approximate. \ #problem(label: "finala") Find an approximation for $log_2(r_f)$ in terms of $n_i$ and $epsilon$ \ -#note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + a)$.] +#note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + x)$.] #solution[ $ @@ -92,7 +91,11 @@ Let's call the "magic number" in the code above $kappa$, so that $ #text[`Q_sqrt`] (n_f) = kappa - (n_i div 2) $ -Use @convert and @finala to show that $#text[`Q_sqrt`] (n_f) approx r_i$ +Use @convert and @finala to show that $#text[`Q_sqrt`] (n_f) approx r_i$ \ +#note(type: "Note")[ + If we know $r_i$, we know $r_f$. \ + We don't even need to convert between the two---the underlying bits are the same! +] #solution[ From @convert, we know that @@ -164,8 +167,7 @@ though it is fairly close to the ideal $epsilon$. #remark() And now, we're done! \ -We've shown that `Q_sqrt(x)` approximates $1/sqrt(x)$ fairly well, \ -thanks to the approximation $log(1+a) = a + epsilon$. +We've shown that `Q_sqrt(x)` approximates $1/sqrt(x)$ fairly well. \ #v(2mm) diff --git a/src/Advanced/Fast Inverse Root/parts/04 bonus.typ b/src/Advanced/Fast Inverse Root/parts/05 bonus.typ similarity index 100% rename from src/Advanced/Fast Inverse Root/parts/04 bonus.typ rename to src/Advanced/Fast Inverse Root/parts/05 bonus.typ