diff --git a/src/Advanced/Fast Inverse Root/main.typ b/src/Advanced/Fast Inverse Root/main.typ new file mode 100644 index 0000000..71043b5 --- /dev/null +++ b/src/Advanced/Fast Inverse Root/main.typ @@ -0,0 +1,28 @@ +#import "@local/handout:0.1.0": * + +// Bonus: +// - Floats vs fixed point +// - Float density +// - Find non-floatable rational numbers +// - What if we use `n`-bit floats? + +#show: doc => handout( + doc, + group: "Advanced 2", + title: [Fast Inverse Square Root], + by: "Mark", +) + +#include "parts/00 int.typ" +#pagebreak() + +#include "parts/01 float.typ" +#pagebreak() + +#include "parts/02 approx.typ" +#pagebreak() + +#include "parts/03 quake.typ" +#pagebreak() + +#include "parts/04 bonus.typ" diff --git a/src/Advanced/Fast Inverse Root/meta.toml b/src/Advanced/Fast Inverse Root/meta.toml new file mode 100644 index 0000000..5a1bdbd --- /dev/null +++ b/src/Advanced/Fast Inverse Root/meta.toml @@ -0,0 +1,7 @@ +[metadata] +title = "Fast Inverse Square Root" + + +[publish] +handout = true +solutions = true diff --git a/src/Advanced/Fast Inverse Root/parts/00 int.typ b/src/Advanced/Fast Inverse Root/parts/00 int.typ new file mode 100644 index 0000000..9de9189 --- /dev/null +++ b/src/Advanced/Fast Inverse Root/parts/00 int.typ @@ -0,0 +1,96 @@ +#import "@local/handout:0.1.0": * + += Integers + +#definition() +A _bit string_ is a string of binary digits. \ +In this handout, we'll denote bit strings with the prefix `0b`. \ +That is, $1010 =$ "one thousand and one," while $#text([`0b1001`]) = 2^3 + 2^0 = 9$ + +#v(2mm) +We will separate long bit strings with underscores for readability. \ +Underscores have no meaning: $#text([`0b1111_0000`]) = #text([`0b11110000`])$. + +#problem() +What is the value of the following bit strings, if we interpret them as integers in base 2? +- `0b0001_1010` +- `0b0110_0001` + +#solution([ + - $#text([`0b0001_1010`]) = 2 + 8 + 16 = 26$ + - $#text([`0b0110_0001`]) = 1 + 32 + 64 = 95$ +]) + +#v(1fr) + +#definition() +We can interpret a bit string in any number of ways. \ +One such interpretation is the _unsigned integer_, or `uint` for short. \ +`uint`s allow us to represent positive (hence "unsigned") integers using 32-bit strings. + +#v(2mm) + +The value of a `uint` is simply its value as a binary number: +- $#text([`0b00000000_00000000_00000000_00000000`]) = 0$ +- $#text([`0b00000000_00000000_00000000_00000011`]) = 3$ +- $#text([`0b00000000_00000000_00000000_00100000`]) = 32$ +- $#text([`0b00000000_00000000_00000000_10000010`]) = 130$ + +#problem() +What is the largest number we can represent with a 32-bit `uint`? + +#solution([ + $#text([`0b01111111_11111111_11111111_11111111`]) = 2^(31)$ +]) + +#v(1fr) +#pagebreak() + +#problem() +Find the value of each of the following 32-bit unsigned integers: +- `0b00000000_00000000_00000101_00111001` +- `0b00000000_00000000_00000001_00101100` +- `0b00000000_00000000_00000100_10110000` +#hint([The third conversion is easy---look carefully at the second.]) + +#solution([ + - $#text([`0b00000000_00000000_00000101_00111001`]) = 1337$ + - $#text([`0b00000000_00000000_00000001_00101100`]) = 300$ + - $#text([`0b00000000_00000000_00000010_01011000`]) = 1200$ + Notice that the third int is the second shifted left twice (i.e, multiplied by 4) +]) + +#v(1fr) + + +#definition() +In general, division of `uints` is nontrivial#footnote([One may use repeated subtraction, but that isn't efficient.]). \ +Division by powers of two, however, is incredibly easy: \ +To divide by two, all we need to do is shift the bits of our integer right. + +#v(2mm) + +For example, consider $#text[`0b0000_0110`] = 6$. \ +If we insert a zero at the left end of this bit string and delete the digit at the right \ +(thus "shifting" each bit right), we get `0b0000_0011`, which is 3. \ + +#v(2mm) + +Of course, we loose the remainder when we left-shift an odd number: \ +$9 div 2 = 4$, since `0b0000_1001` shifted right is `0b0000_0100`. + +#problem() +Right shifts are denoted by the `>>` symbol: \ +$#text[`00110`] #text[`>>`] n$ means "shift `0b0110` right $n$ times." \ +Find the value of the following: +- $12 #text[`>>`] 1$ +- $27 #text[`>>`] 3$ +- $16 #text[`>>`] 8$ + +#solution[ + - $12 #text[`>>`] 1 = 6$ + - $27 #text[`>>`] 3 = 3$ + - $16 #text[`>>`] 8 = 0$ +] + +#v(1fr) diff --git a/src/Advanced/Fast Inverse Root/parts/01 float.typ b/src/Advanced/Fast Inverse Root/parts/01 float.typ new file mode 100644 index 0000000..1125a1f --- /dev/null +++ b/src/Advanced/Fast Inverse Root/parts/01 float.typ @@ -0,0 +1,200 @@ +#import "@local/handout:0.1.0": * +#import "@preview/cetz:0.3.1" + += Floats +#definition() +_Binary decimals_#footnote["decimal" is a misnomer, but that's ok.] are very similar to base-10 decimals. \ +In base 10, we interpret place value as follows: +- $0.1 = 10^(-1)$ +- $0.03 = 3 times 10^(-2)$ +- $0.0208 = 2 times 10^(-2) + 8 times 10^(-4)$ + +#v(5mm) + +We can do the same in base 2: +- $#text([`0.1`]) = 2^(-1) = 0.5$ +- $#text([`0.011`]) = 2^(-2) + 2^(-3) = 0.375$ +- $#text([`101.01`]) = 5.125$ + +#v(5mm) + +#problem() +Rewrite the following binary decimals in base 10: \ +#note([You may leave your answer as a fraction.]) +- `1011.101` +- `110.1101` + + +#v(1fr) +#pagebreak() + +#definition() +Another way we can interpret a bit string is as a _signed floating-point decimal_, or a `float` for short. \ +Floats represent a subset of the real numbers, and are interpreted as follows: \ +#note([The following only applies to floats that consist of 32 bits. We won't encounter any others today.]) + +#align( + center, + box( + inset: 2mm, + cetz.canvas({ + import cetz.draw: * + + let chars = ( + `0`, + `b`, + `0`, + `_`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `_`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `_`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `_`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + `0`, + ) + + let x = 0 + for c in chars { + content((x, 0), c) + x += 0.25 + } + + let y = -0.4 + line((0.3, y), (0.65, y)) + content((0.45, y - 0.2), [s]) + + line((0.85, y), (2.9, y)) + content((1.9, y - 0.2), [exponent]) + + line((3.10, y), (9.4, y)) + content((6.3, y - 0.2), [fraction]) + }), + ), +) + +- The first bit denotes the sign of the float's value + We'll label it $s$. \ + If $s = #text([`1`])$, this float is negative; if $s = #text([`0`])$, it is positive. + +- The next eight bits represent the _exponent_ of this float. + #note([(we'll see what that means soon)]) \ + We'll call the value of this eight-bit binary integer $E$. \ + Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits.)]) + +- The remaining 23 bits represent the _fraction_ of this float, which we'll call $F$. \ + These 23 bits are interpreted as the fractional part of a binary decimal. \ + For example, the bits `0b10100000_00000000_00000000` represents $0.5 + 0.125 = 0.625$. + + +#problem(label: "floata") +Consider `0b01000001_10101000_00000000_00000000`. \ +Find the $s$, $E$, and $F$ we get if we interpret this bit string as a `float`. \ +#note([Leave $F$ as a sum of powers of two.]) + +#solution([ + $s = 0$ \ + $E = 258$ \ + $F = 2^31+2^19 = 2,621,440$ +]) + +#v(1fr) + + +#definition(label: "floatdef") +The final value of a float with sign $s$, exponent $E$, and fraction $F$ is + +$ + (-1)^s times 2^(E - 127) times (1 + F / (2^(23))) +$ + +Notice that this is very similar to decimal scientific notation, which is written as + +$ + (-1)^s times 10^(e) times (f) +$ + +#problem() +Consider `0b01000001_10101000_00000000_00000000`. \ +This is the same bit string we used in @floata. \ + +#v(2mm) + +What value do we get if we interpret this bit string as a float? \ +#hint([$21 div 16 = 1.3125$]) + +#solution([ + This is 21: + $ + 2^(131) times (1 + (2^(21) + 2^(19)) / (2^(23))) + = 2^(4) times (1 + 0.25 + 0.0625) + = 16 times (1.3125) + = 21 + $ +]) + +#v(1fr) +#pagebreak() + +#problem() +Encode $12.5$ as a float. \ +#hint([$12.5 div 8 = 1.5625$]) + +#solution([ + $ + 12.5 + = 8 times 1.5625 + = 2^(3) times (1 + (0.5 + 0.0625)) + = 2^(130) times (1 + (2^(22) + 2^(19)) / (2^(23))) + $ + + which is `0b01000001_01001000_00000000_00000000`. \ +]) + + +#v(1fr) + +#definition() +Say we have a bit string $x$. \ +We'll let $x_f$ denote the value we get if we interpret $x$ as a float, \ +and we'll let $x_i$ denote the value we get if we interpret $x$ an integer. + +#problem() +Let $x = #text[`0b01000001_01001000_00000000_00000000`]$. \ +What are $x_f$ and $x_i$? #note([As always, you may leave big numbers as powers of two.]) +#solution([ + $x_f = 12.5$ + + #v(2mm) + + $x_i = 2^30 + 2^24 + 2^22 + 2^19 = 11,095,237,632$ +]) + +#v(1fr) diff --git a/src/Advanced/Fast Inverse Root/parts/02 approx.typ b/src/Advanced/Fast Inverse Root/parts/02 approx.typ new file mode 100644 index 0000000..439e262 --- /dev/null +++ b/src/Advanced/Fast Inverse Root/parts/02 approx.typ @@ -0,0 +1,173 @@ +#import "@local/handout:0.1.0": * +#import "@preview/cetz:0.3.1" +#import "@preview/cetz-plot:0.1.0": plot, chart + += Integers and Floats + +#generic("Observation:") +For small values of $x$, $log_2(1 + x)$ is approximately equal to $x$. \ +Note that this equality is exact for $x = 0$ and $x = 1$, since $log_2(1) = 0$ and $log_2(2) = 1$. + +#v(5mm) + +We'll add the _correction term_ $epsilon$ to our approximation: $log_2(1 + a) approx a + epsilon$. \ +This allows us to improve the average error of our linear approximation: + +#table( + stroke: none, + align: center, + columns: (1fr, 1fr), + inset: 5mm, + [$log(1+x)$ and $x + 0$] + + cetz.canvas({ + import cetz.draw: * + + let f1(x) = calc.log(calc.abs(x + 1), base: 2) + let f2(x) = x + + // Set-up a thin axis style + set-style(axes: (stroke: .5pt, tick: (stroke: .5pt))) + + + plot.plot( + size: (7, 7), + x-tick-step: 0.2, + y-tick-step: 0.2, + y-min: 0, + y-max: 1, + x-min: 0, + x-max: 1, + legend: none, + axis-style: "scientific-auto", + x-label: none, + y-label: none, + { + let domain = (0, 1) + + plot.add( + f1, + domain: domain, + label: $log(1+x)$, + style: (stroke: ogrape), + ) + + plot.add( + f2, + domain: domain, + label: $x$, + style: (stroke: oblue), + ) + }, + ) + }) + + [ + Max error: 0.086 \ + Average error: 0.0573 + ], + [$log(1+x)$ and $x + 0.045$] + + cetz.canvas({ + import cetz.draw: * + + let f1(x) = calc.log(calc.abs(x + 1), base: 2) + let f2(x) = x + 0.0450466 + + // Set-up a thin axis style + set-style(axes: (stroke: .5pt, tick: (stroke: .5pt))) + + + plot.plot( + size: (7, 7), + x-tick-step: 0.2, + y-tick-step: 0.2, + y-min: 0, + y-max: 1, + x-min: 0, + x-max: 1, + legend: none, + axis-style: "scientific-auto", + x-label: none, + y-label: none, + { + let domain = (0, 1) + + plot.add( + f1, + domain: domain, + label: $log(1+x)$, + style: (stroke: ogrape), + ) + + plot.add( + f2, + domain: domain, + label: $x$, + style: (stroke: oblue), + ) + }, + ) + }) + + [ + Max error: 0.041 \ + Average error: 0.0254 + ], +) + + +A suitiable value of $epsilon$ can be found using calculus or with computational trial-and-error. \ +We won't bother with this---we'll simply leave the correction term as an opaque constant $epsilon$. + + + +#v(1fr) + +#note( + type: "Note", + [ + "Average error" above is simply the area of the region between the two graphs: + $ + integral_0^1 abs( #v(1mm) log(1+x) - (x+epsilon) #v(1mm)) + $ + Feel free to ignore this note, it isn't a critical part of this handout. + ], +) + + +#pagebreak() + +#problem(label: "convert") +Use the fact that $log_2(1 + a) approx a + epsilon$ to approximate $log_2(x_f)$ in terms of $x_i$. \ +Namely, show that +$ + log_2(x_f) = (x_i) / (2^23) - 127 + epsilon +$ +#note([ + In other words, we're finding an expression for $x$ as a float + in terms of $x$ as an int. +]) + +#solution([ + Let $E$ and $F$ be the exponent and float bits of $x_f$. \ + We then have: + $ + log_2(x_f) + &= log_2 ( 2^(E-127) times (1 + (F) / (2^23)) ) \ + &= E - 127 + log_2(1 + F / (2^23)) \ + & approx E-127 + F / (2^23) + epsilon \ + &= 1 / (2^23)(2^23 E + F) - 127 + epsilon \ + &= 1 / (2^23)(x_i) - 127 + epsilon + $ +]) + +#v(1fr) + + +#problem() +Using basic log rules, rewrite $log_2(1 / sqrt(x))$ in terms of $log_2(x)$. + +#solution([ + $ + log_2(1 / sqrt(x)) = (-1) / (2)log_2(x) + $ +]) + +#v(1fr) diff --git a/src/Advanced/Fast Inverse Root/parts/03 quake.typ b/src/Advanced/Fast Inverse Root/parts/03 quake.typ new file mode 100644 index 0000000..f2b96ae --- /dev/null +++ b/src/Advanced/Fast Inverse Root/parts/03 quake.typ @@ -0,0 +1,208 @@ +#import "@local/handout:0.1.0": * + += The Fast Inverse Square Root + + +The following code is present in _Quake III Arena_ (1999): + +#v(5mm) + +```c +float Q_rsqrt( float number ) { + long i = * ( long * ) &number; + i = 0x5f3759df - ( i >> 1 ); + return * ( float * ) &i; +} +``` + +#v(5mm) + +This code defines a function `Q_rsqrt` that consumes a float `number` and approximates its inverse square root. +If we rewrite this using notation we're familiar with, we get the following: +$ + #text[`Q_sqrt`] (n_f) = + #h(5mm) + 6240089 - (n_i div 2) + #h(5mm) + approx 1 / sqrt(n_f) +$ + +#note[ + `0x5f3759df` is $6240089$ in hexadecimal. \ + It is a magic number hard-coded into `Q_sqrt`. +] + +#v(2mm) + +Our goal in this section is to understand why this works: +- How does Quake approximate $1 / sqrt(x)$ by simply subtracting and dividing by two? +- What's special about $6240089$? + + + +#v(1fr) + +#remark() +For those that are interested, here are the details of the "code-to-math" translation: + +- "`long i = * (long *) &number`" is C magic that tells the compiler \ + to set `i` to the `uint` value of the bits of `number`. \ + #note[ + "long" refers to a "long integer", which has 32 bits. \ + Normal `int`s have 16 bits, `short int`s have 8. + ] \ + In other words, `number` is $n_f$ and `i` is $n_i$. +#v(2mm) + + +- Notice the right-shift in the second line of the function. \ + We translated `(i >> i)` into $(n_i div 2)$. +#v(2mm) + +- "`return * (float *) &i`" is again C magic. \ + Much like before, it tells us to return the value of the bits of `i` as a float. +#pagebreak() + +#generic("Setup:") +We are now ready to show that $#text[`Q_sqrt`] (x) approx 1/sqrt(x)$. \ +For convenience, let's call the bit string of the inverse square root $r$. \ +In other words, +$ + r_f := 1 / (sqrt(n_f)) +$ +This is the value we want to approximate. + +#problem(label: "finala") +Find an approximation for $log_2(r_f)$ in terms of $n_i$ and $epsilon$ \ +#note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + a)$.] + +#solution[ + $ + log_2(r_f) + = log_2(1 / sqrt(n_f)) + = (-1) / 2 log_2(n_f) + approx (-1) / 2 ( (n_i) / (2^23) + epsilon - 127 ) + $ +] + +#v(1fr) + +#problem(label: "finalb") +Let's call the "magic number" in the code above $kappa$, so that +$ + #text[`Q_sqrt`] (n_f) = kappa - (n_i div 2) +$ +Use @convert and @finala to show that $#text[`Q_sqrt`] (n_f) approx r_i$ + +#solution[ + From @convert, we know that + $ + log_2(r_f) approx (r_i) / (2^23) + epsilon - 127 + $ + + Combining this with the result from @finala, we get: + $ + (r_i) / (2^23) + epsilon - 127 + &approx (-1) / (2) ( (n_i) / (2^23) + epsilon - 127) \ + (r_i) / (2^23) + &approx (-1) / (2) ( (n_i) / (2^23)) + 3 / 2 (127 - epsilon) \ + r_i + &approx (-1) / 2 (n_i) + 2^23 3 / 2(127 - epsilon) + = 2^23 3 / 2 (127 - epsilon) - (n_i) / 2 + $ + + #v(2mm) + + This is exactly what we need! If we set $kappa$ to $(3 times 2^22) (127-epsilon)$, then + $ + r_i approx kappa - (n_i div 2) = #text[`Q_sqrt`] (n_f) + $ +] + +#v(1fr) + +#problem(label: "finalc") +What is the exact value of $kappa$ in terms of $epsilon$? \ +#hint[Look at @finalb. We already found it!] + +#solution[ + This problem makes sure our students see that + $kappa = (3 times 2^22) (127 - epsilon)$. \ + See the solution to @finalb. +] + +#v(2cm) + +#pagebreak() + +#remark() +In @finalc we saw that $kappa = (3 times 2^22) (127 - epsilon)$. \ +Looking at the code again, we see that $kappa = #text[`0x5f3759df`]$ in _Quake_: + +#v(2mm) + +```c +float Q_rsqrt( float number ) { + long i = * ( long * ) &number; + i = 0x5f3759df - ( i >> 1 ); + return * ( float * ) &i; +} +``` + +#v(2mm) +Using a calculator and some basic algebra, we can find the $epsilon$ this code uses: \ +#note[Remember, #text[`0x5f3759df`] is $6240089$ in hexadecimal.] +$ + (3 times 2^22) (127 - epsilon) &= 6240089 \ + (127 - epsilon) &= 126.955 \ + epsilon &= 0.0450466 +$ + +So, $0.045$ is the $epsilon$ used by Quake. \ +Online sources state that this constant was generated by trial-and-error, \ +though it is fairly close to the ideal $epsilon$. + +#remark() +And now, we're done! \ +We've shown that `Q_sqrt(x)` approximates $1/sqrt(x)$ fairly well, \ +thanks to the approximation $log(1+a) = a + epsilon$. + +#v(2mm) + +Notably, `Q_sqrt` uses _zero_ divisions or multiplications (`>>` doesn't count). \ +This makes it _very_ fast when compared to more traditional approximation techniques (i.e, Taylor series). + +#v(2mm) + +In the case of _Quake_, this is very important. 3D graphics require thousands of inverse-square-root calculations to render a single frame#footnote[e.g, to generate normal vectors], which is not an easy task for a Playstation running at 300MHz. + +#instructornote[ + Let $x$ be a bit string. If we assume $x_f$ is positive and $E$ is even, then + $ + (x #text[`>>`] 1)_f = 2^((E div 2) - 127) times (1 + (F div 2) / (2^(23))) + $ + Notably: a right-shift divides the exponent of $x_f$ by two, \ + which is, of course, a square root! + + #v(2mm) + + This intuition is hand-wavy, though: \ + If $E$ is odd, its lowest-order bit becomes the highest-order bit of $F$ when we shift $x$ right. \ + Also, a right shift doesn't divide the _entire_ exponent, skipping the $-127$ offset. \ + + #v(2mm) + + Remarkably, this intuition is still somewhat correct. \ + The bits align _just so_, and our approximation still works. + + #v(8mm) + + One can think of the fast inverse root as a "digital slide rule": \ + The integer representation of $x_f$ already contains $log_2(x_f)$, offset and scaled. \ + By subtracting and dividing in "log space", we effectively invert and root $x_f$! + + After all, + $ + - 1 / 2 log_2(n_f) = 1 / sqrt(n_f) + $ +] diff --git a/src/Advanced/Fast Inverse Root/parts/04 bonus.typ b/src/Advanced/Fast Inverse Root/parts/04 bonus.typ new file mode 100644 index 0000000..3cf5177 --- /dev/null +++ b/src/Advanced/Fast Inverse Root/parts/04 bonus.typ @@ -0,0 +1,36 @@ +#import "@local/handout:0.1.0": * + += Bonus -- More about Floats + +#problem() +Convince yourself that all numbers that can be represented as a float are rational. + +#problem() +Find a rational number that cannot be represented as a float. + +#v(1fr) + +#problem() +What is the smallest positive 32-bit float? + +#v(1fr) + +#problem() +What is the largest positive 32-bit float? + +#v(1fr) + +#problem() +How many floats are between $-1$ and $1$? + +#v(1fr) + +#problem() +How many floats are between $1$ and $2$? + +#v(1fr) + +#problem() +How many floats are between $1$ and $128$? + +#v(1fr)