From 2259ce1bcbabb07414d037fc25df0b16b7d9e625 Mon Sep 17 00:00:00 2001 From: Mark Date: Mon, 10 Feb 2025 19:43:56 -0800 Subject: [PATCH] Finish main sections --- src/Advanced/Fast Inverse Root/main.typ | 8 -- .../Fast Inverse Root/parts/00 int.typ | 75 ++++++++++--------- .../Fast Inverse Root/parts/02 approx.typ | 12 +++ .../Fast Inverse Root/parts/03 quake.typ | 57 ++++++++++---- 4 files changed, 97 insertions(+), 55 deletions(-) diff --git a/src/Advanced/Fast Inverse Root/main.typ b/src/Advanced/Fast Inverse Root/main.typ index f9cd1b3..63c34f5 100644 --- a/src/Advanced/Fast Inverse Root/main.typ +++ b/src/Advanced/Fast Inverse Root/main.typ @@ -1,11 +1,5 @@ #import "@local/handout:0.1.0": * -// Intro: -// - we don't need signed ints -// - Why is division expensive? -// Add a few problems add/multiplying/dividing floats -// - Spend more time on left-shift -// // Another look: // - Highlight that left-shift divides the exponent by two // - Highlight that log(ri) is already in its integer representation @@ -19,10 +13,8 @@ #show: doc => handout( doc, group: "Advanced 2", - title: [Fast Inverse Root], by: "Mark", - subtitle: "Based on a handout by Bryant Mathews", ) #include "parts/00 int.typ" diff --git a/src/Advanced/Fast Inverse Root/parts/00 int.typ b/src/Advanced/Fast Inverse Root/parts/00 int.typ index 209179d..6101049 100644 --- a/src/Advanced/Fast Inverse Root/parts/00 int.typ +++ b/src/Advanced/Fast Inverse Root/parts/00 int.typ @@ -22,57 +22,32 @@ What is the value of the following bit strings, if we interpret them as integers ]) #v(1fr) -#pagebreak() #definition() We can interpret a bit string in any number of ways. \ -One such interpretation is the _signed integer_, or `int` for short. \ -`ints` allow us to represent negative and positive integers using 32-bit strings. +One such interpretation is the _unsigned integer_, or `uint` for short. \ +`uint`s allow us to represent positive (hence "unsigned") integers using 32-bit strings. #v(2mm) -The first bit of an `int` tells us its sign: -- if the first bit is `1`, the _int_ represents a negative number; -- if the first bit is `0`, it represents a positive number. - -We do not need negative numbers today, so we will assume that the first bit is always zero. \ -#note([If you'd like to know how negative integers are written, look up "two's complement} after class.]) - -#v(2mm) - -The value of a positive signed `long` is simply the value of its binary digits: +The value of a `uint` is simply its value as a binary number: - $#text([`0b00000000_00000000_00000000_00000000`]) = 0$ - $#text([`0b00000000_00000000_00000000_00000011`]) = 3$ - $#text([`0b00000000_00000000_00000000_00100000`]) = 32$ - $#text([`0b00000000_00000000_00000000_10000010`]) = 130$ #problem() -What is the largest number we can represent with a 32-bit `int`? +What is the largest number we can represent with a 32-bit `uint`? #solution([ $#text([`0b01111111_11111111_11111111_11111111`]) = 2^(31)$ ]) #v(1fr) +#pagebreak() #problem() -What is the smallest possible number we can represented with a 32-bit `int`? \ -#hint([ - You do not need to know _how_ negative numbers are represented. \ - Assume that we do not skip any integers, and don't forget about zero. -]) - -#solution([ - There are $2^(64)$ possible 32-bit patterns, - of which 1 represents zero and $2^(31)$ represent positive numbers. - We therefore have access to $2^(64) - 1 - 2^(31)$ negative numbers, - giving us a minimum representable value of $-2^(31) + 1$. -]) - -#v(1fr) - -#problem() -Find the value of each of the following 32-bit `int`s: +Find the value of each of the following 32-bit unsigned integers: - `0b00000000_00000000_00000101_00111001` - `0b00000000_00000000_00000001_00101100` - `0b00000000_00000000_00000100_10110000` @@ -82,8 +57,40 @@ Find the value of each of the following 32-bit `int`s: - $#text([`0b00000000_00000000_00000101_00111001`]) = 1337$ - $#text([`0b00000000_00000000_00000001_00101100`]) = 300$ - $#text([`0b00000000_00000000_00000010_01011000`]) = 1200$ -]) -Notice that the third int is the second shifted left twice (i.e, multiplied by 4) + Notice that the third int is the second shifted left twice (i.e, multiplied by 4) ]) -#v(2fr) +#v(1fr) + + +#definition() +In general, division of `uints` is nontrivial#footnote([One may use repeated subtraction, but that isn't efficient.]). \ +Division by powers of two, however, is incredibly easy: \ +To divide by two, all we need to do is shift the bits of our integer right. + +#v(2mm) + +For example, consider $#text[`0b0000_0110`] = 6$. \ +If we insert a zero at the left end of this bit string and delete the digit at the right \ +(thus "shifting" each bit right), we get `0b0000_0011`, which is 3. \ + +#v(2mm) + +Of course, we loose the remainder when we left-shift an odd number: \ +$9 div 2 = 4$, since `0b0000_1001` shifted right is `0b0000_0100`. + +#problem() +Right shifts are denoted by the `>>` symbol: \ +$#text[`00110`] #text[`>>`] n$ means "shift `0b0110` right $n$ times." \ +Find the value of the following: +- $12 #text[`>>`] 1$ +- $27 #text[`>>`] 3$ +- $16 #text[`>>`] 8$ + +#solution[ + - $12 #text[`>>`] 1 = 6$ + - $27 #text[`>>`] 3 = 3$ + - $16 #text[`>>`] 8 = 0$ +] + +#v(1fr) diff --git a/src/Advanced/Fast Inverse Root/parts/02 approx.typ b/src/Advanced/Fast Inverse Root/parts/02 approx.typ index 7bd75b6..439e262 100644 --- a/src/Advanced/Fast Inverse Root/parts/02 approx.typ +++ b/src/Advanced/Fast Inverse Root/parts/02 approx.typ @@ -159,3 +159,15 @@ $ ]) #v(1fr) + + +#problem() +Using basic log rules, rewrite $log_2(1 / sqrt(x))$ in terms of $log_2(x)$. + +#solution([ + $ + log_2(1 / sqrt(x)) = (-1) / (2)log_2(x) + $ +]) + +#v(1fr) diff --git a/src/Advanced/Fast Inverse Root/parts/03 quake.typ b/src/Advanced/Fast Inverse Root/parts/03 quake.typ index ca7c48c..c5222fb 100644 --- a/src/Advanced/Fast Inverse Root/parts/03 quake.typ +++ b/src/Advanced/Fast Inverse Root/parts/03 quake.typ @@ -20,14 +20,17 @@ float Q_rsqrt( float number ) { This code defines a function `Q_rsqrt` that consumes a float `number` and approximates its inverse square root. If we rewrite this using notation we're familiar with, we get the following: $ - #text[`Q_sqrt`] (n_f) = 6240089 - (n_i div 2) + #text[`Q_sqrt`] (n_f) = + #h(5mm) + 6240089 - (n_i div 2) + #h(5mm) approx 1 / sqrt(n_f) $ -#note([ +#note[ `0x5f3759df` is $6240089$ in hexadecimal. \ It is a magic number hard-coded into `Q_sqrt`. -]) +] #v(2mm) @@ -36,16 +39,28 @@ Our goal in this section is to understand why this works: - What's special about $6240089$? -#problem() -Using basic log rules, rewrite $log_2(1 / sqrt(x))$ in terms of $log_2(x)$. - -#solution([ - $ - log_2(1 / sqrt(x)) = (-1) / (2)log_2(x) - $ -]) #v(1fr) + +#remark() +For those that are interested, here are the details of the "code-to-math" translation: + +- "`long i = * (long *) &number`" is C magic that tells the compiler \ + to set `i` to the `uint` value of the bits of `number`. \ + #note[ + "long" refers to a "long integer", which has 32 bits. \ + Normal `int`s have 16 bits, `short int`s have 8. + ] \ + In other words, `number` is $n_f$ and `i` is $n_i$. +#v(2mm) + + +- Notice the right-shift in the second line of the function. \ + We translated `(i >> i)` into $(n_i div 2)$. +#v(2mm) + +- "`return * (float *) &i`" is again C magic. \ + Much like before, it tells us to return the value of the bits of `i` as a float. #pagebreak() #generic("Setup:") @@ -144,6 +159,22 @@ $ $ So, $0.045$ is the $epsilon$ used by Quake. \ -This constant was likely generated by trial-and-error, and is fairly close to the ideal $epsilon$. +Online sources state that this constant was generated by trial-and-error, \ +though it is fairly close to the ideal $epsilon$. + +#v(4mm) + +#remark() +And now, we're done! \ +We've shown that `Q_sqrt(x)` approximates $1/sqrt(x)$ fairly well, \ +thanks to the approximation $log(1+a) = a + epsilon$. + +#v(2mm) + +Notably, `Q_sqrt` uses _zero_ divisions or multiplications (`>>` doesn't count). \ +This makes it _very_ fast when compared to more traditional approximation techniques like Fourier series. + +#v(2mm) + +In the case of _Quake_, this is very important. 3D graphics require thousands of inverse-square-root calculations to render a single frame#footnote[e.g, to generate normal vectors], which is not an easy task for a Playstation running at 300MHz. -#if_no_solutions(v(2cm))