Tom edits
All checks were successful
CI / Typst formatting (pull_request) Successful in 4s
CI / Typos (pull_request) Successful in 13s
CI / Build (pull_request) Successful in 8m20s

This commit is contained in:
Mark 2025-02-12 16:37:14 -08:00
parent 623450af26
commit 1c3db4b18d
4 changed files with 30 additions and 16 deletions

View File

@ -5,7 +5,8 @@
#definition() #definition()
A _bit string_ is a string of binary digits. \ A _bit string_ is a string of binary digits. \
In this handout, we'll denote bit strings with the prefix `0b`. \ In this handout, we'll denote bit strings with the prefix `0b`. \
That is, $1010 =$ "one thousand and one," while $#text([`0b1001`]) = 2^3 + 2^0 = 9$ #note[This prefix is only notation---it is _not_ part of the string itself.] \
For example, $1010$ is the number "one thousand and one," while $#text([`0b1001`])$ is the string of bits "1 0 0 1".
#v(2mm) #v(2mm)
We will separate long bit strings with underscores for readability. \ We will separate long bit strings with underscores for readability. \
@ -40,7 +41,7 @@ The value of a `uint` is simply its value as a binary number:
What is the largest number we can represent with a 32-bit `uint`? What is the largest number we can represent with a 32-bit `uint`?
#solution([ #solution([
$#text([`0b01111111_11111111_11111111_11111111`]) = 2^(31)$ $#text([`0b11111111_11111111_11111111_11111111`]) = 2^(32)-1$
]) ])
#v(1fr) #v(1fr)
@ -53,6 +54,10 @@ Find the value of each of the following 32-bit unsigned integers:
- `0b00000000_00000000_00000100_10110000` - `0b00000000_00000000_00000100_10110000`
#hint([The third conversion is easy---look carefully at the second.]) #hint([The third conversion is easy---look carefully at the second.])
#instructornote[
Consider making a list of the powers of two $>= 1024$ on the board.
]
#solution([ #solution([
- $#text([`0b00000000_00000000_00000101_00111001`]) = 1337$ - $#text([`0b00000000_00000000_00000101_00111001`]) = 1337$
- $#text([`0b00000000_00000000_00000001_00101100`]) = 300$ - $#text([`0b00000000_00000000_00000001_00101100`]) = 300$
@ -64,7 +69,7 @@ Find the value of each of the following 32-bit unsigned integers:
#definition() #definition()
In general, division of `uints` is nontrivial#footnote([One may use repeated subtraction, but that isn't efficient.]). \ In general, fast division of `uints` is difficult.#footnote([One may use repeated subtraction, but this isn't efficient.]). \
Division by powers of two, however, is incredibly easy: \ Division by powers of two, however, is incredibly easy: \
To divide by two, all we need to do is shift the bits of our integer right. To divide by two, all we need to do is shift the bits of our integer right.
@ -76,8 +81,8 @@ If we insert a zero at the left end of this bit string and delete the digit at t
#v(2mm) #v(2mm)
Of course, we loose the remainder when we left-shift an odd number: \ Of course, we lose the remainder when we left-shift an odd number: \
$9 div 2 = 4$, since `0b0000_1001` shifted right is `0b0000_0100`. $9$ shifted right is $4$, since `0b0000_1001` shifted right is `0b0000_0100`.
#problem() #problem()
Right shifts are denoted by the `>>` symbol: \ Right shifts are denoted by the `>>` symbol: \
@ -86,6 +91,7 @@ Find the value of the following:
- $12 #text[`>>`] 1$ - $12 #text[`>>`] 1$
- $27 #text[`>>`] 3$ - $27 #text[`>>`] 3$
- $16 #text[`>>`] 8$ - $16 #text[`>>`] 8$
#note[Naturally, you'll have to convert these integers to binary first.]
#solution[ #solution[
- $12 #text[`>>`] 1 = 6$ - $12 #text[`>>`] 1 = 6$

View File

@ -3,7 +3,7 @@
= Floats = Floats
#definition() #definition()
_Binary decimals_#footnote["decimal" is a misnomer, but that's ok.] are very similar to base-10 decimals. \ _Binary decimals_#footnote([Note that "binary decimal" is a misnomer---"deci" means "ten"!]) \
In base 10, we interpret place value as follows: In base 10, we interpret place value as follows:
- $0.1 = 10^(-1)$ - $0.1 = 10^(-1)$
- $0.03 = 3 times 10^(-2)$ - $0.03 = 3 times 10^(-2)$
@ -107,11 +107,13 @@ Floats represent a subset of the real numbers, and are interpreted as follows: \
- The next eight bits represent the _exponent_ of this float. - The next eight bits represent the _exponent_ of this float.
#note([(we'll see what that means soon)]) \ #note([(we'll see what that means soon)]) \
We'll call the value of this eight-bit binary integer $E$. \ We'll call the value of this eight-bit binary integer $E$. \
Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits.)]) Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits)])
- The remaining 23 bits represent the _fraction_ of this float, which we'll call $F$. \ - The remaining 23 bits represent the _fraction_ of this float. \
These 23 bits are interpreted as the fractional part of a binary decimal. \ They are interpreted as the fractional part of a binary decimal. \
For example, the bits `0b10100000_00000000_00000000` represents $0.5 + 0.125 = 0.625$. For example, the bits `0b10100000_00000000_00000000` represent $0.5 + 0.125 = 0.625$. \
We'll call the value of these bits as a binary integer $F$. \
Their value as a binary decimal is then $F div 2^23$. #note([(Convince yourself that this is true!)])
#problem(label: "floata") #problem(label: "floata")
@ -135,12 +137,17 @@ $
(-1)^s times 2^(E - 127) times (1 + F / (2^(23))) (-1)^s times 2^(E - 127) times (1 + F / (2^(23)))
$ $
Notice that this is very similar to decimal scientific notation, which is written as Notice that this is very similar to base-10 scientific notation, which is written as
$ $
(-1)^s times 10^(e) times (f) (-1)^s times 10^(e) times (f)
$ $
#note[
We subtract 127 from $E$ so we can represent positive and negative numbers. \
$E$ is an eight bit binary integer, so $0 <= E <= 255$ and $-127 <= (E - 127) <= 127$.
]
#problem() #problem()
Consider `0b01000001_10101000_00000000_00000000`. \ Consider `0b01000001_10101000_00000000_00000000`. \
This is the same bit string we used in @floata. \ This is the same bit string we used in @floata. \

View File

@ -18,7 +18,7 @@ This allows us to improve the average error of our linear approximation:
align: center, align: center,
columns: (1fr, 1fr), columns: (1fr, 1fr),
inset: 5mm, inset: 5mm,
[$log(1+x)$ and $x + 0$] [$log_2(1+x)$ and $x + 0$]
+ cetz.canvas({ + cetz.canvas({
import cetz.draw: * import cetz.draw: *
@ -64,7 +64,7 @@ This allows us to improve the average error of our linear approximation:
Max error: 0.086 \ Max error: 0.086 \
Average error: 0.0573 Average error: 0.0573
], ],
[$log(1+x)$ and $x + 0.045$] [$log(1+x)_2$ and $x + 0.045$]
+ cetz.canvas({ + cetz.canvas({
import cetz.draw: * import cetz.draw: *

View File

@ -29,6 +29,7 @@ $
#note[ #note[
`0x5f3759df` is $6240089$ in hexadecimal. \ `0x5f3759df` is $6240089$ in hexadecimal. \
Ask an instructor to explain if you don't know what this means. \
It is a magic number hard-coded into `Q_sqrt`. It is a magic number hard-coded into `Q_sqrt`.
] ]
@ -56,7 +57,7 @@ For those that are interested, here are the details of the "code-to-math" transl
- Notice the right-shift in the second line of the function. \ - Notice the right-shift in the second line of the function. \
We translated `(i >> i)` into $(n_i div 2)$. We translated `(i >> 1)` into $(n_i div 2)$.
#v(2mm) #v(2mm)
- "`return * (float *) &i`" is again C magic. \ - "`return * (float *) &i`" is again C magic. \
@ -64,7 +65,7 @@ For those that are interested, here are the details of the "code-to-math" transl
#pagebreak() #pagebreak()
#generic("Setup:") #generic("Setup:")
We are now ready to show that $#text[`Q_sqrt`] (x) approx 1/sqrt(x)$. \ We are now ready to show that $#text[`Q_sqrt`] (x)$ effectively approximates $1/sqrt(x)$. \
For convenience, let's call the bit string of the inverse square root $r$. \ For convenience, let's call the bit string of the inverse square root $r$. \
In other words, In other words,
$ $
@ -74,7 +75,7 @@ This is the value we want to approximate.
#problem(label: "finala") #problem(label: "finala")
Find an approximation for $log_2(r_f)$ in terms of $n_i$ and $epsilon$ \ Find an approximation for $log_2(r_f)$ in terms of $n_i$ and $epsilon$ \
#note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + a)$.] #note[Remember, $epsilon$ is the correction constant in our approximation of $log_2(1 + x)$.]
#solution[ #solution[
$ $