handouts/src/Advanced/Fast Inverse Root/parts/01 float.typ

#import "@local/handout:0.1.0": *
#import "@preview/cetz:0.3.1"

= Floats
#definition()
_Binary decimals_#footnote["decimal" is a misnomer, but that's ok.] are very similar to base-10 decimals. \
In base 10, we interpret place value as follows:
- $0.1 = 10^(-1)$
- $0.03 = 3 times 10^(-2)$
- $0.0208 = 2 times 10^(-2) + 8 times 10^(-4)$

#v(5mm)

We can do the same in base 2:
- $#text([`0.1`]) = 2^(-1) = 0.5$
- $#text([`0.011`]) = 2^(-2) + 2^(-3) = 0.375$
- $#text([`101.01`]) = 5.125$

#v(5mm)

#problem()
Rewrite the following binary decimals in base 10: \
#note([You may leave your answer as a fraction.])
- `1011.101`
- `110.1101`


#v(1fr)
#pagebreak()

#definition()
Another way we can interpret a bit string is as a _signed floating-point decimal_, or a `float` for short. \
Floats represent a subset of the real numbers, and are interpreted as follows: \
#note([The following only applies to floats that consist of 32 bits. We won't encounter any others today.])

#align(
  center,
  box(
    inset: 2mm,
    cetz.canvas({
      import cetz.draw: *

      let chars = (
        `0`,
        `b`,
        `0`,
        `_`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `_`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `_`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `_`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
        `0`,
      )

      let x = 0
      for c in chars {
        content((x, 0), c)
        x += 0.25
      }

      let y = -0.4
      line((0.3, y), (0.65, y))
      content((0.45, y - 0.2), [s])

      line((0.85, y), (2.9, y))
      content((1.9, y - 0.2), [exponent])

      line((3.10, y), (9.4, y))
      content((6.3, y - 0.2), [fraction])
    }),
  ),
)

- The first bit denotes the sign of the float's value
  We'll label it $s$. \
  If $s = #text([`1`])$, this float is negative; if $s = #text([`0`])$, it is positive.

- The next eight bits represent the _exponent_ of this float.
  #note([(we'll see what that means soon)]) \
  We'll call the value of this eight-bit binary integer $E$. \
  Naturally, $0 <= E <= 255$ #note([(since $E$ consist of eight bits.)])

- The remaining 23 bits represent the _fraction_ of this float, which we'll call $F$. \
  These 23 bits are interpreted as the fractional part of a binary decimal. \
  For example, the bits `0b10100000_00000000_00000000` represents $0.5 + 0.125 = 0.625$.


#problem(label: "floata")
Consider `0b01000001_10101000_00000000_00000000`. \
Find the $s$, $E$, and $F$ we get if we interpret this bit string as a `float`. \
#note([Leave $F$ as a sum of powers of two.])

#solution([
  $s = 0$ \
  $E = 258$ \
  $F = 2^31+2^19 = 2,621,440$
])

#v(1fr)


#definition(label: "floatdef")
The final value of a float with sign $s$, exponent $E$, and fraction $F$ is

$
  (-1)^s times 2^(E - 127) times (1 + F / (2^(23)))
$

Notice that this is very similar to decimal scientific notation, which is written as

$
  (-1)^s times 10^(e) times (f)
$

#problem()
Consider `0b01000001_10101000_00000000_00000000`. \
This is the same bit string we used in @floata. \

#v(2mm)

What value do we get if we interpret this bit string as a float? \
#hint([$21 div 16 = 1.3125$])

#solution([
  This is 21:
  $
    2^(131) times (1 + (2^(21) + 2^(19)) / (2^(23)))
    = 2^(4) times (1 + 0.25 + 0.0625)
    = 16 times (1.3125)
    = 21
  $
])

#v(1fr)
#pagebreak()

#problem()
Encode $12.5$ as a float. \
#hint([$12.5 div 8 = 1.5625$])

#solution([
  $
    12.5
    = 8 times 1.5625
    = 2^(3) times (1 + (0.5 + 0.0625))
    = 2^(130) times (1 + (2^(22) + 2^(19)) / (2^(23)))
  $

  which is `0b01000001_01001000_00000000_00000000`. \
])


#v(1fr)

#definition()
Say we have a bit string $x$. \
We'll let $x_f$ denote the value we get if we interpret $x$ as a float, \
and we'll let $x_i$ denote the value we get if we interpret $x$ an integer.

#problem()
Let $x = #text[`0b01000001_01001000_00000000_00000000`]$. \
What are $x_f$ and $x_i$? #note([As always, you may leave big numbers as powers of two.])
#solution([
  $x_f = 12.5$

  #v(2mm)

  $x_i = 2^30 + 2^24 + 2^22 + 2^19 = 11,095,237,632$
])

#v(1fr)