Advanced handouts

Add missing file Co-authored-by: Mark <mark@betalupi.com> Co-committed-by: Mark <mark@betalupi.com>
2025-01-22 12:28:44 -08:00
parent 13b65a6c64
commit dd4abdbab0
177 changed files with 20658 additions and 0 deletions
--- a/src/Advanced/Compression/main.tex
+++ b/src/Advanced/Compression/main.tex
@ -0,0 +1,31 @@
+% use [nosolutions] flag to hide solutions.
+% use [solutions] flag to show solutions.
+\documentclass[
+	solutions,
+	singlenumbering
+]{../../../lib/tex/ormc_handout}
+\usepackage{../../../lib/tex/macros}
+
+\input{tikzset.tex}
+\usepackage{units}
+\usepackage{pdfpages}
+
+\uptitlel{Advanced 2}
+\uptitler{\smallurl{}}
+\title{Compression}
+\subtitle{Prepared by Mark on \today{}}
+
+% TODO: add a section on info theory,
+% shannon entropy. etc.
+
+\begin{document}
+
+	\maketitle
+
+	\input{parts/0 intro.tex}
+	\input{parts/1 runlength.tex}
+	\input{parts/2 lzss.tex}
+	\input{parts/3 huffman.tex}
+	\input{parts/4 bonus.tex}
+
+\end{document}
--- a/src/Advanced/Compression/meta.toml
+++ b/src/Advanced/Compression/meta.toml
@ -0,0 +1,6 @@
+[metadata]
+title = "Compression"
+
+[publish]
+handout = true
+solutions = true
--- a/src/Advanced/Compression/parts/0
+++ b/src/Advanced/Compression/parts/0
@ -0,0 +1,38 @@
+\section{Introduction}
+
+\definition{}
+An \textit{alphabet} is a set of symbols. Two examples are
+$\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ and $\{\texttt{0}, \texttt{1}\}$.
+
+\definition{}
+A \textit{string} is a sequence of symbols from an alphabet. \par
+For example, \texttt{CBCAADDD} is a string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$.
+
+\problem{}
+Say we want to store a length-$n$ string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ as a binary sequence. \par
+How many bits will we need? \par
+\hint{
+	Our alphabet has four symbols, so we can encode each symbol using two bits, \par
+	mapping $\texttt{A} \rightarrow \texttt{00}$,
+	$\texttt{B} \rightarrow \texttt{01}$,
+	$\texttt{C} \rightarrow \texttt{10}$, and
+	$\texttt{D} \rightarrow \texttt{11}$.
+}
+
+\begin{solution}
+	$2n$ bits.
+\end{solution}
+
+\vfill
+
+
+\problem{}<naivelen>
+Similarly, we can encode an $n$-symbol string over an alphabet of size $k$ \par
+using $n \times \lceil \log_2k \rceil$ bits. Show that this is true. \par
+\note[Note]{We'll call this the \textit{na\"ive coding scheme}.}
+
+
+\vfill
+As you might expect, this isn't ideal: we can do much better than $n \times \lceil \log_2k \rceil$.
+We will spend the rest of this handout exploring more efficient ways of encoding such sequences of symbols.
+\pagebreak
--- a/src/Advanced/Compression/parts/1
+++ b/src/Advanced/Compression/parts/1
@ -0,0 +1,190 @@
+% TODO:
+% Basic run-length
+% LZ77
+
+\section{Run-length Coding}
+
+
+%\definition{}
+%\textit{Entropy} is a measure of information in a certain sequence. \par
+%A sequence with high entropy contains a lot of information, and a sequence with low entropy contains relatively little.
+%For example, consider the following two ten-symbol ASCII\footnotemark{} strings:
+%\begin{itemize}
+%	\item \texttt{AAAAAAAAAA}
+%	\item \texttt{pDa3:7?j;F}
+%\end{itemize}
+%The first string clearly contains less information than the second.
+%It's much harder to describe \texttt{pDa3:7?j;F} than it is \texttt{AAAAAAAAAA}.
+%Thus, we say that the first has low entropy, and the second has fairly high entropy.
+%
+%\vspace{2mm}
+%
+%The definition above is intentionally hand-wavy. \par
+%Formal definitions of entropy exist, but we won't need them today---we just need
+%an intuitive understanding of the \say{density} of information in a given string.
+
+%
+%\footnotetext{
+%	American Standard Code for Information Exchange, an early character encoding for computers. \par
+%	It contains 128 symbols, including numbers, letters, and
+%	\texttt{!"\#\$\%\&`()*+,-./:;<=>?@[\textbackslash]\^\_\{|\}\textasciitilde}
+%}
+
+
+%\vspace{5mm}
+
+
+\problem{}<runlenone>
+Using the na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} in binary. \par
+\note[Note]{
+	We're still using the four-symbol alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$. \par
+	Dots ($\cdot$) in the string are drawn for readability. Ignore them.
+}
+
+\begin{solution}
+	There are eight \texttt{A}s on each end of that string. Mapping symbols as before, \par
+	we get \texttt{[00 00 00 00 00 00 00 00 01 10 11 00 00 00 00 00 00 00 00]}
+
+	\begin{instructornote}
+		In this handout, all encoded binary is written in square brackets. \par
+		Spaces, dashes, dots, and etc are added for readability, and should be ignored.
+	\end{instructornote}
+\end{solution}
+
+
+\vfill
+In \ref{runlenone}---and often, in the real world---the strings we want to encode have fairly low \textit{entropy}. \par
+That is, they have predictable patterns, sequences of symbols that don't contain a lot of information. \par
+\note{
+	For example, consider the text in this document. \par
+	The symbols \texttt{e}, \texttt{t}, and \texttt{<space>} are much more common than any others. \par
+	Also, certain subsequences are repeated: \texttt{th}, \texttt{and}, \texttt{encode}, and so on.
+}
+We can exploit this fact to develop encoding schemes that need relatively few bits per letter.
+
+\example{}
+A simple example of such a coding scheme is \textit{run-length encoding}. Instead of simply listing letters of a string
+in their binary form, we'll add a \textit{count} to each letter, shortening repeated instances of the same symbol.
+
+\vspace{2mm}
+
+We'll encode our string into a sequence of 6-bit blocks, interpreted as follows:
+
+\begin{center}
+	\begin{tikzpicture}
+		\node[anchor=west,color=gray] at (-2.3, 0) {Bits};
+		\node[anchor=west,color=gray] at (-2.3, -0.5) {Meaning};
+		\draw[color=gray] (-2.3, -0.25) -- (5.5, -0.25);
+		\draw[color=gray] (-2.3, 0.15) -- (-2.3, -0.65);
+
+		\node at (0, 0) {\texttt{0}};
+		\node at (1, 0) {\texttt{0}};
+		\node at (2, 0) {\texttt{1}};
+		\node at (3, 0) {\texttt{1}};
+		\node at (4, 0) {\texttt{0}};
+		\node at (5, 0) {\texttt{1}};
+
+		\draw (-0.5, 0.25) -- (5.5, 0.25);
+		\draw (-0.5, -0.25) -- (5.5, -0.25);
+		\draw (-0.5, -0.75) -- (5.5, -0.75);
+
+		\draw (-0.5, 0.25) -- (-0.5, -0.75);
+		\draw (3.5, 0.25) -- (3.5, -0.75);
+		\draw (5.5, 0.25) -- (5.5, -0.75);
+
+		\node at (1.5, -0.5) {number of copies};
+		\node at (4.5, -0.5) {symbol};
+	\end{tikzpicture}
+\end{center}
+So, the sequence \texttt{BBB} will be encoded as \texttt{[0011-01]}. \par
+\note[Notation]{
+	Just like dots, dashes and spaces are added for readability. Pretend they don't exist. \par
+	Encoded binary sequences will always be written in square brackets. \texttt{[]}.
+}
+
+\problem{}
+Decode \texttt{[010000001111]} using this scheme.
+
+\begin{solution}
+	\texttt{AAAADDD}
+\end{solution}
+\vfill
+
+\problem{}
+Encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} using this scheme. \par
+Is this more or less efficient than \ref{runlenone}?
+
+\begin{solution}
+	\texttt{[1000-00 0001-01 0001-10 0001-11 1000-00]} \par
+	This requires 30 bits, as compared to 38 in \ref{runlenone}.
+\end{solution}
+
+\vfill
+\pagebreak
+
+
+
+
+
+
+\problem{}
+Give an example of a message on $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$
+that uses $n$ bits when encoded with a na\"ive scheme, and \textit{fewer} than $\nicefrac{n}{2}$ bits
+when encoded using the scheme described on the previous page.
+
+
+\vfill
+
+\problem{}
+Give an example of a message on $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$
+that uses $n$ bits when encoded with a na\"ive scheme, and \textit{more} than $2n$ bits
+when encoded using the scheme described on the previous page.
+
+
+\vfill
+
+
+\problem{}
+Is run-length coding always more efficient than na\"ive coding? \par
+When does it work well, and when does it fail?
+
+\vfill
+
+
+\problem{}
+Our coding scheme wastes a lot of space when our string has few runs of the same symbol. \par
+Fix this problem: modify the scheme so that single occurrences of symbols do not waste space. \par
+\hint{We don't need a run length for every symbol. We only need one for \textit{repeated} symbols.}
+
+\begin{solution}
+	One idea is as follows: \par
+	\begin{itemize}
+		\item Encode single symbols na\"ively: \texttt{ABCD} becomes \texttt{[00 01 10 11]}
+		\item Signal runs using two copies of the same symbol: \texttt{AAAAAA} becomes \texttt{[00 00 0110]}. \par
+		When our decoder sees two copies of the same symbol, it will interpret the next four bits as
+		a run length.
+	\end{itemize}
+	\texttt{BDC$\cdot$DDDDD$\cdot$AADBDC} will be encoded as \texttt{[01 11 10 11-11-0101 01-01-0010 11 01 11 10]}.
+\end{solution}
+
+\vfill
+
+\problem{}<firstlz>
+Consider the following string: \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD}. \par
+\begin{itemize}
+	\item How many bits do we need to encode this na\"ively? \par
+	\item How about with the (unmodified) run-length scheme described on the previous page?
+\end{itemize}
+\hint{You don't need to encode this string---just find the length of its encoded form.}
+
+\begin{solution}
+	Na\"ively: \tab 22 bits \par
+	Run-length: \tab $6 \times 21 = 126$ bits. Watch out for the two repeated \texttt{A}s!
+\end{solution}
+
+
+\vfill
+
+Neither solution to \ref{firstlz} is ideal. Run-length is very wasteful due to the lack of runs, and na\"ive coding
+does not take advantage of repetition in the string. We'll need a better coding scheme.
+\pagebreak
--- a/src/Advanced/Compression/parts/2
+++ b/src/Advanced/Compression/parts/2
@ -0,0 +1,174 @@
+\section{LZ Codes}
+
+The LZ-family\footnotemark{} of codes (LZ77, LZ78, LZSS, LZMA, and others) take advantage of repeated subsequences
+in a string. They are the basis of most modern compression algorithms, including DEFLATE, which is used in the ZIP, PNG,
+and GZIP formats.
+
+\footnotetext{
+	Named after Abraham Lempel and Jacob Ziv, the original inventors. \par
+	LZ77 is the algorithm described in their first paper on the topic, which was published in 1977. \par
+	LZ78, LZSS, and LZMA are minor variations on the same general idea.
+}
+
+\vspace{2mm}
+
+The idea behind LZ is to represent repeated substrings as \textit{pointers} to previous parts of the string. \par
+Pointers take the form \texttt{<pos, len>}, where \texttt{pos} is the position of the string to repeat and
+\texttt{len} is the number of symbols to copy.
+
+\vspace{2mm}
+
+For example, we can encode the string \texttt{ABRACADABRA} as \texttt{[ABRACAD<7, 4>]}. \par
+The pointer \texttt{<7, 4>} tells us to look back 7 positions (to the first \texttt{A}), and copy the next 4 symbols. \par
+Note that pointers refer to the partially decoded output---\textit{not} to the encoded string. \par
+This allows pointers to reference other pointers, and ensures that codes like \texttt{A<1,9>} are valid. \par
+\note{For example, \texttt{[B<1,2>]} decodes to \texttt{BBB}.}
+
+\problem{}
+Encode \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} using this scheme. \par
+Then, decode the following:
+\begin{itemize}
+	\item \texttt{[ABCD<4,4>]}
+	\item \texttt{[A<1,9>]}
+	\item \texttt{[DAC<3,5>]}
+\end{itemize}
+
+\begin{solution}
+
+	% spell:off
+	\texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} becomes \texttt{[ABCD<4, 4> BA<2,4> ABCD<4,4>]}.
+	% spell:on
+
+	\linehack{}
+
+	In parts two and three, remember that we're reading the \textit{output string.} \par
+	The ten \texttt{A}s in part two are produced one by one, \par
+	with the decoder's \say{read head} following its \say{write head.}
+
+	\begin{itemize}
+		\item \texttt{ABCD$\cdot$ABCD}
+		\item \texttt{AAAAA$\cdot$AAAAA}
+		\item \texttt{DACDACDA}
+	\end{itemize}
+\end{solution}
+
+\vfill
+
+\problem{}
+Convince yourself that LZ is a generalization of the run-length code we discussed in the previous section.
+\hint{\texttt{[A<1,9>]} and \texttt{[00-1001]} are the same thing!}
+
+\remark{}
+Note that we left a few things out of this section: we didn't discuss the algorithm that converts a string to an LZ-encoded blob,
+nor did we discuss how we should represent strings encoded with LZ in binary. We skipped these details because they are
+problems of implementation---they're the engineer's headache, not the mathematician's. \par
+
+\pagebreak
+
+%\begin{instructornote}
+%	A simple LZ-scheme can work as follows. We encode our string into a sequence of
+%	nine-bit blocks, drawn below. The first bit of each block tells us whether or not
+%	this block is a pointer, and the next eight bits contain either a \texttt{pos, len} pair
+%	(using, say, for bits for each number) or a plain eight-bit symbol code.
+%	\begin{center}
+%		\begin{tikzpicture}
+%			\node[anchor=west,color=gray] at (-2.3, 0) {Bits};
+%			\node[anchor=west,color=gray] at (-2.3, -0.5) {Meaning};
+%			\draw[color=gray] (-2.3, -0.25) -- (5.5, -0.25);
+%			\draw[color=gray] (-2.3, 0.15) -- (-2.3, -0.65);
+%
+%			\node at (0, 0) {\texttt{0}};
+%			\node at (1, 0) {\texttt{0}};
+%			\node at (2, 0) {\texttt{1}};
+%			\node at (3, 0) {\texttt{0}};
+%			\node at (4, 0) {\texttt{1}};
+%			\node at (5, 0) {\texttt{1}};
+%			\node at (6, 0) {\texttt{0}};
+%			\node at (7, 0) {\texttt{0}};
+%			\node at (8, 0) {\texttt{1}};
+%
+%			\draw (-0.5, 0.25) -- (8.5, 0.25);
+%			\draw (-0.5, -0.25) -- (8.5, -0.25);
+%			\draw (-0.5, -0.75) -- (8.5, -0.75);
+%
+%			\draw (-0.5, 0.25) -- (-0.5, -0.75);
+%			\draw (0.5, 0.25) -- (0.5, -0.75);
+%			\draw (8.5, 0.25) -- (8.5, -0.75);
+%
+%			\node at (0, -0.5) {flag};
+%			\node at (4.5, -0.5) {if flag \texttt{<pos, len>}, else eight-bit symbol};
+%		\end{tikzpicture}
+%	\end{center}
+%
+%	To encode a string, we read it using a \say{window}, shown below. This window consists of
+%	a search buffer and a lookahead buffer, both of which have a fixed (but configurable) size.
+%	This window passes over the string one character at a time, inserting a pointer if it finds
+%	the lookahead buffer inside its search buffer, and a plain character otherwise.
+%
+%
+%	\begin{center}
+%		\begin{tikzpicture}
+%			% Text tape
+%			\node[color=gray] at (-0.75, 0) {\texttt{...}};
+%			\node[color=gray] at (0.0, 0) {\texttt{D}};
+%			\node at (0.5, 0) {\texttt{A}};
+%			\node at (1.0, 0) {\texttt{B}};
+%			\node at (1.5, 0) {\texttt{C}};
+%			\node at (2.0, 0) {\texttt{D}};
+%			\node at (2.5, 0) {\texttt{A}};
+%			\node at (3.0, 0) {\texttt{B}};
+%			\node at (3.5, 0) {\texttt{C}};
+%			\node at (4.0, 0) {\texttt{D}};
+%			\node[color=gray] at (4.5, 0) {\texttt{B}};
+%			\node[color=gray] at (5.0, 0) {\texttt{D}};
+%			\node[color=gray] at (5.5, 0) {\texttt{A}};
+%			\node[color=gray] at (6.0, 0) {\texttt{C}};
+%			\node[color=gray] at (6.75, 0) {\texttt{...}};
+%
+%			\draw (-1.75, 0.25) -- (7.25, 0.25);
+%			\draw (-1.75, -0.25) -- (7.25, -0.25);
+%
+%
+%			\draw[line width = 0.7mm, color=oblue, dotted] (2.25, 0.5) -- (2.25, -0.5);
+%			\draw[line width = 0.7mm, color=oblue]
+%				(-1.25, 0.5)
+%				-- (4.25, 0.5)
+%				-- (4.25, -0.5)
+%				-- (-1.25, -0.5)
+%				-- cycle
+%			;
+%
+%			\draw
+%				(4.2, -0.625)
+%				-- (4.2, -0.75)
+%				to node[anchor=north, midway] {lookahead} (2.3, -0.75)
+%				-- (2.3, -0.625)
+%			;
+%
+%			\draw
+%				(2.2, -0.625)
+%				-- (2.2, -0.75)
+%				to node[anchor=north, midway] {search buffer} (-1.1, -0.75)
+%				-- (-1.1, -0.625)
+%			;
+%
+%			\draw[color=gray]
+%				(2.2, 0.625)
+%				-- (2.2, 0.75)
+%				to node[anchor=south, midway] {match!} (0.3, 0.75)
+%				-- (0.3, 0.625)
+%			;
+%
+%			%\draw[->, color=gray] (2.5, 0.3) -- (2.5, 0.8) to[out=90,in=90] (0.5, 0.8);
+%			\node at (7.0, -0.75) {Result: \texttt{[$\cdot\cdot\cdot$DABCD<4,4>$\cdot\cdot\cdot$]}};
+%		\end{tikzpicture}
+%	\end{center}
+%
+%	This is not the exact process used in practice---but it's close enough. \par
+%	This process may be tweaked in any number of ways.
+%\end{instructornote}
+%
+%\makeatletter\if@solutions
+%	\vfill
+%	\pagebreak
+%\fi\makeatother
--- a/src/Advanced/Compression/parts/3
+++ b/src/Advanced/Compression/parts/3
@ -0,0 +1,424 @@
+\section{Huffman Codes}
+
+
+\example{}
+Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
+With the na\"ive coding scheme, we can encode a length $n$ string with $3n$ bits, by mapping...
+\begin{itemize}
+	\item $\texttt{A}$ to $\texttt{000}$
+	\item $\texttt{B}$ to $\texttt{001}$
+	\item $\texttt{C}$ to $\texttt{010}$
+	\item $\texttt{D}$ to $\texttt{011}$
+	\item $\texttt{E}$ to $\texttt{100}$
+\end{itemize}
+For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par
+It is easy to see that this scheme uses an average of three bits per symbol.
+
+\vspace{2mm}
+
+However, one could argue that this coding scheme is wasteful: \par
+we're not using three of the eight possible three-bit sequences!
+
+\example{}
+There is, of course, a better way. \par
+Consider the following mapping:
+
+\begin{itemize}
+	\item $\texttt{A}$ to $\texttt{00}$
+	\item $\texttt{B}$ to $\texttt{01}$
+	\item $\texttt{C}$ to $\texttt{10}$
+	\item $\texttt{D}$ to $\texttt{110}$
+	\item $\texttt{E}$ to $\texttt{111}$
+\end{itemize}
+
+\problem{}
+\begin{itemize}
+	\item Using the above code, encode \texttt{ADEBCE}.
+	\item Then, decode \texttt{[110011001111]}.
+\end{itemize}
+
+\begin{solution}
+	\texttt{ADEBCE} becomes \texttt{[00 110 111 01 10 111]}, \par
+	and \texttt{[110 01 10 01 111]} is \texttt{DBCBE}.
+\end{solution}
+
+\vfill
+
+\problem{}
+How many bits does this code need per symbol, on average?
+
+\begin{solution}
+	\begin{equation*}
+		\frac{2 + 2 + 2 + 3 + 3}{5} = \frac{12}{5} = 2.4
+	\end{equation*}
+\end{solution}
+
+\vfill
+
+\problem{}
+Consider the code below. How is it different from the one on the previous page? \par
+Is this a good way to encode five-letter strings?
+\begin{itemize}
+	\item $\texttt{A}$ to $\texttt{00}$
+	\item $\texttt{B}$ to $\texttt{01}$
+	\item $\texttt{C}$ to $\texttt{10}$
+	\item $\texttt{D}$ to $\texttt{110}$
+	\item $\texttt{E}$ to $\texttt{11}$
+\end{itemize}
+
+\begin{solution}
+	No. The code for \texttt{E} occurs inside the code for \texttt{D},
+	and we thus can't decode sequences uniquely. For example, we could
+	decode the fragment \texttt{[11001$\cdot\cdot\cdot$]} as \texttt{EA}
+	or as \texttt{DB}.
+\end{solution}
+
+\vfill
+\pagebreak
+
+
+
+
+
+
+
+
+
+\remark{}
+The code from the previous page can be visualized as a full binary tree: \par
+\note{Every node in a \textit{full binary tree} has either zero or two children.}
+
+\vspace{-5mm}
+\null\hfill
+\begin{minipage}[t]{0.48\textwidth}
+	\vspace{0pt}
+
+	\begin{itemize}
+		\item $\texttt{A}$ encodes as $\texttt{00}$
+		\item $\texttt{B}$ encodes as $\texttt{01}$
+		\item $\texttt{C}$ encodes as $\texttt{10}$
+		\item $\texttt{D}$ encodes as $\texttt{110}$
+		\item $\texttt{E}$ encodes as $\texttt{111}$
+	\end{itemize}
+\end{minipage}
+\hfill
+\begin{minipage}[t]{0.48\textwidth}
+	\vspace{0pt}
+
+	\begin{center}
+		\begin{tikzpicture}[scale=1.0]
+			\begin{scope}[layer = nodes]
+				\node[int] (x) at (0, 0) {};
+				\node[int] (0) at (-0.75, -1) {};
+				\node[int] (1) at (0.75, -1) {};
+				\node[end] (00) at (-1.25, -2) {\texttt{A}};
+				\node[end] (01) at (-0.25, -2) {\texttt{B}};
+				\node[end] (10) at (0.25, -2) {\texttt{C}};
+				\node[int] (11) at (1.25, -2) {};
+				\node[end] (110) at (0.75, -3) {\texttt{D}};
+				\node[end] (111) at (1.75, -3) {\texttt{E}};
+			\end{scope}
+
+			\draw[-]
+				(x) to node[edg] {\texttt{0}} (0)
+				(x) to node[edg] {\texttt{1}} (1)
+				(0) to node[edg] {\texttt{0}} (00)
+				(0) to node[edg] {\texttt{1}} (01)
+				(1) to node[edg] {\texttt{0}} (10)
+				(1) to node[edg] {\texttt{1}} (11)
+				(11) to node[edg] {\texttt{0}} (110)
+				(11) to node[edg] {\texttt{1}} (111)
+			;
+		\end{tikzpicture}
+	\end{center}
+\end{minipage}
+\hfill\null
+You can think of each symbol's code as it's \say{address} in this tree.
+When decoding a string, we start at the topmost node. Reading the binary sequence
+bit by bit, we move down the tree, taking a left edge if we see a \texttt{0}
+and a right edge if we see a \texttt{1}.
+Once we reach a letter, we return to the top node and repeat the process.
+
+
+
+\definition{}
+We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par
+
+\problem{}
+Convince yourself that trees like the one above always produce a prefix-free code.
+
+\problem{}<treedecode>
+Decode \texttt{[110111001001110110]} using the tree above.
+
+\begin{solution}
+	This is \texttt{[110$\cdot$111$\cdot$00$\cdot$10$\cdot$01$\cdot$110$\cdot$110]}, which is \texttt{DEACBDD}
+\end{solution}
+
+\vfill
+
+\problem{}
+Encode \texttt{ABDECBE} using this tree. \par
+How many bits do we save over a na\"ive scheme?
+
+\begin{solution}
+	This is \texttt{[00 01 110 111 10 01 111]}, and saves four bits.
+\end{solution}
+
+
+\vfill
+\pagebreak
+
+\problem{}
+In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par
+\note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.}
+
+\vspace{2mm}
+Draw a tree that encodes this string more efficiently. \par
+
+\begin{solution}
+	Two possible solutions are below. \par
+	\begin{itemize}
+		\item The left tree encodes \texttt{DEACBDD} as \texttt{[00$\cdot$111$\cdot$110$\cdot$10$\cdot$01$\cdot$00$\cdot$00]}, using 16 bits.
+		\item The right tree encodes \texttt{DEACBDD} as \texttt{[0$\cdot$111$\cdot$101$\cdot$110$\cdot$100$\cdot$0$\cdot$0]}, using 15 bits.
+	\end{itemize}
+
+	\null\hfill
+	\begin{minipage}{0.48\textwidth}
+		\begin{center}
+			\begin{tikzpicture}[scale=1.0]
+				\begin{scope}[layer = nodes]
+					\node[int] (x) at (0, 0) {};
+					\node[int] (0) at (-0.75, -1) {};
+					\node[int] (1) at (0.75, -1) {};
+					\node[end] (00) at (-1.25, -2) {\texttt{D}};
+					\node[end] (01) at (-0.25, -2) {\texttt{B}};
+					\node[end] (10) at (0.25, -2) {\texttt{C}};
+					\node[int] (11) at (1.25, -2) {};
+					\node[end] (110) at (0.75, -3) {\texttt{A}};
+					\node[end] (111) at (1.75, -3) {\texttt{E}};
+				\end{scope}
+
+				\draw[-]
+					(x) to node[edg] {\texttt{0}} (0)
+					(x) to node[edg] {\texttt{1}} (1)
+					(0) to node[edg] {\texttt{0}} (00)
+					(0) to node[edg] {\texttt{1}} (01)
+					(1) to node[edg] {\texttt{0}} (10)
+					(1) to node[edg] {\texttt{1}} (11)
+					(11) to node[edg] {\texttt{0}} (110)
+					(11) to node[edg] {\texttt{1}} (111)
+				;
+			\end{tikzpicture}
+		\end{center}
+	\end{minipage}
+	\hfill
+	\begin{minipage}{0.48\textwidth}
+		\begin{center}
+			\begin{tikzpicture}[scale=1.0]
+				\begin{scope}[layer = nodes]
+					\node[int] (x) at (0, 0) {};
+					\node[int] (0) at (-0.75, -1) {\texttt{D}};
+					\node[int] (1) at (0.75, -1) {};
+					\node[end] (10) at (0.25, -2) {};
+					\node[int] (11) at (1.25, -2) {};
+					\node[end] (100) at (-0.15, -3) {\texttt{A}};
+					\node[end] (101) at (0.6, -3) {\texttt{B}};
+					\node[end] (110) at (0.9, -3) {\texttt{C}};
+					\node[end] (111) at (1.6, -3) {\texttt{E}};
+				\end{scope}
+
+				\draw[-]
+					(x) to node[edg] {\texttt{0}} (0)
+					(x) to node[edg] {\texttt{1}} (1)
+					(1) to node[edg] {\texttt{0}} (10)
+					(1) to node[edg] {\texttt{1}} (11)
+					(10) to node[edg] {\texttt{0}} (101)
+					(10) to node[edg] {\texttt{1}} (100)
+					(11) to node[edg] {\texttt{0}} (110)
+					(11) to node[edg] {\texttt{1}} (111)
+				;
+			\end{tikzpicture}
+		\end{center}
+	\end{minipage}
+	\hfill\null
+\end{solution}
+
+\vfill
+
+\problem{}
+Now, do the opposite: draw a tree that encodes \texttt{DEACBDD} \textit{less} efficiently than before.
+
+\begin{solution}
+	Bury \texttt{D} as deep as possible in the tree, so that we need four bits to encode it.
+\end{solution}
+
+\vfill
+
+\remark{}
+As we just saw, constructing a prefix-free code is fairly easy. \par
+Constructing the \textit{most efficient} prefix-free code for a
+given message is a bit more difficult. \par
+\pagebreak
+
+
+
+
+
+
+
+
+
+
+\remark{}
+Let's restate our problem. \par
+Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes
+
+\begin{equation*}
+	\mathcal{B}_f(T) = \sum_{a \in A} f(a) \times d_T(a)
+\end{equation*}
+
+Where...
+\begin{itemize}[itemsep=1mm]
+	\item $a$ is a symbol in $A$
+
+	\item $d_T(a)$ is the \say{depth} of $a$ in our tree. \par
+	\note{In other words, $d_T(a)$ is the number of bits we need to encode $a$}
+
+	\item $f(a)$ is a frequency function that maps each symbol in $A$ to a value in $[0, 1]$. \par
+	You can think of this as the distribution of symbols in messages we expect to encode. \par
+	For example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}\}$:
+	\begin{itemize}
+		\item In $\texttt{AAA}$, $f(\texttt{A}) = 1$ and $f(\texttt{B}) = f(\texttt{C}) = 0$.
+		\item In $\texttt{ABC}$, $f(\texttt{A}) = f(\texttt{B}) = f(\texttt{C}) = \nicefrac{1}{3}$.
+	\end{itemize}
+	\note{Note that $f(a) \geq 0$ and $\sum f(a) = 1$.}
+\end{itemize}
+
+\vspace{2mm}
+
+Also notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.
+
+
+\problem{}<hufptone>
+Let $f$ be fixed frequency function over an alphabet $A$. \par
+Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par
+Construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
+\begin{equation*}
+	\mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
+\end{equation*}
+
+\begin{solution}
+	$\mathcal{B}_f(T)$ and $\mathcal{B}_f(T')$ are nearly identical, and differ only at $d_T(a)$ and $d_T(b)$.
+	So, we get...
+
+	\begin{align*}
+		\mathcal{B}_f(T) - \mathcal{B}_f(T')
+		&= f(a)d_T(a) + f(b)d_T(b) - f(a)d_T(b) - f(b)d_T(a) \\
+		&= f(a)\bigl(d_T(a) - d_T(b)\bigr) + f(b)\bigl(d_T(b) - d_T(a)\bigr) \\
+		&= \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
+	\end{align*}
+\end{solution}
+
+\vfill
+\pagebreak
+
+\problem{}<hufpttwo>
+Show that there is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
+\hint{You may assume that an optimal tree exists. There are a few cases.}
+
+\begin{solution}
+	Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par
+	If there is a tie among three or more symbols, pick $a, b$ to be those with the greatest depth. \par
+	Label $a$ and $b$ so that that $d_T(a) \geq d_T(a)$.
+
+	\vspace{1mm}
+
+	If $a$ and $b$ share a parent, we're done.
+	If $a$ and $b$ do not share a parent, we have three cases:
+	\begin{itemize}[itemsep=1mm]
+		\item There is a node $x$ with $d_T(x) > d_T(a)$. \par
+		Create $T'$ by swapping $a$ and $x$. By definition, $f(a) < f(x)$, and thus
+		by \ref{hufptone} $\mathcal{B}_f(T) > \mathcal{B}_f(T')$. This is a contradiction,
+		since we chose $T$ as an optimal tree---so this case is impossible.
+
+		\item $a$ is an only child. Create $T'$ by removing $a$'s parent and replacing it with $a$. \par
+		Then $\mathcal{B}_f(T) > \mathcal{B}_f(T')$, same contradiction as above. \par
+		\note{If we assume $T$ is a full binary tree, this case doesn't exist.}
+
+		\item $a$ has a sibling $x$, and $x$ isn't $b$. \par
+		Let $T'$ be the tree created by swapping $x$ and $b$ (thus making $a$ and $b$ siblings). \par
+		By \ref{hufptone}, $\mathcal{B}_f(T) \geq \mathcal{B}_f(T')$. $T$ is optimal, so there cannot
+		be a tree with a better average length---thus $\mathcal{B}_f(T) = \mathcal{B}_f(T')$ and $T'$
+		is also optimal.
+	\end{itemize}
+\end{solution}
+
+\vfill
+\pagebreak
+
+\problem{}
+Devise an algorithm that builds an optimal tree given an alphabet $A$ and a frequency function $f$. \par
+Then, use the previous two problems to show that your algorithm indeed produces an ideal tree. \par
+\hint{
+	First, make an algorithm that makes sense intuitively. \par
+	Once you have something that looks good, start your proof.
+} \par
+\hint{Build from the bottom.}
+
+\begin{solution}
+	\textbf{The Algorithm:} \par
+	Given an alphabet $A$ and a frequency function $f$...
+	\begin{itemize}
+		\item If $|A| = 1$, return a single node.
+		\item Let $a, b$ be two symbols with the smallest frequency.
+		\item Let $A' = A - \{a, b\} + \{x\}$ \tab \note{(Where $x$ is a new \say{placeholder} symbol)}
+		\item Let $f'(x) = f(a) + f(b)$, and $f'(s) = f(s)$ for all other symbols $s$.
+		\item Compute $T'$ by repeating this algorithm on $A'$ and $f'$
+		\item Create $T$ from $T'$ by adding $a$ and $b$ as children of $x$.
+	\end{itemize}
+
+	\vspace{2mm}
+	In plain english: pick the two nodes with the smallest frequency, combine them,
+	and replace them with a \say{compound symbol}. Repeat until you're done.
+
+
+	\linehack{}
+	\textbf{The Proof:} \par
+	We'll proceed by induction on $|A|$. \par
+	Let $f$ be an arbitrary frequency function.
+
+	\vspace{4mm}
+
+	\textbf{Base case:} $|A| = 1$. We only have one vertex, and we thus only have one tree. \par
+	The algorithm above produces this tree. Done.
+
+	\vspace{4mm}
+
+	\textbf{Induction:} Assume that for all $A$ with $|A| = n - 1$, the algorithm above produces an ideal tree.
+	First, we'll show that $\mathcal{B}_f(T) = \mathcal{B}_{f'}(T') + f(a) + f(b)$:
+	\begin{align*}
+		\mathcal{B}_f(T)
+		&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f(a)d_T(a) + f(b)d_T(b) \\
+		&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + \Bigl(f(a)+f(b)\Bigr)\Bigl(d_{T'}(x) + 1\Bigr) \\
+		&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f'(z)d_{T'}(z) + f(a) + f(b) \\
+		&= \sum_{x \in A'} \Bigl(f'(x)d_{T'}(x)\Bigr) + f(a) + f(b) \\
+		&= \mathcal{B}_{f'}(T') + f(a) + f(b)
+	\end{align*}
+
+	Now, assume that $T$ is not optimal. There then exists an optimal tree $U$ with $a$ and $b$ as siblings (by \ref{hufpttwo}).
+	Let $U'$ be the tree created by removing $a, b$ from $U$. $U'$ is a tree for $A'$ and $f'$, so we can repeat the calculation
+	above to find that $\mathcal{B}_f(U) = \mathcal{B}_{f'}(U') + f(a) + f(b)$.
+
+	\vspace{2mm}
+
+	So, $
+		\mathcal{B}_{f'}(T')
+		~=~ \mathcal{B}_f(T) - f(a) - f(b)
+		~>~ \mathcal{B}_f(U) - f(a) - f(b)
+		~=~ \mathcal{B}_{f'}(U')
+	$. \par
+	Since $T'$ is optimal for $A'$ and $f'$, this is a contradition. $T$ must therefore be optimal.
+\end{solution}
+
+\vfill
+\pagebreak
--- a/src/Advanced/Compression/parts/4
+++ b/src/Advanced/Compression/parts/4
@ -0,0 +1,40 @@
+\section{Bonus problems}
+
+
+\problem{}
+Make sense of the document on the next page. \par
+What does it describe, and how does it work?
+
+
+\problem{}
+Given a table with a marked point, $O$, and with $2013$ properly working watches put down on the table, prove that there exists a moment in time when the sum of the distances from $O$ to the watches' centers is less than the sum of the distances from $O$ to the tips of the watches' minute hands.
+
+\vfill
+
+
+\problem{A Minor Inconvenience}
+A group of eight friends goes out to dinner. Each drives his own car, checking it in with valet upon arrival.
+Unfortunately, the valet attendant forgot to tag the friends' keys. Thus, when the group leaves the restaurant,
+each friend is handed a random key.
+\begin{itemize}
+	\item What is the probability that everyone gets the correct set of keys?
+	\item What is the probability that each friend gets the wrong set?
+\end{itemize}
+
+\vfill
+
+
+\problem{Bimmer Parking}
+A parking lot has a row of 16 spaces, of which a random 12 are taken. \par
+Ivan drives a BMW, and thus needs two adjacent spaces to park. \par
+What is the probability he'll find a spot?
+
+\vfill
+\pagebreak
+
+\includepdf[
+	pages=1,
+	fitpaper=true
+]{parts/qoi-specification.pdf}
+
+\pagebreak
--- a/src/Advanced/Compression/parts/qoi-specification.pdf
+++ b/src/Advanced/Compression/parts/qoi-specification.pdf
--- a/src/Advanced/Compression/tikzset.tex
+++ b/src/Advanced/Compression/tikzset.tex
@ -0,0 +1,68 @@
+\usetikzlibrary{arrows.meta}
+\usetikzlibrary{shapes.geometric}
+\usetikzlibrary{patterns}
+
+% We put nodes in a separate layer, so we can
+% slightly overlap with paths for a perfect fit
+\pgfdeclarelayer{nodes}
+\pgfdeclarelayer{path}
+\pgfsetlayers{main,nodes}
+
+% Layer settings
+\tikzset{
+	% Layer hack, lets us write
+	% later = * in scopes.
+	layer/.style = {
+		execute at begin scope={\pgfonlayer{#1}},
+		execute at end scope={\endpgfonlayer}
+	},
+	%
+	% Arrowhead tweak
+	>={Latex[ width=2mm, length=2mm ]},
+	%
+	% Labels inside edges
+	label/.style = {
+		rectangle,
+		% For automatic red background in solutions
+		fill = \ORMCbgcolor,
+		draw = none,
+		rounded corners = 0mm
+	},
+	%
+	% Nodes
+	edg/.style = {
+		midway,
+		fill = \ORMCbgcolor,
+		text = gray
+	},
+	int/.style = {},
+	end/.style = {
+		anchor=north
+	},
+	%
+	% Loop tweaks
+	loop above/.style = {
+		min distance = 2mm,
+		looseness = 8,
+		out = 45,
+		in = 135
+	},
+	loop below/.style = {
+		min distance = 5mm,
+		looseness = 10,
+		out = 315,
+		in = 225
+	},
+	loop right/.style = {
+		min distance = 5mm,
+		looseness = 10,
+		out = 45,
+		in = 315
+	},
+	loop left/.style = {
+		min distance = 5mm,
+		looseness = 10,
+		out = 135,
+		in = 215
+	}
+}