handouts/Advanced/Compression/parts/3 huffman.tex

\section{Huffman Codes}


\example{}
Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
With the na\"ive coding scheme, we can encode a length $n$ string with $3n$ bits, by mapping...
\begin{itemize}
	\item $\texttt{A}$ to $\texttt{000}$
	\item $\texttt{B}$ to $\texttt{001}$
	\item $\texttt{C}$ to $\texttt{010}$
	\item $\texttt{D}$ to $\texttt{011}$
	\item $\texttt{E}$ to $\texttt{100}$
\end{itemize}
For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par
It is easy to see that this scheme uses an average of three bits per symbol.

\vspace{2mm}

However, one could argue that this coding scheme is wasteful: \par
we're not using three of the eight possible three-bit sequences!

\example{}
There is, of course, a better way. \par
Consider the following mapping:

\begin{itemize}
	\item $\texttt{A}$ to $\texttt{00}$
	\item $\texttt{B}$ to $\texttt{01}$
	\item $\texttt{C}$ to $\texttt{10}$
	\item $\texttt{D}$ to $\texttt{110}$
	\item $\texttt{E}$ to $\texttt{111}$
\end{itemize}

\problem{}
\begin{itemize}
	\item Using the above code, encode \texttt{ADEBCE}.
	\item Then, decode \texttt{[110011001111]}.
\end{itemize}

\begin{solution}
	\texttt{ADEBCE} becomes \texttt{[00 110 111 01 10 111]}, \par
	and \texttt{[110 01 10 01 111]} is \texttt{DBCBE}.
\end{solution}

\vfill

\problem{}
How many bits does this code need per symbol, on average?

\begin{solution}
	\begin{equation*}
		\frac{2 + 2 + 2 + 3 + 3}{5} = \frac{12}{5} = 2.4
	\end{equation*}
\end{solution}

\vfill

\problem{}
Consider the code below. How is it different from the one on the previous page? \par
Is this a good way to encode five-letter strings?
\begin{itemize}
	\item $\texttt{A}$ to $\texttt{00}$
	\item $\texttt{B}$ to $\texttt{01}$
	\item $\texttt{C}$ to $\texttt{10}$
	\item $\texttt{D}$ to $\texttt{110}$
	\item $\texttt{E}$ to $\texttt{11}$
\end{itemize}

\begin{solution}
	No. The code for \texttt{E} occurs inside the code for \texttt{D},
	and we thus can't decode sequences uniquely. For example, we could
	decode the fragment \texttt{[11001$\cdot\cdot\cdot$]} as \texttt{EA}
	or as \texttt{DB}.
\end{solution}

\vfill
\pagebreak


\remark{}
The code from the previous page can be visualized as a full binary tree: \par
\note{Every node in a \textit{full binary tree} has either zero or two children.}

\vspace{-5mm}
\null\hfill
\begin{minipage}[t]{0.48\textwidth}
	\vspace{0pt}

	\begin{itemize}
		\item $\texttt{A}$ encodes as $\texttt{00}$
		\item $\texttt{B}$ encodes as $\texttt{01}$
		\item $\texttt{C}$ encodes as $\texttt{10}$
		\item $\texttt{D}$ encodes as $\texttt{110}$
		\item $\texttt{E}$ encodes as $\texttt{111}$
	\end{itemize}
\end{minipage}
\hfill
\begin{minipage}[t]{0.48\textwidth}
	\vspace{0pt}

	\begin{center}
		\begin{tikzpicture}[scale=1.0]
			\begin{scope}[layer = nodes]
				\node[int] (x) at (0, 0) {};
				\node[int] (0) at (-0.75, -1) {};
				\node[int] (1) at (0.75, -1) {};
				\node[end] (00) at (-1.25, -2) {\texttt{A}};
				\node[end] (01) at (-0.25, -2) {\texttt{B}};
				\node[end] (10) at (0.25, -2) {\texttt{C}};
				\node[int] (11) at (1.25, -2) {};
				\node[end] (110) at (0.75, -3) {\texttt{D}};
				\node[end] (111) at (1.75, -3) {\texttt{E}};
			\end{scope}

			\draw[-]
				(x) to node[edg] {\texttt{0}} (0)
				(x) to node[edg] {\texttt{1}} (1)
				(0) to node[edg] {\texttt{0}} (00)
				(0) to node[edg] {\texttt{1}} (01)
				(1) to node[edg] {\texttt{0}} (10)
				(1) to node[edg] {\texttt{1}} (11)
				(11) to node[edg] {\texttt{0}} (110)
				(11) to node[edg] {\texttt{1}} (111)
			;
		\end{tikzpicture}
	\end{center}
\end{minipage}
\hfill\null
You can think of each symbol's code as it's \say{address} in this tree.
When decoding a string, we start at the topmost node. Reading the binary sequence
bit by bit, we move down the tree, taking a left edge if we see a \texttt{0}
and a right edge if we see a \texttt{1}.
Once we reach a letter, we return to the top node and repeat the process.


\definition{}
We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par

\problem{}
Convince yourself that trees like the one above always produce a prefix-free code.

\problem{}<treedecode>
Decode \texttt{[110111001001110110]} using the tree above.

\begin{solution}
	This is \texttt{[110$\cdot$111$\cdot$00$\cdot$10$\cdot$01$\cdot$110$\cdot$110]}, which is \texttt{DEACBDD}
\end{solution}

\vfill

\problem{}
Encode \texttt{ABDECBE} using this tree. \par
How many bits do we save over a na\"ive scheme?

\begin{solution}
	This is \texttt{[00 01 110 111 10 01 111]}, and saves four bits.
\end{solution}


\vfill
\pagebreak

\problem{}
In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par
\note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.}

\vspace{2mm}
Draw a tree that encodes this string more efficiently. \par

\begin{solution}
	Two possible solutions are below. \par
	\begin{itemize}
		\item The left tree encodes \texttt{DEACBDD} as \texttt{[00$\cdot$111$\cdot$110$\cdot$10$\cdot$01$\cdot$00$\cdot$00]}, using 16 bits.
		\item The right tree encodes \texttt{DEACBDD} as \texttt{[0$\cdot$111$\cdot$101$\cdot$110$\cdot$100$\cdot$0$\cdot$0]}, using 15 bits.
	\end{itemize}

	\null\hfill
	\begin{minipage}{0.48\textwidth}
		\begin{center}
			\begin{tikzpicture}[scale=1.0]
				\begin{scope}[layer = nodes]
					\node[int] (x) at (0, 0) {};
					\node[int] (0) at (-0.75, -1) {};
					\node[int] (1) at (0.75, -1) {};
					\node[end] (00) at (-1.25, -2) {\texttt{D}};
					\node[end] (01) at (-0.25, -2) {\texttt{B}};
					\node[end] (10) at (0.25, -2) {\texttt{C}};
					\node[int] (11) at (1.25, -2) {};
					\node[end] (110) at (0.75, -3) {\texttt{A}};
					\node[end] (111) at (1.75, -3) {\texttt{E}};
				\end{scope}

				\draw[-]
					(x) to node[edg] {\texttt{0}} (0)
					(x) to node[edg] {\texttt{1}} (1)
					(0) to node[edg] {\texttt{0}} (00)
					(0) to node[edg] {\texttt{1}} (01)
					(1) to node[edg] {\texttt{0}} (10)
					(1) to node[edg] {\texttt{1}} (11)
					(11) to node[edg] {\texttt{0}} (110)
					(11) to node[edg] {\texttt{1}} (111)
				;
			\end{tikzpicture}
		\end{center}
	\end{minipage}
	\hfill
	\begin{minipage}{0.48\textwidth}
		\begin{center}
			\begin{tikzpicture}[scale=1.0]
				\begin{scope}[layer = nodes]
					\node[int] (x) at (0, 0) {};
					\node[int] (0) at (-0.75, -1) {\texttt{D}};
					\node[int] (1) at (0.75, -1) {};
					\node[end] (10) at (0.25, -2) {};
					\node[int] (11) at (1.25, -2) {};
					\node[end] (100) at (-0.15, -3) {\texttt{A}};
					\node[end] (101) at (0.6, -3) {\texttt{B}};
					\node[end] (110) at (0.9, -3) {\texttt{C}};
					\node[end] (111) at (1.6, -3) {\texttt{E}};
				\end{scope}

				\draw[-]
					(x) to node[edg] {\texttt{0}} (0)
					(x) to node[edg] {\texttt{1}} (1)
					(1) to node[edg] {\texttt{0}} (10)
					(1) to node[edg] {\texttt{1}} (11)
					(10) to node[edg] {\texttt{0}} (101)
					(10) to node[edg] {\texttt{1}} (100)
					(11) to node[edg] {\texttt{0}} (110)
					(11) to node[edg] {\texttt{1}} (111)
				;
			\end{tikzpicture}
		\end{center}
	\end{minipage}
	\hfill\null
\end{solution}

\vfill

\problem{}
Now, do the opposite: draw a tree that encodes \texttt{DEACBDD} \textit{less} efficiently than before.

\begin{solution}
	Bury \texttt{D} as deep as possible in the tree, so that we need four bits to encode it.
\end{solution}

\vfill

\remark{}
As we just saw, constructing a prefix-free code is fairly easy. \par
Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par
\pagebreak


\remark{}
Let's restate our problem. \par
Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes

\begin{equation*}
	\mathcal{B}_f(T) = \sum_{a \in A} f(a) \times d_T(a)
\end{equation*}

Where...
\begin{itemize}[itemsep=1mm]
	\item $a$ is a symbol in $A$

	\item $d_T(a)$ is the \say{depth} of $a$ in our tree. \par
	\note{In other words, $d_T(a)$ is the number of bits we need to encode $a$}

	\item $f(a)$ is a frequency function that maps each symbol in $A$ to a value in $[0, 1]$. \par
	You can think of this as the distribution of symbols in messages we expect to encode. \par
	For example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}\}$:
	\begin{itemize}
		\item In $\texttt{AAA}$, $f(\texttt{A}) = 1$ and $f(\texttt{B}) = f(\texttt{C}) = 0$.
		\item In $\texttt{ABC}$, $f(\texttt{A}) = f(\texttt{B}) = f(\texttt{C}) = \nicefrac{1}{3}$.
	\end{itemize}
	\note{Note that $f(a) \geq 0$ and $\sum f(a) = 1$.}
\end{itemize}

\vspace{2mm}

Also notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.


\problem{}<hufptone>
Let $f$ be fixed frequency function over an alphabet $A$. \par
Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par
Construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
\begin{equation*}
	\mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
\end{equation*}

\begin{solution}
	$\mathcal{B}_f(T)$ and $\mathcal{B}_f(T')$ are nearly identical, and differ only at $d_T(a)$ and $d_T(b)$.
	So, we get...

	\begin{align*}
		\mathcal{B}_f(T) - \mathcal{B}_f(T')
		&= f(a)d_T(a) + f(b)d_T(b) - f(a)d_T(b) - f(b)d_T(a) \\
		&= f(a)\bigl(d_T(a) - d_T(b)\bigr) + f(b)\bigl(d_T(b) - d_T(a)\bigr) \\
		&= \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
	\end{align*}
\end{solution}

\vfill
\pagebreak

\problem{}<hufpttwo>
Show that there is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
\hint{You may assume that an optimal tree exists. There are a few cases.}

\begin{solution}
	Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par
	If there is a tie among three or more symbols, pick $a, b$ to be those with the greatest depth. \par
	Label $a$ and $b$ so that that $d_T(a) \geq d_T(a)$.

	\vspace{1mm}

	If $a$ and $b$ share a parent, we're done.
	If $a$ and $b$ do not share a parent, we have three cases:
	\begin{itemize}[itemsep=1mm]
		\item There is a node $x$ with $d_T(x) > d_T(a)$. \par
		Create $T'$ by swapping $a$ and $x$. By definition, $f(a) < f(x)$, and thus
		by \ref{hufptone} $\mathcal{B}_f(T) > \mathcal{B}_f(T')$. This is a contradiction,
		since we chose $T$ as an optimal tree---so this case is impossible.

		\item $a$ is an only child. Create $T'$ by removing $a$'s parent and replacing it with $a$. \par
		Then $\mathcal{B}_f(T) > \mathcal{B}_f(T')$, same contradiction as above. \par
		\note{If we assume $T$ is a full binary tree, this case doesn't exist.}

		\item $a$ has a sibling $x$, and $x$ isn't $b$. \par
		Let $T'$ be the tree created by swapping $x$ and $b$ (thus making $a$ and $b$ siblings). \par
		By \ref{hufptone}, $\mathcal{B}_f(T) \geq \mathcal{B}_f(T')$. $T$ is optimal, so there cannot
		be a tree with a better average length---thus $\mathcal{B}_f(T) = \mathcal{B}_f(T')$ and $T'$
		is also optimal.
	\end{itemize}
\end{solution}

\vfill
\pagebreak

\problem{}
Devise an algorithm that builds an optimal tree given an alphabet $A$ and a frequency function $f$. \par
Then, use the previous two problems to show that your algorithm indeed produces an ideal tree. \par
\hint{
	First, make an algorithm that makes sense intuitively. \par
	Once you have something that looks good, start your proof.
} \par
\hint{Build from the bottom.}

\begin{solution}
	\textbf{The Algorithm:} \par
	Given an alphabet $A$ and a frequency function $f$...
	\begin{itemize}
		\item If $|A| = 1$, return a single node.
		\item Let $a, b$ be two symbols with the smallest frequency.
		\item Let $A' = A - \{a, b\} + \{x\}$ \tab \note{(Where $x$ is a new \say{placeholder} symbol)}
		\item Let $f'(x) = f(a) + f(b)$, and $f'(s) = f(s)$ for all other symbols $s$.
		\item Compute $T'$ by repeating this algorithm on $A'$ and $f'$
		\item Create $T$ from $T'$ by adding $a$ and $b$ as children of $x$.
	\end{itemize}

	\vspace{2mm}
	In plain english: pick the two nodes with the smallest frequency, combine them,
	and replace them with a \say{compound symbol}. Repeat until you're done.


	\linehack{}
	\textbf{The Proof:} \par
	We'll proceed by induction on $|A|$. \par
	Let $f$ be an arbitrary frequency function.

	\vspace{4mm}

	\textbf{Base case:} $|A| = 1$. We only have one vertex, and we thus only have one tree. \par
	The algorithm above produces this tree. Done.

	\vspace{4mm}

	\textbf{Induction:} Assume that for all $A$ with $|A| = n - 1$, the algorithm above produces an ideal tree.
	First, we'll show that $\mathcal{B}_f(T) = \mathcal{B}_{f'}(T') + f(a) + f(b)$:
	\begin{align*}
		\mathcal{B}_f(T)
		&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f(a)d_T(a) + f(b)d_T(b) \\
		&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + \Bigl(f(a)+f(b)\Bigr)\Bigl(d_{T'}(x) + 1\Bigr) \\
		&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f'(z)d_{T'}(z) + f(a) + f(b) \\
		&= \sum_{x \in A'} \Bigl(f'(x)d_{T'}(x)\Bigr) + f(a) + f(b) \\
		&= \mathcal{B}_{f'}(T') + f(a) + f(b)
	\end{align*}

	Now, assume that $T$ is not optimal. There then exists an optimal tree $U$ with $a$ and $b$ as siblings (by \ref{hufpttwo}).
	Let $U'$ be the tree created by removing $a, b$ from $U$. $U'$ is a tree for $A'$ and $f'$, so we can repeat the calculation
	above to find that $\mathcal{B}_f(U) = \mathcal{B}_{f'}(U') + f(a) + f(b)$.

	\vspace{2mm}

	So, $
		\mathcal{B}_{f'}(T')
		~=~ \mathcal{B}_f(T) - f(a) - f(b)
		~>~ \mathcal{B}_f(U) - f(a) - f(b)
		~=~ \mathcal{B}_{f'}(U')
	$. \par
	Since $T'$ is optimal for $A'$ and $f'$, this is a contradition. $T$ must therefore be optimal.
\end{solution}

\vfill
\pagebreak
Added compression parts 2024-04-12 13:11:24 -07:00			`\section{Huffman Codes}`


Minor edits 2024-04-21 21:26:19 -07:00			`\example{}`
			`Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par`
Finished compression handout 2024-04-25 14:59:29 -07:00			`With the na\"ive coding scheme, we can encode a length $n$ string with $3n$ bits, by mapping...`
Added compression parts 2024-04-12 13:11:24 -07:00			`\begin{itemize}`
			`\item $\texttt{A}$ to $\texttt{000}$`
			`\item $\texttt{B}$ to $\texttt{001}$`
			`\item $\texttt{C}$ to $\texttt{010}$`
			`\item $\texttt{D}$ to $\texttt{011}$`
			`\item $\texttt{E}$ to $\texttt{100}$`
			`\end{itemize}`
Minor edits 2024-04-21 21:26:19 -07:00			`For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par`
Finished compression handout 2024-04-25 14:59:29 -07:00			`It is easy to see that this scheme uses an average of three bits per symbol.`
Added compression parts 2024-04-12 13:11:24 -07:00
			`\vspace{2mm}`

Polish 2024-04-23 17:33:58 -07:00			`However, one could argue that this coding scheme is wasteful: \par`
Minor edits 2024-04-21 21:26:19 -07:00			`we're not using three of the eight possible three-bit sequences!`

			`\example{}`
			`There is, of course, a better way. \par`
			`Consider the following mapping:`

			`\begin{itemize}`
			`\item $\texttt{A}$ to $\texttt{00}$`
			`\item $\texttt{B}$ to $\texttt{01}$`
			`\item $\texttt{C}$ to $\texttt{10}$`
			`\item $\texttt{D}$ to $\texttt{110}$`
			`\item $\texttt{E}$ to $\texttt{111}$`
			`\end{itemize}`

			`\problem{}`
			`\begin{itemize}`
			`\item Using the above code, encode \texttt{ADEBCE}.`
			`\item Then, decode \texttt{[110011001111]}.`
			`\end{itemize}`

			`\begin{solution}`
			`\texttt{ADEBCE} becomes \texttt{[00 110 111 01 10 111]}, \par`
			`and \texttt{[110 01 10 01 111]} is \texttt{DBCBE}.`
			`\end{solution}`

			`\vfill`

			`\problem{}`
			`How many bits does this code need per symbol, on average?`

			`\begin{solution}`
			`\begin{equation*}`
			`\frac{2 + 2 + 2 + 3 + 3}{5} = \frac{12}{5} = 2.4`
			`\end{equation*}`
			`\end{solution}`

			`\vfill`

			`\problem{}`
Added trees 2024-04-22 17:58:32 -07:00			`Consider the code below. How is it different from the one on the previous page? \par`
Minor edits 2024-04-21 21:26:19 -07:00			`Is this a good way to encode five-letter strings?`
			`\begin{itemize}`
			`\item $\texttt{A}$ to $\texttt{00}$`
			`\item $\texttt{B}$ to $\texttt{01}$`
			`\item $\texttt{C}$ to $\texttt{10}$`
			`\item $\texttt{D}$ to $\texttt{110}$`
			`\item $\texttt{E}$ to $\texttt{11}$`
			`\end{itemize}`

			`\begin{solution}`
			`No. The code for \texttt{E} occurs inside the code for \texttt{D},`
			`and we thus can't decode sequences uniquely. For example, we could`
			`decode the fragment \texttt{[11001$\cdot\cdot\cdot$]} as \texttt{EA}`
			`or as \texttt{DB}.`
			`\end{solution}`

			`\vfill`
			`\pagebreak`


Added trees 2024-04-22 17:58:32 -07:00






Minor edits 2024-04-21 21:26:19 -07:00			`\remark{}`
Polish 2024-04-23 17:33:58 -07:00			`The code from the previous page can be visualized as a full binary tree: \par`
			`\note{Every node in a \textit{full binary tree} has either zero or two children.}`
Added trees 2024-04-22 17:58:32 -07:00
			`\vspace{-5mm}`
			`\null\hfill`
			`\begin{minipage}[t]{0.48\textwidth}`
			`\vspace{0pt}`

			`\begin{itemize}`
			`\item $\texttt{A}$ encodes as $\texttt{00}$`
			`\item $\texttt{B}$ encodes as $\texttt{01}$`
			`\item $\texttt{C}$ encodes as $\texttt{10}$`
			`\item $\texttt{D}$ encodes as $\texttt{110}$`
			`\item $\texttt{E}$ encodes as $\texttt{111}$`
			`\end{itemize}`
			`\end{minipage}`
			`\hfill`
			`\begin{minipage}[t]{0.48\textwidth}`
			`\vspace{0pt}`

			`\begin{center}`
			`\begin{tikzpicture}[scale=1.0]`
			`\begin{scope}[layer = nodes]`
			`\node[int] (x) at (0, 0) {};`
			`\node[int] (0) at (-0.75, -1) {};`
			`\node[int] (1) at (0.75, -1) {};`
			`\node[end] (00) at (-1.25, -2) {\texttt{A}};`
			`\node[end] (01) at (-0.25, -2) {\texttt{B}};`
			`\node[end] (10) at (0.25, -2) {\texttt{C}};`
			`\node[int] (11) at (1.25, -2) {};`
			`\node[end] (110) at (0.75, -3) {\texttt{D}};`
			`\node[end] (111) at (1.75, -3) {\texttt{E}};`
			`\end{scope}`

			`\draw[-]`
			`(x) to node[edg] {\texttt{0}} (0)`
			`(x) to node[edg] {\texttt{1}} (1)`
			`(0) to node[edg] {\texttt{0}} (00)`
			`(0) to node[edg] {\texttt{1}} (01)`
			`(1) to node[edg] {\texttt{0}} (10)`
			`(1) to node[edg] {\texttt{1}} (11)`
			`(11) to node[edg] {\texttt{0}} (110)`
			`(11) to node[edg] {\texttt{1}} (111)`
			`;`
			`\end{tikzpicture}`
			`\end{center}`
			`\end{minipage}`
			`\hfill\null`
Polish 2024-04-23 17:33:58 -07:00			`You can think of each symbol's code as it's \say{address} in this tree.`
			`When decoding a string, we start at the topmost node. Reading the binary sequence`
			`bit by bit, we move down the tree, taking a left edge if we see a \texttt{0}`
			`and a right edge if we see a \texttt{1}.`
			`Once we reach a letter, we return to the top node and repeat the process.`
Added trees 2024-04-22 17:58:32 -07:00


Polish 2024-04-23 17:33:58 -07:00			`\definition{}`
			`We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par`
Added trees 2024-04-22 17:58:32 -07:00
Polish 2024-04-23 17:33:58 -07:00			`\problem{}`
			`Convince yourself that trees like the one above always produce a prefix-free code.`
Added trees 2024-04-22 17:58:32 -07:00
			`\problem{}<treedecode>`
			`Decode \texttt{[110111001001110110]} using the tree above.`
Minor edits 2024-04-21 21:26:19 -07:00
Added trees 2024-04-22 17:58:32 -07:00			`\begin{solution}`
			`This is \texttt{[110$\cdot$111$\cdot$00$\cdot$10$\cdot$01$\cdot$110$\cdot$110]}, which is \texttt{DEACBDD}`
			`\end{solution}`
Minor edits 2024-04-21 21:26:19 -07:00
Added trees 2024-04-22 17:58:32 -07:00			`\vfill`

Polish 2024-04-23 17:33:58 -07:00			`\problem{}`
			`Encode \texttt{ABDECBE} using this tree. \par`
			`How many bits do we save over a na\"ive scheme?`

			`\begin{solution}`
			`This is \texttt{[00 01 110 111 10 01 111]}, and saves four bits.`
			`\end{solution}`


			`\vfill`
			`\pagebreak`

Added trees 2024-04-22 17:58:32 -07:00			`\problem{}`
			`In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par`
			`\note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.}`
Minor edits 2024-04-21 21:26:19 -07:00
Added trees 2024-04-22 17:58:32 -07:00			`\vspace{2mm}`
			`Draw a tree that encodes this string more efficiently. \par`
Minor edits 2024-04-21 21:26:19 -07:00
Added trees 2024-04-22 17:58:32 -07:00			`\begin{solution}`
			`Two possible solutions are below. \par`
			`\begin{itemize}`
			`\item The left tree encodes \texttt{DEACBDD} as \texttt{[00$\cdot$111$\cdot$110$\cdot$10$\cdot$01$\cdot$00$\cdot$00]}, using 16 bits.`
			`\item The right tree encodes \texttt{DEACBDD} as \texttt{[0$\cdot$111$\cdot$101$\cdot$110$\cdot$100$\cdot$0$\cdot$0]}, using 15 bits.`
			`\end{itemize}`

			`\null\hfill`
			`\begin{minipage}{0.48\textwidth}`
			`\begin{center}`
			`\begin{tikzpicture}[scale=1.0]`
			`\begin{scope}[layer = nodes]`
			`\node[int] (x) at (0, 0) {};`
			`\node[int] (0) at (-0.75, -1) {};`
			`\node[int] (1) at (0.75, -1) {};`
			`\node[end] (00) at (-1.25, -2) {\texttt{D}};`
			`\node[end] (01) at (-0.25, -2) {\texttt{B}};`
			`\node[end] (10) at (0.25, -2) {\texttt{C}};`
			`\node[int] (11) at (1.25, -2) {};`
			`\node[end] (110) at (0.75, -3) {\texttt{A}};`
			`\node[end] (111) at (1.75, -3) {\texttt{E}};`
			`\end{scope}`

			`\draw[-]`
			`(x) to node[edg] {\texttt{0}} (0)`
			`(x) to node[edg] {\texttt{1}} (1)`
			`(0) to node[edg] {\texttt{0}} (00)`
			`(0) to node[edg] {\texttt{1}} (01)`
			`(1) to node[edg] {\texttt{0}} (10)`
			`(1) to node[edg] {\texttt{1}} (11)`
			`(11) to node[edg] {\texttt{0}} (110)`
			`(11) to node[edg] {\texttt{1}} (111)`
			`;`
			`\end{tikzpicture}`
			`\end{center}`
			`\end{minipage}`
			`\hfill`
			`\begin{minipage}{0.48\textwidth}`
			`\begin{center}`
			`\begin{tikzpicture}[scale=1.0]`
			`\begin{scope}[layer = nodes]`
			`\node[int] (x) at (0, 0) {};`
			`\node[int] (0) at (-0.75, -1) {\texttt{D}};`
			`\node[int] (1) at (0.75, -1) {};`
			`\node[end] (10) at (0.25, -2) {};`
			`\node[int] (11) at (1.25, -2) {};`
			`\node[end] (100) at (-0.15, -3) {\texttt{A}};`
			`\node[end] (101) at (0.6, -3) {\texttt{B}};`
			`\node[end] (110) at (0.9, -3) {\texttt{C}};`
			`\node[end] (111) at (1.6, -3) {\texttt{E}};`
			`\end{scope}`

			`\draw[-]`
			`(x) to node[edg] {\texttt{0}} (0)`
			`(x) to node[edg] {\texttt{1}} (1)`
			`(1) to node[edg] {\texttt{0}} (10)`
			`(1) to node[edg] {\texttt{1}} (11)`
			`(10) to node[edg] {\texttt{0}} (101)`
			`(10) to node[edg] {\texttt{1}} (100)`
			`(11) to node[edg] {\texttt{0}} (110)`
			`(11) to node[edg] {\texttt{1}} (111)`
			`;`
			`\end{tikzpicture}`
			`\end{center}`
			`\end{minipage}`
			`\hfill\null`
			`\end{solution}`
Minor edits 2024-04-21 21:26:19 -07:00
Added trees 2024-04-22 17:58:32 -07:00			`\vfill`

			`\problem{}`
			`Now, do the opposite: draw a tree that encodes \texttt{DEACBDD} \textit{less} efficiently than before.`

			`\begin{solution}`
			`Bury \texttt{D} as deep as possible in the tree, so that we need four bits to encode it.`
			`\end{solution}`
Added compression parts 2024-04-12 13:11:24 -07:00
			`\vfill`
Added trees 2024-04-22 17:58:32 -07:00
			`\remark{}`
Polish 2024-04-23 17:33:58 -07:00			`As we just saw, constructing a prefix-free code is fairly easy. \par`
Added trees 2024-04-22 17:58:32 -07:00			`Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par`
Added huffman problems 2024-04-23 11:46:39 -07:00			`\pagebreak`

Polish 2024-04-23 17:33:58 -07:00








Added huffman problems 2024-04-23 11:46:39 -07:00			`\remark{}`
			`Let's restate our problem. \par`
			`Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes`

			`\begin{equation*}`
			`\mathcal{B}_f(T) = \sum_{a \in A} f(a) \times d_T(a)`
			`\end{equation*}`

			`Where...`
			`\begin{itemize}[itemsep=1mm]`
			`\item $a$ is a symbol in $A$`

			`\item $d_T(a)$ is the \say{depth} of $a$ in our tree. \par`
			`\note{In other words, $d_T(a)$ is the number of bits we need to encode $a$}`

			`\item $f(a)$ is a frequency function that maps each symbol in $A$ to a value in $[0, 1]$. \par`
			`You can think of this as the distribution of symbols in messages we expect to encode. \par`
			`For example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}\}$:`
			`\begin{itemize}`
			`\item In $\texttt{AAA}$, $f(\texttt{A}) = 1$ and $f(\texttt{B}) = f(\texttt{C}) = 0$.`
			`\item In $\texttt{ABC}$, $f(\texttt{A}) = f(\texttt{B}) = f(\texttt{C}) = \nicefrac{1}{3}$.`
			`\end{itemize}`
			`\note{Note that $f(a) \geq 0$ and $\sum f(a) = 1$.}`
			`\end{itemize}`

			`\vspace{2mm}`

Polish 2024-04-23 17:33:58 -07:00			`Also notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.`
Added huffman problems 2024-04-23 11:46:39 -07:00

			`\problem{}<hufptone>`
			`Let $f$ be fixed frequency function over an alphabet $A$. \par`
			`Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par`
Polish 2024-04-23 17:33:58 -07:00			`Construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par`
Added huffman problems 2024-04-23 11:46:39 -07:00			`\begin{equation*}`
			`\mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)`
			`\end{equation*}`

			`\begin{solution}`
			`$\mathcal{B}_f(T)$ and $\mathcal{B}_f(T')$ are nearly identical, and differ only at $d_T(a)$ and $d_T(b)$.`
			`So, we get...`

			`\begin{align*}`
			`\mathcal{B}_f(T) - \mathcal{B}_f(T')`
			`&= f(a)d_T(a) + f(b)d_T(b) - f(a)d_T(b) - f(b)d_T(a) \\`
			`&= f(a)\bigl(d_T(a) - d_T(b)\bigr) + f(b)\bigl(d_T(b) - d_T(a)\bigr) \\`
			`&= \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)`
			`\end{align*}`
			`\end{solution}`

			`\vfill`
			`\pagebreak`

			`\problem{}<hufpttwo>`
Polish 2024-04-23 17:33:58 -07:00			`Show that there is an optimal tree in which the two symbols with the lowest frequencies have the same parent.`
			`\hint{You may assume that an optimal tree exists. There are a few cases.}`
Added huffman problems 2024-04-23 11:46:39 -07:00
			`\begin{solution}`
			`Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par`
			`If there is a tie among three or more symbols, pick $a, b$ to be those with the greatest depth. \par`
			`Label $a$ and $b$ so that that $d_T(a) \geq d_T(a)$.`

			`\vspace{1mm}`

			`If $a$ and $b$ share a parent, we're done.`
			`If $a$ and $b$ do not share a parent, we have three cases:`
			`\begin{itemize}[itemsep=1mm]`
			`\item There is a node $x$ with $d_T(x) > d_T(a)$. \par`
			`Create $T'$ by swapping $a$ and $x$. By definition, $f(a) < f(x)$, and thus`
			`by \ref{hufptone} $\mathcal{B}_f(T) > \mathcal{B}_f(T')$. This is a contradiction,`
			`since we chose $T$ as an optimal tree---so this case is impossible.`

			`\item $a$ is an only child. Create $T'$ by removing $a$'s parent and replacing it with $a$. \par`
			`Then $\mathcal{B}_f(T) > \mathcal{B}_f(T')$, same contradiction as above. \par`
			`\note{If we assume $T$ is a full binary tree, this case doesn't exist.}`

			`\item $a$ has a sibling $x$, and $x$ isn't $b$. \par`
			`Let $T'$ be the tree created by swapping $x$ and $b$ (thus making $a$ and $b$ siblings). \par`
			`By \ref{hufptone}, $\mathcal{B}_f(T) \geq \mathcal{B}_f(T')$. $T$ is optimal, so there cannot`
			`be a tree with a better average length---thus $\mathcal{B}_f(T) = \mathcal{B}_f(T')$ and $T'$`
			`is also optimal.`
			`\end{itemize}`
			`\end{solution}`

			`\vfill`
			`\pagebreak`

			`\problem{}`
			`Devise an algorithm that builds an optimal tree given an alphabet $A$ and a frequency function $f$. \par`
			`Then, use the previous two problems to show that your algorithm indeed produces an ideal tree. \par`
			`\hint{`
			`First, make an algorithm that makes sense intuitively. \par`
			`Once you have something that looks good, start your proof.`
			`} \par`
			`\hint{Build from the bottom.}`

			`\begin{solution}`
			`\textbf{The Algorithm:} \par`
			`Given an alphabet $A$ and a frequency function $f$...`
			`\begin{itemize}`
			`\item If $\|A\| = 1$, return a single node.`
			`\item Let $a, b$ be two symbols with the smallest frequency.`
			`\item Let $A' = A - \{a, b\} + \{x\}$ \tab \note{(Where $x$ is a new \say{placeholder} symbol)}`
			`\item Let $f'(x) = f(a) + f(b)$, and $f'(s) = f(s)$ for all other symbols $s$.`
			`\item Compute $T'$ by repeating this algorithm on $A'$ and $f'$`
			`\item Create $T$ from $T'$ by adding $a$ and $b$ as children of $x$.`
			`\end{itemize}`

			`\vspace{2mm}`
			`In plain english: pick the two nodes with the smallest frequency, combine them,`
Polish 2024-04-23 17:33:58 -07:00			`and replace them with a \say{compound symbol}. Repeat until you're done.`
Added huffman problems 2024-04-23 11:46:39 -07:00

			`\linehack{}`
			`\textbf{The Proof:} \par`
			`We'll proceed by induction on $\|A\|$. \par`
			`Let $f$ be an arbitrary frequency function.`

			`\vspace{4mm}`

			`\textbf{Base case:} $\|A\| = 1$. We only have one vertex, and we thus only have one tree. \par`
			`The algorithm above produces this tree. Done.`

			`\vspace{4mm}`

			`\textbf{Induction:} Assume that for all $A$ with $\|A\| = n - 1$, the algorithm above produces an ideal tree.`
			`First, we'll show that $\mathcal{B}_f(T) = \mathcal{B}_{f'}(T') + f(a) + f(b)$:`
			`\begin{align*}`
			`\mathcal{B}_f(T)`
			`&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f(a)d_T(a) + f(b)d_T(b) \\`
			`&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + \Bigl(f(a)+f(b)\Bigr)\Bigl(d_{T'}(x) + 1\Bigr) \\`
			`&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f'(z)d_{T'}(z) + f(a) + f(b) \\`
			`&= \sum_{x \in A'} \Bigl(f'(x)d_{T'}(x)\Bigr) + f(a) + f(b) \\`
			`&= \mathcal{B}_{f'}(T') + f(a) + f(b)`
			`\end{align*}`

			`Now, assume that $T$ is not optimal. There then exists an optimal tree $U$ with $a$ and $b$ as siblings (by \ref{hufpttwo}).`
			`Let $U'$ be the tree created by removing $a, b$ from $U$. $U'$ is a tree for $A'$ and $f'$, so we can repeat the calculation`
			`above to find that $\mathcal{B}_f(U) = \mathcal{B}_{f'}(U') + f(a) + f(b)$.`

			`\vspace{2mm}`

			`So, $`
			`\mathcal{B}_{f'}(T')`
			`~=~ \mathcal{B}_f(T) - f(a) - f(b)`
			`~>~ \mathcal{B}_f(U) - f(a) - f(b)`
			`~=~ \mathcal{B}_{f'}(U')`
			`$. \par`
			`Since $T'$ is optimal for $A'$ and $f'$, this is a contradition. $T$ must therefore be optimal.`
			`\end{solution}`

			`\vfill`
Added compression parts 2024-04-12 13:11:24 -07:00			`\pagebreak`