Mark 5f8c54650f
Some checks failed
Lints / typos (push) Failing after 17s
Fix typos
2025-01-19 20:24:51 -08:00

424 lines
14 KiB

\section{Huffman Codes}
Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
With the na\"ive coding scheme, we can encode a length $n$ string with $3n$ bits, by mapping...
\item $\texttt{A}$ to $\texttt{000}$
\item $\texttt{B}$ to $\texttt{001}$
\item $\texttt{C}$ to $\texttt{010}$
\item $\texttt{D}$ to $\texttt{011}$
\item $\texttt{E}$ to $\texttt{100}$
For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par
It is easy to see that this scheme uses an average of three bits per symbol.
However, one could argue that this coding scheme is wasteful: \par
we're not using three of the eight possible three-bit sequences!
There is, of course, a better way. \par
Consider the following mapping:
\item $\texttt{A}$ to $\texttt{00}$
\item $\texttt{B}$ to $\texttt{01}$
\item $\texttt{C}$ to $\texttt{10}$
\item $\texttt{D}$ to $\texttt{110}$
\item $\texttt{E}$ to $\texttt{111}$
\item Using the above code, encode \texttt{ADEBCE}.
\item Then, decode \texttt{[110011001111]}.
\texttt{ADEBCE} becomes \texttt{[00 110 111 01 10 111]}, \par
and \texttt{[110 01 10 01 111]} is \texttt{DBCBE}.
How many bits does this code need per symbol, on average?
\frac{2 + 2 + 2 + 3 + 3}{5} = \frac{12}{5} = 2.4
Consider the code below. How is it different from the one on the previous page? \par
Is this a good way to encode five-letter strings?
\item $\texttt{A}$ to $\texttt{00}$
\item $\texttt{B}$ to $\texttt{01}$
\item $\texttt{C}$ to $\texttt{10}$
\item $\texttt{D}$ to $\texttt{110}$
\item $\texttt{E}$ to $\texttt{11}$
No. The code for \texttt{E} occurs inside the code for \texttt{D},
and we thus can't decode sequences uniquely. For example, we could
decode the fragment \texttt{[11001$\cdot\cdot\cdot$]} as \texttt{EA}
or as \texttt{DB}.
The code from the previous page can be visualized as a full binary tree: \par
\note{Every node in a \textit{full binary tree} has either zero or two children.}
\item $\texttt{A}$ encodes as $\texttt{00}$
\item $\texttt{B}$ encodes as $\texttt{01}$
\item $\texttt{C}$ encodes as $\texttt{10}$
\item $\texttt{D}$ encodes as $\texttt{110}$
\item $\texttt{E}$ encodes as $\texttt{111}$
\begin{scope}[layer = nodes]
\node[int] (x) at (0, 0) {};
\node[int] (0) at (-0.75, -1) {};
\node[int] (1) at (0.75, -1) {};
\node[end] (00) at (-1.25, -2) {\texttt{A}};
\node[end] (01) at (-0.25, -2) {\texttt{B}};
\node[end] (10) at (0.25, -2) {\texttt{C}};
\node[int] (11) at (1.25, -2) {};
\node[end] (110) at (0.75, -3) {\texttt{D}};
\node[end] (111) at (1.75, -3) {\texttt{E}};
(x) to node[edg] {\texttt{0}} (0)
(x) to node[edg] {\texttt{1}} (1)
(0) to node[edg] {\texttt{0}} (00)
(0) to node[edg] {\texttt{1}} (01)
(1) to node[edg] {\texttt{0}} (10)
(1) to node[edg] {\texttt{1}} (11)
(11) to node[edg] {\texttt{0}} (110)
(11) to node[edg] {\texttt{1}} (111)
You can think of each symbol's code as it's \say{address} in this tree.
When decoding a string, we start at the topmost node. Reading the binary sequence
bit by bit, we move down the tree, taking a left edge if we see a \texttt{0}
and a right edge if we see a \texttt{1}.
Once we reach a letter, we return to the top node and repeat the process.
We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par
Convince yourself that trees like the one above always produce a prefix-free code.
Decode \texttt{[110111001001110110]} using the tree above.
This is \texttt{[110$\cdot$111$\cdot$00$\cdot$10$\cdot$01$\cdot$110$\cdot$110]}, which is \texttt{DEACBDD}
Encode \texttt{ABDECBE} using this tree. \par
How many bits do we save over a na\"ive scheme?
This is \texttt{[00 01 110 111 10 01 111]}, and saves four bits.
In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par
\note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.}
Draw a tree that encodes this string more efficiently. \par
Two possible solutions are below. \par
\item The left tree encodes \texttt{DEACBDD} as \texttt{[00$\cdot$111$\cdot$110$\cdot$10$\cdot$01$\cdot$00$\cdot$00]}, using 16 bits.
\item The right tree encodes \texttt{DEACBDD} as \texttt{[0$\cdot$111$\cdot$101$\cdot$110$\cdot$100$\cdot$0$\cdot$0]}, using 15 bits.
\begin{scope}[layer = nodes]
\node[int] (x) at (0, 0) {};
\node[int] (0) at (-0.75, -1) {};
\node[int] (1) at (0.75, -1) {};
\node[end] (00) at (-1.25, -2) {\texttt{D}};
\node[end] (01) at (-0.25, -2) {\texttt{B}};
\node[end] (10) at (0.25, -2) {\texttt{C}};
\node[int] (11) at (1.25, -2) {};
\node[end] (110) at (0.75, -3) {\texttt{A}};
\node[end] (111) at (1.75, -3) {\texttt{E}};
(x) to node[edg] {\texttt{0}} (0)
(x) to node[edg] {\texttt{1}} (1)
(0) to node[edg] {\texttt{0}} (00)
(0) to node[edg] {\texttt{1}} (01)
(1) to node[edg] {\texttt{0}} (10)
(1) to node[edg] {\texttt{1}} (11)
(11) to node[edg] {\texttt{0}} (110)
(11) to node[edg] {\texttt{1}} (111)
\begin{scope}[layer = nodes]
\node[int] (x) at (0, 0) {};
\node[int] (0) at (-0.75, -1) {\texttt{D}};
\node[int] (1) at (0.75, -1) {};
\node[end] (10) at (0.25, -2) {};
\node[int] (11) at (1.25, -2) {};
\node[end] (100) at (-0.15, -3) {\texttt{A}};
\node[end] (101) at (0.6, -3) {\texttt{B}};
\node[end] (110) at (0.9, -3) {\texttt{C}};
\node[end] (111) at (1.6, -3) {\texttt{E}};
(x) to node[edg] {\texttt{0}} (0)
(x) to node[edg] {\texttt{1}} (1)
(1) to node[edg] {\texttt{0}} (10)
(1) to node[edg] {\texttt{1}} (11)
(10) to node[edg] {\texttt{0}} (101)
(10) to node[edg] {\texttt{1}} (100)
(11) to node[edg] {\texttt{0}} (110)
(11) to node[edg] {\texttt{1}} (111)
Now, do the opposite: draw a tree that encodes \texttt{DEACBDD} \textit{less} efficiently than before.
Bury \texttt{D} as deep as possible in the tree, so that we need four bits to encode it.
As we just saw, constructing a prefix-free code is fairly easy. \par
Constructing the \textit{most efficient} prefix-free code for a
given message is a bit more difficult. \par
Let's restate our problem. \par
Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes
\mathcal{B}_f(T) = \sum_{a \in A} f(a) \times d_T(a)
\item $a$ is a symbol in $A$
\item $d_T(a)$ is the \say{depth} of $a$ in our tree. \par
\note{In other words, $d_T(a)$ is the number of bits we need to encode $a$}
\item $f(a)$ is a frequency function that maps each symbol in $A$ to a value in $[0, 1]$. \par
You can think of this as the distribution of symbols in messages we expect to encode. \par
For example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}\}$:
\item In $\texttt{AAA}$, $f(\texttt{A}) = 1$ and $f(\texttt{B}) = f(\texttt{C}) = 0$.
\item In $\texttt{ABC}$, $f(\texttt{A}) = f(\texttt{B}) = f(\texttt{C}) = \nicefrac{1}{3}$.
\note{Note that $f(a) \geq 0$ and $\sum f(a) = 1$.}
Also notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.
Let $f$ be fixed frequency function over an alphabet $A$. \par
Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par
Construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
\mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
$\mathcal{B}_f(T)$ and $\mathcal{B}_f(T')$ are nearly identical, and differ only at $d_T(a)$ and $d_T(b)$.
So, we get...
\mathcal{B}_f(T) - \mathcal{B}_f(T')
&= f(a)d_T(a) + f(b)d_T(b) - f(a)d_T(b) - f(b)d_T(a) \\
&= f(a)\bigl(d_T(a) - d_T(b)\bigr) + f(b)\bigl(d_T(b) - d_T(a)\bigr) \\
&= \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
Show that there is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
\hint{You may assume that an optimal tree exists. There are a few cases.}
Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par
If there is a tie among three or more symbols, pick $a, b$ to be those with the greatest depth. \par
Label $a$ and $b$ so that that $d_T(a) \geq d_T(a)$.
If $a$ and $b$ share a parent, we're done.
If $a$ and $b$ do not share a parent, we have three cases:
\item There is a node $x$ with $d_T(x) > d_T(a)$. \par
Create $T'$ by swapping $a$ and $x$. By definition, $f(a) < f(x)$, and thus
by \ref{hufptone} $\mathcal{B}_f(T) > \mathcal{B}_f(T')$. This is a contradiction,
since we chose $T$ as an optimal tree---so this case is impossible.
\item $a$ is an only child. Create $T'$ by removing $a$'s parent and replacing it with $a$. \par
Then $\mathcal{B}_f(T) > \mathcal{B}_f(T')$, same contradiction as above. \par
\note{If we assume $T$ is a full binary tree, this case doesn't exist.}
\item $a$ has a sibling $x$, and $x$ isn't $b$. \par
Let $T'$ be the tree created by swapping $x$ and $b$ (thus making $a$ and $b$ siblings). \par
By \ref{hufptone}, $\mathcal{B}_f(T) \geq \mathcal{B}_f(T')$. $T$ is optimal, so there cannot
be a tree with a better average length---thus $\mathcal{B}_f(T) = \mathcal{B}_f(T')$ and $T'$
is also optimal.
Devise an algorithm that builds an optimal tree given an alphabet $A$ and a frequency function $f$. \par
Then, use the previous two problems to show that your algorithm indeed produces an ideal tree. \par
First, make an algorithm that makes sense intuitively. \par
Once you have something that looks good, start your proof.
} \par
\hint{Build from the bottom.}
\textbf{The Algorithm:} \par
Given an alphabet $A$ and a frequency function $f$...
\item If $|A| = 1$, return a single node.
\item Let $a, b$ be two symbols with the smallest frequency.
\item Let $A' = A - \{a, b\} + \{x\}$ \tab \note{(Where $x$ is a new \say{placeholder} symbol)}
\item Let $f'(x) = f(a) + f(b)$, and $f'(s) = f(s)$ for all other symbols $s$.
\item Compute $T'$ by repeating this algorithm on $A'$ and $f'$
\item Create $T$ from $T'$ by adding $a$ and $b$ as children of $x$.
In plain english: pick the two nodes with the smallest frequency, combine them,
and replace them with a \say{compound symbol}. Repeat until you're done.
\textbf{The Proof:} \par
We'll proceed by induction on $|A|$. \par
Let $f$ be an arbitrary frequency function.
\textbf{Base case:} $|A| = 1$. We only have one vertex, and we thus only have one tree. \par
The algorithm above produces this tree. Done.
\textbf{Induction:} Assume that for all $A$ with $|A| = n - 1$, the algorithm above produces an ideal tree.
First, we'll show that $\mathcal{B}_f(T) = \mathcal{B}_{f'}(T') + f(a) + f(b)$:
&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f(a)d_T(a) + f(b)d_T(b) \\
&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + \Bigl(f(a)+f(b)\Bigr)\Bigl(d_{T'}(x) + 1\Bigr) \\
&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f'(z)d_{T'}(z) + f(a) + f(b) \\
&= \sum_{x \in A'} \Bigl(f'(x)d_{T'}(x)\Bigr) + f(a) + f(b) \\
&= \mathcal{B}_{f'}(T') + f(a) + f(b)
Now, assume that $T$ is not optimal. There then exists an optimal tree $U$ with $a$ and $b$ as siblings (by \ref{hufpttwo}).
Let $U'$ be the tree created by removing $a, b$ from $U$. $U'$ is a tree for $A'$ and $f'$, so we can repeat the calculation
above to find that $\mathcal{B}_f(U) = \mathcal{B}_{f'}(U') + f(a) + f(b)$.
So, $
~=~ \mathcal{B}_f(T) - f(a) - f(b)
~>~ \mathcal{B}_f(U) - f(a) - f(b)
~=~ \mathcal{B}_{f'}(U')
$. \par
Since $T'$ is optimal for $A'$ and $f'$, this is a contradition. $T$ must therefore be optimal.