\section{Huffman Codes} \example{} Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par With a na\"ive coding scheme, we can encode a length $n$ string with $3n$ bits, by mapping... \begin{itemize} \item $\texttt{A}$ to $\texttt{000}$ \item $\texttt{B}$ to $\texttt{001}$ \item $\texttt{C}$ to $\texttt{010}$ \item $\texttt{D}$ to $\texttt{011}$ \item $\texttt{E}$ to $\texttt{100}$ \end{itemize} For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par To encode strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we need an average of three bits per symbol. \vspace{2mm} However, one could argue that this coding scheme is wasteful: \par we're not using three of the eight possible three-bit sequences! \example{} There is, of course, a better way. \par Consider the following mapping: \begin{itemize} \item $\texttt{A}$ to $\texttt{00}$ \item $\texttt{B}$ to $\texttt{01}$ \item $\texttt{C}$ to $\texttt{10}$ \item $\texttt{D}$ to $\texttt{110}$ \item $\texttt{E}$ to $\texttt{111}$ \end{itemize} \problem{} \begin{itemize} \item Using the above code, encode \texttt{ADEBCE}. \item Then, decode \texttt{[110011001111]}. \end{itemize} \begin{solution} \texttt{ADEBCE} becomes \texttt{[00 110 111 01 10 111]}, \par and \texttt{[110 01 10 01 111]} is \texttt{DBCBE}. \end{solution} \vfill \problem{} How many bits does this code need per symbol, on average? \begin{solution} \begin{equation*} \frac{2 + 2 + 2 + 3 + 3}{5} = \frac{12}{5} = 2.4 \end{equation*} \end{solution} \vfill \problem{} Consider the code below. How is it different from the one on the previous page? \par Is this a good way to encode five-letter strings? \begin{itemize} \item $\texttt{A}$ to $\texttt{00}$ \item $\texttt{B}$ to $\texttt{01}$ \item $\texttt{C}$ to $\texttt{10}$ \item $\texttt{D}$ to $\texttt{110}$ \item $\texttt{E}$ to $\texttt{11}$ \end{itemize} \begin{solution} No. The code for \texttt{E} occurs inside the code for \texttt{D}, and we thus can't decode sequences uniquely. For example, we could decode the fragment \texttt{[11001$\cdot\cdot\cdot$]} as \texttt{EA} or as \texttt{DB}. \end{solution} \vfill \pagebreak \remark{} The code from the previous page can be visualized as a full binary tree: \par \note{Every node in a \textit{full binary tree} has either zero or two children.} \vspace{-5mm} \null\hfill \begin{minipage}[t]{0.48\textwidth} \vspace{0pt} \begin{itemize} \item $\texttt{A}$ encodes as $\texttt{00}$ \item $\texttt{B}$ encodes as $\texttt{01}$ \item $\texttt{C}$ encodes as $\texttt{10}$ \item $\texttt{D}$ encodes as $\texttt{110}$ \item $\texttt{E}$ encodes as $\texttt{111}$ \end{itemize} \end{minipage} \hfill \begin{minipage}[t]{0.48\textwidth} \vspace{0pt} \begin{center} \begin{tikzpicture}[scale=1.0] \begin{scope}[layer = nodes] \node[int] (x) at (0, 0) {}; \node[int] (0) at (-0.75, -1) {}; \node[int] (1) at (0.75, -1) {}; \node[end] (00) at (-1.25, -2) {\texttt{A}}; \node[end] (01) at (-0.25, -2) {\texttt{B}}; \node[end] (10) at (0.25, -2) {\texttt{C}}; \node[int] (11) at (1.25, -2) {}; \node[end] (110) at (0.75, -3) {\texttt{D}}; \node[end] (111) at (1.75, -3) {\texttt{E}}; \end{scope} \draw[-] (x) to node[edg] {\texttt{0}} (0) (x) to node[edg] {\texttt{1}} (1) (0) to node[edg] {\texttt{0}} (00) (0) to node[edg] {\texttt{1}} (01) (1) to node[edg] {\texttt{0}} (10) (1) to node[edg] {\texttt{1}} (11) (11) to node[edg] {\texttt{0}} (110) (11) to node[edg] {\texttt{1}} (111) ; \end{tikzpicture} \end{center} \end{minipage} \hfill\null You can think of each symbol's code as it's \say{address} in this tree. When decoding a string, we start at the topmost node. Reading the binary sequence bit by bit, we move down the tree, taking a left edge if we see a \texttt{0} and a right edge if we see a \texttt{1}. Once we reach a letter, we return to the top node and repeat the process. \definition{} We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par \problem{} Convince yourself that trees like the one above always produce a prefix-free code. \problem{} Decode \texttt{[110111001001110110]} using the tree above. \begin{solution} This is \texttt{[110$\cdot$111$\cdot$00$\cdot$10$\cdot$01$\cdot$110$\cdot$110]}, which is \texttt{DEACBDD} \end{solution} \vfill \problem{} Encode \texttt{ABDECBE} using this tree. \par How many bits do we save over a na\"ive scheme? \begin{solution} This is \texttt{[00 01 110 111 10 01 111]}, and saves four bits. \end{solution} \vfill \pagebreak \problem{} In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par \note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.} \vspace{2mm} Draw a tree that encodes this string more efficiently. \par \begin{solution} Two possible solutions are below. \par \begin{itemize} \item The left tree encodes \texttt{DEACBDD} as \texttt{[00$\cdot$111$\cdot$110$\cdot$10$\cdot$01$\cdot$00$\cdot$00]}, using 16 bits. \item The right tree encodes \texttt{DEACBDD} as \texttt{[0$\cdot$111$\cdot$101$\cdot$110$\cdot$100$\cdot$0$\cdot$0]}, using 15 bits. \end{itemize} \null\hfill \begin{minipage}{0.48\textwidth} \begin{center} \begin{tikzpicture}[scale=1.0] \begin{scope}[layer = nodes] \node[int] (x) at (0, 0) {}; \node[int] (0) at (-0.75, -1) {}; \node[int] (1) at (0.75, -1) {}; \node[end] (00) at (-1.25, -2) {\texttt{D}}; \node[end] (01) at (-0.25, -2) {\texttt{B}}; \node[end] (10) at (0.25, -2) {\texttt{C}}; \node[int] (11) at (1.25, -2) {}; \node[end] (110) at (0.75, -3) {\texttt{A}}; \node[end] (111) at (1.75, -3) {\texttt{E}}; \end{scope} \draw[-] (x) to node[edg] {\texttt{0}} (0) (x) to node[edg] {\texttt{1}} (1) (0) to node[edg] {\texttt{0}} (00) (0) to node[edg] {\texttt{1}} (01) (1) to node[edg] {\texttt{0}} (10) (1) to node[edg] {\texttt{1}} (11) (11) to node[edg] {\texttt{0}} (110) (11) to node[edg] {\texttt{1}} (111) ; \end{tikzpicture} \end{center} \end{minipage} \hfill \begin{minipage}{0.48\textwidth} \begin{center} \begin{tikzpicture}[scale=1.0] \begin{scope}[layer = nodes] \node[int] (x) at (0, 0) {}; \node[int] (0) at (-0.75, -1) {\texttt{D}}; \node[int] (1) at (0.75, -1) {}; \node[end] (10) at (0.25, -2) {}; \node[int] (11) at (1.25, -2) {}; \node[end] (100) at (-0.15, -3) {\texttt{A}}; \node[end] (101) at (0.6, -3) {\texttt{B}}; \node[end] (110) at (0.9, -3) {\texttt{C}}; \node[end] (111) at (1.6, -3) {\texttt{E}}; \end{scope} \draw[-] (x) to node[edg] {\texttt{0}} (0) (x) to node[edg] {\texttt{1}} (1) (1) to node[edg] {\texttt{0}} (10) (1) to node[edg] {\texttt{1}} (11) (10) to node[edg] {\texttt{0}} (101) (10) to node[edg] {\texttt{1}} (100) (11) to node[edg] {\texttt{0}} (110) (11) to node[edg] {\texttt{1}} (111) ; \end{tikzpicture} \end{center} \end{minipage} \hfill\null \end{solution} \vfill \problem{} Now, do the opposite: draw a tree that encodes \texttt{DEACBDD} \textit{less} efficiently than before. \begin{solution} Bury \texttt{D} as deep as possible in the tree, so that we need four bits to encode it. \end{solution} \vfill \remark{} As we just saw, constructing a prefix-free code is fairly easy. \par Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par \pagebreak \remark{} Let's restate our problem. \par Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes \begin{equation*} \mathcal{B}_f(T) = \sum_{a \in A} f(a) \times d_T(a) \end{equation*} Where... \begin{itemize}[itemsep=1mm] \item $a$ is a symbol in $A$ \item $d_T(a)$ is the \say{depth} of $a$ in our tree. \par \note{In other words, $d_T(a)$ is the number of bits we need to encode $a$} \item $f(a)$ is a frequency function that maps each symbol in $A$ to a value in $[0, 1]$. \par You can think of this as the distribution of symbols in messages we expect to encode. \par For example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}\}$: \begin{itemize} \item In $\texttt{AAA}$, $f(\texttt{A}) = 1$ and $f(\texttt{B}) = f(\texttt{C}) = 0$. \item In $\texttt{ABC}$, $f(\texttt{A}) = f(\texttt{B}) = f(\texttt{C}) = \nicefrac{1}{3}$. \end{itemize} \note{Note that $f(a) \geq 0$ and $\sum f(a) = 1$.} \end{itemize} \vspace{2mm} Also notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems. \problem{} Let $f$ be fixed frequency function over an alphabet $A$. \par Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par Construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par \begin{equation*} \mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr) \end{equation*} \begin{solution} $\mathcal{B}_f(T)$ and $\mathcal{B}_f(T')$ are nearly identical, and differ only at $d_T(a)$ and $d_T(b)$. So, we get... \begin{align*} \mathcal{B}_f(T) - \mathcal{B}_f(T') &= f(a)d_T(a) + f(b)d_T(b) - f(a)d_T(b) - f(b)d_T(a) \\ &= f(a)\bigl(d_T(a) - d_T(b)\bigr) + f(b)\bigl(d_T(b) - d_T(a)\bigr) \\ &= \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr) \end{align*} \end{solution} \vfill \pagebreak \problem{} Show that there is an optimal tree in which the two symbols with the lowest frequencies have the same parent. \hint{You may assume that an optimal tree exists. There are a few cases.} \begin{solution} Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par If there is a tie among three or more symbols, pick $a, b$ to be those with the greatest depth. \par Label $a$ and $b$ so that that $d_T(a) \geq d_T(a)$. \vspace{1mm} If $a$ and $b$ share a parent, we're done. If $a$ and $b$ do not share a parent, we have three cases: \begin{itemize}[itemsep=1mm] \item There is a node $x$ with $d_T(x) > d_T(a)$. \par Create $T'$ by swapping $a$ and $x$. By definition, $f(a) < f(x)$, and thus by \ref{hufptone} $\mathcal{B}_f(T) > \mathcal{B}_f(T')$. This is a contradiction, since we chose $T$ as an optimal tree---so this case is impossible. \item $a$ is an only child. Create $T'$ by removing $a$'s parent and replacing it with $a$. \par Then $\mathcal{B}_f(T) > \mathcal{B}_f(T')$, same contradiction as above. \par \note{If we assume $T$ is a full binary tree, this case doesn't exist.} \item $a$ has a sibling $x$, and $x$ isn't $b$. \par Let $T'$ be the tree created by swapping $x$ and $b$ (thus making $a$ and $b$ siblings). \par By \ref{hufptone}, $\mathcal{B}_f(T) \geq \mathcal{B}_f(T')$. $T$ is optimal, so there cannot be a tree with a better average length---thus $\mathcal{B}_f(T) = \mathcal{B}_f(T')$ and $T'$ is also optimal. \end{itemize} \end{solution} \vfill \pagebreak \problem{} Devise an algorithm that builds an optimal tree given an alphabet $A$ and a frequency function $f$. \par Then, use the previous two problems to show that your algorithm indeed produces an ideal tree. \par \hint{ First, make an algorithm that makes sense intuitively. \par Once you have something that looks good, start your proof. } \par \hint{Build from the bottom.} \begin{solution} \textbf{The Algorithm:} \par Given an alphabet $A$ and a frequency function $f$... \begin{itemize} \item If $|A| = 1$, return a single node. \item Let $a, b$ be two symbols with the smallest frequency. \item Let $A' = A - \{a, b\} + \{x\}$ \tab \note{(Where $x$ is a new \say{placeholder} symbol)} \item Let $f'(x) = f(a) + f(b)$, and $f'(s) = f(s)$ for all other symbols $s$. \item Compute $T'$ by repeating this algorithm on $A'$ and $f'$ \item Create $T$ from $T'$ by adding $a$ and $b$ as children of $x$. \end{itemize} \vspace{2mm} In plain english: pick the two nodes with the smallest frequency, combine them, and replace them with a \say{compound symbol}. Repeat until you're done. \linehack{} \textbf{The Proof:} \par We'll proceed by induction on $|A|$. \par Let $f$ be an arbitrary frequency function. \vspace{4mm} \textbf{Base case:} $|A| = 1$. We only have one vertex, and we thus only have one tree. \par The algorithm above produces this tree. Done. \vspace{4mm} \textbf{Induction:} Assume that for all $A$ with $|A| = n - 1$, the algorithm above produces an ideal tree. First, we'll show that $\mathcal{B}_f(T) = \mathcal{B}_{f'}(T') + f(a) + f(b)$: \begin{align*} \mathcal{B}_f(T) &= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f(a)d_T(a) + f(b)d_T(b) \\ &= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + \Bigl(f(a)+f(b)\Bigr)\Bigl(d_{T'}(x) + 1\Bigr) \\ &= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f'(z)d_{T'}(z) + f(a) + f(b) \\ &= \sum_{x \in A'} \Bigl(f'(x)d_{T'}(x)\Bigr) + f(a) + f(b) \\ &= \mathcal{B}_{f'}(T') + f(a) + f(b) \end{align*} Now, assume that $T$ is not optimal. There then exists an optimal tree $U$ with $a$ and $b$ as siblings (by \ref{hufpttwo}). Let $U'$ be the tree created by removing $a, b$ from $U$. $U'$ is a tree for $A'$ and $f'$, so we can repeat the calculation above to find that $\mathcal{B}_f(U) = \mathcal{B}_{f'}(U') + f(a) + f(b)$. \vspace{2mm} So, $ \mathcal{B}_{f'}(T') ~=~ \mathcal{B}_f(T) - f(a) - f(b) ~>~ \mathcal{B}_f(U) - f(a) - f(b) ~=~ \mathcal{B}_{f'}(U') $. \par Since $T'$ is optimal for $A'$ and $f'$, this is a contradition. $T$ must therefore be optimal. \end{solution} \vfill \pagebreak