handouts/Advanced/Compression/parts/3 huffman.tex

\section{Huffman Codes}


\remark{}
As a first example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits, by mapping...
\begin{itemize}
	\item $\texttt{A}$ to $\texttt{000}$
	\item $\texttt{B}$ to $\texttt{001}$
	\item $\texttt{C}$ to $\texttt{010}$
	\item $\texttt{D}$ to $\texttt{011}$
	\item $\texttt{E}$ to $\texttt{100}$
\end{itemize}
With this scheme, the string \texttt{ADEBCE} becomes \texttt{[000 011 100 001 010 100]}. \par
This matches what we computed in \ref{naivelen}: ~ $6 \times \lceil \log_2(5) \rceil = 6 \times 3 = 18$. \par
\note[Notation]{
	The spaces in \texttt{[000 011 100 001 010 100]} are provided for convenience. \par
	This is equivalent to \texttt{[000011100001010100]}, but is easier to read. \par
	In this handout, encoded binary blobs will always be written in square brackets.
}

\vspace{2mm}

You could argue that this coding scheme is wasteful: we're not using three of the eight possible three-bit sequences!

\vfill
\pagebreak