\section{Introduction} \definition{} An \textit{alphabet} is a set of symbols. Two examples are $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ and $\{\texttt{0}, \texttt{1}\}$. \definition{} A \textit{string} is a sequence of symbols from an alphabet. \par For example, \texttt{CBCAADDD} is a string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$. \problem{} Say we want to store a length-$n$ string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ as a binary sequence. \par How many bits will we need? \par \hint{ Our alphabet has four symbols, so we can encode each symbol using two bits, \par mapping $\texttt{A} \rightarrow \texttt{00}$, $\texttt{B} \rightarrow \texttt{01}$, $\texttt{C} \rightarrow \texttt{10}$, and $\texttt{D} \rightarrow \texttt{11}$. } \begin{solution} $2n$ bits. \end{solution} \vfill \problem{} Similarly, we can use a na\"ive coding scheme to encode an $n$-symbol string over an alphabet of size $k$ \par using $n \times \lceil \log_2k \rceil$ bits. Convince yourself that this is true. \vfill As you might expect, this isn't ideal: we can do much better than $n \times \lceil \log_2k \rceil$. We will spend the rest of this handout exploring more efficient ways of encoding such sequences of symbols. \pagebreak