\section{Introduction}

\definition{}
An \textit{alphabet} is a set of symbols. Two examples are
$\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ and $\{\texttt{0}, \texttt{1}\}$.

\definition{}
A \textit{string} is a sequence of symbols from an alphabet. \par
For example, \texttt{CBCAADDD} is a string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$.

\problem{}
Say we want to store a length-$n$ string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ as a binary sequence. \par
How many bits will we need? \par
\hint{
	Our alphabet has four symbols, so we can encode each symbol using two bits, \par
	mapping $\texttt{A} \rightarrow \texttt{00}$,
	$\texttt{B} \rightarrow \texttt{01}$,
	$\texttt{C} \rightarrow \texttt{10}$, and
	$\texttt{D} \rightarrow \texttt{11}$.
}

\begin{solution}
	$2n$ bits.
\end{solution}

\vfill


\problem{}<naivelen>
Similarly, we can use a na\"ive coding scheme to encode an $n$-symbol string over an alphabet of size $k$ \par
using $n \times \lceil \log_2k \rceil$ bits. Convince yourself that this is true.


\vfill
As you might expect, this isn't ideal: we can do much better than $n \times \lceil \log_2k \rceil$.
We will spend the rest of this handout exploring more efficient ways of encoding such sequences of symbols.
\pagebreak