diff --git a/Advanced/Compression/parts/1 runlength.tex b/Advanced/Compression/parts/1 runlength.tex index 6b4d5e6..7696ac1 100644 --- a/Advanced/Compression/parts/1 runlength.tex +++ b/Advanced/Compression/parts/1 runlength.tex @@ -88,6 +88,15 @@ We'll encode our string into a sequence of 6-bit blocks, interpreted as follows: So, the sequence \texttt{BBB} will be encoded as \texttt{[0011-01]}. \par \note[Notation]{Just like spaces, dashes in a binary blob are added for readability.} + +\remark{Notation} +In this handout, encoded binary blobs will always be written in square brackets. \par +Ignore spaces and dashes, they are provided for convenience. \par +For example, the binary sequences \texttt{[000 011 100 001 010 100]} and \texttt{[000011100001010100]} \par +are identical. The first, however, is easier to read. + +\pagebreak + \problem{} Encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} using this scheme. \par Is this more or less efficient than \ref{runlenone}? @@ -98,7 +107,7 @@ Is this more or less efficient than \ref{runlenone}? \end{solution} \vfill -\pagebreak + \problem{} Is run-length coding always efficient? When does it work well, and when does it fail? diff --git a/Advanced/Compression/parts/3 huffman.tex b/Advanced/Compression/parts/3 huffman.tex index 5824fc6..d027734 100644 --- a/Advanced/Compression/parts/3 huffman.tex +++ b/Advanced/Compression/parts/3 huffman.tex @@ -1,8 +1,8 @@ \section{Huffman Codes} -\remark{} -As a first example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par +\example{} +Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits, by mapping... \begin{itemize} \item $\texttt{A}$ to $\texttt{000}$ @@ -11,17 +11,119 @@ With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits, \item $\texttt{D}$ to $\texttt{011}$ \item $\texttt{E}$ to $\texttt{100}$ \end{itemize} -With this scheme, the string \texttt{ADEBCE} becomes \texttt{[000 011 100 001 010 100]}. \par -This matches what we computed in \ref{naivelen}: ~ $6 \times \lceil \log_2(5) \rceil = 6 \times 3 = 18$. \par -\note[Notation]{ - The spaces in \texttt{[000 011 100 001 010 100]} are provided for convenience. \par - This is equivalent to \texttt{[000011100001010100]}, but is easier to read. \par - In this handout, encoded binary blobs will always be written in square brackets. -} +For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par +To encoding strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we +need an average of three bits per symbol. \vspace{2mm} -You could argue that this coding scheme is wasteful: we're not using three of the eight possible three-bit sequences! +One could argue that this coding scheme is wasteful: \par +we're not using three of the eight possible three-bit sequences! + +\example{} +There is, of course, a better way. \par +Consider the following mapping: + +\begin{itemize} + \item $\texttt{A}$ to $\texttt{00}$ + \item $\texttt{B}$ to $\texttt{01}$ + \item $\texttt{C}$ to $\texttt{10}$ + \item $\texttt{D}$ to $\texttt{110}$ + \item $\texttt{E}$ to $\texttt{111}$ +\end{itemize} + +\problem{} +\begin{itemize} + \item Using the above code, encode \texttt{ADEBCE}. + \item Then, decode \texttt{[110011001111]}. +\end{itemize} + +\begin{solution} + \texttt{ADEBCE} becomes \texttt{[00 110 111 01 10 111]}, \par + and \texttt{[110 01 10 01 111]} is \texttt{DBCBE}. +\end{solution} + +\vfill + +\problem{} +How many bits does this code need per symbol, on average? + +\begin{solution} + \begin{equation*} + \frac{2 + 2 + 2 + 3 + 3}{5} = \frac{12}{5} = 2.4 + \end{equation*} +\end{solution} + +\vfill + +\problem{} +Consider the code below. How is it different from the one above? \par +Is this a good way to encode five-letter strings? +\begin{itemize} + \item $\texttt{A}$ to $\texttt{00}$ + \item $\texttt{B}$ to $\texttt{01}$ + \item $\texttt{C}$ to $\texttt{10}$ + \item $\texttt{D}$ to $\texttt{110}$ + \item $\texttt{E}$ to $\texttt{11}$ +\end{itemize} + +\begin{solution} + No. The code for \texttt{E} occurs inside the code for \texttt{D}, + and we thus can't decode sequences uniquely. For example, we could + decode the fragment \texttt{[11001$\cdot\cdot\cdot$]} as \texttt{EA} + or as \texttt{DB}. +\end{solution} + +\vfill +\pagebreak + + +\remark{} +Huffman codes can be visualized as a tree which we traverse while decoding our sequence. \par +We start at the topmost node, taking the left edge if we see a \texttt{0} and the right edge if we see a \texttt{1}. \par +As an example, consider the code for $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ +on the previous page: + + +\begin{itemize} + \item $\texttt{A}$ encodes as $\texttt{00}$ + \item $\texttt{B}$ encodes as $\texttt{01}$ + \item $\texttt{C}$ encodes as $\texttt{10}$ + \item $\texttt{D}$ encodes as $\texttt{110}$ + \item $\texttt{E}$ encodes as $\texttt{111}$ +\end{itemize} + +Drawing this scheme as a tree, we get the following: + + +\begin{center} +\begin{tikzpicture}[scale=1.0] + \begin{scope}[layer = nodes] + \node[int] (x) at (0, 0) {}; + \node[int] (0) at (-0.75, -1) {}; + \node[int] (1) at (0.75, -1) {}; + \node[end] (00) at (-1.25, -2) {\texttt{A}}; + \node[end] (01) at (-0.25, -2) {\texttt{B}}; + \node[end] (10) at (0.25, -2) {\texttt{C}}; + \node[int] (11) at (1.25, -2) {}; + \node[end] (110) at (0.75, -3) {\texttt{D}}; + \node[end] (111) at (1.75, -3) {\texttt{E}}; + \end{scope} + + \draw[-] + (x) to node[midway, fill=white, text=gray] {\texttt{0}} (0) + (x) to node[midway, fill=white, text=gray] {\texttt{1}} (1) + (0) to node[midway, fill=white, text=gray] {\texttt{0}} (00) + (0) to node[midway, fill=white, text=gray] {\texttt{1}} (01) + (1) to node[midway, fill=white, text=gray] {\texttt{0}} (10) + (1) to node[midway, fill=white, text=gray] {\texttt{1}} (11) + (11) to node[midway, fill=white, text=gray] {\texttt{0}} (110) + (11) to node[midway, fill=white, text=gray] {\texttt{1}} (111) + ; +\end{tikzpicture} +\end{center} + + \vfill \pagebreak \ No newline at end of file diff --git a/Advanced/Compression/tikzset.tex b/Advanced/Compression/tikzset.tex index d83fa32..8bfdf1c 100644 --- a/Advanced/Compression/tikzset.tex +++ b/Advanced/Compression/tikzset.tex @@ -30,11 +30,9 @@ }, % % Nodes - main/.style = { - draw, - circle, - fill = white, - line width = 0.35mm + int/.style = {}, + end/.style = { + anchor=north }, % % Loop tweaks