Minor edits
This commit is contained in:
parent
311f09f00e
commit
8ba834de59
@ -88,6 +88,15 @@ We'll encode our string into a sequence of 6-bit blocks, interpreted as follows:
|
||||
So, the sequence \texttt{BBB} will be encoded as \texttt{[0011-01]}. \par
|
||||
\note[Notation]{Just like spaces, dashes in a binary blob are added for readability.}
|
||||
|
||||
|
||||
\remark{Notation}
|
||||
In this handout, encoded binary blobs will always be written in square brackets. \par
|
||||
Ignore spaces and dashes, they are provided for convenience. \par
|
||||
For example, the binary sequences \texttt{[000 011 100 001 010 100]} and \texttt{[000011100001010100]} \par
|
||||
are identical. The first, however, is easier to read.
|
||||
|
||||
\pagebreak
|
||||
|
||||
\problem{}
|
||||
Encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} using this scheme. \par
|
||||
Is this more or less efficient than \ref{runlenone}?
|
||||
@ -98,7 +107,7 @@ Is this more or less efficient than \ref{runlenone}?
|
||||
\end{solution}
|
||||
|
||||
\vfill
|
||||
\pagebreak
|
||||
|
||||
|
||||
\problem{}
|
||||
Is run-length coding always efficient? When does it work well, and when does it fail?
|
||||
|
@ -1,8 +1,8 @@
|
||||
\section{Huffman Codes}
|
||||
|
||||
|
||||
\remark{}
|
||||
As a first example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
|
||||
\example{}
|
||||
Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
|
||||
With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits, by mapping...
|
||||
\begin{itemize}
|
||||
\item $\texttt{A}$ to $\texttt{000}$
|
||||
@ -11,17 +11,119 @@ With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits,
|
||||
\item $\texttt{D}$ to $\texttt{011}$
|
||||
\item $\texttt{E}$ to $\texttt{100}$
|
||||
\end{itemize}
|
||||
With this scheme, the string \texttt{ADEBCE} becomes \texttt{[000 011 100 001 010 100]}. \par
|
||||
This matches what we computed in \ref{naivelen}: ~ $6 \times \lceil \log_2(5) \rceil = 6 \times 3 = 18$. \par
|
||||
\note[Notation]{
|
||||
The spaces in \texttt{[000 011 100 001 010 100]} are provided for convenience. \par
|
||||
This is equivalent to \texttt{[000011100001010100]}, but is easier to read. \par
|
||||
In this handout, encoded binary blobs will always be written in square brackets.
|
||||
}
|
||||
For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par
|
||||
To encoding strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we
|
||||
need an average of three bits per symbol.
|
||||
|
||||
\vspace{2mm}
|
||||
|
||||
You could argue that this coding scheme is wasteful: we're not using three of the eight possible three-bit sequences!
|
||||
One could argue that this coding scheme is wasteful: \par
|
||||
we're not using three of the eight possible three-bit sequences!
|
||||
|
||||
\example{}
|
||||
There is, of course, a better way. \par
|
||||
Consider the following mapping:
|
||||
|
||||
\begin{itemize}
|
||||
\item $\texttt{A}$ to $\texttt{00}$
|
||||
\item $\texttt{B}$ to $\texttt{01}$
|
||||
\item $\texttt{C}$ to $\texttt{10}$
|
||||
\item $\texttt{D}$ to $\texttt{110}$
|
||||
\item $\texttt{E}$ to $\texttt{111}$
|
||||
\end{itemize}
|
||||
|
||||
\problem{}
|
||||
\begin{itemize}
|
||||
\item Using the above code, encode \texttt{ADEBCE}.
|
||||
\item Then, decode \texttt{[110011001111]}.
|
||||
\end{itemize}
|
||||
|
||||
\begin{solution}
|
||||
\texttt{ADEBCE} becomes \texttt{[00 110 111 01 10 111]}, \par
|
||||
and \texttt{[110 01 10 01 111]} is \texttt{DBCBE}.
|
||||
\end{solution}
|
||||
|
||||
\vfill
|
||||
|
||||
\problem{}
|
||||
How many bits does this code need per symbol, on average?
|
||||
|
||||
\begin{solution}
|
||||
\begin{equation*}
|
||||
\frac{2 + 2 + 2 + 3 + 3}{5} = \frac{12}{5} = 2.4
|
||||
\end{equation*}
|
||||
\end{solution}
|
||||
|
||||
\vfill
|
||||
|
||||
\problem{}
|
||||
Consider the code below. How is it different from the one above? \par
|
||||
Is this a good way to encode five-letter strings?
|
||||
\begin{itemize}
|
||||
\item $\texttt{A}$ to $\texttt{00}$
|
||||
\item $\texttt{B}$ to $\texttt{01}$
|
||||
\item $\texttt{C}$ to $\texttt{10}$
|
||||
\item $\texttt{D}$ to $\texttt{110}$
|
||||
\item $\texttt{E}$ to $\texttt{11}$
|
||||
\end{itemize}
|
||||
|
||||
\begin{solution}
|
||||
No. The code for \texttt{E} occurs inside the code for \texttt{D},
|
||||
and we thus can't decode sequences uniquely. For example, we could
|
||||
decode the fragment \texttt{[11001$\cdot\cdot\cdot$]} as \texttt{EA}
|
||||
or as \texttt{DB}.
|
||||
\end{solution}
|
||||
|
||||
\vfill
|
||||
\pagebreak
|
||||
|
||||
|
||||
\remark{}
|
||||
Huffman codes can be visualized as a tree which we traverse while decoding our sequence. \par
|
||||
We start at the topmost node, taking the left edge if we see a \texttt{0} and the right edge if we see a \texttt{1}. \par
|
||||
As an example, consider the code for $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$
|
||||
on the previous page:
|
||||
|
||||
|
||||
\begin{itemize}
|
||||
\item $\texttt{A}$ encodes as $\texttt{00}$
|
||||
\item $\texttt{B}$ encodes as $\texttt{01}$
|
||||
\item $\texttt{C}$ encodes as $\texttt{10}$
|
||||
\item $\texttt{D}$ encodes as $\texttt{110}$
|
||||
\item $\texttt{E}$ encodes as $\texttt{111}$
|
||||
\end{itemize}
|
||||
|
||||
Drawing this scheme as a tree, we get the following:
|
||||
|
||||
|
||||
\begin{center}
|
||||
\begin{tikzpicture}[scale=1.0]
|
||||
\begin{scope}[layer = nodes]
|
||||
\node[int] (x) at (0, 0) {};
|
||||
\node[int] (0) at (-0.75, -1) {};
|
||||
\node[int] (1) at (0.75, -1) {};
|
||||
\node[end] (00) at (-1.25, -2) {\texttt{A}};
|
||||
\node[end] (01) at (-0.25, -2) {\texttt{B}};
|
||||
\node[end] (10) at (0.25, -2) {\texttt{C}};
|
||||
\node[int] (11) at (1.25, -2) {};
|
||||
\node[end] (110) at (0.75, -3) {\texttt{D}};
|
||||
\node[end] (111) at (1.75, -3) {\texttt{E}};
|
||||
\end{scope}
|
||||
|
||||
\draw[-]
|
||||
(x) to node[midway, fill=white, text=gray] {\texttt{0}} (0)
|
||||
(x) to node[midway, fill=white, text=gray] {\texttt{1}} (1)
|
||||
(0) to node[midway, fill=white, text=gray] {\texttt{0}} (00)
|
||||
(0) to node[midway, fill=white, text=gray] {\texttt{1}} (01)
|
||||
(1) to node[midway, fill=white, text=gray] {\texttt{0}} (10)
|
||||
(1) to node[midway, fill=white, text=gray] {\texttt{1}} (11)
|
||||
(11) to node[midway, fill=white, text=gray] {\texttt{0}} (110)
|
||||
(11) to node[midway, fill=white, text=gray] {\texttt{1}} (111)
|
||||
;
|
||||
\end{tikzpicture}
|
||||
\end{center}
|
||||
|
||||
|
||||
|
||||
\vfill
|
||||
\pagebreak
|
@ -30,11 +30,9 @@
|
||||
},
|
||||
%
|
||||
% Nodes
|
||||
main/.style = {
|
||||
draw,
|
||||
circle,
|
||||
fill = white,
|
||||
line width = 0.35mm
|
||||
int/.style = {},
|
||||
end/.style = {
|
||||
anchor=north
|
||||
},
|
||||
%
|
||||
% Loop tweaks
|
||||
|
Loading…
x
Reference in New Issue
Block a user