This commit is contained in:
Mark 2024-04-23 17:33:58 -07:00
parent d8698b4c81
commit 8269bf1135
4 changed files with 200 additions and 157 deletions

View File

@ -9,7 +9,7 @@ A \textit{string} is a sequence of symbols from an alphabet. \par
For example, \texttt{CBCAADDD} is a string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$. For example, \texttt{CBCAADDD} is a string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$.
\problem{} \problem{}
Say we want to store a length-$n$ string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ as a binary blob. \par Say we want to store a length-$n$ string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ as a binary sequence. \par
How many bits will we need? \par How many bits will we need? \par
\hint{ \hint{
Our alphabet has four symbols, so we can encode each symbol using two bits, \par Our alphabet has four symbols, so we can encode each symbol using two bits, \par
@ -32,6 +32,6 @@ using $n \times \lceil \log_2k \rceil$ bits. Convince yourself that this is true
\vfill \vfill
Of course, this isn't ideal---we can do much better than $n \times \lceil \log_2k \rceil$. As you might expect, this isn't ideal: we can do much better than $n \times \lceil \log_2k \rceil$.
We will spend the rest of this handout exploring more efficient ways of encoding such sequences of symbols. We will spend the rest of this handout exploring more efficient ways of encoding such sequences of symbols.
\pagebreak \pagebreak

View File

@ -5,37 +5,37 @@
\section{Run-length Coding} \section{Run-length Coding}
\definition{} %\definition{}
\textit{Entropy} is a measure of information in a certain sequence. \par %\textit{Entropy} is a measure of information in a certain sequence. \par
A sequence with high entropy contains a lot of information, and a sequence with low entropy contains relatively little. %A sequence with high entropy contains a lot of information, and a sequence with low entropy contains relatively little.
For example, consider the following two ten-symbol ASCII\footnotemark{} strings: %For example, consider the following two ten-symbol ASCII\footnotemark{} strings:
\begin{itemize} %\begin{itemize}
\item \texttt{AAAAAAAAAA} % \item \texttt{AAAAAAAAAA}
\item \texttt{pDa3:7?j;F} % \item \texttt{pDa3:7?j;F}
\end{itemize} %\end{itemize}
The first string clearly contains less information than the second. %The first string clearly contains less information than the second.
It's much harder to describe \texttt{pDa3:7?j;F} than it is \texttt{AAAAAAAAAA}. %It's much harder to describe \texttt{pDa3:7?j;F} than it is \texttt{AAAAAAAAAA}.
Thus, we say that the first has low entropy, and the second has fairly high entropy. %Thus, we say that the first has low entropy, and the second has fairly high entropy.
%
%\vspace{2mm}
%
%The definition above is intentionally hand-wavy. \par
%Formal definitions of entropy exist, but we won't need them today---we just need
%an intuitive understanding of the \say{density} of information in a given string.
\vspace{2mm} %
%\footnotetext{
The definition above is intentionally hand-wavy. \par % American Standard Code for Information Exchange, an early character encoding for computers. \par
Formal definitions of entropy exist, but we won't need them today---we just need % It contains 128 symbols, including numbers, letters, and
an intuitive understanding of the \say{density} of information in a given string. % \texttt{!"\#\$\%\&`()*+,-./:;<=>?@[\textbackslash]\^\_\{|\}\textasciitilde}
%}
\footnotetext{ %\vspace{5mm}
American Standard Code for Information Exchange, an early character encoding for computers. \par
It contains 128 symbols, including numbers, letters, and
\texttt{!"\#\$\%\&`()*+,-./:;<=>?@[\textbackslash]\^\_\{|\}\textasciitilde}
}
\vspace{5mm}
\problem{}<runlenone> \problem{}<runlenone>
Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} as binary blob. \par Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} in binary. \par
\note[Note]{ \note[Note]{
We're still using the four-symbol alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$. \par We're still using the four-symbol alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$. \par
Dots ($\cdot$) in the string are drawn for readability. Ignore them. Dots ($\cdot$) in the string are drawn for readability. Ignore them.
@ -48,12 +48,13 @@ Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AA
\vfill \vfill
In \ref{runlenone}---and often, in the real world---the strings we want to encode have fairly low entropy. In \ref{runlenone}---and often, in the real world---the strings we want to encode have fairly low \textit{entropy}. \par
We can leverage this fact to develop efficient encoding schemes. They have predictable patterns, sequences of symbols that don't contain a lot of information. \par
We can exploit this fact to develop efficient encoding schemes.
\example{} \example{}
The simplest such coding scheme is \textit{run-length encoding}. Instead of simply listing letters of a string A simple example of such a coding scheme is \textit{run-length encoding}. Instead of simply listing letters of a string
in their binary form, we'll add a \textit{count} to each letter, compressing repeated sequences of the same symbol. in their binary form, we'll add a \textit{count} to each letter, shortening repeated instances of the same symbol.
\vspace{2mm} \vspace{2mm}
@ -86,16 +87,10 @@ We'll encode our string into a sequence of 6-bit blocks, interpreted as follows:
\end{tikzpicture} \end{tikzpicture}
\end{center} \end{center}
So, the sequence \texttt{BBB} will be encoded as \texttt{[0011-01]}. \par So, the sequence \texttt{BBB} will be encoded as \texttt{[0011-01]}. \par
\note[Notation]{Just like spaces, dashes in a binary blob are added for readability.} \note[Notation]{
Just like dots, dashes and spaces are added for readability. \par
Encoded binary sequences will always be written in square brackets. \texttt{[]}.
\remark{Notation} }
In this handout, encoded binary blobs will always be written in square brackets. \par
Ignore spaces and dashes, they are provided for convenience. \par
For example, the binary sequences \texttt{[000 011 100 001 010 100]} and \texttt{[000011100001010100]} \par
are identical. The first, however, is easier to read.
\pagebreak
\problem{} \problem{}
Encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} using this scheme. \par Encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} using this scheme. \par
@ -107,6 +102,15 @@ Is this more or less efficient than \ref{runlenone}?
\end{solution} \end{solution}
\vfill \vfill
\pagebreak
\problem{} \problem{}
@ -137,7 +141,7 @@ Fix this problem: modify the scheme so that single occurrences of symbols do not
Consider the following string: \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD}. \par Consider the following string: \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD}. \par
\begin{itemize} \begin{itemize}
\item How many bits do we need to encode this na\"ively? \par \item How many bits do we need to encode this na\"ively? \par
\item How about with the (unmodified) run-length scheme described above? \item How about with the (unmodified) run-length scheme described on the previous page?
\end{itemize} \end{itemize}
\hint{You don't need to encode this string---just find the length of its encoded form.} \hint{You don't need to encode this string---just find the length of its encoded form.}

View File

@ -1,6 +1,6 @@
\section{LZ Codes} \section{LZ Codes}
The LZ-family\footnotemark{} of codes (LZ77, LZ78, LZSS, LZMA, and others) take advantage of repeated sequences of symbols The LZ-family\footnotemark{} of codes (LZ77, LZ78, LZSS, LZMA, and others) take advantage of repeated subsequences
in a string. They are the basis of most modern compression algorithms, including DEFLATE, which is used in the ZIP, PNG, in a string. They are the basis of most modern compression algorithms, including DEFLATE, which is used in the ZIP, PNG,
and GZIP formats. and GZIP formats.
@ -21,10 +21,10 @@ Pointers take the form \texttt{<pos, len>}, where \texttt{pos} is the position o
For example, we can encode the string \texttt{ABRACADABRA} as \texttt{[ABRACAD<7, 4>]}. \par For example, we can encode the string \texttt{ABRACADABRA} as \texttt{[ABRACAD<7, 4>]}. \par
The pointer \texttt{<7, 4>} tells us to look back 7 positions (to the first \texttt{A}), and copy the next 4 symbols. \par The pointer \texttt{<7, 4>} tells us to look back 7 positions (to the first \texttt{A}), and copy the next 4 symbols. \par
Note that pointers refer to the partially decoded output---\textit{not} to the encoded string. \par Note that pointers refer to the partially decoded output---\textit{not} to the encoded string. \par
This allows pointers to reference other pointers, and ensures codes like \texttt{A<1,9>} are valid. This allows pointers to reference other pointers, and ensures that codes like \texttt{A<1,9>} are valid.
\problem{} \problem{}
Encode \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} using LZ. Encode \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} using this scheme. \par
Then, decode the following: Then, decode the following:
\begin{itemize} \begin{itemize}
\item \texttt{[ABCD<4,4>]} \item \texttt{[ABCD<4,4>]}
@ -39,7 +39,7 @@ Then, decode the following:
\linehack{} \linehack{}
In parts two and three, remember that we're reading the \textit{output string.} \par In parts two and three, remember that we're reading the \textit{output string.} \par
The nine \texttt{A}s in part two are produced one by one, \par The ten \texttt{A}s in part two are produced one by one, \par
with the decoder's \say{read head} following its \say{write head.} with the decoder's \say{read head} following its \say{write head.}
\begin{itemize} \begin{itemize}
@ -58,98 +58,114 @@ Convince yourself that LZ is a generalization of the run-length code we discusse
\remark{} \remark{}
Note that we left a few things out of this section: we didn't discuss the algorithm that converts a string to an LZ-encoded blob, Note that we left a few things out of this section: we didn't discuss the algorithm that converts a string to an LZ-encoded blob,
nor did we discuss how we should represent strings encoded with LZ in binary. We skipped these details because they are nor did we discuss how we should represent strings encoded with LZ in binary. We skipped these details because they are
problems of implementation---they're the engineer's headache, not the mathematician's. If you're interested, a brief explanation is below. problems of implementation---they're the engineer's headache, not the mathematician's. \par
Ask an instructor to explain.
\begin{center}
\begin{tikzpicture}
\node[anchor=west,color=gray] at (-2.3, 0) {Bits};
\node[anchor=west,color=gray] at (-2.3, -0.5) {Meaning};
\draw[color=gray] (-2.3, -0.25) -- (5.5, -0.25);
\draw[color=gray] (-2.3, 0.15) -- (-2.3, -0.65);
\node at (0, 0) {\texttt{0}};
\node at (1, 0) {\texttt{0}};
\node at (2, 0) {\texttt{1}};
\node at (3, 0) {\texttt{0}};
\node at (4, 0) {\texttt{1}};
\node at (5, 0) {\texttt{1}};
\node at (6, 0) {\texttt{0}};
\node at (7, 0) {\texttt{0}};
\node at (8, 0) {\texttt{1}};
\draw (-0.5, 0.25) -- (8.5, 0.25);
\draw (-0.5, -0.25) -- (8.5, -0.25);
\draw (-0.5, -0.75) -- (8.5, -0.75);
\draw (-0.5, 0.25) -- (-0.5, -0.75);
\draw (0.5, 0.25) -- (0.5, -0.75);
\draw (8.5, 0.25) -- (8.5, -0.75);
\node at (0, -0.5) {flag};
\node at (4.5, -0.5) {if flag \texttt{<pos, len>}, else eight-bit symbol};
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}
% Text tape
\node[color=gray] at (-0.75, 0) {\texttt{...}};
\node[color=gray] at (0.0, 0) {\texttt{D}};
\node at (0.5, 0) {\texttt{A}};
\node at (1.0, 0) {\texttt{B}};
\node at (1.5, 0) {\texttt{C}};
\node at (2.0, 0) {\texttt{D}};
\node at (2.5, 0) {\texttt{A}};
\node at (3.0, 0) {\texttt{B}};
\node at (3.5, 0) {\texttt{C}};
\node at (4.0, 0) {\texttt{D}};
\node[color=gray] at (4.5, 0) {\texttt{B}};
\node[color=gray] at (5.0, 0) {\texttt{D}};
\node[color=gray] at (5.5, 0) {\texttt{A}};
\node[color=gray] at (6.0, 0) {\texttt{C}};
\node[color=gray] at (6.75, 0) {\texttt{...}};
\draw (-1.75, 0.25) -- (7.25, 0.25);
\draw (-1.75, -0.25) -- (7.25, -0.25);
\draw[line width = 0.7mm, color=oblue, dotted] (2.25, 0.5) -- (2.25, -0.5);
\draw[line width = 0.7mm, color=oblue]
(-1.25, 0.5)
-- (4.25, 0.5)
-- (4.25, -0.5)
-- (-1.25, -0.5)
-- cycle
;
\draw
(4.2, -0.625)
-- (4.2, -0.75)
to node[anchor=north, midway] {lookahead} (2.3, -0.75)
-- (2.3, -0.625)
;
\draw
(2.2, -0.625)
-- (2.2, -0.75)
to node[anchor=north, midway] {search buffer} (-1.1, -0.75)
-- (-1.1, -0.625)
;
\draw[color=gray]
(2.2, 0.625)
-- (2.2, 0.75)
to node[anchor=south, midway] {match!} (0.3, 0.75)
-- (0.3, 0.625)
;
%\draw[->, color=gray] (2.5, 0.3) -- (2.5, 0.8) to[out=90,in=90] (0.5, 0.8);
\node at (7.0, -0.75) {Result: \texttt{[$\cdot\cdot\cdot$DABCD<4,4>$\cdot\cdot\cdot$]}};
\end{tikzpicture}
\end{center}
\vfill
\pagebreak \pagebreak
%\begin{instructornote}
% A simple LZ-scheme can work as follows. We encode our string into a sequence of
% nine-bit blocks, drawn below. The first bit of each block tells us whether or not
% this block is a pointer, and the next eight bits contain either a \texttt{pos, len} pair
% (using, say, for bits for each number) or a plain eight-bit symbol code.
% \begin{center}
% \begin{tikzpicture}
% \node[anchor=west,color=gray] at (-2.3, 0) {Bits};
% \node[anchor=west,color=gray] at (-2.3, -0.5) {Meaning};
% \draw[color=gray] (-2.3, -0.25) -- (5.5, -0.25);
% \draw[color=gray] (-2.3, 0.15) -- (-2.3, -0.65);
%
% \node at (0, 0) {\texttt{0}};
% \node at (1, 0) {\texttt{0}};
% \node at (2, 0) {\texttt{1}};
% \node at (3, 0) {\texttt{0}};
% \node at (4, 0) {\texttt{1}};
% \node at (5, 0) {\texttt{1}};
% \node at (6, 0) {\texttt{0}};
% \node at (7, 0) {\texttt{0}};
% \node at (8, 0) {\texttt{1}};
%
% \draw (-0.5, 0.25) -- (8.5, 0.25);
% \draw (-0.5, -0.25) -- (8.5, -0.25);
% \draw (-0.5, -0.75) -- (8.5, -0.75);
%
% \draw (-0.5, 0.25) -- (-0.5, -0.75);
% \draw (0.5, 0.25) -- (0.5, -0.75);
% \draw (8.5, 0.25) -- (8.5, -0.75);
%
% \node at (0, -0.5) {flag};
% \node at (4.5, -0.5) {if flag \texttt{<pos, len>}, else eight-bit symbol};
% \end{tikzpicture}
% \end{center}
%
% To encode a string, we read it using a \say{window}, shown below. This window consists of
% a search buffer and a lookahead buffer, both of which have a fixed (but configurable) size.
% This window passes over the string one character at a time, inserting a pointer if it finds
% the lookahead buffer inside its search buffer, and a plain character otherwise.
%
%
% \begin{center}
% \begin{tikzpicture}
% % Text tape
% \node[color=gray] at (-0.75, 0) {\texttt{...}};
% \node[color=gray] at (0.0, 0) {\texttt{D}};
% \node at (0.5, 0) {\texttt{A}};
% \node at (1.0, 0) {\texttt{B}};
% \node at (1.5, 0) {\texttt{C}};
% \node at (2.0, 0) {\texttt{D}};
% \node at (2.5, 0) {\texttt{A}};
% \node at (3.0, 0) {\texttt{B}};
% \node at (3.5, 0) {\texttt{C}};
% \node at (4.0, 0) {\texttt{D}};
% \node[color=gray] at (4.5, 0) {\texttt{B}};
% \node[color=gray] at (5.0, 0) {\texttt{D}};
% \node[color=gray] at (5.5, 0) {\texttt{A}};
% \node[color=gray] at (6.0, 0) {\texttt{C}};
% \node[color=gray] at (6.75, 0) {\texttt{...}};
%
% \draw (-1.75, 0.25) -- (7.25, 0.25);
% \draw (-1.75, -0.25) -- (7.25, -0.25);
%
%
% \draw[line width = 0.7mm, color=oblue, dotted] (2.25, 0.5) -- (2.25, -0.5);
% \draw[line width = 0.7mm, color=oblue]
% (-1.25, 0.5)
% -- (4.25, 0.5)
% -- (4.25, -0.5)
% -- (-1.25, -0.5)
% -- cycle
% ;
%
% \draw
% (4.2, -0.625)
% -- (4.2, -0.75)
% to node[anchor=north, midway] {lookahead} (2.3, -0.75)
% -- (2.3, -0.625)
% ;
%
% \draw
% (2.2, -0.625)
% -- (2.2, -0.75)
% to node[anchor=north, midway] {search buffer} (-1.1, -0.75)
% -- (-1.1, -0.625)
% ;
%
% \draw[color=gray]
% (2.2, 0.625)
% -- (2.2, 0.75)
% to node[anchor=south, midway] {match!} (0.3, 0.75)
% -- (0.3, 0.625)
% ;
%
% %\draw[->, color=gray] (2.5, 0.3) -- (2.5, 0.8) to[out=90,in=90] (0.5, 0.8);
% \node at (7.0, -0.75) {Result: \texttt{[$\cdot\cdot\cdot$DABCD<4,4>$\cdot\cdot\cdot$]}};
% \end{tikzpicture}
% \end{center}
%
% This is not the exact process used in practice---but it's close enough. \par
% This process may be tweaked in any number of ways.
%\end{instructornote}
%
%\makeatletter\if@solutions
% \vfill
% \pagebreak
%\fi\makeatother

View File

@ -3,7 +3,7 @@
\example{} \example{}
Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits, by mapping... With a na\"ive coding scheme, we can encode a length $n$ string with $3n$ bits, by mapping...
\begin{itemize} \begin{itemize}
\item $\texttt{A}$ to $\texttt{000}$ \item $\texttt{A}$ to $\texttt{000}$
\item $\texttt{B}$ to $\texttt{001}$ \item $\texttt{B}$ to $\texttt{001}$
@ -12,12 +12,12 @@ With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits,
\item $\texttt{E}$ to $\texttt{100}$ \item $\texttt{E}$ to $\texttt{100}$
\end{itemize} \end{itemize}
For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par
To encoding strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we To encode strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we
need an average of three bits per symbol. need an average of three bits per symbol.
\vspace{2mm} \vspace{2mm}
One could argue that this coding scheme is wasteful: \par However, one could argue that this coding scheme is wasteful: \par
we're not using three of the eight possible three-bit sequences! we're not using three of the eight possible three-bit sequences!
\example{} \example{}
@ -86,9 +86,8 @@ Is this a good way to encode five-letter strings?
\remark{} \remark{}
The code from the previous page can be visualized as a tree which we traverse while decoding our sequence. The code from the previous page can be visualized as a full binary tree: \par
Starting from the topmost node, we take the left edge if we see a \texttt{0} and the right edge if we see a \texttt{1}. \note{Every node in a \textit{full binary tree} has either zero or two children.}
Once we reach a letter, we return to the top node and repeat the process.
\vspace{-5mm} \vspace{-5mm}
\null\hfill \null\hfill
@ -135,10 +134,19 @@ Once we reach a letter, we return to the top node and repeat the process.
\end{center} \end{center}
\end{minipage} \end{minipage}
\hfill\null \hfill\null
You can think of each symbol's code as it's \say{address} in this tree.
When decoding a string, we start at the topmost node. Reading the binary sequence
bit by bit, we move down the tree, taking a left edge if we see a \texttt{0}
and a right edge if we see a \texttt{1}.
Once we reach a letter, we return to the top node and repeat the process.
\definition{}
We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par
\problem{}
Convince yourself that trees like the one above always produce a prefix-free code.
\problem{}<treedecode> \problem{}<treedecode>
Decode \texttt{[110111001001110110]} using the tree above. Decode \texttt{[110111001001110110]} using the tree above.
@ -149,6 +157,18 @@ Decode \texttt{[110111001001110110]} using the tree above.
\vfill \vfill
\problem{}
Encode \texttt{ABDECBE} using this tree. \par
How many bits do we save over a na\"ive scheme?
\begin{solution}
This is \texttt{[00 01 110 111 10 01 111]}, and saves four bits.
\end{solution}
\vfill
\pagebreak
\problem{} \problem{}
In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par
\note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.} \note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.}
@ -236,13 +256,19 @@ Now, do the opposite: draw a tree that encodes \texttt{DEACBDD} \textit{less} ef
\vfill \vfill
\remark{} \remark{}
We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par As we just saw, constructing a prefix-free code is fairly easy. \par
As we've seen, it is fairly easy to construct a prefix-free variable-length code using a binary tree. \par
Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par
We'll spend the rest of this section solving this problem.
\pagebreak \pagebreak
\remark{} \remark{}
Let's restate our problem. \par Let's restate our problem. \par
Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes
@ -270,16 +296,13 @@ Where...
\vspace{2mm} \vspace{2mm}
Also, notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems. Also notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.
\problem{}<hufptone> \problem{}<hufptone>
Let $f$ be fixed frequency function over an alphabet $A$. \par Let $f$ be fixed frequency function over an alphabet $A$. \par
Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par
Construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
\vspace{2mm}
Now, construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
\begin{equation*} \begin{equation*}
\mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr) \mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
\end{equation*} \end{equation*}
@ -300,8 +323,8 @@ Now, construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
\pagebreak \pagebreak
\problem{}<hufpttwo> \problem{}<hufpttwo>
Show that is an optimal tree in which the two symbols with the lowest frequencies have the same parent. Show that there is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
\hint{You may assume that an optimal tree exists. Check three nontrivial cases.} \hint{You may assume that an optimal tree exists. There are a few cases.}
\begin{solution} \begin{solution}
Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par
@ -356,7 +379,7 @@ Then, use the previous two problems to show that your algorithm indeed produces
\vspace{2mm} \vspace{2mm}
In plain english: pick the two nodes with the smallest frequency, combine them, In plain english: pick the two nodes with the smallest frequency, combine them,
and add that into the alphabet as a \say{compound symbol}. Repeat until you're done. and replace them with a \say{compound symbol}. Repeat until you're done.
\linehack{} \linehack{}