This commit is contained in:
2024-04-23 17:33:58 -07:00
parent d8698b4c81
commit 8269bf1135
4 changed files with 200 additions and 157 deletions

View File

@ -3,7 +3,7 @@
\example{}
Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits, by mapping...
With a na\"ive coding scheme, we can encode a length $n$ string with $3n$ bits, by mapping...
\begin{itemize}
\item $\texttt{A}$ to $\texttt{000}$
\item $\texttt{B}$ to $\texttt{001}$
@ -12,12 +12,12 @@ With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits,
\item $\texttt{E}$ to $\texttt{100}$
\end{itemize}
For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par
To encoding strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we
To encode strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we
need an average of three bits per symbol.
\vspace{2mm}
One could argue that this coding scheme is wasteful: \par
However, one could argue that this coding scheme is wasteful: \par
we're not using three of the eight possible three-bit sequences!
\example{}
@ -86,9 +86,8 @@ Is this a good way to encode five-letter strings?
\remark{}
The code from the previous page can be visualized as a tree which we traverse while decoding our sequence.
Starting from the topmost node, we take the left edge if we see a \texttt{0} and the right edge if we see a \texttt{1}.
Once we reach a letter, we return to the top node and repeat the process.
The code from the previous page can be visualized as a full binary tree: \par
\note{Every node in a \textit{full binary tree} has either zero or two children.}
\vspace{-5mm}
\null\hfill
@ -135,10 +134,19 @@ Once we reach a letter, we return to the top node and repeat the process.
\end{center}
\end{minipage}
\hfill\null
You can think of each symbol's code as it's \say{address} in this tree.
When decoding a string, we start at the topmost node. Reading the binary sequence
bit by bit, we move down the tree, taking a left edge if we see a \texttt{0}
and a right edge if we see a \texttt{1}.
Once we reach a letter, we return to the top node and repeat the process.
\definition{}
We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par
\problem{}
Convince yourself that trees like the one above always produce a prefix-free code.
\problem{}<treedecode>
Decode \texttt{[110111001001110110]} using the tree above.
@ -149,6 +157,18 @@ Decode \texttt{[110111001001110110]} using the tree above.
\vfill
\problem{}
Encode \texttt{ABDECBE} using this tree. \par
How many bits do we save over a na\"ive scheme?
\begin{solution}
This is \texttt{[00 01 110 111 10 01 111]}, and saves four bits.
\end{solution}
\vfill
\pagebreak
\problem{}
In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par
\note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.}
@ -236,13 +256,19 @@ Now, do the opposite: draw a tree that encodes \texttt{DEACBDD} \textit{less} ef
\vfill
\remark{}
We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par
As we've seen, it is fairly easy to construct a prefix-free variable-length code using a binary tree. \par
As we just saw, constructing a prefix-free code is fairly easy. \par
Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par
We'll spend the rest of this section solving this problem.
\pagebreak
\remark{}
Let's restate our problem. \par
Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes
@ -270,16 +296,13 @@ Where...
\vspace{2mm}
Also, notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.
Also notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.
\problem{}<hufptone>
Let $f$ be fixed frequency function over an alphabet $A$. \par
Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par
\vspace{2mm}
Now, construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
Construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
\begin{equation*}
\mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
\end{equation*}
@ -300,8 +323,8 @@ Now, construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
\pagebreak
\problem{}<hufpttwo>
Show that is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
\hint{You may assume that an optimal tree exists. Check three nontrivial cases.}
Show that there is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
\hint{You may assume that an optimal tree exists. There are a few cases.}
\begin{solution}
Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par
@ -356,7 +379,7 @@ Then, use the previous two problems to show that your algorithm indeed produces
\vspace{2mm}
In plain english: pick the two nodes with the smallest frequency, combine them,
and add that into the alphabet as a \say{compound symbol}. Repeat until you're done.
and replace them with a \say{compound symbol}. Repeat until you're done.
\linehack{}