Added huffman problems
This commit is contained in:
parent
b37af6cc27
commit
d8698b4c81
@ -8,7 +8,7 @@
|
|||||||
\usepackage{../../resources/macros}
|
\usepackage{../../resources/macros}
|
||||||
|
|
||||||
\input{tikzset.tex}
|
\input{tikzset.tex}
|
||||||
|
\usepackage{units}
|
||||||
|
|
||||||
\uptitlel{Advanced 2}
|
\uptitlel{Advanced 2}
|
||||||
\uptitler{\smallurl{}}
|
\uptitler{\smallurl{}}
|
||||||
|
@ -241,4 +241,161 @@ As we've seen, it is fairly easy to construct a prefix-free variable-length code
|
|||||||
Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par
|
Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par
|
||||||
We'll spend the rest of this section solving this problem.
|
We'll spend the rest of this section solving this problem.
|
||||||
|
|
||||||
|
\pagebreak
|
||||||
|
|
||||||
|
\remark{}
|
||||||
|
Let's restate our problem. \par
|
||||||
|
Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes
|
||||||
|
|
||||||
|
\begin{equation*}
|
||||||
|
\mathcal{B}_f(T) = \sum_{a \in A} f(a) \times d_T(a)
|
||||||
|
\end{equation*}
|
||||||
|
|
||||||
|
Where...
|
||||||
|
\begin{itemize}[itemsep=1mm]
|
||||||
|
\item $a$ is a symbol in $A$
|
||||||
|
|
||||||
|
\item $d_T(a)$ is the \say{depth} of $a$ in our tree. \par
|
||||||
|
\note{In other words, $d_T(a)$ is the number of bits we need to encode $a$}
|
||||||
|
|
||||||
|
\item $f(a)$ is a frequency function that maps each symbol in $A$ to a value in $[0, 1]$. \par
|
||||||
|
You can think of this as the distribution of symbols in messages we expect to encode. \par
|
||||||
|
For example, consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}\}$:
|
||||||
|
\begin{itemize}
|
||||||
|
\item In $\texttt{AAA}$, $f(\texttt{A}) = 1$ and $f(\texttt{B}) = f(\texttt{C}) = 0$.
|
||||||
|
\item In $\texttt{ABC}$, $f(\texttt{A}) = f(\texttt{B}) = f(\texttt{C}) = \nicefrac{1}{3}$.
|
||||||
|
\end{itemize}
|
||||||
|
\note{Note that $f(a) \geq 0$ and $\sum f(a) = 1$.}
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
\vspace{2mm}
|
||||||
|
|
||||||
|
Also, notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.
|
||||||
|
|
||||||
|
|
||||||
|
\problem{}<hufptone>
|
||||||
|
Let $f$ be fixed frequency function over an alphabet $A$. \par
|
||||||
|
Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par
|
||||||
|
|
||||||
|
\vspace{2mm}
|
||||||
|
|
||||||
|
Now, construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
|
||||||
|
\begin{equation*}
|
||||||
|
\mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
|
||||||
|
\end{equation*}
|
||||||
|
|
||||||
|
\begin{solution}
|
||||||
|
$\mathcal{B}_f(T)$ and $\mathcal{B}_f(T')$ are nearly identical, and differ only at $d_T(a)$ and $d_T(b)$.
|
||||||
|
So, we get...
|
||||||
|
|
||||||
|
\begin{align*}
|
||||||
|
\mathcal{B}_f(T) - \mathcal{B}_f(T')
|
||||||
|
&= f(a)d_T(a) + f(b)d_T(b) - f(a)d_T(b) - f(b)d_T(a) \\
|
||||||
|
&= f(a)\bigl(d_T(a) - d_T(b)\bigr) + f(b)\bigl(d_T(b) - d_T(a)\bigr) \\
|
||||||
|
&= \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
|
||||||
|
\end{align*}
|
||||||
|
\end{solution}
|
||||||
|
|
||||||
|
\vfill
|
||||||
|
\pagebreak
|
||||||
|
|
||||||
|
\problem{}<hufpttwo>
|
||||||
|
Show that is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
|
||||||
|
\hint{You may assume that an optimal tree exists. Check three nontrivial cases.}
|
||||||
|
|
||||||
|
\begin{solution}
|
||||||
|
Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par
|
||||||
|
If there is a tie among three or more symbols, pick $a, b$ to be those with the greatest depth. \par
|
||||||
|
Label $a$ and $b$ so that that $d_T(a) \geq d_T(a)$.
|
||||||
|
|
||||||
|
\vspace{1mm}
|
||||||
|
|
||||||
|
If $a$ and $b$ share a parent, we're done.
|
||||||
|
If $a$ and $b$ do not share a parent, we have three cases:
|
||||||
|
\begin{itemize}[itemsep=1mm]
|
||||||
|
\item There is a node $x$ with $d_T(x) > d_T(a)$. \par
|
||||||
|
Create $T'$ by swapping $a$ and $x$. By definition, $f(a) < f(x)$, and thus
|
||||||
|
by \ref{hufptone} $\mathcal{B}_f(T) > \mathcal{B}_f(T')$. This is a contradiction,
|
||||||
|
since we chose $T$ as an optimal tree---so this case is impossible.
|
||||||
|
|
||||||
|
\item $a$ is an only child. Create $T'$ by removing $a$'s parent and replacing it with $a$. \par
|
||||||
|
Then $\mathcal{B}_f(T) > \mathcal{B}_f(T')$, same contradiction as above. \par
|
||||||
|
\note{If we assume $T$ is a full binary tree, this case doesn't exist.}
|
||||||
|
|
||||||
|
\item $a$ has a sibling $x$, and $x$ isn't $b$. \par
|
||||||
|
Let $T'$ be the tree created by swapping $x$ and $b$ (thus making $a$ and $b$ siblings). \par
|
||||||
|
By \ref{hufptone}, $\mathcal{B}_f(T) \geq \mathcal{B}_f(T')$. $T$ is optimal, so there cannot
|
||||||
|
be a tree with a better average length---thus $\mathcal{B}_f(T) = \mathcal{B}_f(T')$ and $T'$
|
||||||
|
is also optimal.
|
||||||
|
\end{itemize}
|
||||||
|
\end{solution}
|
||||||
|
|
||||||
|
\vfill
|
||||||
|
\pagebreak
|
||||||
|
|
||||||
|
\problem{}
|
||||||
|
Devise an algorithm that builds an optimal tree given an alphabet $A$ and a frequency function $f$. \par
|
||||||
|
Then, use the previous two problems to show that your algorithm indeed produces an ideal tree. \par
|
||||||
|
\hint{
|
||||||
|
First, make an algorithm that makes sense intuitively. \par
|
||||||
|
Once you have something that looks good, start your proof.
|
||||||
|
} \par
|
||||||
|
\hint{Build from the bottom.}
|
||||||
|
|
||||||
|
\begin{solution}
|
||||||
|
\textbf{The Algorithm:} \par
|
||||||
|
Given an alphabet $A$ and a frequency function $f$...
|
||||||
|
\begin{itemize}
|
||||||
|
\item If $|A| = 1$, return a single node.
|
||||||
|
\item Let $a, b$ be two symbols with the smallest frequency.
|
||||||
|
\item Let $A' = A - \{a, b\} + \{x\}$ \tab \note{(Where $x$ is a new \say{placeholder} symbol)}
|
||||||
|
\item Let $f'(x) = f(a) + f(b)$, and $f'(s) = f(s)$ for all other symbols $s$.
|
||||||
|
\item Compute $T'$ by repeating this algorithm on $A'$ and $f'$
|
||||||
|
\item Create $T$ from $T'$ by adding $a$ and $b$ as children of $x$.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
\vspace{2mm}
|
||||||
|
In plain english: pick the two nodes with the smallest frequency, combine them,
|
||||||
|
and add that into the alphabet as a \say{compound symbol}. Repeat until you're done.
|
||||||
|
|
||||||
|
|
||||||
|
\linehack{}
|
||||||
|
\textbf{The Proof:} \par
|
||||||
|
We'll proceed by induction on $|A|$. \par
|
||||||
|
Let $f$ be an arbitrary frequency function.
|
||||||
|
|
||||||
|
\vspace{4mm}
|
||||||
|
|
||||||
|
\textbf{Base case:} $|A| = 1$. We only have one vertex, and we thus only have one tree. \par
|
||||||
|
The algorithm above produces this tree. Done.
|
||||||
|
|
||||||
|
\vspace{4mm}
|
||||||
|
|
||||||
|
\textbf{Induction:} Assume that for all $A$ with $|A| = n - 1$, the algorithm above produces an ideal tree.
|
||||||
|
First, we'll show that $\mathcal{B}_f(T) = \mathcal{B}_{f'}(T') + f(a) + f(b)$:
|
||||||
|
\begin{align*}
|
||||||
|
\mathcal{B}_f(T)
|
||||||
|
&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f(a)d_T(a) + f(b)d_T(b) \\
|
||||||
|
&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + \Bigl(f(a)+f(b)\Bigr)\Bigl(d_{T'}(x) + 1\Bigr) \\
|
||||||
|
&= \sum_{x \in A - \{a, b\}} \Bigl(f(x)d_T(x)\Bigr) + f'(z)d_{T'}(z) + f(a) + f(b) \\
|
||||||
|
&= \sum_{x \in A'} \Bigl(f'(x)d_{T'}(x)\Bigr) + f(a) + f(b) \\
|
||||||
|
&= \mathcal{B}_{f'}(T') + f(a) + f(b)
|
||||||
|
\end{align*}
|
||||||
|
|
||||||
|
Now, assume that $T$ is not optimal. There then exists an optimal tree $U$ with $a$ and $b$ as siblings (by \ref{hufpttwo}).
|
||||||
|
Let $U'$ be the tree created by removing $a, b$ from $U$. $U'$ is a tree for $A'$ and $f'$, so we can repeat the calculation
|
||||||
|
above to find that $\mathcal{B}_f(U) = \mathcal{B}_{f'}(U') + f(a) + f(b)$.
|
||||||
|
|
||||||
|
\vspace{2mm}
|
||||||
|
|
||||||
|
So, $
|
||||||
|
\mathcal{B}_{f'}(T')
|
||||||
|
~=~ \mathcal{B}_f(T) - f(a) - f(b)
|
||||||
|
~>~ \mathcal{B}_f(U) - f(a) - f(b)
|
||||||
|
~=~ \mathcal{B}_{f'}(U')
|
||||||
|
$. \par
|
||||||
|
Since $T'$ is optimal for $A'$ and $f'$, this is a contradition. $T$ must therefore be optimal.
|
||||||
|
\end{solution}
|
||||||
|
|
||||||
|
\vfill
|
||||||
\pagebreak
|
\pagebreak
|
Loading…
x
Reference in New Issue
Block a user