This commit is contained in:
Mark 2024-04-24 15:33:33 -07:00
parent 8269bf1135
commit 0bfe54d69b
4 changed files with 37 additions and 9 deletions

View File

@ -2,8 +2,7 @@
% use [solutions] flag to show solutions.
\documentclass[
solutions,
singlenumbering,
unfinished
singlenumbering
]{../../resources/ormc_handout}
\usepackage{../../resources/macros}

View File

@ -27,8 +27,9 @@ How many bits will we need? \par
\problem{}<naivelen>
Similarly, we can use a na\"ive coding scheme to encode an $n$-symbol string over an alphabet of size $k$ \par
using $n \times \lceil \log_2k \rceil$ bits. Convince yourself that this is true.
Similarly, we can encode an $n$-symbol string over an alphabet of size $k$ \par
using $n \times \lceil \log_2k \rceil$ bits. Show that this is true. \par
\note[Note]{We'll call this the \textit{na\"ive coding scheme}.}
\vfill

View File

@ -49,8 +49,13 @@ Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AA
\vfill
In \ref{runlenone}---and often, in the real world---the strings we want to encode have fairly low \textit{entropy}. \par
They have predictable patterns, sequences of symbols that don't contain a lot of information. \par
We can exploit this fact to develop efficient encoding schemes.
That is, they have predictable patterns, sequences of symbols that don't contain a lot of information. \par
\note{
For example, consider the text in this document. \par
The symbols \texttt{e}, \texttt{t}, and \texttt{<space>} are much more common than any others. \par
Also, certain subsequences are repeated: \texttt{th}, \texttt{and}, \texttt{encode}, and so on.
}
We can exploit this fact to develop encoding schemes that need relatively few bits per letter.
\example{}
A simple example of such a coding scheme is \textit{run-length encoding}. Instead of simply listing letters of a string
@ -88,10 +93,18 @@ We'll encode our string into a sequence of 6-bit blocks, interpreted as follows:
\end{center}
So, the sequence \texttt{BBB} will be encoded as \texttt{[0011-01]}. \par
\note[Notation]{
Just like dots, dashes and spaces are added for readability. \par
Just like dots, dashes and spaces are added for readability. Pretend they don't exist. \par
Encoded binary sequences will always be written in square brackets. \texttt{[]}.
}
\problem{}
Decode \texttt{[010000001111]} using this scheme.
\begin{solution}
\texttt{AAAADDD}
\end{solution}
\vfill
\problem{}
Encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} using this scheme. \par
Is this more or less efficient than \ref{runlenone}?
@ -109,12 +122,26 @@ Is this more or less efficient than \ref{runlenone}?
\problem{}
Give an example of a message on $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$
that uses $n$ bits when encoded with a na\"ive scheme, and \textit{fewer} than $\nicefrac{n}{2}$ bits
when encoded using the scheme described on the previous page.
\vfill
\problem{}
Give an example of a message on $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$
that uses $n$ bits when encoded with a na\"ive scheme, and \textit{more} than $2n$ bits
when encoded using the scheme described on the previous page.
\vfill
\problem{}
Is run-length coding always efficient? When does it work well, and when does it fail?
Is run-length coding always more efficient than na\"ive coding? \par
When does it work well, and when does it fail?
\vfill

View File

@ -21,7 +21,8 @@ Pointers take the form \texttt{<pos, len>}, where \texttt{pos} is the position o
For example, we can encode the string \texttt{ABRACADABRA} as \texttt{[ABRACAD<7, 4>]}. \par
The pointer \texttt{<7, 4>} tells us to look back 7 positions (to the first \texttt{A}), and copy the next 4 symbols. \par
Note that pointers refer to the partially decoded output---\textit{not} to the encoded string. \par
This allows pointers to reference other pointers, and ensures that codes like \texttt{A<1,9>} are valid.
This allows pointers to reference other pointers, and ensures that codes like \texttt{A<1,9>} are valid. \par
\note{For example, \texttt{[B<1,2>]} decodes to \texttt{BBB}.}
\problem{}
Encode \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} using this scheme. \par