Polish
This commit is contained in:
@ -5,37 +5,37 @@
|
||||
\section{Run-length Coding}
|
||||
|
||||
|
||||
\definition{}
|
||||
\textit{Entropy} is a measure of information in a certain sequence. \par
|
||||
A sequence with high entropy contains a lot of information, and a sequence with low entropy contains relatively little.
|
||||
For example, consider the following two ten-symbol ASCII\footnotemark{} strings:
|
||||
\begin{itemize}
|
||||
\item \texttt{AAAAAAAAAA}
|
||||
\item \texttt{pDa3:7?j;F}
|
||||
\end{itemize}
|
||||
The first string clearly contains less information than the second.
|
||||
It's much harder to describe \texttt{pDa3:7?j;F} than it is \texttt{AAAAAAAAAA}.
|
||||
Thus, we say that the first has low entropy, and the second has fairly high entropy.
|
||||
%\definition{}
|
||||
%\textit{Entropy} is a measure of information in a certain sequence. \par
|
||||
%A sequence with high entropy contains a lot of information, and a sequence with low entropy contains relatively little.
|
||||
%For example, consider the following two ten-symbol ASCII\footnotemark{} strings:
|
||||
%\begin{itemize}
|
||||
% \item \texttt{AAAAAAAAAA}
|
||||
% \item \texttt{pDa3:7?j;F}
|
||||
%\end{itemize}
|
||||
%The first string clearly contains less information than the second.
|
||||
%It's much harder to describe \texttt{pDa3:7?j;F} than it is \texttt{AAAAAAAAAA}.
|
||||
%Thus, we say that the first has low entropy, and the second has fairly high entropy.
|
||||
%
|
||||
%\vspace{2mm}
|
||||
%
|
||||
%The definition above is intentionally hand-wavy. \par
|
||||
%Formal definitions of entropy exist, but we won't need them today---we just need
|
||||
%an intuitive understanding of the \say{density} of information in a given string.
|
||||
|
||||
\vspace{2mm}
|
||||
|
||||
The definition above is intentionally hand-wavy. \par
|
||||
Formal definitions of entropy exist, but we won't need them today---we just need
|
||||
an intuitive understanding of the \say{density} of information in a given string.
|
||||
%
|
||||
%\footnotetext{
|
||||
% American Standard Code for Information Exchange, an early character encoding for computers. \par
|
||||
% It contains 128 symbols, including numbers, letters, and
|
||||
% \texttt{!"\#\$\%\&`()*+,-./:;<=>?@[\textbackslash]\^\_\{|\}\textasciitilde}
|
||||
%}
|
||||
|
||||
|
||||
\footnotetext{
|
||||
American Standard Code for Information Exchange, an early character encoding for computers. \par
|
||||
It contains 128 symbols, including numbers, letters, and
|
||||
\texttt{!"\#\$\%\&`()*+,-./:;<=>?@[\textbackslash]\^\_\{|\}\textasciitilde}
|
||||
}
|
||||
|
||||
|
||||
\vspace{5mm}
|
||||
%\vspace{5mm}
|
||||
|
||||
|
||||
\problem{}<runlenone>
|
||||
Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} as binary blob. \par
|
||||
Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} in binary. \par
|
||||
\note[Note]{
|
||||
We're still using the four-symbol alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$. \par
|
||||
Dots ($\cdot$) in the string are drawn for readability. Ignore them.
|
||||
@ -48,12 +48,13 @@ Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AA
|
||||
|
||||
|
||||
\vfill
|
||||
In \ref{runlenone}---and often, in the real world---the strings we want to encode have fairly low entropy.
|
||||
We can leverage this fact to develop efficient encoding schemes.
|
||||
In \ref{runlenone}---and often, in the real world---the strings we want to encode have fairly low \textit{entropy}. \par
|
||||
They have predictable patterns, sequences of symbols that don't contain a lot of information. \par
|
||||
We can exploit this fact to develop efficient encoding schemes.
|
||||
|
||||
\example{}
|
||||
The simplest such coding scheme is \textit{run-length encoding}. Instead of simply listing letters of a string
|
||||
in their binary form, we'll add a \textit{count} to each letter, compressing repeated sequences of the same symbol.
|
||||
A simple example of such a coding scheme is \textit{run-length encoding}. Instead of simply listing letters of a string
|
||||
in their binary form, we'll add a \textit{count} to each letter, shortening repeated instances of the same symbol.
|
||||
|
||||
\vspace{2mm}
|
||||
|
||||
@ -86,16 +87,10 @@ We'll encode our string into a sequence of 6-bit blocks, interpreted as follows:
|
||||
\end{tikzpicture}
|
||||
\end{center}
|
||||
So, the sequence \texttt{BBB} will be encoded as \texttt{[0011-01]}. \par
|
||||
\note[Notation]{Just like spaces, dashes in a binary blob are added for readability.}
|
||||
|
||||
|
||||
\remark{Notation}
|
||||
In this handout, encoded binary blobs will always be written in square brackets. \par
|
||||
Ignore spaces and dashes, they are provided for convenience. \par
|
||||
For example, the binary sequences \texttt{[000 011 100 001 010 100]} and \texttt{[000011100001010100]} \par
|
||||
are identical. The first, however, is easier to read.
|
||||
|
||||
\pagebreak
|
||||
\note[Notation]{
|
||||
Just like dots, dashes and spaces are added for readability. \par
|
||||
Encoded binary sequences will always be written in square brackets. \texttt{[]}.
|
||||
}
|
||||
|
||||
\problem{}
|
||||
Encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} using this scheme. \par
|
||||
@ -107,6 +102,15 @@ Is this more or less efficient than \ref{runlenone}?
|
||||
\end{solution}
|
||||
|
||||
\vfill
|
||||
\pagebreak
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\problem{}
|
||||
@ -137,7 +141,7 @@ Fix this problem: modify the scheme so that single occurrences of symbols do not
|
||||
Consider the following string: \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD}. \par
|
||||
\begin{itemize}
|
||||
\item How many bits do we need to encode this na\"ively? \par
|
||||
\item How about with the (unmodified) run-length scheme described above?
|
||||
\item How about with the (unmodified) run-length scheme described on the previous page?
|
||||
\end{itemize}
|
||||
\hint{You don't need to encode this string---just find the length of its encoded form.}
|
||||
|
||||
|
Reference in New Issue
Block a user