\section{LZ Codes} The LZ-family\footnotemark{} of codes (LZ77, LZ78, LZSS, LZMA, and others) take advantage of repeated sequences of symbols in a string. They are the basis of most modern compression algorithms, including DEFLATE, which is used in the ZIP, PNG, and GZIP formats. \footnotetext{ Named after Abraham Lempel and Jacob Ziv, the original inventors. \par LZ77 is the algorithm described in their first paper on the topic, which was published in 1977. \par LZ78, LZSS, and LZMA are minor variations on the same general idea. } \vspace{2mm} The idea behind LZ is to represent repeated substrings as \textit{pointers} to previous parts of the string. \par Pointers take the form \texttt{}, where \texttt{pos} is the position of the string to repeat and \texttt{len} is the number of symbols to copy. \vspace{2mm} For example, we can encode the string \texttt{ABRACADABRA} as \texttt{[ABRACAD<7, 4>]}. \par The pointer \texttt{<7, 4>} tells us to look back 7 positions (to the first \texttt{A}), and copy the next 4 symbols. \par Note that pointers refer to the partially decoded output---\textit{not} to the encoded string. \par This allows pointers to reference other pointers, and ensures codes like \texttt{A<1,9>} are valid. \problem{} Encode \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} using LZ. Then, decode the following: \begin{itemize} \item \texttt{[ABCD<4,4>]} \item \texttt{[A<1,9>]} \item \texttt{[DAC<3,5>]} \end{itemize} \begin{solution} \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} becomes \texttt{[ABCD<4, 4> BA<2,4> ABCD<4,4>]}. \linehack{} In parts two and three, remember that we're reading the \textit{output string.} \par The nine \texttt{A}s in part two are produced one by one, \par with the decoder's \say{read head} following its \say{write head.} \begin{itemize} \item \texttt{ABCD$\cdot$ABCD} \item \texttt{AAAAA$\cdot$AAAAA} \item \texttt{DACDACDA} \end{itemize} \end{solution} \vfill \problem{} Convince yourself that LZ is a generalization of the run-length code we discussed in the previous section. \hint{\texttt{[A<1,9>]} and \texttt{[00-1001]} are the same thing!} \remark{} Note that we left a few things out of this section: we didn't discuss the algorithm that converts a string to an LZ-encoded blob, nor did we discuss how we should represent strings encoded with LZ in binary. We skipped these details because they are problems of implementation---they're the engineer's headache, not the mathematician's. If you're interested, a brief explanation is below. Ask an instructor to explain. \begin{center} \begin{tikzpicture} \node[anchor=west,color=gray] at (-2.3, 0) {Bits}; \node[anchor=west,color=gray] at (-2.3, -0.5) {Meaning}; \draw[color=gray] (-2.3, -0.25) -- (5.5, -0.25); \draw[color=gray] (-2.3, 0.15) -- (-2.3, -0.65); \node at (0, 0) {\texttt{0}}; \node at (1, 0) {\texttt{0}}; \node at (2, 0) {\texttt{1}}; \node at (3, 0) {\texttt{0}}; \node at (4, 0) {\texttt{1}}; \node at (5, 0) {\texttt{1}}; \node at (6, 0) {\texttt{0}}; \node at (7, 0) {\texttt{0}}; \node at (8, 0) {\texttt{1}}; \draw (-0.5, 0.25) -- (8.5, 0.25); \draw (-0.5, -0.25) -- (8.5, -0.25); \draw (-0.5, -0.75) -- (8.5, -0.75); \draw (-0.5, 0.25) -- (-0.5, -0.75); \draw (0.5, 0.25) -- (0.5, -0.75); \draw (8.5, 0.25) -- (8.5, -0.75); \node at (0, -0.5) {flag}; \node at (4.5, -0.5) {if flag \texttt{}, else eight-bit symbol}; \end{tikzpicture} \end{center} \begin{center} \begin{tikzpicture} % Text tape \node[color=gray] at (-0.75, 0) {\texttt{...}}; \node[color=gray] at (0.0, 0) {\texttt{D}}; \node at (0.5, 0) {\texttt{A}}; \node at (1.0, 0) {\texttt{B}}; \node at (1.5, 0) {\texttt{C}}; \node at (2.0, 0) {\texttt{D}}; \node at (2.5, 0) {\texttt{A}}; \node at (3.0, 0) {\texttt{B}}; \node at (3.5, 0) {\texttt{C}}; \node at (4.0, 0) {\texttt{D}}; \node[color=gray] at (4.5, 0) {\texttt{B}}; \node[color=gray] at (5.0, 0) {\texttt{D}}; \node[color=gray] at (5.5, 0) {\texttt{A}}; \node[color=gray] at (6.0, 0) {\texttt{C}}; \node[color=gray] at (6.75, 0) {\texttt{...}}; \draw (-1.75, 0.25) -- (7.25, 0.25); \draw (-1.75, -0.25) -- (7.25, -0.25); \draw[line width = 0.7mm, color=oblue, dotted] (2.25, 0.5) -- (2.25, -0.5); \draw[line width = 0.7mm, color=oblue] (-1.25, 0.5) -- (4.25, 0.5) -- (4.25, -0.5) -- (-1.25, -0.5) -- cycle ; \draw (4.2, -0.625) -- (4.2, -0.75) to node[anchor=north, midway] {lookahead} (2.3, -0.75) -- (2.3, -0.625) ; \draw (2.2, -0.625) -- (2.2, -0.75) to node[anchor=north, midway] {search buffer} (-1.1, -0.75) -- (-1.1, -0.625) ; \draw[color=gray] (2.2, 0.625) -- (2.2, 0.75) to node[anchor=south, midway] {match!} (0.3, 0.75) -- (0.3, 0.625) ; %\draw[->, color=gray] (2.5, 0.3) -- (2.5, 0.8) to[out=90,in=90] (0.5, 0.8); \node at (7.0, -0.75) {Result: \texttt{[$\cdot\cdot\cdot$DABCD<4,4>$\cdot\cdot\cdot$]}}; \end{tikzpicture} \end{center} \vfill \pagebreak