Polish

2024-04-23 17:33:58 -07:00
parent d8698b4c81
commit 8269bf1135
4 changed files with 200 additions and 157 deletions
--- a/Advanced/Compression/parts/0
+++ b/Advanced/Compression/parts/0
@@ -9,7 +9,7 @@ A \textit{string} is a sequence of symbols from an alphabet. \par
 For example, \texttt{CBCAADDD} is a string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$.

 \problem{}
-Say we want to store a length-$n$ string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ as a binary blob. \par
+Say we want to store a length-$n$ string over the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$ as a binary sequence. \par
 How many bits will we need? \par
 \hint{
 	Our alphabet has four symbols, so we can encode each symbol using two bits, \par
@@ -32,6 +32,6 @@ using $n \times \lceil \log_2k \rceil$ bits. Convince yourself that this is true


 \vfill
-Of course, this isn't ideal---we can do much better than $n \times \lceil \log_2k \rceil$.
+As you might expect, this isn't ideal: we can do much better than $n \times \lceil \log_2k \rceil$.
 We will spend the rest of this handout exploring more efficient ways of encoding such sequences of symbols.
 \pagebreak
--- a/Advanced/Compression/parts/1
+++ b/Advanced/Compression/parts/1
@@ -5,37 +5,37 @@
 \section{Run-length Coding}


-\definition{}
-\textit{Entropy} is a measure of information in a certain sequence. \par
-A sequence with high entropy contains a lot of information, and a sequence with low entropy contains relatively little.
-For example, consider the following two ten-symbol ASCII\footnotemark{} strings:
-\begin{itemize}
-	\item \texttt{AAAAAAAAAA}
-	\item \texttt{pDa3:7?j;F}
-\end{itemize}
-The first string clearly contains less information than the second.
-It's much harder to describe \texttt{pDa3:7?j;F} than it is \texttt{AAAAAAAAAA}.
-Thus, we say that the first has low entropy, and the second has fairly high entropy.
+%\definition{}
+%\textit{Entropy} is a measure of information in a certain sequence. \par
+%A sequence with high entropy contains a lot of information, and a sequence with low entropy contains relatively little.
+%For example, consider the following two ten-symbol ASCII\footnotemark{} strings:
+%\begin{itemize}
+%	\item \texttt{AAAAAAAAAA}
+%	\item \texttt{pDa3:7?j;F}
+%\end{itemize}
+%The first string clearly contains less information than the second.
+%It's much harder to describe \texttt{pDa3:7?j;F} than it is \texttt{AAAAAAAAAA}.
+%Thus, we say that the first has low entropy, and the second has fairly high entropy.
+%
+%\vspace{2mm}
+%
+%The definition above is intentionally hand-wavy. \par
+%Formal definitions of entropy exist, but we won't need them today---we just need
+%an intuitive understanding of the \say{density} of information in a given string.

-\vspace{2mm}
-
-The definition above is intentionally hand-wavy. \par
-Formal definitions of entropy exist, but we won't need them today---we just need
-an intuitive understanding of the \say{density} of information in a given string.
+%
+%\footnotetext{
+%	American Standard Code for Information Exchange, an early character encoding for computers. \par
+%	It contains 128 symbols, including numbers, letters, and
+%	\texttt{!"\#\$\%\&`()*+,-./:;<=>?@[\textbackslash]\^\_\{|\}\textasciitilde}
+%}


-\footnotetext{
-	American Standard Code for Information Exchange, an early character encoding for computers. \par
-	It contains 128 symbols, including numbers, letters, and
-	\texttt{!"\#\$\%\&`()*+,-./:;<=>?@[\textbackslash]\^\_\{|\}\textasciitilde}
-}
-
-
-\vspace{5mm}
+%\vspace{5mm}


 \problem{}<runlenone>
-Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} as binary blob. \par
+Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} in binary. \par
 \note[Note]{
 	We're still using the four-symbol alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}\}$. \par
 	Dots ($\cdot$) in the string are drawn for readability. Ignore them.
@@ -48,12 +48,13 @@ Using a na\"ive coding scheme, encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AA


 \vfill
-In \ref{runlenone}---and often, in the real world---the strings we want to encode have fairly low entropy.
-We can leverage this fact to develop efficient encoding schemes.
+In \ref{runlenone}---and often, in the real world---the strings we want to encode have fairly low \textit{entropy}. \par
+They have predictable patterns, sequences of symbols that don't contain a lot of information. \par
+We can exploit this fact to develop efficient encoding schemes.

 \example{}
-The simplest such coding scheme is \textit{run-length encoding}. Instead of simply listing letters of a string
-in their binary form, we'll add a \textit{count} to each letter, compressing repeated sequences of the same symbol.
+A simple example of such a coding scheme is \textit{run-length encoding}. Instead of simply listing letters of a string
+in their binary form, we'll add a \textit{count} to each letter, shortening repeated instances of the same symbol.

 \vspace{2mm}

@@ -86,16 +87,10 @@ We'll encode our string into a sequence of 6-bit blocks, interpreted as follows:
 	\end{tikzpicture}
 \end{center}
 So, the sequence \texttt{BBB} will be encoded as \texttt{[0011-01]}. \par
-\note[Notation]{Just like spaces, dashes in a binary blob are added for readability.}
-
-
-\remark{Notation}
-In this handout, encoded binary blobs will always be written in square brackets. \par
-Ignore spaces and dashes, they are provided for convenience. \par
-For example, the binary sequences \texttt{[000 011 100 001 010 100]} and \texttt{[000011100001010100]} \par
-are identical. The first, however, is easier to read.
-
-\pagebreak
+\note[Notation]{
+	Just like dots, dashes and spaces are added for readability. \par
+	Encoded binary sequences will always be written in square brackets. \texttt{[]}.
+}

 \problem{}
 Encode \texttt{AAAA$\cdot$AAAA$\cdot$BCD$\cdot$AAAA$\cdot$AAAA} using this scheme. \par
@@ -107,6 +102,15 @@ Is this more or less efficient than \ref{runlenone}?
 \end{solution}

 \vfill
+\pagebreak
+
+
+
+
+
+
+
+


 \problem{}
@@ -137,7 +141,7 @@ Fix this problem: modify the scheme so that single occurrences of symbols do not
 Consider the following string: \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD}. \par
 \begin{itemize}
 	\item How many bits do we need to encode this na\"ively? \par
-	\item How about with the (unmodified) run-length scheme described above?
+	\item How about with the (unmodified) run-length scheme described on the previous page?
 \end{itemize}
 \hint{You don't need to encode this string---just find the length of its encoded form.}

--- a/Advanced/Compression/parts/2
+++ b/Advanced/Compression/parts/2
@@ -1,6 +1,6 @@
 \section{LZ Codes}

-The LZ-family\footnotemark{} of codes (LZ77, LZ78, LZSS, LZMA, and others) take advantage of repeated sequences of symbols
+The LZ-family\footnotemark{} of codes (LZ77, LZ78, LZSS, LZMA, and others) take advantage of repeated subsequences
 in a string. They are the basis of most modern compression algorithms, including DEFLATE, which is used in the ZIP, PNG,
 and GZIP formats.

@@ -21,10 +21,10 @@ Pointers take the form \texttt{<pos, len>}, where \texttt{pos} is the position o
 For example, we can encode the string \texttt{ABRACADABRA} as \texttt{[ABRACAD<7, 4>]}. \par
 The pointer \texttt{<7, 4>} tells us to look back 7 positions (to the first \texttt{A}), and copy the next 4 symbols. \par
 Note that pointers refer to the partially decoded output---\textit{not} to the encoded string. \par
-This allows pointers to reference other pointers, and ensures codes like \texttt{A<1,9>} are valid.
+This allows pointers to reference other pointers, and ensures that codes like \texttt{A<1,9>} are valid.

 \problem{}
-Encode \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} using LZ.
+Encode \texttt{ABCD$\cdot$ABCD$\cdot$BABABA$\cdot$ABCD$\cdot$ABCD} using this scheme. \par
 Then, decode the following:
 \begin{itemize}
 	\item \texttt{[ABCD<4,4>]}
@@ -39,7 +39,7 @@ Then, decode the following:
 	\linehack{}

 	In parts two and three, remember that we're reading the \textit{output string.} \par
-	The nine \texttt{A}s in part two are produced one by one, \par
+	The ten \texttt{A}s in part two are produced one by one, \par
 	with the decoder's \say{read head} following its \say{write head.}

 	\begin{itemize}
@@ -58,98 +58,114 @@ Convince yourself that LZ is a generalization of the run-length code we discusse
 \remark{}
 Note that we left a few things out of this section: we didn't discuss the algorithm that converts a string to an LZ-encoded blob,
 nor did we discuss how we should represent strings encoded with LZ in binary. We skipped these details because they are
-problems of implementation---they're the engineer's headache, not the mathematician's. If you're interested, a brief explanation is below.
-Ask an instructor to explain.
+problems of implementation---they're the engineer's headache, not the mathematician's. \par

-\begin{center}
-	\begin{tikzpicture}
-		\node[anchor=west,color=gray] at (-2.3, 0) {Bits};
-		\node[anchor=west,color=gray] at (-2.3, -0.5) {Meaning};
-		\draw[color=gray] (-2.3, -0.25) -- (5.5, -0.25);
-		\draw[color=gray] (-2.3, 0.15) -- (-2.3, -0.65);
-
-		\node at (0, 0) {\texttt{0}};
-		\node at (1, 0) {\texttt{0}};
-		\node at (2, 0) {\texttt{1}};
-		\node at (3, 0) {\texttt{0}};
-		\node at (4, 0) {\texttt{1}};
-		\node at (5, 0) {\texttt{1}};
-		\node at (6, 0) {\texttt{0}};
-		\node at (7, 0) {\texttt{0}};
-		\node at (8, 0) {\texttt{1}};
-
-		\draw (-0.5, 0.25) -- (8.5, 0.25);
-		\draw (-0.5, -0.25) -- (8.5, -0.25);
-		\draw (-0.5, -0.75) -- (8.5, -0.75);
-
-		\draw (-0.5, 0.25) -- (-0.5, -0.75);
-		\draw (0.5, 0.25) -- (0.5, -0.75);
-		\draw (8.5, 0.25) -- (8.5, -0.75);
-
-		\node at (0, -0.5) {flag};
-		\node at (4.5, -0.5) {if flag \texttt{<pos, len>}, else eight-bit symbol};
-	\end{tikzpicture}
-\end{center}
-
-
-\begin{center}
-	\begin{tikzpicture}
-		% Text tape
-		\node[color=gray] at (-0.75, 0) {\texttt{...}};
-		\node[color=gray] at (0.0, 0) {\texttt{D}};
-		\node at (0.5, 0) {\texttt{A}};
-		\node at (1.0, 0) {\texttt{B}};
-		\node at (1.5, 0) {\texttt{C}};
-		\node at (2.0, 0) {\texttt{D}};
-		\node at (2.5, 0) {\texttt{A}};
-		\node at (3.0, 0) {\texttt{B}};
-		\node at (3.5, 0) {\texttt{C}};
-		\node at (4.0, 0) {\texttt{D}};
-		\node[color=gray] at (4.5, 0) {\texttt{B}};
-		\node[color=gray] at (5.0, 0) {\texttt{D}};
-		\node[color=gray] at (5.5, 0) {\texttt{A}};
-		\node[color=gray] at (6.0, 0) {\texttt{C}};
-		\node[color=gray] at (6.75, 0) {\texttt{...}};
-
-		\draw (-1.75, 0.25) -- (7.25, 0.25);
-		\draw (-1.75, -0.25) -- (7.25, -0.25);
-
-
-		\draw[line width = 0.7mm, color=oblue, dotted] (2.25, 0.5) -- (2.25, -0.5);
-		\draw[line width = 0.7mm, color=oblue]
-			(-1.25, 0.5)
-			-- (4.25, 0.5)
-			-- (4.25, -0.5)
-			-- (-1.25, -0.5)
-			-- cycle
-		;
-
-		\draw
-			(4.2, -0.625)
-			-- (4.2, -0.75)
-			to node[anchor=north, midway] {lookahead} (2.3, -0.75)
-			-- (2.3, -0.625)
-		;
-
-		\draw
-			(2.2, -0.625)
-			-- (2.2, -0.75)
-			to node[anchor=north, midway] {search buffer} (-1.1, -0.75)
-			-- (-1.1, -0.625)
-		;
-
-		\draw[color=gray]
-			(2.2, 0.625)
-			-- (2.2, 0.75)
-			to node[anchor=south, midway] {match!} (0.3, 0.75)
-			-- (0.3, 0.625)
-		;
-
-		%\draw[->, color=gray] (2.5, 0.3) -- (2.5, 0.8) to[out=90,in=90] (0.5, 0.8);
-		\node at (7.0, -0.75) {Result: \texttt{[$\cdot\cdot\cdot$DABCD<4,4>$\cdot\cdot\cdot$]}};
-	\end{tikzpicture}
-\end{center}
-
-
-\vfill
 \pagebreak
+
+%\begin{instructornote}
+%	A simple LZ-scheme can work as follows. We encode our string into a sequence of
+%	nine-bit blocks, drawn below. The first bit of each block tells us whether or not
+%	this block is a pointer, and the next eight bits contain either a \texttt{pos, len} pair
+%	(using, say, for bits for each number) or a plain eight-bit symbol code.
+%	\begin{center}
+%		\begin{tikzpicture}
+%			\node[anchor=west,color=gray] at (-2.3, 0) {Bits};
+%			\node[anchor=west,color=gray] at (-2.3, -0.5) {Meaning};
+%			\draw[color=gray] (-2.3, -0.25) -- (5.5, -0.25);
+%			\draw[color=gray] (-2.3, 0.15) -- (-2.3, -0.65);
+%
+%			\node at (0, 0) {\texttt{0}};
+%			\node at (1, 0) {\texttt{0}};
+%			\node at (2, 0) {\texttt{1}};
+%			\node at (3, 0) {\texttt{0}};
+%			\node at (4, 0) {\texttt{1}};
+%			\node at (5, 0) {\texttt{1}};
+%			\node at (6, 0) {\texttt{0}};
+%			\node at (7, 0) {\texttt{0}};
+%			\node at (8, 0) {\texttt{1}};
+%
+%			\draw (-0.5, 0.25) -- (8.5, 0.25);
+%			\draw (-0.5, -0.25) -- (8.5, -0.25);
+%			\draw (-0.5, -0.75) -- (8.5, -0.75);
+%
+%			\draw (-0.5, 0.25) -- (-0.5, -0.75);
+%			\draw (0.5, 0.25) -- (0.5, -0.75);
+%			\draw (8.5, 0.25) -- (8.5, -0.75);
+%
+%			\node at (0, -0.5) {flag};
+%			\node at (4.5, -0.5) {if flag \texttt{<pos, len>}, else eight-bit symbol};
+%		\end{tikzpicture}
+%	\end{center}
+%
+%	To encode a string, we read it using a \say{window}, shown below. This window consists of
+%	a search buffer and a lookahead buffer, both of which have a fixed (but configurable) size.
+%	This window passes over the string one character at a time, inserting a pointer if it finds
+%	the lookahead buffer inside its search buffer, and a plain character otherwise.
+%
+%
+%	\begin{center}
+%		\begin{tikzpicture}
+%			% Text tape
+%			\node[color=gray] at (-0.75, 0) {\texttt{...}};
+%			\node[color=gray] at (0.0, 0) {\texttt{D}};
+%			\node at (0.5, 0) {\texttt{A}};
+%			\node at (1.0, 0) {\texttt{B}};
+%			\node at (1.5, 0) {\texttt{C}};
+%			\node at (2.0, 0) {\texttt{D}};
+%			\node at (2.5, 0) {\texttt{A}};
+%			\node at (3.0, 0) {\texttt{B}};
+%			\node at (3.5, 0) {\texttt{C}};
+%			\node at (4.0, 0) {\texttt{D}};
+%			\node[color=gray] at (4.5, 0) {\texttt{B}};
+%			\node[color=gray] at (5.0, 0) {\texttt{D}};
+%			\node[color=gray] at (5.5, 0) {\texttt{A}};
+%			\node[color=gray] at (6.0, 0) {\texttt{C}};
+%			\node[color=gray] at (6.75, 0) {\texttt{...}};
+%
+%			\draw (-1.75, 0.25) -- (7.25, 0.25);
+%			\draw (-1.75, -0.25) -- (7.25, -0.25);
+%
+%
+%			\draw[line width = 0.7mm, color=oblue, dotted] (2.25, 0.5) -- (2.25, -0.5);
+%			\draw[line width = 0.7mm, color=oblue]
+%				(-1.25, 0.5)
+%				-- (4.25, 0.5)
+%				-- (4.25, -0.5)
+%				-- (-1.25, -0.5)
+%				-- cycle
+%			;
+%
+%			\draw
+%				(4.2, -0.625)
+%				-- (4.2, -0.75)
+%				to node[anchor=north, midway] {lookahead} (2.3, -0.75)
+%				-- (2.3, -0.625)
+%			;
+%
+%			\draw
+%				(2.2, -0.625)
+%				-- (2.2, -0.75)
+%				to node[anchor=north, midway] {search buffer} (-1.1, -0.75)
+%				-- (-1.1, -0.625)
+%			;
+%
+%			\draw[color=gray]
+%				(2.2, 0.625)
+%				-- (2.2, 0.75)
+%				to node[anchor=south, midway] {match!} (0.3, 0.75)
+%				-- (0.3, 0.625)
+%			;
+%
+%			%\draw[->, color=gray] (2.5, 0.3) -- (2.5, 0.8) to[out=90,in=90] (0.5, 0.8);
+%			\node at (7.0, -0.75) {Result: \texttt{[$\cdot\cdot\cdot$DABCD<4,4>$\cdot\cdot\cdot$]}};
+%		\end{tikzpicture}
+%	\end{center}
+%
+%	This is not the exact process used in practice---but it's close enough. \par
+%	This process may be tweaked in any number of ways.
+%\end{instructornote}
+%
+%\makeatletter\if@solutions
+%	\vfill
+%	\pagebreak
+%\fi\makeatother
--- a/Advanced/Compression/parts/3
+++ b/Advanced/Compression/parts/3
@@ -3,7 +3,7 @@

 \example{}
 Now consider the alphabet $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$. \par
-With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits, by mapping...
+With a na\"ive coding scheme, we can encode a length $n$ string with $3n$ bits, by mapping...
 \begin{itemize}
 	\item $\texttt{A}$ to $\texttt{000}$
 	\item $\texttt{B}$ to $\texttt{001}$
@@ -12,12 +12,12 @@ With a na\"ive coding scheme, we can encode a length-$n$ string with $3n$ bits,
 	\item $\texttt{E}$ to $\texttt{100}$
 \end{itemize}
 For example, this encodes \texttt{ADEBCE} as \texttt{[000 011 100 001 010 100]}. \par
-To encoding strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we
+To encode strings over $\{\texttt{A}, \texttt{B}, \texttt{C}, \texttt{D}, \texttt{E}\}$ with this scheme, we
 need an average of three bits per symbol.

 \vspace{2mm}

-One could argue that this coding scheme is wasteful: \par
+However, one could argue that this coding scheme is wasteful: \par
 we're not using three of the eight possible three-bit sequences!

 \example{}
@@ -86,9 +86,8 @@ Is this a good way to encode five-letter strings?


 \remark{}
-The code from the previous page can be visualized as a tree which we traverse while decoding our sequence.
-Starting from the topmost node, we take the left edge if we see a \texttt{0} and the right edge if we see a \texttt{1}.
-Once we reach a letter, we return to the top node and repeat the process.
+The code from the previous page can be visualized as a full binary tree: \par
+\note{Every node in a \textit{full binary tree} has either zero or two children.}

 \vspace{-5mm}
 \null\hfill
@@ -135,10 +134,19 @@ Once we reach a letter, we return to the top node and repeat the process.
 	\end{center}
 \end{minipage}
 \hfill\null
+You can think of each symbol's code as it's \say{address} in this tree.
+When decoding a string, we start at the topmost node. Reading the binary sequence
+bit by bit, we move down the tree, taking a left edge if we see a \texttt{0}
+and a right edge if we see a \texttt{1}.
+Once we reach a letter, we return to the top node and repeat the process.



+\definition{}
+We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par

+\problem{}
+Convince yourself that trees like the one above always produce a prefix-free code.

 \problem{}<treedecode>
 Decode \texttt{[110111001001110110]} using the tree above.
@@ -149,6 +157,18 @@ Decode \texttt{[110111001001110110]} using the tree above.

 \vfill

+\problem{}
+Encode \texttt{ABDECBE} using this tree. \par
+How many bits do we save over a na\"ive scheme?
+
+\begin{solution}
+	This is \texttt{[00 01 110 111 10 01 111]}, and saves four bits.
+\end{solution}
+
+
+\vfill
+\pagebreak
+
 \problem{}
 In \ref{treedecode}, we needed 18 bits to encode \texttt{DEACBDD}. \par
 \note{Note that we'd need $3 \times 7 = 21$ bits to encode this string na\"ively.}
@@ -236,13 +256,19 @@ Now, do the opposite: draw a tree that encodes \texttt{DEACBDD} \textit{less} ef
 \vfill

 \remark{}
-We say a coding scheme is \textit{prefix-free} if no whole code word is a prefix of another code word. \par
-As we've seen, it is fairly easy to construct a prefix-free variable-length code using a binary tree. \par
+As we just saw, constructing a prefix-free code is fairly easy. \par
 Constucting the \textit{most efficient} prefix-free code for a given message is a bit more difficult. \par
-We'll spend the rest of this section solving this problem.
-
 \pagebreak

+
+
+
+
+
+
+
+
+
 \remark{}
 Let's restate our problem. \par
 Given an alphabet $A$ and a frequency function $f$, we want to construct a binary tree $T$ that minimizes
@@ -270,16 +296,13 @@ Where...

 \vspace{2mm}

-Also, notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.
+Also notice that $\mathcal{B}_f(T)$ is the \say{average bits per symbol} metric we saw in previous problems.


 \problem{}<hufptone>
 Let $f$ be fixed frequency function over an alphabet $A$. \par
 Let $T$ be an arbitrary tree for $A$, and let $a, b$ be two symbols in $A$. \par
-
-\vspace{2mm}
-
-Now, construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
+Construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
 \begin{equation*}
 	\mathcal{B}_f(T) - \mathcal{B}_f(T') = \Bigl(f(b) - f(a)\Bigr) \times \Bigl(d_T(a) - d_T(b)\Bigr)
 \end{equation*}
@@ -300,8 +323,8 @@ Now, construct $T'$ by swapping $a$ and $b$ in $T$. Show that \par
 \pagebreak

 \problem{}<hufpttwo>
-Show that is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
-\hint{You may assume that an optimal tree exists. Check three nontrivial cases.}
+Show that there is an optimal tree in which the two symbols with the lowest frequencies have the same parent.
+\hint{You may assume that an optimal tree exists. There are a few cases.}

 \begin{solution}
 	Let $T$ be an optimal tree, and let $a, b$ be the two symbols with the lowest frequency. \par
@@ -356,7 +379,7 @@ Then, use the previous two problems to show that your algorithm indeed produces

 	\vspace{2mm}
 	In plain english: pick the two nodes with the smallest frequency, combine them,
-	and add that into the alphabet as a \say{compound symbol}. Repeat until you're done.
+	and replace them with a \say{compound symbol}. Repeat until you're done.


 	\linehack{}