diff --git a/Advanced/Error-Correcting Codes/main.tex b/Advanced/Error-Correcting Codes/main.tex new file mode 100755 index 0000000..8960bf5 --- /dev/null +++ b/Advanced/Error-Correcting Codes/main.tex @@ -0,0 +1,28 @@ +% https://git.betalupi.com/Mark/latex-packages +% use [nosolutions] flag to hide solutions. +% use [solutions] flag to show solutions. +% Last built with version 1.1.0 +\documentclass[ + solutions +]{ormc_handout} + +\usepackage{tikz} + +\begin{document} + + \maketitle + + + {Error-Correcting Codes} + { + Based on a handout by Yingkun Li \\ + Revised by Mark on \today + } + + + \input{parts/00 detection} + \input{parts/01 correction} + \input{parts/02 distance} + \input{parts/03 bonus} + +\end{document} \ No newline at end of file diff --git a/Advanced/Error-Correcting Codes/parts/00 detection.tex b/Advanced/Error-Correcting Codes/parts/00 detection.tex new file mode 100755 index 0000000..6b25436 --- /dev/null +++ b/Advanced/Error-Correcting Codes/parts/00 detection.tex @@ -0,0 +1,164 @@ +\section{Error Detection} + +An ISBN\footnote{International Standard Book Number} is a unique numeric book identifier. It comes in two forms: ISBN-10 and ISBN-13. Naturally, ISBN-10s have ten digits, and ISBN-13s have thirteen. The final digit in both versions is a \textit{check digit}. + +\vspace{3mm} + +Say we have a sequence of nine digits, forming a partial ISBN-10: $n_1 n_2 ... n_9$. \\ +The final digit, $n_{10}$, is calculated as follows: + +$$ + \Biggr( \sum_{i = 1}^{9} i \times n_i \Biggl) \text{ mod } 11 +$$ + +If $n_{10}$ is equal to 10, it is written as \texttt{X}. + + +\problem{} +Which of the following could be valid ISBNs? + +\begin{itemize} + \item \texttt{0-134-54896-2} + \item \texttt{0-307-29206-3} + \item \texttt{0-316-00395-6} +\end{itemize} + +\begin{solution} + Only the first has an inconsistent check digit. +\end{solution} + +\vfill +\pagebreak + +\problem{} +Show that the following sum is divisible by 11 iff $n_1n_2...n_{10}$ is a valid ISBN-10. +$$ + \sum_{i = 1}^{10} (11 - i)n_i +$$ + +\begin{solution} + Proof that valid $\implies$ divisible, working in mod 11: + + \vspace{2mm} + + $10n_1 + 9n_2 + ... + 2n_9 + n_{10} \equiv$ \\ + $(-n_1) + (-2n_2) + ... + (-9n_9) + n_{10} =$ \\ + $-n_{10} + n_{10} \equiv 0$ + + \vspace{2mm} + + Having done this, the rest is easy. Work in reverse, or note that each step above is an iff. + +\end{solution} + +\vfill + +\problem{} +Take a valid ISBN-10 and change one digit. Is it possible that you get another valid ISBN-10? \\ +Provide an example or a proof. + +\begin{solution} + Let $S$ be the sum $10n_1 + 9n_2 + ... + 2n_9 + n_{10}$, before any digits are changed. + + \vspace{3mm} + + If you change one digit of the ISBN, $S$ changes by $km$, where $k \in \{1,2,...,10\}$ and $|m| \leq 10$. \\ + $k$ and $m$ cannot be divisible by 11, thus $km$ cannot be divisible by 11. + + \vspace{3mm} + + We know that $S \equiv 0 \text{ (mod 11)}$. \\ + After the change, the checksum is $S + km \equiv km \not\equiv 0 \text{ (mod 11)}$. +\end{solution} + +\vfill + +\problem{} +Take a valid ISBN-10 and swap two adjacent digits. When will the result be a valid ISBN-10? \\ +This is called a \textit{transposition error}. + + +\begin{solution} + Let $n_1n_2...n_{10}$ be a valid ISBN-10. \\ + When we swap $n_i$ and $n_{i+1}$, we subtract $n_i$ and add $n_{i+1}$ to the checksum. + + \vspace{3mm} + + If the new ISBN is to be valid, we must have that $n_{i+1} - n_i \equiv 0 \text{ (mod 11)}$. \\ + This is impossible unless $n_i = n_{i+1}$. Figure out why yourself. +\end{solution} + +\vfill +\pagebreak + +\problem{} +ISBN-13 error checking is slightly different. Given a partial ISBN-13 $n_1 n_2 n_3 ... n_{12}$, the final digit is given by + +$$ + n_{13} = \Biggr[ \sum_{i=1}^{12} n_i \times (2 + (-1)^i) \Biggl] \text{ mod } 10 +$$ + +What is the last digit of the following ISBN-13? \\ +\texttt{978-0-380-97726-?} + +\begin{solution} + The final digit is 0. +\end{solution} + +\vfill + +\problem{} +Take a valid ISBN-13 and change one digit. Is it possible that you get another valid ISBN-13? \\ +Provide an example or a proof. + +\begin{solution} + Let $n_1n_2...n_{13}$ be a valid ISBN-13. Choose some $n_i$ and change it to $m_i$. \\ + + \vspace{3mm} + + Since $n_i$, $m_i$ $\in \{0, 1, 2, ..., 9\}$, $-9 \leq n_i - m_i \leq 9$. \\ + + \vspace{2mm} + + Case 0: $i$ is 13 \\ + This is trivial. + + \vspace{2mm} + + Case 1: $i$ is odd \\ + For the new ISBN to be valid, we need $n_i - m_i \equiv 0 \text{ (mod 10)}$. \\ + This cannot happen if $n_i \neq m_i$. + + \vspace{2mm} + + Case 2: $i$ is even \\ + For the new ISBN to be valid, we need $3(n_i - m_i) \equiv 0 \text{ (mod 10)}$ \\ + This cannot happen, 10 and 3 are coprime. +\end{solution} + +\vfill + +\problem{} +Take a valid ISBN-13 and swap two adjacent digits. When will the result be a valid ISBN-13? \\ +\hint{The answer here is more interesting than it was last time.} + +\begin{solution} + Say we swap $n_i$ and $n_{i+1}$, where $i \in \{1, 2, ..., 11\}$. \\ + The checksum changes by $2(n_{i+1} - n_i)$, and will \\ + remain the same if this value is $\equiv 0 \text{ (mod 10)}$. +\end{solution} + +\vfill + +\problem{} +\texttt{978-0-08-2066-46-6} was a valid ISBN until I changed a single digit. \\ +Can you tell me which digit I changed? + +\begin{solution} + Nope, unless you look at the meaning of each digit in the spec. \\ + If you're unlucky, maybe not even then. +\end{solution} + + +\vfill +\pagebreak diff --git a/Advanced/Error-Correcting Codes/parts/01 correction.tex b/Advanced/Error-Correcting Codes/parts/01 correction.tex new file mode 100644 index 0000000..883c85a --- /dev/null +++ b/Advanced/Error-Correcting Codes/parts/01 correction.tex @@ -0,0 +1,122 @@ +\section{Error Correction} + +Error detection is helpful, but we'd also like to fix errors when we find them. One example of such a system is the QR code, which remains readable even if a significant amount of it is removed. QR codes with icons inside aren't special--they're just missing their central elements. The error-correcting codes in the QR specification allow us to recover the lost data. +\begin{figure}[h] + \centering + \includegraphics[width = 3cm]{qr} +\end{figure} + +\definition{Repeating codes} +The simplest possible error-correcting code is a \say{repeating code}. It works just as you'd expect: \\ +Instead of sending data once, it sends multiple copies. If a few bits are damaged, they can be both detected and repaired. \\ + +For example, consider the following three-repeat code encoding the binary string $101$: + +$$ + 111~000~111 +$$ + +If we flip any one bit, we can easily find and fix the error. + +\definition{Code Efficiency} +The efficiency of an error-correcting code is calculated as follows: +$$ +\frac{\text{number of data bits}}{\text{total bits sent}} +$$ + +For example, the efficiency of the three-repeat code above is $\frac{3}{9} = \frac{1}{3} \approx 0.33$ + +\problem{} +What is the efficiency of an $k$-repeat code? + +\vfill + +\problem{} +How many repeated digits do you need to... +\begin{itemize} + \item[-] detect a transposition error? + \item[-] correct a transposition error? +\end{itemize} + +\vfill +\pagebreak + +\definition{Hamming's Square Code} + +A more effective coding scheme comes in the form of Hamming's square code. +Take a four-bit message and arrange it in a $2 \times 2$ square. \\ + +Compute the pairity of each row and write it at the right. \\ +Compute the pairity of each column and write it at the bottom. \\ +Finally, compute the pairity of the entire message write it in the lower right corner. + +\vspace{3mm} + +Reading the result row by row to get the encoded message. \\ +For example, the message 1011 generates the sequence 101110011: + +$$ +1011 +\longrightarrow +\begin{array}{cc|} + 1 & 0 \\ + 1 & 1 \\ + \hline +\end{array} +\longrightarrow +\begin{array}{cc|c} + 1 & 0 & 1 \\ + 1 & 1 & 0 \\ \hline + 0 & 1 & +\end{array} +\longrightarrow +\begin{array}{cc|c} + 1 & 0 & 1 \\ + 1 & 1 & 0 \\ \hline + 0 & 1 & 1 +\end{array} +\longrightarrow +101110011 +$$ + +\problem{} +The following message are encoded using the method above. +Find and correct any single-digit or transposition errors. +\begin{enumerate} + \item \texttt{110 110 011} %101110011 + \item \texttt{100 101 011} %110101011 + \item \texttt{001 010 110} %000110110 +\end{enumerate} + +\begin{solution} + \begin{enumerate} + \item \texttt{101 110 011} or \texttt{110 101 011} + \item \texttt{110 101 011} + \item \texttt{000 110 110} + \end{enumerate} +\end{solution} + +\vfill + +\problem{} +What is the efficiency of this coding scheme? + +\vfill + +\problem{} +Can we correct a single-digit error in the encoded message? \\ +Can we correct a transposition error in the encoded message? + +\vfill + +\problem{} +Let's generalize this coding scheme to a non-square table: \\ +Given a message of length $ab$, construct a rectangle with dimensions $a \times b$ as described above. +\begin{itemize} + \item What is the efficiency of a $a \times b$ rectangle code? + \item Can the $a \times b$ rectangle code detect and fix single-bit errors? + \item Can the $a \times b$ rectangle code detect and fix two-bit errors? +\end{itemize} + +\vfill +\pagebreak diff --git a/Advanced/Error-Correcting Codes/parts/02 distance.tex b/Advanced/Error-Correcting Codes/parts/02 distance.tex new file mode 100644 index 0000000..a7ec4f4 --- /dev/null +++ b/Advanced/Error-Correcting Codes/parts/02 distance.tex @@ -0,0 +1,41 @@ +\section{Hamming Distance} + +\definition{} +The \textit{Hamming distance} between two strings $x = x_1x_2...x_n$ and $y = y_1y_2...y_n$ is the number of positions at which the digits of $x$ and $y$ are different. + +\problem{} +Compute the Hamming distance between \texttt{1010} and \texttt{0001}. + +\vfill + +\problem{} +Read $d_H(x, y)$ as \say{the hamming distance between $x$ and $y$.} \\ +Prove the following statements: +\begin{enumerate} +\item $d_H(x, y) \ge 0$ with equality if and only if $x = y$, +\item $d_H(x, y) = d_H(y, x)$, +\item $d_H(x, z) \le d_H(x, y) + d_H(y, z)$. +\end{enumerate} + +\vfill + +\problem{} +Say we encode and send a message with the 3-repeat code. A few bits are damaged in transit. \\ +When the transmission is decoded, a different message is read. + +\vspace{2mm} + +What is the minimum possible hamming distance between the undamaged encoded message and the damaged encoded message? + +\vfill + +\problem{} +Say we encode and send a message with Hamming's square code. A few bits are damaged in transit. \\ +When the transmission is decoded, no uncorrectable errors are detected and a different message is read. + +\vspace{2mm} + +What is the minimum possible hamming distance between the undamaged encoded message and the damaged encoded message? + +\vfill +\pagebreak \ No newline at end of file diff --git a/Advanced/Error-Correcting Codes/parts/03 bonus.tex b/Advanced/Error-Correcting Codes/parts/03 bonus.tex new file mode 100644 index 0000000..f58ecd7 --- /dev/null +++ b/Advanced/Error-Correcting Codes/parts/03 bonus.tex @@ -0,0 +1,94 @@ +\section{Hat Puzzles: The Revenge} + +\problem{} +Three people are sitting in a circle. A black or a white hat will be placed on each person's head, with equal probability. Each person can see everyone's hat color except their own. \\ + +\vspace{1mm} + +The participants are asked to simultaneously write down \say{Black}, \say{White}, or \say{Pass}. \\ +They win if at least one person guesses their hat correctly. \\ +They lose if anyone guesses incorrectly, or if everyone passes. + +\vspace{1mm} + +How can they maximize their chance of winning? + +\vfill + + +\problem{} +Consider the same game with $2^n-1$ people. How can they achieve a win rate of $\frac{2^n-1}{2^n}$? + +\vfill +\pagebreak + +% A copy of the post these problems are based on. Contains the solution. +% https://cornellmath.wordpress.com/2007/09/20/hat-guessing-puzzles-the-revenge +% +% I guess since my previous hat color guessing problem was so popular, I might as well talk about the other one I know. However, this one isn't meant to attack the foundations of mathematics. The problem is as follows: +% +% Three people are sitting in a circle. Black or white hats (50% chance of each) will all be placed on their heads, and they will be able to see everyone's hat color but their own. They will all simultaneously write down on a piece of paper either "Black", "White", or "Pass", trying to guess their own hat color. All the people collectively win (whatever that means) if at least someone guesses their hat correctly and no one guesses incorrectly. They lose if anyone guesses incorrectly, or everyone passes. If they can agree on a strategy beforehand, what is their best chance of winning? +% +% Again, there is the problem that no information can be conveyed to someone about their own hat color, so they would seem to be guessing blindly (talking and facial expressions are prohibited). However, they can still win 75% of the time. Figure it out! +% +% Once you solve the easy version of this puzzle, the harder version is with larger numbers of people. As a partial spoiler, stick to 2^n-1, where the best win rate is 2^n-1 out of 2^n. How is this possible? (Answer below the fold) +% +% The trick to the puzzle is realizing that, even though any specific person who elects not to pass has only a 50% chance of being right, the strategy can be chosen so that the wrong guesses are all concentrated into a small number of possibilities. That is, because you only need % one right guess to win and multiple wrong guesses don't make a loss worse, the strategy should attempt to make as many people wrong simultaneously if anyone is going to guess wrong. +% +% The three-person case makes a good example. Consider the following strategy: +% +% If you see two hats of the same color: Guess the opposite color. +% +% If you see two different hat colors: Pass. +% +% What happens? It's not hard to write down all the possibilities: +% +% 3 black hats: Everyone sees two black hats, and guesses White. Everyone is wrong. +% +% 2 black hats, 1 white hat: The people in black hats see both colors and Pass; the person in the white hat sees two black hats and says White. One person is correct and everyone else passed. +% +% 1 black hat, 2 white hats: This is identical to the previous case, with colors reversed. Its a win. +% +% 3 white hats: This is identical to the first case, with colors reversed. Its a loss. +% +% So unless all three hats were the same color, everyone won. However, the chances of all three hats being the same color is only 1 in 4, so its a win 75% of the time. Notice that the key was getting everyone to be wrong at the same time, but only having one correct guess in winning situations. +% +% Ok, what about more people, say, n of them? Well, we need a strategy where the wrong guesses are concentrated and the right guesses are spread out. Let's make this a little bit more mathematical, by turning white hats into 1s and black hats into 0s. Now, a possible hat scenario is a sequence of n binary digits, and every sequence is equally likely. +% +% Since the optimal strategy seems to be when all the wrong guesses happen simultaneously, we need to agree on some sequences that will be the wrong sequences, that is, the scenarios where everyone will guess incorrectly. How does this work? Say 0000000 is one of the agreed upon wrong sequences (this is for n=7). Then, if someone looks around and sees all zeros/black hats, they will guess white. That way, everyone will be wrong if it is all black hats; but if there is exactly one white hat, everyone wins! Since it is n times more likely for there to be exactly one white hat than no white hats, this seems to work pretty well. +% +% The general strategy if you have a whole bunch of wrong sequences is for everyone to look around, and: +% +% If it looks like you might be in a wrong sequence, guess the opposite possibility. +% +% If you are definitely not in a wrong sequence, pass. +% +% (Note that we are assuming that no two wrong sequences differ by a single digit, so that there is always an 'opposite possiblity') How well does this strategy work? +% +% It loses every wrong sequence. +% +% It wins every sequence that differs from a wrong sequences by exactly one digit. +% +% It loses every sequence that differs from every wrong sequence by at least two digits (since everyone passes). +% +% So what we want is a collection of wrong sequences that are evenly spread out amongst the possibilities, ie, we want to 'cover' as many possibilites as possible with the fewest number of wrong sequences. +% +% This is actually a problem that real people care about, even some who don't wear hats. This is (roughly) the problem of finding an error correcting code. Sometimes, one computer will be sending another computer information in the form of a sequence of 1s and 0s, and by some fluke % a single digit will get flipped. The goal of error correcting codes is to turn the sequence of 1s and 0s you want to send into a longer sequence, which has the property that the receiving computer can tell if a digit got flipped and repair it. +% +% A silly example is the Tripling Code, where if what I want to do is send you 011, I instead send you 011011011 (we always agree on what code we are using ahead of time). Now, if one digit gets flipped, you will see two of the three copies of the sequence agreeing and one differing, and you will know what I was trying to say. However, this is a wildly inefficient code, since it takes three times as long to say anything. +% +% What does an error-correcting code look like? Well, we agree ahead of time upon which possible sequences are the codewords (ie, the ones that are correct), and how to turn them into the messages we really wanted to send. Then, if you get something that differs from a codeword by % exactly one digit, you know how to correct it (this is assuming that the codewords are far enough apart that there is only one close one). So the goal for making an efficient code is to pick codewords spread apart evenly enough that as many possible sequences are exactly one away % from a codeword. This is exactly what we were looking for with our 'wrong sequences', even though the names were different. +% +% Therefore, we can invoke some fancy error-correcting codes to find the optimal hat guessing strategy. In particular, if the number of people/length of sequence is 2^n-1, there is a 'perfect code' called the Hamming code, which will give us a choice of wrong sequences such that every possibility is either 1) a wrong sequence, or 2) exactly 1 digit away from a wrong sequence. Hence, this is best possible strategy for hat guessing. I am not going into the details of the Hamming codes, since the important thing here is that they exist. +% +% However, this only solves the problem for a very specific number of people. What about other numbers? Theres a complication in these cases, in that its impossible to have a perfect code. That is, it is impossible to choose wrong sequences so that every possible sequence is either wrong, or one digit away from exactly one wrong sequence. +% +% We can ask what the nearest possibility to a perfect code is, but its not clear which way to be less than perfect is optimal: +% +% 1) Having some of the correct guesses overlap, that is, having some wrong sequences differ by 2 digits. +% +% 2) Having some sequences which are lost because everyone passes. +% +% 3) Most significantly, moving away from the 'wrong sequence' strategy. +% +% The last one, which I would guess is the correct way to proceed, is bad because the tools from computer science become useless rapidly. I really have no idea what the optimal solution looks like here. \ No newline at end of file diff --git a/Advanced/Error-Correcting Codes/qr.png b/Advanced/Error-Correcting Codes/qr.png new file mode 100644 index 0000000..195d3e3 Binary files /dev/null and b/Advanced/Error-Correcting Codes/qr.png differ