From 3bc44ed86706ff3325491601f363d31f14e5e529 Mon Sep 17 00:00:00 2001 From: Mark Date: Tue, 27 Jun 2023 21:23:37 -0700 Subject: [PATCH] Cleanup --- Misc/Warm-Ups/regex.tex | 78 ++++++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 36 deletions(-) diff --git a/Misc/Warm-Ups/regex.tex b/Misc/Warm-Ups/regex.tex index b846b37..3bf0952 100644 --- a/Misc/Warm-Ups/regex.tex +++ b/Misc/Warm-Ups/regex.tex @@ -6,6 +6,8 @@ \usepackage{xcolor} \usepackage{soul} +\usepackage{hyperref} +\usepackage[T1]{fontenc} % Fixes texttt braces \definecolor{Light}{gray}{.90} \sethlcolor{Light} @@ -20,30 +22,33 @@ \maketitle - Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \\ - (abbreviated \say{regex}, which is pronounced like \say{gif}) + Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \par + This is often abbreviated \say{regex}, which is pronounced like \say{gif.} \vspace{2mm} - Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \\ + Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \par - Often enough, a clever regex pattern can do the work of a few hundred lines of code. \\ + Often enough, a clever regex pattern can do the work of a few hundred lines of code. \vspace{2mm} - Like the DFAs we have studied, a regex pattern \textit{accepts} or \textit{rejects} a string. However, we don't usually use this terminology when discussing regex, instead opting to say a pattern \textit{matches} or \textit{doesn't match} a string. \\ + Like the DFAs we've studied, a regex pattern \textit{accepts} or \textit{rejects} a string. However, we don't usually use this terminology with regex, and instead say that a string \textit{matches} or \textit{doesn't match} a pattern. \vspace{5mm} - \textbf{Quantifiers} \\ - Quantifiers tell us how many of a character to match. \\ - There are four of them: - \htexttt{+}, \htexttt{*}, \htexttt{?}, and \htexttt{\{ \}} + Regex strings consist of characters, quantifiers, sets, and groups. + + \vspace{5mm} + + \textbf{Quantifiers} \par + Quantifiers specify how many of a character to match. \par + There are four of these: \htexttt{+}, \htexttt{*}, \htexttt{?}, and \htexttt{\{ \}} \vspace{2mm} - \htexttt{+} means \say{match one or more of the preceding token} \\ - \htexttt{*} means \say{match zero or more of the preceding token} \\ + \htexttt{+} means \say{match one or more of the preceding token} \par + \htexttt{*} means \say{match zero or more of the preceding token} For example, the pattern \htexttt{ca+t} will match the following strings: \begin{itemize} @@ -51,19 +56,19 @@ \item \texttt{caat} \item \texttt{caaaaaaaat} \end{itemize} - \htexttt{ca+t} will \textbf{not} match the string \texttt{ct}. \\ + \htexttt{ca+t} will \textbf{not} match the string \texttt{ct}. \par The pattern \htexttt{ca*t} will match all the strings above, including \texttt{ct}. \vspace{2mm} - \htexttt{?} means \say{match one or none of the preceding token} \\ - The pattern \htexttt{linea?r} will match only \texttt{linear} and \texttt{liner}. \\ + \htexttt{?} means \say{match one or none of the preceding token} \par + The pattern \htexttt{linea?r} will match only \texttt{linear} and \texttt{liner}. \vspace{2mm} - Brackets \htexttt{\{min, max\}} are the most flexible quantifier. \\ - They specify exactly how many tokens to match: \\ - \htexttt{ab\{2\}a} will match only \texttt{abba}. \\ - \htexttt{ab\{1,3\}a} will match only \texttt{aba}, \texttt{abba}, and \texttt{abbba}. \\ + Brackets \htexttt{\{min, max\}} are the most flexible quantifier. \par + They specify exactly how many tokens to match: \par + \htexttt{ab\{2\}a} will match only \texttt{abba}. \par + \htexttt{ab\{1,3\}a} will match only \texttt{aba}, \texttt{abba}, and \texttt{abbba}. \par \htexttt{ab\{2,\}a} will match any \texttt{ab...ba} with at least two \texttt{b}s. \vspace{5mm} @@ -83,52 +88,52 @@ - \textbf{Characters, Sets, and Groups} \\ - We specify characters literally, as shown above: \\ - \texttt{a+} means \say{one or more \texttt{a} character} \\ + \textbf{Characters, Sets, and Groups} \par + In the previous section, we saw how we can specify characters literally: \par + \texttt{a+} means \say{one or more \texttt{a} character} \vspace{2mm} - There are, however, other ways we can specify characters. \\ + There are, of course, other ways we can specify characters. \vspace{2mm} - The first such way is the \textit{set}, denoted \htexttt{[ ]}. A set can pretend to be any character inside it. \\ - For example, \htexttt{m[aoy]th} will match \texttt{math}, \texttt{moth}, or \texttt{myth}. \\ - \htexttt{a[01]+b} will match \texttt{a0b}, \texttt{a111b}, \texttt{a1100110b}, and any other similar string. \\ - You may negate a set with a \htexttt{\textasciicircum}. \\ + The first such way is the \textit{set}, denoted \htexttt{[ ]}. A set can pretend to be any character inside it. \par + For example, \htexttt{m[aoy]th} will match \texttt{math}, \texttt{moth}, or \texttt{myth}. \par + \htexttt{a[01]+b} will match \texttt{a0b}, \texttt{a111b}, \texttt{a1100110b}, and any other similar string. \par + You may negate a set with a \htexttt{\textasciicircum}. \par \htexttt{[\textasciicircum abc]} will match any character except \texttt{a}, \texttt{b}, or \texttt{c}, including symbols and spaces. \vspace{2mm} - If we want to keep characters together, we can use the \textit{group}, denoted \htexttt{( )}. \\ + If we want to keep characters together, we can use the \textit{group}, denoted \htexttt{( )}. \par - Groups work exactly as you'd expect, representing an atomic\footnotemark{} group of characters. \\ - \htexttt{a(01)+b} will match \texttt{a01b} and \texttt{a010101b}, but will \textbf{not} match \texttt{a0b}, \texttt{a1b}, or \texttt{a1100110b}. \\ + Groups work exactly as you'd expect, representing an atomic\footnotemark{} group of characters. \par + \htexttt{a(01)+b} will match \texttt{a01b} and \texttt{a010101b}, but will \textbf{not} match \texttt{a0b}, \texttt{a1b}, or \texttt{a1100110b}. \footnotetext{In other words, \say{unbreakable}} \problem{} - You are now familiar with most of the tools regex has to offer. \\ + You are now familiar with most of the tools regex has to offer. \par Write patterns that match the following strings: \begin{enumerate}[itemsep=1mm] - \item An ISO-8601 date, like \texttt{2022-10-29}. \\ + \item An ISO-8601 date, like \texttt{2022-10-29}. \par \hint{Invalid dates like \texttt{2022-13-29} should also be matched.} - \item An email address. \\ + \item An email address. \par \hint{Don't forget about subdomains, like \texttt{math.ucla.edu}.} \item A UCLA room number, like \texttt{MS 5118} or \texttt{Kinsey 1220B}. - \item Any ISBN-10 of the form \texttt{0-316-00395-7}. \\ + \item Any ISBN-10 of the form \texttt{0-316-00395-7}. \par \hint{Remember that the check digit may be an \texttt{X}. Dashes are optional.} - \item A word of even length. \\ + \item A word of even length. \par \hint{The set \texttt{[A-z]} contains every english letter, capitalized and lowercase. \\ \texttt{[a-z]} will only match lowercase letters.} - \item A word with exactly 3 vowels. \\ + \item A word with exactly 3 vowels. \par \hint{The special token \texttt{\textbackslash w} will match any word character. It is equivalent to \texttt{[A-z0-9\_]} \\ \texttt{\_} stands for a literal underscore.} \item A word that has even length and exactly 3 vowels. @@ -145,6 +150,7 @@ \problem{} - If you'd like to know more, check out \texttt{regexr.com}. It offers an interative regex prompt, as well as a cheatsheet that explains every other regex token there is. You will find a nice set of challenges at \texttt{http://regex.alf.nu}. \\ + If you'd like to know more, check out \url{https://regexr.com}. It offers an interative regex prompt, as well as a cheatsheet that explains every other regex token there is. \par + You will find a nice set of challenges at \url{https://alf.nu/RegexGolf}. I especially encourage you to look into this if you are interested in computer science. \end{document} \ No newline at end of file