diff --git a/src/Warm-Ups/Regex/main.tex b/src/Warm-Ups/Regex/main.tex deleted file mode 100644 index a8f48e7..0000000 --- a/src/Warm-Ups/Regex/main.tex +++ /dev/null @@ -1,153 +0,0 @@ -\documentclass[ - solutions, - hidewarning, -]{../../../lib/tex/ormc_handout} -\usepackage{../../../lib/tex/macros} - - -\usepackage{xcolor} -\usepackage{soul} -\usepackage{hyperref} - -\definecolor{Light}{gray}{.90} -\sethlcolor{Light} -\newcommand{\htexttt}[1]{\texttt{\hl{#1}}} - - -\title{The Regex Warm-Up} -\uptitler{\smallurl{}} -\subtitle{Prepared by Mark on \today} - -\begin{document} - - \maketitle - - - Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \par - This is often abbreviated \say{regex}, which is pronounced like \say{gif.} - - \vspace{2mm} - - Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \par - - Often enough, a clever regex pattern can do the work of a few hundred lines of code. - - \vspace{2mm} - - Like the DFAs we've studied, a regex pattern \textit{accepts} or \textit{rejects} a string. However, we don't usually use this terminology with regex, and instead say that a string \textit{matches} or \textit{doesn't match} a pattern. - - \vspace{5mm} - - Regex strings consist of characters, quantifiers, sets, and groups. - - \vspace{5mm} - - \textbf{Quantifiers} \par - Quantifiers specify how many of a character to match. \par - There are four of these: \htexttt{+}, \htexttt{*}, \htexttt{?}, and \htexttt{\{ \}} - - \vspace{2mm} - - \htexttt{+} means \say{match one or more of the preceding token} \par - \htexttt{*} means \say{match zero or more of the preceding token} - - For example, the pattern \htexttt{ca+t} will match the following strings: - \begin{itemize} - \item \texttt{cat} - \item \texttt{caat} - \item \texttt{caaaaaaaat} - \end{itemize} - \htexttt{ca+t} will \textbf{not} match the string \texttt{ct}. \par - The pattern \htexttt{ca*t} will match all the strings above, including \texttt{ct}. - \vspace{2mm} - - - \htexttt{?} means \say{match one or none of the preceding token} \par - The pattern \htexttt{linea?r} will match only \texttt{linear} and \texttt{liner}. - \vspace{2mm} - - Brackets \htexttt{\{min, max\}} are the most flexible quantifier. \par - They specify exactly how many tokens to match: \par - \htexttt{ab\{2\}a} will match only \texttt{abba}. \par - \htexttt{ab\{1,3\}a} will match only \texttt{aba}, \texttt{abba}, and \texttt{abbba}. \par - % spell:off - \htexttt{ab\{2,\}a} will match any \texttt{ab...ba} with at least two \texttt{b}s. - % spell:on - - \vspace{5mm} - - \problem{} - Write the patterns \htexttt{a*} and \htexttt{a+} using only \htexttt{\{ \}}. - \vfill - - \problem{} - Draw a DFA equivalent to the regex pattern \htexttt{01*0}. - \vfill - - \pagebreak - - - - - - - \textbf{Characters, Sets, and Groups} \par - In the previous section, we saw how we can specify characters literally: \par - \texttt{a+} means \say{one or more \texttt{a} character} - - \vspace{2mm} - - There are, of course, other ways we can specify characters. - - \vspace{2mm} - - The first such way is the \textit{set}, denoted \htexttt{[ ]}. A set can pretend to be any character inside it. \par - For example, \htexttt{m[aoy]th} will match \texttt{math}, \texttt{moth}, or \texttt{myth}. \par - \htexttt{a[01]+b} will match \texttt{a0b}, \texttt{a111b}, \texttt{a1100110b}, and any other similar string. \par - You may negate a set with a \htexttt{\textasciicircum}. \par - \htexttt{[\textasciicircum abc]} will match any character except \texttt{a}, \texttt{b}, or \texttt{c}, including symbols and spaces. - - \vspace{2mm} - - If we want to keep characters together, we can use the \textit{group}, denoted \htexttt{( )}. \par - - Groups work exactly as you'd expect, representing an atomic\footnotemark{} group of characters. \par - \htexttt{a(01)+b} will match \texttt{a01b} and \texttt{a010101b}, but will \textbf{not} match \texttt{a0b}, \texttt{a1b}, or \texttt{a1100110b}. - - \footnotetext{In other words, \say{unbreakable}} - - - \problem{} - You are now familiar with most of the tools regex has to offer. \par - Write patterns that match the following strings: - \begin{enumerate}[itemsep=1mm] - \item An ISO-8601 date, like \texttt{2022-10-29}. \par - \hint{Invalid dates like \texttt{2022-13-29} should also be matched.} - - \item An email address. \par - \hint{Don't forget about subdomains, like \texttt{math.ucla.edu}.} - - \item A UCLA room number, like \texttt{MS 5118} or \texttt{Kinsey 1220B}. - - \item Any ISBN-10 of the form \texttt{0-316-00395-7}. \par - \hint{Remember that the check digit may be an \texttt{X}. Dashes are optional.} - - \item A word of even length. \par - \hint{The set \texttt{[A-z]} contains every english letter, capitalized and lowercase. \\ - \texttt{[a-z]} will only match lowercase letters.} - - \item A word with exactly 3 vowels. \par - \hint{The special token \texttt{\textbackslash w} will match any word character. It is equivalent to \texttt{[A-z0-9\_]} \\ \texttt{\_} stands for a literal underscore.} - - \item A word that has even length and exactly 3 vowels. - - \item A sentence that does not start with a capital letter. - \end{enumerate} - - \vfill - - \problem{} - If you'd like to know more, check out \url{https://regexr.com}. It offers an interactive regex prompt, as well as a cheatsheet that explains every other regex token there is. \par - You will find a nice set of challenges at \url{https://alf.nu/RegexGolf}. - I especially encourage you to look into this if you are interested in computer science. -\end{document} \ No newline at end of file diff --git a/src/Warm-Ups/Regex/main.typ b/src/Warm-Ups/Regex/main.typ new file mode 100644 index 0000000..74b2a77 --- /dev/null +++ b/src/Warm-Ups/Regex/main.typ @@ -0,0 +1,138 @@ +#import "@local/handout:0.1.0": * + +#show: doc => handout( + doc, + quarter: link( + "https://betalupi.com/handouts", + "betalupi.com/handouts", + ), + + title: [The Regex Warm-Up], + by: "Mark", +) + + +Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \ +This is often abbreviated "regex," which is pronounced like "gif." + +#v(2mm) + +Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \ + +Often enough, a clever regex pattern can do the work of a few hundred lines of code. + +#v(2mm) + +Like the DFAs we've studied, a regex pattern _accepts_ or _rejects_ a string. However, we don't usually use this terminology with regex, and instead say that a string _matches_ or _doesn't match_ a pattern. + +#v(5mm) + +Regex strings consist of characters, quantifiers, sets, and groups. + +#v(5mm) + + + +*Quantifiers* \ +Quantifiers specify how many of a character to match. \ +There are four of these: `+`, `*`, `?`, and `{ }`. + +#v(4mm) + +`+` means "match one or more of the preceding token" \ +`*` means "match zero or more of the preceding token" + +For example, the pattern `ca+t` will match the following strings: +- `cat` +- `caat` +- `caaaaaaaat` +`ca+t` will *not* match the string `ct`. \ +The pattern `ca*t` will match all the strings above, including `ct`. + + +#v(4mm) + + +`?` means "match one or none of the preceding token" \ +The pattern `linea?r` will match only `linear` and `liner`. + +#v(4mm) + +Brackets `{min, max}` are the most flexible quantifier. \ +They specify exactly how many tokens to match: \ +`ab{2}a` will match only `abba`. \ +`ab{1,3}a` will match only `aba`, `abba`, and `abbba`. \ +`ab{2,}a` will match any `ab...ba` with at least two `b`s. + +#problem() +Write the patterns `a*` and `a+` using only `{ }`. +#v(1fr) + +#problem() +Draw a DFA equivalent to the regex pattern `01*0`. +#v(1fr) + +#pagebreak() + + + + + + +*Characters, Sets, and Groups* \ +In the previous section, we saw how we can specify characters literally: \ +`a+` means "one or more `a` characters" \ +There are, of course, other ways we can specify characters. + +#v(4mm) + +The first such way is the _set_, denoted `[ ]`. A set can pretend to be any character inside it. \ +For example, `m[aoy]th` will match `math`, `moth`, or `myth`. \ +`a[01]+b` will match `a0b`, `a111b`, `a1100110b`, and any other similar string. \ + +#v(4mm) + +We can negate a set with a `^`. \ +`[^abc]` will match any single character except `a`, `b`, or `c`, including symbols and spaces. + +#v(4mm) + +If we want to keep characters together, we can use the _group_, denoted `( )`. \ + +Groups work exactly as you'd expect, representing an atomic#footnote([In other words, "unbreakable"]) group of characters. \ +`a(01)+b` will match `a01b` and `a010101b`, but will *not* match `a0b`, `a1b`, or `a1100110b`. + +#problem() +You are now familiar with most of the tools regex has to offer. \ +Write patterns that match the following strings: + +- An ISO-8601 date, like `2022-10-29`. \ + #hint([Invalid dates like `2022-13-29` should also be matched.]) + +- An email address. \ + #hint([Don't forget about subdomains, like `math.ucla.edu`.]) + +- A UCLA room number, like `MS 5118` or `Kinsey 1220B`. + +- Any ISBN-10 of the form `0-316-00395-7`. \ + #hint([Remember that the check digit may be an `X`. Dashes are optional.]) + +- A word of even length. \ + #hint([The set `[A-z]` contains every english letter, capitalized and lowercase. \ + `[a-z]` will only match lowercase letters.]) + +- A word with exactly 3 vowels. \ + #hint([The special token `\w` will match any word character. \ + It is equivalent to `[A-z0-9_]`. `_` represents a literal underscore. +]) + +- A word that has even length and exactly 3 vowels. + +- A sentence that does not start with a capital letter. +#v(1fr) + +#problem() +If you'd like to know more, check out `https://regexr.com`. +It offers an interactive regex prompt, +as well as a cheatsheet that explains every other regex token there is. \ +You can find a nice set of challenges at `https://alf.nu/RegexGolf`.